2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

202 Commits

Author SHA1 Message Date
Linus Torvalds
1880df2cf4 block-6.16-20250704
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmhn80AQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgphdSD/93OEB7MwxhhzhaU9U0eiYRPlXcV9+nRMKI
 kSjPM/JFdGsiUGcEBvNvSNqJCpxQTytv+1JTPO4KhQ4hjiGDnuuaw51h7Ro3uRlp
 75Up2uWnh9RaVRCABJQnHVd6zizij0RFHJYwlYlIXkGVQ6vqmaGz1Y4GAeGD4Jw+
 iokVENz4uH9n5Zn3oruvufZk+uffZ++Sr4Vqtq3hVJ78ZWOV+iLXzHJSCmEnWSQL
 QptFP+MDSd9o0ej5bKLDP6kG4xIvMkBl9JY+Y2QH+Rev5Jroc26GmTcgwbRTkXDi
 hHQgilwmq4LkMyTGDaH2M7BlXoJlAhnWt7/2da9yr6ygLwHoD9LU2ALgGBKgb0r9
 E/YrM2ioEC8lkKUGgalX9JReXTExGBvNeaKixi+CoNKDXMauEbJUNkSOH6kfstRo
 5QCdn5g9l0Bf6qKBBmAnfty5mDtw9F3mowefxv2DFAPebXD+2I2FyIuafC5LedlE
 llsC77t2vBBKOAqL+WXypyYKTKAxMSk9NRO4FFkF9OFDdJIruofHXy0Nsi8aHLV7
 defzDrr9y1plYHqjMzJy8VfLvv+2YDrmkldBgcfxMRBWfetD3XIOGCmpBFmdOcgx
 FUqviNDc7Yr2LyDwMdIPfS8ZqmAdmB198/c7UrRdiZe/QyB7tMeeo1vzeCw3XF3n
 srEJ1bJLxA==
 =1VG9
 -----END PGP SIGNATURE-----

Merge tag 'block-6.16-20250704' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe fixes via Christoph:
     - fix incorrect cdw15 value in passthru error logging (Alok Tiwari)
     - fix memory leak of bio integrity in nvmet (Dmitry Bogdanov)
     - refresh visible attrs after being checked (Eugen Hristev)
     - fix suspicious RCU usage warning in the multipath code (Geliang Tang)
     - correctly account for namespace head reference counter (Nilay Shroff)

 - Fix for a regression introduced in ublk in this cycle, where it would
   attempt to queue a canceled request.

 - brd RCU sleeping fix, also introduced in this cycle. Bare bones fix,
   should be improved upon for the next release.

* tag 'block-6.16-20250704' of git://git.kernel.dk/linux:
  brd: fix sleeping function called from invalid context in brd_insert_page()
  ublk: don't queue request if the associated uring_cmd is canceled
  nvme-multipath: fix suspicious RCU usage warning
  nvme-pci: refresh visible attrs after being checked
  nvmet: fix memory leak of bio integrity
  nvme: correctly account for namespace head reference counter
  nvme: Fix incorrect cdw15 value in passthru error logging
2025-07-04 09:33:59 -07:00
Geliang Tang
d681107420 nvme-multipath: fix suspicious RCU usage warning
When I run the NVME over TCP test in virtme-ng, I get the following
"suspicious RCU usage" warning in nvme_mpath_add_sysfs_link():

'''
[    5.024557][   T44] nvmet: Created nvm controller 1 for subsystem nqn.2025-06.org.nvmexpress.mptcp for NQN nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77.
[    5.027401][  T183] nvme nvme0: creating 2 I/O queues.
[    5.029017][  T183] nvme nvme0: mapped 2/0/0 default/read/poll queues.
[    5.032587][  T183] nvme nvme0: new ctrl: NQN "nqn.2025-06.org.nvmexpress.mptcp", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77
[    5.042214][   T25]
[    5.042440][   T25] =============================
[    5.042579][   T25] WARNING: suspicious RCU usage
[    5.042705][   T25] 6.16.0-rc3+ #23 Not tainted
[    5.042812][   T25] -----------------------------
[    5.042934][   T25] drivers/nvme/host/multipath.c:1203 RCU-list traversed in non-reader section!!
[    5.043111][   T25]
[    5.043111][   T25] other info that might help us debug this:
[    5.043111][   T25]
[    5.043341][   T25]
[    5.043341][   T25] rcu_scheduler_active = 2, debug_locks = 1
[    5.043502][   T25] 3 locks held by kworker/u9:0/25:
[    5.043615][   T25]  #0: ffff888008730948 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x7ed/0x1350
[    5.043830][   T25]  #1: ffffc900001afd40 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0xcf3/0x1350
[    5.044084][   T25]  #2: ffff888013ee0020 (&head->srcu){.+.+}-{0:0}, at: nvme_mpath_add_sysfs_link.part.0+0xb4/0x3a0
[    5.044300][   T25]
[    5.044300][   T25] stack backtrace:
[    5.044439][   T25] CPU: 0 UID: 0 PID: 25 Comm: kworker/u9:0 Not tainted 6.16.0-rc3+ #23 PREEMPT(full)
[    5.044441][   T25] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    5.044442][   T25] Workqueue: async async_run_entry_fn
[    5.044445][   T25] Call Trace:
[    5.044446][   T25]  <TASK>
[    5.044449][   T25]  dump_stack_lvl+0x6f/0xb0
[    5.044453][   T25]  lockdep_rcu_suspicious.cold+0x4f/0xb1
[    5.044457][   T25]  nvme_mpath_add_sysfs_link.part.0+0x2fb/0x3a0
[    5.044459][   T25]  ? queue_work_on+0x90/0xf0
[    5.044461][   T25]  ? lockdep_hardirqs_on+0x78/0x110
[    5.044466][   T25]  nvme_mpath_set_live+0x1e9/0x4f0
[    5.044470][   T25]  nvme_mpath_add_disk+0x240/0x2f0
[    5.044472][   T25]  ? __pfx_nvme_mpath_add_disk+0x10/0x10
[    5.044475][   T25]  ? add_disk_fwnode+0x361/0x580
[    5.044480][   T25]  nvme_alloc_ns+0x81c/0x17c0
[    5.044483][   T25]  ? kasan_quarantine_put+0x104/0x240
[    5.044487][   T25]  ? __pfx_nvme_alloc_ns+0x10/0x10
[    5.044495][   T25]  ? __pfx_nvme_find_get_ns+0x10/0x10
[    5.044496][   T25]  ? rcu_read_lock_any_held+0x45/0xa0
[    5.044498][   T25]  ? validate_chain+0x232/0x4f0
[    5.044503][   T25]  nvme_scan_ns+0x4c8/0x810
[    5.044506][   T25]  ? __pfx_nvme_scan_ns+0x10/0x10
[    5.044508][   T25]  ? find_held_lock+0x2b/0x80
[    5.044512][   T25]  ? ktime_get+0x16d/0x220
[    5.044517][   T25]  ? kvm_clock_get_cycles+0x18/0x30
[    5.044520][   T25]  ? __pfx_nvme_scan_ns_async+0x10/0x10
[    5.044522][   T25]  async_run_entry_fn+0x97/0x560
[    5.044523][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044526][   T25]  process_one_work+0xd3c/0x1350
[    5.044532][   T25]  ? __pfx_process_one_work+0x10/0x10
[    5.044536][   T25]  ? assign_work+0x16c/0x240
[    5.044539][   T25]  worker_thread+0x4da/0xd50
[    5.044545][   T25]  ? __pfx_worker_thread+0x10/0x10
[    5.044546][   T25]  kthread+0x356/0x5c0
[    5.044548][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044549][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044552][   T25]  ? __lock_release.isra.0+0x5d/0x180
[    5.044553][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044555][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044557][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044559][   T25]  ret_from_fork+0x218/0x2e0
[    5.044561][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044562][   T25]  ret_from_fork_asm+0x1a/0x30
[    5.044570][   T25]  </TASK>
'''

This patch uses sleepable RCU version of helper list_for_each_entry_srcu()
instead of list_for_each_entry_rcu() to fix it.

Fixes: 4dbd2b2ebe ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-07-01 08:17:02 +02:00
Nilay Shroff
ba806c9003 nvme: correctly account for namespace head reference counter
The blktests nvme/058 manifests an issue where the NVMe subsystem
kobject entry remains stale in sysfs, causing a failure during
subsequent NVMe module reloads[1]. Specifically, when attempting to
register a new NVMe subsystem, the driver encounters a kobejct name
collision because a stale kobject still exists. Though, please note
that nvme/058 doesn't report any failure and test case passes and
it's only during subsequent NVMe module reloads, the stale nvme sub-
system kobject entry in sysfs causes the observed symptom[1].

This issue stems from an imbalance in the get/put usage of the namespace
head (nshead) reference counter. The nshead holds a reference to the
associated NVMe subsystem. If the nshead reference is not properly
released, it prevents the cleanup of the subsystem's kobject, leaving
nvme subsystem stale entry behind in sysfs.

During the failure case, the last namespace path referencing a nshead
is removed, but the nshead reference was not released. This occurs
because the release logic currently only puts the nshead reference
when its state is LIVE. However, in configurations where ANA (Asymmetric
Namespace Access) is enabled, a namespace may be associated with an ANA
state that is neither optimized nor non-optimized. In this case, the
nshead may never transition to LIVE, and the corresponding nshead
reference is then never dropped. In fact nvme/058 associates some of
nvme namespaces to an inaccessible ANA state and with that nshead is
created but it's state is not transitioned to LIVE. So the current
logic would then causes nshead reference to be leaked for non-LIVE
states.

Another scenario, during namespace allocation, the driver first
allocates a nshead and then issues an Identify Namespace command. If
this command fails — which can happen in tests like nvme/058 that
rapidly enables and disables namespaces — we must release the reference
to the newly allocated nshead. However this reference release is
currently missing in the failure, causing a nshead reference leak.

To fix this, we now unconditionally release the nshead reference when
the last nvme path referencing to the nshead is removed, regardless of
the head’s state. Also during identify namespace failure case we now
properly release the nshead reference. So this ensures proper cleanup
of the nshead, and consequently, the NVMe subsystem and its associated
kobject.

This change prevents stale kobject entries from lingering in sysfs and
eliminates the module reload failures observed just after running
nvme/058.

[1] https://lore.kernel.org/all/CAHj4cs8fOBS-eSjsd5LUBzy7faKXJtgLkCN+mDy_-ezCLLLq+Q@mail.gmail.com/

Reported-by: yi.zhang@redhat.com
Closes: https://lore.kernel.org/all/CAHj4cs8fOBS-eSjsd5LUBzy7faKXJtgLkCN+mDy_-ezCLLLq+Q@mail.gmail.com/
Fixes: 62188639ec ("nvme-multipath: introduce delayed removal of the multipath head node")
Tested-by: yi.zhang@redhat.com
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-06-30 08:31:49 +02:00
Linus Torvalds
e540341508 block-6.16-20250626
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmhd4zsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmZgEACOk81RNf8WGNQf4/parSENzebWNj9W+fKD
 RDhWxwBAquT2VzkF8Iu6wbteVbP9A8yq4BagbD079OWrr0iV8NgWA5y1GyqdER6N
 upe2ZtBlY7RR4F1FerpSGqRBbhWYejNojSr073ea8mmx5Yl0BbHz5aKKmzWGbUYO
 lveYPgCeL4dD7kfPeINiamhicLudyAGdqqYpG+/wriefwaVhTgCe+4aQ6pEwftRT
 utqCzrpUnxrmXS4TFXiWd4u3iVNwPhzcMyUrgkK1yTM7mWIqp8QyHzfF4Acbh/T3
 RN/8d5OCfYmamlRvDUCl3FXWukkdGtBrA4m51mhUIzRJ9Np9IiSHdd2UTDgGqSeG
 2NSjLtmdDQvtVXeuqBs56os7e3DFx42LZuceqbGWaTQ4VC4QE+Xz+n2ZENx/hWFZ
 /lixcIBdxt6iqjveJuBJeXW6UqaR+Hz4hpSigZU69DMQzrKm65bSoMdOvyn5b0bU
 GtlPusSnfgpsSe/H41Lm7SLBePiGXMJvhujzlkWW5cnUUl+yRUQhTO206kQJkbV1
 XUMs8Syow15gjQaXI9KiAq+MMUuUwOvXmptMyYQ1NjFy16yzhJ8QOhJilJLWfLdT
 SqsLyXn1kG2EdcPmXHJRthIgVmQ+uORy2JB1wAomyjJj9a16wJYhgCGDjrl4mocl
 9LpjfnyMsA==
 =ln4w
 -----END PGP SIGNATURE-----

Merge tag 'block-6.16-20250626' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - Fixes for ublk:
      - fix C++ narrowing warnings in the uapi header
      - update/improve UBLK_F_SUPPORT_ZERO_COPY comment in uapi header
      - fix for the ublk ->queue_rqs() implementation, limiting a batch
        to just the specific task AND ring
      - ublk_get_data() error handling fix
      - sanity check more arguments in ublk_ctrl_add_dev()
      - selftest addition

 - NVMe pull request via Christoph:
      - reset delayed remove_work after reconnect
      - fix atomic write size validation

 - Fix for a warning introduced in bdev_count_inflight_rw() in this
   merge window

* tag 'block-6.16-20250626' of git://git.kernel.dk/linux:
  block: fix false warning in bdev_count_inflight_rw()
  ublk: sanity check add_dev input for underflow
  nvme: fix atomic write size validation
  nvme: refactor the atomic write unit detection
  nvme: reset delayed remove_work after reconnect
  ublk: setup ublk_io correctly in case of ublk_get_data() failure
  ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI header
  ublk: fix narrowing warnings in UAPI header
  selftests: ublk: don't take same backing file for more than one ublk devices
  ublk: build batch from IOs in same io_ring_ctx and io task
2025-06-27 09:02:33 -07:00
Keith Busch
dd2c185489 nvme: reset delayed remove_work after reconnect
The remove_work will proceed with permanently disconnecting on the
initial final path failure if the head shows no paths after the delay.
If a new path connects while the remove_work is pending, and if that new
path happens to disconnect before that remove_work executes, the delayed
removal should reset based on the most recent path disconnect time, but
queue_delayed_work() won't do anything if the work is already pending.
Attempt to cancel the delayed work when a new path connects, and use
mod_delayed_work() in case the remove_work remains pending anyway.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-06-26 13:04:35 +02:00
Ingo Molnar
41cb08555c treewide, timers: Rename from_timer() to timer_container_of()
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-08 09:07:37 +02:00
Yi Zhang
44e479d720 nvme: spelling fixes
Fix various spelling errors in comments.

Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-06-04 10:23:28 +02:00
Linus Torvalds
6f59de9bc0 for-6.16/block-20250523
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnGYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpq9aD/4iqOts77xhWWLrOJWkkhOcV5rREeyppq8X
 MKYul9S4cc4Uin9Xou9a+nab31QBQEk3nsN3kX9o3yAXvkh6yUm36HD8qYNW/46q
 IUkwRQQJ0COyTnexMZQNTbZPQDIYcenXmQxOcrEJ5jC1Jcz0sOKHsgekL+ab3kCy
 fLnuz2ozvjGDMala/NmE8fN5qSlj4qQABHgbamwlwfo4aWu07cwfqn5G/FCYJgDO
 xUvsnTVclom2g4G+7eSSvGQI1QyAxl5QpviPnj/TEgfFBFnhbCSoBTEY6ecqhlfW
 6u59MF/Uw8E+weiuGY4L87kDtBhjQs3UMSLxCuwH7MxXb25ff7qB4AIkcFD0kKFH
 3V5NtwqlU7aQT0xOjGxaHhfPwjLD+FVss4ARmuHS09/Kn8egOW9yROPyetnuH84R
 Oz0Ctnt1IPLFjvGeg3+rt9fjjS9jWOXLITb9Q6nX9gnCt7orCwIYke8YCpmnJyhn
 i+fV4CWYIQBBRKxIT0E/GhJxZOmL0JKpomnbpP2dH8npemnsTCuvtfdrK9gfhH2X
 chBVqCPY8MNU5zKfzdEiavPqcm9392lMzOoOXW2pSC1eAKqnAQ86ZT3r7rLntqE8
 75LxHcvaQIsnpyG+YuJVHvoiJ83TbqZNpyHwNaQTYhDmdYpp2d/wTtTQywX4DuXb
 Y6NDJw5+kQ==
 =1PNK
 -----END PGP SIGNATURE-----

Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - ublk updates:
      - Add support for updating the size of a ublk instance
      - Zero-copy improvements
      - Auto-registering of buffers for zero-copy
      - Series simplifying and improving GET_DATA and request lookup
      - Series adding quiesce support
      - Lots of selftests additions
      - Various cleanups

 - NVMe updates via Christoph:
      - add per-node DMA pools and use them for PRP/SGL allocations
        (Caleb Sander Mateos, Keith Busch)
      - nvme-fcloop refcounting fixes (Daniel Wagner)
      - support delayed removal of the multipath node and optionally
        support the multipath node for private namespaces (Nilay Shroff)
      - support shared CQs in the PCI endpoint target code (Wilfred
        Mallawa)
      - support admin-queue only authentication (Hannes Reinecke)
      - use the crc32c library instead of the crypto API (Eric Biggers)
      - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes
        Reinecke, Leon Romanovsky, Gustavo A. R. Silva)

 - MD updates via Yu:
      - Fix that normal IO can be starved by sync IO, found by mkfs on
        newly created large raid5, with some clean up patches for bdev
        inflight counters

 - Clean up brd, getting rid of atomic kmaps and bvec poking

 - Add loop driver specifically for zoned IO testing

 - Eliminate blk-rq-qos calls with a static key, if not enabled

 - Improve hctx locking for when a plug has IO for multiple queues
   pending

 - Remove block layer bouncing support, which in turn means we can
   remove the per-node bounce stat as well

 - Improve blk-throttle support

 - Improve delay support for blk-throttle

 - Improve brd discard support

 - Unify IO scheduler switching. This should also fix a bunch of lockdep
   warnings we've been seeing, after enabling lockdep support for queue
   freezing/unfreezeing

 - Add support for block write streams via FDP (flexible data placement)
   on NVMe

 - Add a bunch of block helpers, facilitating the removal of a bunch of
   duplicated boilerplate code

 - Remove obsolete BLK_MQ pci and virtio Kconfig options

 - Add atomic/untorn write support to blktrace

 - Various little cleanups and fixes

* tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits)
  selftests: ublk: add test for UBLK_F_QUIESCE
  ublk: add feature UBLK_F_QUIESCE
  selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
  traceevent/block: Add REQ_ATOMIC flag to block trace events
  ublk: run auto buf unregisgering in same io_ring_ctx with registering
  io_uring: add helper io_uring_cmd_ctx_handle()
  ublk: remove io argument from ublk_auto_buf_reg_fallback()
  ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()
  selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK
  selftests: ublk: support UBLK_F_AUTO_BUF_REG
  ublk: support UBLK_AUTO_BUF_REG_FALLBACK
  ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
  ublk: prepare for supporting to register request buffer automatically
  ublk: convert to refcount_t
  selftests: ublk: make IO & device removal test more stressful
  nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
  nvme: introduce multipath_always_on module param
  nvme-multipath: introduce delayed removal of the multipath head node
  nvme-pci: derive and better document max segments limits
  nvme-pci: use struct_size for allocation struct nvme_dev
  ...
2025-05-26 11:39:36 -07:00
Nilay Shroff
9e221d8cf9 nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
In the NVMe context, the term "shutdown" has a specific technical
meaning. To avoid confusion, this commit renames the nvme_mpath_
shutdown_disk function to nvme_mpath_remove_disk to better reflect
its purpose (i.e. removing the disk from the system). However,
nvme_mpath_remove_disk was already in use, and its functionality
is related to releasing or putting the head node disk. To resolve
this naming conflict and improve clarity, the existing nvme_mpath_
remove_disk function is also renamed to nvme_mpath_put_disk.

This renaming improves code readability and better aligns function
names with their actual roles.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20 05:34:52 +02:00
Nilay Shroff
737af5f001 nvme: introduce multipath_always_on module param
Currently, a multipath head disk node is not created for single-
ported NVMe adapters or private namespaces with non-unique NSID.
However, creating a head node in these cases can help transparently
handle transient PCIe link failures. Without a head node, features
like delayed removal cannot be leveraged, making it difficult to
tolerate such link failures. To address this, this commit introduces
nvme_core module parameter multipath_always_on.

When multipath_always_on is set to true, it forces the creation of a
multipath head node regardless NVMe disk or namespace type. So this
option allows the use of delayed removal of head node functionality
even for single-ported NVMe disks and private namespaces with a unique
NSID and thus helps transparently handle transient PCIe link failures.

By default multipath_always_on is set to false, thus preserving the
existing behavior. Setting it to true enables improved fault tolerance
in PCIe setups. Moreover, please note that enabling this option would
also implicitly enable nvme_core.multipath.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20 05:34:46 +02:00
Nilay Shroff
62188639ec nvme-multipath: introduce delayed removal of the multipath head node
Currently, the multipath head node of an NVMe disk is removed
immediately as soon as all paths of the disk are removed. However,
this can cause issues in scenarios where:

- The disk hot-removal followed by re-addition.
- Transient PCIe link failures that trigger re-enumeration,
  temporarily removing and then restoring the disk.

In these cases, removing the head node prematurely may lead to a head
disk node name change upon re-addition, requiring applications to
reopen their handles if they were performing I/O during the failure.

To address this, introduce a delayed removal mechanism of head disk
node. During transient failure, instead of immediate removal of head
disk node, the system waits for a configurable timeout, allowing the
disk to recover.

During transient disk failure, if application sends any IO then we
queue it instead of failing such IO immediately. If the disk comes back
online within the timeout, the queued IOs are resubmitted to the disk
ensuring seamless operation. In case disk couldn't recover from the
failure then queued IOs are failed to its completion and application
receives the error.

So this way, if disk comes back online within the configured period,
the head node remains unchanged, ensuring uninterrupted workloads
without requiring applications to reopen device handles.

A new sysfs attribute, named "delayed_removal_secs" is added under head
disk blkdev for user who wish to configure time for the delayed removal
of head disk node. The default value of this attribute is set to zero
second ensuring no behavior change unless explicitly configured.

Link: https://lore.kernel.org/linux-nvme/Y9oGTKCFlOscbPc2@infradead.org/
Link: https://lore.kernel.org/linux-nvme/Y+1aKcQgbskA2tra@kbusch-mbp.dhcp.thefacebook.com/
Suggested-by: Keith Busch <kbusch@kernel.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
[nilay: reworked based on the original idea/POC from Christoph and Keith]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20 05:34:27 +02:00
Linus Torvalds
6462c247b2 block-6.15-20250515
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgmVlYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpg3FD/9KaCQn3VOI6NBMSgNP2FRjGhHbyza6kjN8
 Q1g0wVTEESt5jBDoNuahA1aTevhBT9qDNG3d7PPwg0OEsE/WW0/RBMe16W9VlDz5
 ph3S+nNLaBXQ6aZgH/qyH8eM+CEdoV2FhCUaikuQ25mft6BeonYGTEHVmdnfAg0i
 er7XnBl/lYNSzdzy1bEbGb0N7Lheasa386Z9oMevCtqgObG/XIFvc2otxxx1QmJ9
 iVcJJSnHf1Y9oqYGjebmqRoYnp9d3KEeAA1lMamlshVyX3DcZcXqMe0xEBZflGoG
 +nLUQ9Qjk38Azp+FOqQh/D4Bs/jXzThSsizzTwgBmHA5Pu5dCC0ZyPSrSX8Aahd6
 Yf4awgwFBH7vAvIbf8GMDloNxa6IAuZRzDig/fgqF2thSJJBghA7HNl92oz3BvoY
 KsnpdxPa8EtrTc0n5WNDj9/0m1msDfzlgLUBgSTM4N0fPlEkDYzLnBSi7d5jq8K4
 3lLqmZ/YhBeyzx37pTHLzE3rax/gZ6yLxyxliHrN/F9wSZRbc/PpMu1TdaSwElyk
 F5VyLMOMHA4PPpqkNdn1TgPd2GQ/uqDkNA0oO7wstBQ5rBuHN4RmItpQBq3lov+Z
 JyntB5th//2IFEVWAso6Ct67hqUAC8+JNmkTMpRzLaKgX4qr2ZBeGJIx445dhh+z
 C99L/zA3XA==
 =s90l
 -----END PGP SIGNATURE-----

Merge tag 'block-6.15-20250515' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
      - fixes for atomic writes (Alan Adamson)
      - fixes for polled CQs in nvmet-epf (Damien Le Moal)
      - fix for polled CQs in nvme-pci (Keith Busch)
      - fix compile on odd configs that need to be forced to inline
        (Kees Cook)
      - one more quirk (Ilya Guterman)

 - Fix for missing allocation of an integrity buffer for some cases

 - Fix for a regression with ublk command cancelation

* tag 'block-6.15-20250515' of git://git.kernel.dk/linux:
  ublk: fix dead loop when canceling io command
  nvme-pci: add NVME_QUIRK_NO_DEEPEST_PS quirk for SOLIDIGM P44 Pro
  nvme: all namespaces in a subsystem must adhere to a common atomic write size
  nvme: multipath: enable BLK_FEAT_ATOMIC_WRITES for multipathing
  nvmet: pci-epf: remove NVMET_PCI_EPF_Q_IS_SQ
  nvmet: pci-epf: improve debug message
  nvmet: pci-epf: cleanup nvmet_pci_epf_raise_irq()
  nvmet: pci-epf: do not fall back to using INTX if not supported
  nvmet: pci-epf: clear completion queue IRQ flag on delete
  nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable
  nvme-pci: make nvme_pci_npages_prp() __always_inline
  block: always allocate integrity buffer when required
2025-05-16 10:21:25 -07:00
Alan Adamson
a21675ee3b nvme: multipath: enable BLK_FEAT_ATOMIC_WRITES for multipathing
A change to QEMU resulted in all nvme controllers (single and
multi-controller subsystems) to have its CMIC.MCTRS bit set which
indicates the subsystem supports multiple controllers and it is possible
a namespace can be shared between those multiple controllers in a
multipath configuration.

When a namespace of a CMIC.MCTRS enabled subsystem is allocated, a
multipath node is created.  The queue limits for this node are inherited
from the namespace being allocated. When inheriting queue limits, the
features being inherited need to be specified. The atomic write feature
(BLK_FEAT_ATOMIC_WRITES) was not specified so the atomic queue limits
were not inherited by the multipath disk node which resulted in the sysfs
atomic write attributes being zeroed. The fix is to include
BLK_FEAT_ATOMIC_WRITES in the list of features to be inherited.

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-14 07:33:10 +02:00
Linus Torvalds
f7c2ca2584 block-6.15-20250417
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgBYN8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprhiD/9LnoTEqPdDGaE/f/1yUgYcJL8IUSMTfpyB
 JajQp5klWNHmIyD/GdzFq+6SL/XDAllO/NgVdQlI+78s5GRn7A/fy3unmB3kYhs/
 Spz9reD7/wH6lp2/u5jKD0Dk3Wz9LCGAUxQ2QYtd9lXJo/Dem3roBpty6/GYvLQy
 3kSwa3e4dekd9jBZ+lbnSaFcvQg3Xc/x+SoP3r60wrMIEOyJrHLHWhLMolC/ZkGw
 sl1nvj4dnRAK77G7KPctYIu6K7ryJwQLJhBre7t5Fd4Dzn46l/sNwOkBn7hhdaTR
 e3+F7C1D22zIHFrknkm1+9KkZA/9tIz1CUYRlYCxGPsH1XP4dy78uTk7sgGORV9C
 0gFJ3nqzSu0LP3Mk06e2DH+Oqq0wtdnggxmAXjJhah9JFrP7H9bEi4lTEsJ6XjLV
 PCL4PYGEkrJp7faD0p2icq6GKwx/EINlCob6Cx0h+lNo/Crz0FjkPNZkLTiYLahc
 S8Wlc6xMiMtRxdH3LX8ptAGot2s3uTQiNIKmkPkzIiSDoUoZbao1oknm8tpmXa1x
 Wg6bmOj5Jbd1K+Gyu24rIxW7RVkXtfB63o5ScRu+YGXhulsnV2mCPXZ2qxlW3s51
 zZcHUNQPAgmBdf/qzkNbk4fPS2G1rC6eJOLn84B4E5PWbP0xFjv6FdEwPF/ovdb8
 aIyR3vSjyA==
 =YCi8
 -----END PGP SIGNATURE-----

Merge tag 'block-6.15-20250417' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - MD pull via Yu:
      - fix raid10 missing discard IO accounting (Yu Kuai)
      - fix bitmap stats for bitmap file (Zheng Qixing)
      - fix oops while reading all member disks failed during
        check/repair (Meir Elisha)

 - NVMe pull via Christoph:
      - fix scan failure for non-ANA multipath controllers (Hannes
        Reinecke)
      - fix multipath sysfs links creation for some cases (Hannes
        Reinecke)
      - PCIe endpoint fixes (Damien Le Moal)
      - use NULL instead of 0 in the auth code (Damien Le Moal)

 - Various ublk fixes:
      - Slew of selftest additions
      - Improvements and fixes for IO cancelation
      - Tweak to Kconfig verbiage

 - Fix for page dirtying for blk integrity mapped pages

 - loop fixes:
      - buffered IO fix
      - uevent fixes
      - request priority inheritance fix

 - Various little fixes

* tag 'block-6.15-20250417' of git://git.kernel.dk/linux: (38 commits)
  selftests: ublk: add generic_06 for covering fault inject
  ublk: simplify aborting ublk request
  ublk: remove __ublk_quiesce_dev()
  ublk: improve detection and handling of ublk server exit
  ublk: move device reset into ublk_ch_release()
  ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io
  ublk: add ublk_force_abort_dev()
  ublk: properly serialize all FETCH_REQs
  selftests: ublk: move creating UBLK_TMP into _prep_test()
  selftests: ublk: add test_stress_05.sh
  selftests: ublk: support user recovery
  selftests: ublk: support target specific command line
  selftests: ublk: increase max nr_queues and queue depth
  selftests: ublk: set queue pthread's cpu affinity
  selftests: ublk: setup ring with IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN
  selftests: ublk: add two stress tests for zero copy feature
  selftests: ublk: run stress tests in parallel
  selftests: ublk: make sure _add_ublk_dev can return in sub-shell
  selftests: ublk: cleanup backfile automatically
  selftests: ublk: add io_uring uapi header
  ...
2025-04-18 09:21:14 -07:00
Hannes Reinecke
08937bcd4c nvme-multipath: sysfs links may not be created for devices
When rapidly rescanning for new namespaces nvme_mpath_add_sysfs_link() may be
called for a block device not added to sysfs. But NVME_NS_SYSFS_ATTR_LINK
had already been set, so when checking this device a second time we will fail
to create the link.

Fix this by exchanging the order of the block device check and the
NVME_NS_SYSFS_ATTR_LINK bit check.

Fixes: 4dbd2b2ebe ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>**
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-16 07:37:36 +02:00
Linus Torvalds
ff88562529 block-6.15-20250411
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmf5LuUQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgdVEAC329IvnA9OpPZ/yh3eEvl+XnQT3rlBaxgh
 l8I37ymk4mOMAQCssZILu0ybkuGVoKlH+876/SuuRVuIx/mLtZtvXguLBp+yYBeY
 2LHDLl0YkIcXlqFfWx0AKth4PEOBTK+E61G5RI2CHofkE77q7uKI+7F3axSKJNzm
 Z8HfBhUGnEQclOOb9XIG1ItRWAy6ZX9B63mDzqGWm1FsGXVQkRDjMTSBIFfgbsMu
 MzTL3pSepgUqfiAs/k+hm+nQ79DgFnXTDGd/m5BpCM7AC08etpnhfICRmC3Dupod
 wAcYAUsvo9bJJT6ock8D1uVegJLaX7engI25gliZUrTWOPftGJ4CzdMMiG3UbKXl
 Eu6kahhu0alY2rPIySmUTgaUegd82Eq6EBBgARclG4U/zMlD02bQeXzFaWtgE5mh
 E91KHk88DSzmuHGQMPRqHtQBgEVzu3wWQgGTUyOluQDC6BdP84sqSg2+4wquYcv9
 l8O8mq/kf2Ra3JJ6TviigelSyOsiztLPHDzyr4UgJRR5nA7Rtv6JSzYYzWh6v559
 18ubDYLHMerbOS1+cDDUcjdivGnDmuUKJ7rpUrB8PJ87eD/gPZxMBTfeQjmvX2l8
 WyNgDYO/xTqWmqMdwebl6/g1aq1441i9sCNcYtGE8aKHHHvuUCdbseponBKE1DnJ
 X97aOk59zw==
 =jkBM
 -----END PGP SIGNATURE-----

Merge tag 'block-6.15-20250411' of git://git.kernel.dk/linux

Pull more block fixes from Jens Axboe:
 "Apparently my internal clock was off, or perhaps it was just wishful
  thinking, but I sent out block fixes yesterday as my brain assumed it
  was Friday. Subsequently, that missed the NVMe fixes that should go
  into this weeks release as well. Hence, here's a followup with those,
  and another simple fix.

   - NVMe pull request via Christoph:
       - nvmet fc/fcloop refcounting fixes (Daniel Wagner)
       - fix missed namespace/ANA scans (Hannes Reinecke)
       - fix a use after free in the new TCP netns support (Kuniyuki
         Iwashima)
       - fix a NULL instead of false review in multipath (Uday Shankar)

   - Use strscpy() for null_blk disk name copy"

* tag 'block-6.15-20250411' of git://git.kernel.dk/linux:
  null_blk: Use strscpy() instead of strscpy_pad() in null_add_dev()
  nvmet-fc: put ref when assoc->del_work is already scheduled
  nvmet-fc: take tgtport reference only once
  nvmet-fc: update tgtport ref per assoc
  nvmet-fc: inline nvmet_fc_free_hostport
  nvmet-fc: inline nvmet_fc_delete_assoc
  nvmet-fcloop: add ref counting to lport
  nvmet-fcloop: replace kref with refcount
  nvmet-fcloop: swap list_add_tail arguments
  nvme-tcp: fix use-after-free of netns by kernel TCP socket.
  nvme: multipath: fix return value of nvme_available_path
  nvme: re-read ANA log page after ns scan completes
  nvme: requeue namespace scan on missed AENs
2025-04-11 08:29:35 -07:00
Uday Shankar
e3105f54a5 nvme: multipath: fix return value of nvme_available_path
The function returns bool so we should return false, not NULL. No
functional changes are expected.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-08 07:07:18 +02:00
Thomas Gleixner
8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Nilay Shroff
7cbafa3ff0 nvme-multipath: Add visibility for queue-depth io-policy
This patch helps add nvme native multipath visibility for queue-depth
io-policy. It adds a new attribute file named "queue_depth" under
namespace device path node which would print the number of active/
in-flight I/O requests currently queued for the given path.

For instance, if we have a shared namespace accessible from two different
controllers/paths then accessing head block node of the shared namespace
would show the following output:

$ ls -l /sys/block/nvme1n1/multipath/
nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1

In the above example, nvme1n1 is head gendisk node created for a shared
namespace and the namespace is accessible from nvme1c1n1 and nvme1c3n1
paths. For queue-depth io-policy we can then refer the "queue_depth"
attribute file created under each namespace path:

$ cat /sys/block/nvme1n1/multipath/nvme1c1n1/queue_depth
518

$cat /sys/block/nvme1n1/multipath/nvme1c3n1/queue_depth
504

>From the above output, we can infer that I/O workload targeted at nvme1n1
uses two paths nvme1c1n1 and nvme1c3n1 and the current queue depth of each
path is 518 and 504 respectively. Reading "queue_depth" file when
configured io-policy is anything but queue-depth would show no output.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-03-20 16:53:55 -07:00
Nilay Shroff
6546cc4a56 nvme-multipath: Add visibility for numa io-policy
This patch helps add nvme native multipath visibility for numa io-policy.
It adds a new attribute file named "numa_nodes" under namespace gendisk
device path node which prints the list of numa nodes preferred by the
given namespace path. The numa nodes value is comma delimited list of
nodes or A-B range of nodes.

For instance, if we have a shared namespace accessible from two different
controllers/paths then accessing head node of the shared namespace would
show the following output:

$ ls -l /sys/block/nvme1n1/multipath/
nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1

In the above example, nvme1n1 is head gendisk node created for a shared
namespace and this namespace is accessible from nvme1c1n1 and nvme1c3n1
paths. For numa io-policy we can then refer the "numa_nodes" attribute
file created under each namespace path:

$ cat /sys/block/nvme1n1/multipath/nvme1c1n1/numa_nodes
0-1

$ cat /sys/block/nvme1n1/multipath/nvme1c3n1/numa_nodes
2-3

>From the above output, we infer that I/O workload targeted at nvme1n1
and running on numa nodes 0 and 1 would prefer using path nvme1c1n1.
Similarly, I/O workload running on numa nodes 2 and 3 would prefer
using path nvme1c3n1. Reading "numa_nodes" file when configured
io-policy is anything but numa would show no output.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-03-20 16:53:54 -07:00
Nilay Shroff
4dbd2b2ebe nvme-multipath: Add visibility for round-robin io-policy
This patch helps add nvme native multipath visibility for round-robin
io-policy. It creates a "multipath" sysfs directory under head gendisk
device node directory and then from "multipath" directory it adds a link
to each namespace path device the head node refers.

For instance, if we have a shared namespace accessible from two different
controllers/paths then we create a soft link to each path device from head
disk node as shown below:

$ ls -l /sys/block/nvme1n1/multipath/
nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1

In the above example, nvme1n1 is head gendisk node created for a shared
namespace and the namespace is accessible from nvme1c1n1 and nvme1c3n1
paths.

For round-robin I/O policy, we could easily infer from the above output
that I/O workload targeted to nvme1n1 would toggle across paths nvme1c1n1
and nvme1c3n1.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-03-20 16:53:54 -07:00
Christoph Hellwig
1e1a9cecfa block: force noio scope in blk_mq_freeze_queue
When block drivers or the core block code perform allocations with a
frozen queue, this could try to recurse into the block device to
reclaim memory and deadlock.  Thus all allocations done by a process
that froze a queue need to be done without __GFP_IO and __GFP_FS.
Instead of tying to track all of them down, force a noio scope as
part of freezing the queue.

Note that nvme is a bit of a mess here due to the non-owner freezes,
and they will be addressed separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-31 07:20:08 -07:00
Breno Leitao
5dd18f09ce nvme/multipath: Fix RCU list traversal to use SRCU primitive
The code currently uses list_for_each_entry_rcu() while holding an SRCU
lock, triggering false positive warnings with CONFIG_PROVE_RCU=y
enabled:

	drivers/nvme/host/multipath.c:168 RCU-list traversed in non-reader section!!
	drivers/nvme/host/multipath.c:227 RCU-list traversed in non-reader section!!
	drivers/nvme/host/multipath.c:260 RCU-list traversed in non-reader section!!

While the list is properly protected by SRCU lock, the code uses the
wrong list traversal primitive. Replace list_for_each_entry_rcu() with
list_for_each_entry_srcu() to correctly indicate SRCU-based protection
and eliminate the false warning.

Signed-off-by: Breno Leitao <leitao@debian.org>
Fixes: be647e2c76 ("nvme: use srcu for iterating namespace list")
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-18 09:11:24 -08:00
Christoph Hellwig
0b4ace9da5 nvme-multipath: don't bother clearing max_hw_zone_append_sectors
The limits stacking now properly zeroes it if at least one of the
underlying limits clears it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241108154657.845768-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11 09:20:36 -07:00
Christoph Hellwig
559218d43e block: pre-calculate max_zone_append_sectors
max_zone_append_sectors differs from all other queue limits in that the
final value used is not stored in the queue_limits but needs to be
obtained using queue_limits_max_zone_append_sectors helper.  This not
only adds (tiny) extra overhead to the I/O path, but also can be easily
forgotten in file system code.

Add a new max_hw_zone_append_sectors value to queue_limits which is
set by the driver, and calculate max_zone_append_sectors from that and
the other inputs in blk_validate_zoned_limits, similar to how
max_sectors is calculated to fix this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241108154657.845768-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11 09:20:36 -07:00
Jens Axboe
ab9bc81c1c Revert "block: pre-calculate max_zone_append_sectors"
This causes issue on, at least, nvme-mpath where my boot fails with:

WARNING: CPU: 354 PID: 2729 at block/blk-settings.c:75 blk_validate_limits+0x356/0x380
Modules linked in: tg3(+) nvme usbcore scsi_mod ptp i2c_piix4 libphy nvme_core crc32c_intel scsi_common usb_common pps_core i2c_smbus
CPU: 354 UID: 0 PID: 2729 Comm: kworker/u2061:1 Not tainted 6.12.0-rc6+ #181
Hardware name: Dell Inc. PowerEdge R7625/06444F, BIOS 1.8.3 04/02/2024
Workqueue: async async_run_entry_fn
RIP: 0010:blk_validate_limits+0x356/0x380
Code: f6 47 01 04 75 28 83 bf 94 00 00 00 00 75 39 83 bf 98 00 00 00 00 75 34 83 7f 68 00 75 32 31 c0 83 7f 5c 00 0f 84 9b fd ff ff <0f> 0b eb 13 0f 0b eb 0f 48 c7 c0 74 12 58 92 48 89 c7 e8 13 76 46
RSP: 0018:ffffa8a1dfb93b30 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff9232829c8388 RCX: 0000000000000088
RDX: 0000000000000080 RSI: 0000000000000200 RDI: ffffa8a1dfb93c38
RBP: 000000000000000c R08: 00000000ffffffff R09: 000000000000ffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9232829b9000
R13: ffff9232829b9010 R14: ffffa8a1dfb93c38 R15: ffffa8a1dfb93c38
FS:  0000000000000000(0000) GS:ffff923867c80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c1b92480a8 CR3: 0000002484ff0002 CR4: 0000000000370ef0
Call Trace:
 <TASK>
 ? __warn+0xca/0x1a0
 ? blk_validate_limits+0x356/0x380
 ? report_bug+0x11a/0x1a0
 ? handle_bug+0x5e/0x90
 ? exc_invalid_op+0x16/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? blk_validate_limits+0x356/0x380
 blk_alloc_queue+0x7a/0x250
 __blk_alloc_disk+0x39/0x80
 nvme_mpath_alloc_disk+0x13d/0x1b0 [nvme_core]
 nvme_scan_ns+0xcc7/0x1010 [nvme_core]
 async_run_entry_fn+0x27/0x120
 process_scheduled_works+0x1a0/0x360
 worker_thread+0x2bc/0x350
 ? pr_cont_work+0x1b0/0x1b0
 kthread+0x111/0x120
 ? kthread_unuse_mm+0x90/0x90
 ret_from_fork+0x30/0x40
 ? kthread_unuse_mm+0x90/0x90
 ret_from_fork_asm+0x11/0x20
 </TASK>
---[ end trace 0000000000000000 ]---

presumably due to max_zone_append_sectors not being cleared to zero,
resulting in blk_validate_zoned_limits() complaining and failing.

This reverts commit 2a8f6153e1.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-07 05:45:34 -07:00
Christoph Hellwig
2a8f6153e1 block: pre-calculate max_zone_append_sectors
max_zone_append_sectors differs from all other queue limits in that the
final value used is not stored in the queue_limits but needs to be
obtained using queue_limits_max_zone_append_sectors helper.  This not
only adds (tiny) extra overhead to the I/O path, but also can be easily
forgotten in file system code.

Add a new max_hw_zone_append_sectors value to queue_limits which is
set by the driver, and calculate max_zone_append_sectors from that and
the other inputs in blk_validate_zoned_limits, similar to how
max_sectors is calculated to fix this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-04 10:34:07 -07:00
Keith Busch
1f021341ee nvme-multipath: defer partition scanning
We need to suppress the partition scan from occuring within the
controller's scan_work context. If a path error occurs here, the IO will
wait until a path becomes available or all paths are torn down, but that
action also occurs within scan_work, so it would deadlock. Defer the
partion scan to a different context that does not block scan_work.

Reported-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-10-15 08:32:07 -07:00
Tokunori Ikegami
9c7072df53 nvme: delete unnecessary fallthru comment
Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-10-08 13:50:44 -07:00
Hannes Reinecke
3b97f5a05c nvme-multipath: avoid hang on inaccessible namespaces
During repetitive namespace remapping operations on the target the
namespace might have changed between the time the initial scan
was performed, and partition scan was invoked by device_add_disk()
in nvme_mpath_set_live(). We then end up with a stuck scanning process:

[<0>] folio_wait_bit_common+0x12a/0x310
[<0>] filemap_read_folio+0x97/0xd0
[<0>] do_read_cache_folio+0x108/0x390
[<0>] read_part_sector+0x31/0xa0
[<0>] read_lba+0xc5/0x160
[<0>] efi_partition+0xd9/0x8f0
[<0>] bdev_disk_changed+0x23d/0x6d0
[<0>] blkdev_get_whole+0x78/0xc0
[<0>] bdev_open+0x2c6/0x3b0
[<0>] bdev_file_open_by_dev+0xcb/0x120
[<0>] disk_scan_partitions+0x5d/0x100
[<0>] device_add_disk+0x402/0x420
[<0>] nvme_mpath_set_live+0x4f/0x1f0 [nvme_core]
[<0>] nvme_mpath_add_disk+0x107/0x120 [nvme_core]
[<0>] nvme_alloc_ns+0xac6/0xe60 [nvme_core]
[<0>] nvme_scan_ns+0x2dd/0x3e0 [nvme_core]
[<0>] nvme_scan_work+0x1a3/0x490 [nvme_core]

This happens when we have several paths, some of which are inaccessible,
and the active paths are removed first. Then nvme_find_path() will requeue
I/O in the ns_head (as paths are present), but the requeue list is never
triggered as all remaining paths are inactive.

This patch checks for NVME_NSHEAD_DISK_LIVE in nvme_available_path(),
and requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared once
the last path has been removed to properly terminate pending I/O.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-24 23:30:28 -07:00
Hannes Reinecke
63bcf9014e nvme-multipath: system fails to create generic nvme device
NVME_NSHEAD_DISK_LIVE is a flag for struct nvme_ns_head, not nvme_ns.
The current code has a typo causing NVME_NSHEAD_DISK_LIVE never to
be cleared once device_add_disk_fails, causing the system never to
create the 'generic' character device. Even several rescan attempts
will change the situation and the system has to be rebooted to fix
the issue.

Fixes: 11384580e3 ("nvme-multipath: add error handling support for add_disk()")
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-24 23:30:28 -07:00
Christoph Hellwig
28982ad73d nvme: set BLK_FEAT_ZONED for ZNS multipath disks
The new stricter limits validation doesn't like a max_append_sectors value
to be set without BLK_FEAT_ZONED.  Set it before allocation the disk to
fix this instead of just inheriting it later.

Fixes: d690cb8ae1 ("block: add an API to atomically update queue limits")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-03 10:00:10 -07:00
Christoph Hellwig
18f03a063d nvme: implement ->get_unique_id
Implement the get_unique_id method to allow pNFS SCSI layout access to
NVMe namespaces.

This is the server side implementation of RFC 9561 "Using the Parallel
NFS (pNFS) SCSI Layout to Access Non-Volatile Memory Express (NVMe)
Storage Devices".

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-08 10:25:39 -07:00
Thomas Song
f227345f0a nvme-multipath: implement "queue-depth" iopolicy
The round-robin path selector is inefficient in cases where there is a
difference in latency between paths.  In the presence of one or more
high latency paths the round-robin selector continues to use the high
latency path equally. This results in a bias towards the highest latency
path and can cause a significant decrease in overall performance as IOs
pile on the highest latency path. This problem is acute with NVMe-oF
controllers.

The queue-depth path selector sends I/O down the path with the lowest
number of requests in its request queue. Paths with lower latency will
clear requests more quickly and have less requests queued compared to
higher latency paths. The goal of this path selector is to make more use
of lower latency paths which will bring down overall IO latency and
increase throughput and performance.

Signed-off-by: Thomas Song <tsong@purestorage.com>
[emilne: commandeered patch developed by Thomas Song @ Pure Storage]
Co-developed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Co-developed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Link: https://lore.kernel.org/linux-nvme/20240509202929.831680-1-jmeneghi@redhat.com/
Tested-by: Marco Patalano <mpatalan@redhat.com>
Tested-by: Jyoti Rani <jrani@purestorage.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Randy Jennings <randyj@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-02 06:47:19 -07:00
John Meneghini
3d7c2fd2ea nvme-multipath: prepare for "queue-depth" iopolicy
This patch prepares for the introduction of a new iopolicy by breaking up
the nvme_find_path() code path into sub-routines.

Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-06-26 09:45:57 -07:00
Weiwen Hu
d89a5c6705 nvme: fix status magic numbers
Replaced some magic numbers about SC and SCT with enum and macro.

Signed-off-by: Weiwen Hu <huweiwen@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-06-24 12:53:42 -07:00
Christoph Hellwig
8023e144f9 block: move the poll flag to queue_limits
Move the poll flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Stacking drivers are simplified in that they now can simply set the
flag, and blk_stack_limits will clear it when the features is not
supported by any of the underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
f76af42f8b block: move the nowait flag to queue_limits
Move the nowait flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Stacking drivers are simplified in that they now can simply set the
flag, and blk_stack_limits will clear it when the features is not
supported by any of the underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
1a02f3a73f block: move the stable_writes flag to queue_limits
Move the stable_writes flag into the queue_limits feature field so that
it can be set atomically with the queue frozen.

The flag is now inherited by blk_stack_limits, which greatly simplifies
the code in dm, and fixed md which previously did not pass on the flag
set on lower devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
cdb2497918 block: move the io_stat flag setting to queue_limits
Move the io_stat flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Simplify md and dm to set the flag unconditionally instead of avoiding
setting a simple flag for cases where it already is set by other means,
which is a bit pointless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
bd4a633b6f block: move the nonrot flag to queue_limits
Move the nonrot flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.

Use the chance to switch to defaulting to non-rotational and require
the driver to opt into rotational, which matches the polarity of the
sysfs interface.

For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new
rotational flag is not set as they clearly are not rotational despite
this being a behavior change.  There are some other drivers that
unconditionally set the rotational flag to keep the existing behavior
as they arguably can be used on rotational devices even if that is
probably not their main use today (e.g. virtio_blk and drbd).

The flag is automatically inherited in blk_stack_limits matching the
existing behavior in dm and md.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
1122c0c1cc block: move cache control settings out of queue->flags
Move the cache control settings into the queue_limits so that the flags
can be set atomically with the device queue frozen.

Add new features and flags field for the driver set flags, and internal
(usually sysfs-controlled) flags in the block layer.  Note that we'll
eventually remove enough field from queue_limits to bring it back to the
previous size.

The disable flag is inverted compared to the previous meaning, which
means it now survives a rescan, similar to the max_sectors and
max_discard_sectors user limits.

The FLUSH and FUA flags are now inherited by blk_stack_limits, which
simplified the code in dm a lot, but also causes a slight behavior
change in that dm-switch and dm-unstripe now advertise a write cache
despite setting num_flush_bios to 0.  The I/O path will handle this
gracefully, but as far as I can tell the lack of num_flush_bios
and thus flush support is a pre-existing data integrity bug in those
targets that really needs fixing, after which a non-zero num_flush_bios
should be required in dm for targets that map to underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19 07:58:28 -06:00
Christoph Hellwig
3c3e85ddff block: bypass the STABLE_WRITES flag for protection information
Currently registering a checksum-enabled (aka PI) integrity profile sets
the QUEUE_FLAG_STABLE_WRITE flag, and unregistering it clears the flag.
This can incorrectly clear the flag when the driver requires stable
writes even without PI, e.g. in case of iSCSI or NVMe/TCP with data
digest enabled.

Fix this by looking at the csum_type directly in bdev_stable_writes and
not setting the queue flag.  Also remove the blk_queue_stable_writes
helper as the only user in nvme wants to only look at the actual
QUEUE_FLAG_STABLE_WRITE flag as it inherits the integrity configuration
by other means.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240613084839.1044015-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14 10:20:06 -06:00
Keith Busch
be647e2c76 nvme: use srcu for iterating namespace list
The nvme pci driver synchronizes with all the namespace queues during a
reset to ensure that there's no pending timeout work.

Meanwhile the timeout work potentially iterates those same namespaces to
freeze their queues.

Each of those namespace iterations use the same read lock. If a write
lock should somehow get between the synchronize and freeze steps, then
forward progress is deadlocked.

We had been relying on the nvme controller state machine to ensure the
reset work wouldn't conflict with timeout work. That guarantee may be a
bit fragile to rely on, so iterate the namespace lists without taking
potentially circular locks, as reported by lockdep.

Link: https://lore.kernel.org/all/20220930001943.zdbvolc3gkekfmcv@shindev/
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-28 09:43:32 -07:00
Keith Busch
a2e4c5f5f6 nvme-multipath: fix io accounting on failover
There are io stats accounting that needs to be handled, so don't call
blk_mq_end_request() directly. Use the existing nvme_end_req() helper
that already handles everything.

Fixes: d4d957b53d ("nvme-multipath: support io stats on the mpath device")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-23 13:44:42 -07:00
Nilay Shroff
d3a043733f nvme-multipath: find NUMA path only for online numa-node
In current native multipath design when a shared namespace is created,
we loop through each possible numa-node, calculate the NUMA distance of
that node from each nvme controller and then cache the optimal IO path
for future reference while sending IO. The issue with this design is that
we may refer to the NUMA distance table for an offline node which may not
be populated at the time and so we may inadvertently end up finding and
caching a non-optimal path for IO. Then latter when the corresponding
numa-node becomes online and hence the NUMA distance table entry for that
node is created, ideally we should re-calculate the multipath node distance
for the newly added node however that doesn't happen unless we rescan/reset
the controller. So essentially, we may keep using non-optimal IO path for a
node which is made online after namespace is created.
This patch helps fix this issue ensuring that when a shared namespace is
created, we calculate the multipath node distance for each online numa-node
instead of each possible numa-node. Then latter when a node becomes online
and we receive any IO on that newly added node, we would calculate the
multipath node distance for newly added node but this time NUMA distance
table would have been already populated for newly added node. Hence we
would be able to correctly calculate the multipath node distance and choose
the optimal path for the IO.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-21 06:43:08 -07:00
Nilay Shroff
863fe60ed2 nvme: find numa distance only if controller has valid numa id
On system where native nvme multipath is configured and iopolicy
is set to numa but the nvme controller numa node id is undefined
or -1 (NUMA_NO_NODE) then avoid calculating node distance for
finding optimal io path. In such case we may access numa distance
table with invalid index and that may potentially refer to incorrect
memory. So this patch ensures that if the nvme controller numa node
id is -1 then instead of calculating node distance for finding optimal
io path, we set the numa node distance of such controller to default 10
(LOCAL_DISTANCE).

Link: https://lore.kernel.org/all/20240413090614.678353-1-nilay@linux.ibm.com/
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-01 02:58:42 -07:00
Christoph Hellwig
c5be5df721 nvme-multipath: pass queue_limits to blk_alloc_disk
The multipath disk starts out with the stacking default limits.
The one interesting part here is that blk_set_stacking_limits
sets the max_zone_append_sectorts to UINT_MAX, which fails the
validation for non-zoned devices.  With the old one call per
limit scheme this was fine because no one verified this weird
mismatch and it was fixed by blk_stack_limits a little later
before I/O could be issued.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04 08:24:56 -08:00
Christoph Hellwig
74fa8f9c55 block: pass a queue_limits argument to blk_alloc_disk
Pass a queue_limits to blk_alloc_disk and apply it if non-NULL.  This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.

Also change blk_alloc_disk to return an ERR_PTR instead of just NULL
which can't distinguish errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-19 16:58:23 -07:00
Keith Busch
6d3c7fb17b nvme: use ctrl state accessor
The ctrl->state value is updated in another thread using WRITE_ONCE, so
ensure all the readers use the appropriate accessor.

Reviewed-by: Sagi Grimberg <sagi@grmberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-01-29 07:02:50 -08:00