The error path for the SCSI check condition of not ready, target in ALUA
state transition, will result in the failure of that path after the retries
are exhausted. In most cases that is well ahead of the transition timeout
established in the SCSI ALUA device handler.
Instead, reprep the command and re-add it to the queue after a 1 second
delay. This will allow the handler to take care of the timeout and only
fail the path if the target has exceeded the transition expiry timeout
(default 60 seconds). If the expiry timeout is exceeded, the handler will
change the path state from transitioning to standby leading to a path
failure eliminating the potential of this re-prep to continue endlessly. In
most cases the target will exit the transitioning state well before the
expiry timeout but after the retries are exhausted as mentioned.
Additionally remove the scsi_io_completion_reprep() function which provides
little value.
Link: https://lore.kernel.org/r/20220729214110.58576-1-brian@purestorage.com
Reviewed-by: Martin Wilck <mwilck@suse.com>
Acked-by: Krishna Kant <krishna.kant@purestorage.com>
Acked-by: Seamus Connor <sconnor@purestorage.com>
Signed-off-by: Brian Bunker <brian@purestorage.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This partially reverts commit d2b292c3f6 ("scsi: qla2xxx: Enable ATIO
interrupt handshake for ISP27XX")
For some workloads where the host sends a batch of commands and then
pauses, ATIO interrupt coalesce can cause some incoming ATIO entries to be
ignored for extended periods of time, resulting in slow performance,
timeouts, and aborted commands.
Disable interrupt coalesce and re-enable the dedicated ATIO MSI-X
interrupt.
Link: https://lore.kernel.org/r/97dcf365-89ff-014d-a3e5-1404c6af511c@cybernetics.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The variable cmd_type is assigned a value but it is never read. The
variable and the assignment are redundant and can be removed.
Cleans up clang scan build warning:
drivers/scsi/megaraid/megaraid_sas_fusion.c:3228:10: warning: Although
the value stored to 'cmd_type' is used in the enclosing expression, the
value is never actually read from 'cmd_type' [deadcode.DeadStores]
Link: https://lore.kernel.org/r/20220730124509.148457-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The variable bm_int_st is assigned a value but it is never read. The
variable and the assignment are redundant and can be removed.
Cleans up clang scan build warning:
drivers/scsi/FlashPoint.c:1726:7: warning: Although the value stored
to 'bm_int_st' is used in the enclosing expression, the value is never
actually read from 'bm_int_st' [deadcode.DeadStores]
Link: https://lore.kernel.org/r/20220730123736.147758-1-colin.i.king@gmail.com
Acked-by: Khalid Aziz <khalid@gonehiking.org>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Case (1):
The only waiter on wka_port->completion_wq is zfcp_fc_wka_port_get()
trying to open a WKA port. As such it should only be woken up by WKA port
*open* responses, not by WKA port close responses.
Case (2):
A close WKA port response coming in just after having sent a new open WKA
port request and before blocking for the open response with wait_event()
in zfcp_fc_wka_port_get() erroneously renders the wait_event a NOP
because the close handler overwrites wka_port->status. Hence the
wait_event condition is erroneously true and it does not enter blocking
state.
With non-negligible probability, the following time space sequence happens
depending on timing without this fix:
user process ERP thread zfcp work queue tasklet system work queue
============ ========== =============== ======= =================
$ echo 1 > online
zfcp_ccw_set_online
zfcp_ccw_activate
zfcp_erp_adapter_reopen
msleep scan backoff zfcp_erp_strategy
| ...
| zfcp_erp_action_cleanup
| ...
| queue delayed scan_work
| queue ns_up_work
| ns_up_work:
| zfcp_fc_wka_port_get
| open wka request
| open response
| GSPN FC-GS
| RSPN FC-GS [NPIV-only]
| zfcp_fc_wka_port_put
| (--wka->refcount==0)
| sched delayed wka->work
|
~~~Case (1)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
zfcp_erp_wait
flush scan_work
| wka->work:
| wka->status=CLOSING
| close wka request
| scan_work:
| zfcp_fc_wka_port_get
| (wka->status==CLOSING)
| wka->status=OPENING
| open wka request
| wait_event
| | close response
| | wka->status=OFFLINE
| | wake_up /*WRONG*/
~~~Case (2)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| wka->work:
| wka->status=CLOSING
| close wka request
zfcp_erp_wait
flush scan_work
| scan_work:
| zfcp_fc_wka_port_get
| (wka->status==CLOSING)
| wka->status=OPENING
| open wka request
| close response
| wka->status=OFFLINE
| wake_up /*WRONG&NOP*/
| wait_event /*NOP*/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| (wka->status!=ONLINE)
| return -EIO
| return early
open response
wka->status=ONLINE
wake_up /*NOP*/
So we erroneously end up with no automatic port scan. This is a big problem
when it happens during boot. The timing is influenced by v3.19 commit
18f87a67e6 ("zfcp: auto port scan resiliency").
Fix it by fully mutually excluding zfcp_fc_wka_port_get() and
zfcp_fc_wka_port_offline(). For that to work, we make the latter block
until we got the response for a close WKA port. In order not to penalize
the system workqueue, we move wka_port->work to our own adapter workqueue.
Note that before v2.6.30 commit 828bc1212a ("[SCSI] zfcp: Set WKA-port to
offline on adapter deactivation"), zfcp did block in
zfcp_fc_wka_port_offline() as well, but with a different condition.
While at it, make non-functional cleanups to improve code reading in
zfcp_fc_wka_port_get(). If we cannot send the WKA port open request, don't
rely on the subsequent wait_event condition to immediately let this case
pass without blocking. Also don't want to rely on the additional condition
handling the refcount to be skipped just to finally return with -EIO.
Link: https://lore.kernel.org/r/20220729162529.1620730-1-maier@linux.ibm.com
Fixes: 5ab944f97e ("[SCSI] zfcp: attach and release SAN nameserver port on demand")
Cc: <stable@vger.kernel.org> #v2.6.28+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are two .exit_cmd_priv implementations. Both implementations use
resources associated with the SCSI host. Make sure that these resources are
still available when .exit_cmd_priv is called by moving the .exit_cmd_priv
calls from scsi_host_dev_release() to scsi_forget_host(). Moving
blk_mq_free_tag_set() from scsi_host_dev_release() to scsi_remove_host() is
safe because scsi_forget_host() waits until all SCSI devices associated
with the host have been removed.
Fix the following use-after-free:
==================================================================
BUG: KASAN: use-after-free in srp_exit_cmd_priv+0x27/0xd0 [ib_srp]
Read of size 8 at addr ffff888100337000 by task multipathd/16727
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x5e/0x5db
kasan_report+0xab/0x120
srp_exit_cmd_priv+0x27/0xd0 [ib_srp]
scsi_mq_exit_request+0x4d/0x70
blk_mq_free_rqs+0x143/0x410
__blk_mq_free_map_and_rqs+0x6e/0x100
blk_mq_free_tag_set+0x2b/0x160
scsi_host_dev_release+0xf3/0x1a0
device_release+0x54/0xe0
kobject_put+0xa5/0x120
device_release+0x54/0xe0
kobject_put+0xa5/0x120
scsi_device_dev_release_usercontext+0x4c1/0x4e0
execute_in_process_context+0x23/0x90
device_release+0x54/0xe0
kobject_put+0xa5/0x120
scsi_disk_release+0x3f/0x50
device_release+0x54/0xe0
kobject_put+0xa5/0x120
disk_release+0x17f/0x1b0
device_release+0x54/0xe0
kobject_put+0xa5/0x120
dm_put_table_device+0xa3/0x160 [dm_mod]
dm_put_device+0xd0/0x140 [dm_mod]
free_priority_group+0xd8/0x110 [dm_multipath]
free_multipath+0x94/0xe0 [dm_multipath]
dm_table_destroy+0xa2/0x1e0 [dm_mod]
__dm_destroy+0x196/0x350 [dm_mod]
dev_remove+0x10c/0x160 [dm_mod]
ctl_ioctl+0x2c2/0x590 [dm_mod]
dm_ctl_ioctl+0x5/0x10 [dm_mod]
__x64_sys_ioctl+0xb4/0xf0
dm_ctl_ioctl+0x5/0x10 [dm_mod]
__x64_sys_ioctl+0xb4/0xf0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Link: https://lore.kernel.org/r/20220728221851.1822295-5-bvanassche@acm.org
Fixes: 65ca846a53 ("scsi: core: Introduce {init,exit}_cmd_priv()")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Reported-by: Li Zhijian <lizhijian@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Under huge load there is a possibility of race condition in updating
se_dev_entry object in ACL removal procedure:
NIP [c0080000154093d0] transport_lookup_cmd_lun+0x1f8/0x3d0 [target_core_mod]
LR [c00800001542ab34] target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
Call Trace:
target_submit_cmd_map_sgls+0x11c/0x300 [target_core_mod]
target_submit_cmd+0x44/0x60 [target_core_mod]
tcm_qla2xxx_handle_cmd+0x88/0xe0 [tcm_qla2xxx]
qlt_do_work+0x2e4/0x3d0 [qla2xxx]
process_one_work+0x298/0x5c
Despite usage of RCU primitives with deve->se_lun pointer, it has not
become dereference-safe because deve->se_lun is updated and not
synchronized with a reader. That change might be in a release function
called by synchronize_rcu(). But, in fact, there is no point in setting
that pointer to NULL for deleting deve. All access to deve->se_lun is
already under rcu_read_lock. And either deve->se_lun is always valid or
deve is not valid itself and will not be found in the list_for_*. The same
applicable for deve->se_lun_acl too. So a better solution is to remove
that NULLing.
Link: https://lore.kernel.org/r/20220727214125.19647-2-d.bogdanov@yadro.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After ufshcd_wl_shutdown() set device power off and link off,
ufshcd_shutdown() could turn off clock/power. Also remove
pm_runtime_get_sync.
The reason why it is safe to remove pm_runtime_get_sync() is because:
- ufshcd_wl_shutdown() -> pm_runtime_get_sync() will resume hba->dev too.
- device resume(turn on clk/power) is not required, even if device is in
RPM_SUSPENDED.
Link: https://lore.kernel.org/r/20220727030526.31022-1-peter.wang@mediatek.com
Fixes: b294ff3e34 ("scsi: ufs: core: Enable power management for wlun")
Cc: <stable@vger.kernel.org> # 5.15.x
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Measurements for one particular UFS controller + UFS device show a 25%
higher read bandwidth if the maximum data buffer size is increased from 512
KiB to 1 MiB. Hence increase the maximum size of the data buffer associated
with a single request from SCSI_DEFAULT_MAX_SECTORS (1024) * 512 bytes =
512 KiB to 1 MiB.
Notes:
- The maximum data buffer size supported by the UFSHCI specification
is 65535 * 256 KiB or about 16 GiB.
- The maximum data buffer size for READ(10) commands is 65535 logical
blocks. To transfer more than 65535 * 4096 bytes = 255 MiB with a single
SCSI command, the READ(16) command is required. Support for READ(16) is
optional in the UFS 3.1 and UFS 4.0 standards.
Link: https://lore.kernel.org/r/20220726225232.1362251-1-bvanassche@acm.org
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Bean Huo <beanhuo@micron.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Tested-by: Avri Altman <avri.altman@wdc.com>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch removes XDWRITEREAD support because it never fully worked when
it was added in the initial LIO merge and it's been fully broken since 2013
from commit a289008749 ("target: Add compare_and_write_post() completion
callback fall through").
The two issues above are:
1. XDWRITEREAD support was just never completed when LIO was merged. We
never did the DISABLE WRITE check and so we never write data out.
2. Since the commit above, we never complete the command. After we do the
XOR, we return from xdreadwrite_callback and that's it. We never send a
response for the command, so the command will always time out and fail.
Since this has been fully broken for almost nine years this patch just
removes emulated support.
Link: https://lore.kernel.org/r/20220726235339.14551-1-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
RFC7143 states that Initiator decides what type of authentication to
use:
The initiator MUST continue with:
CHAP_N=<N> CHAP_R=<R>
or, if it requires target authentication, with:
CHAP_N=<N> CHAP_R=<R> CHAP_I=<I> CHAP_C=<C>
Allow one way authentication if mutual authentication is configured. That
passes some tests from Windows HLK for Mutual CHAP with iSNS.
Link: https://lore.kernel.org/r/20220718152555.17084-5-d.bogdanov@yadro.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Variable length SCSI commands are transferred over iSCSI via two CDB
buffers - in Basic Header Segment and in Additional Header Segment (AHS).
Since AHS is not supported yet, a target reads just BHS (48 byte) from TCP
and treats the remaining octets as a next new iSCSI PDU that causes
protocol errors.
Add support for the Extended CDB AHS type.
Link: https://lore.kernel.org/r/20220718152555.17084-2-d.bogdanov@yadro.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When building with Clang we encounter these warnings:
| drivers/target/iscsi/iscsi_target_login.c:719:24: error: format
| specifies type 'unsigned short' but the argument has type 'int'
| [-Werror,-Wformat] " from node: %s\n", atomic_read(&sess->nconn),
-
| drivers/target/iscsi/iscsi_target_login.c:767:12: error: format
| specifies type 'unsigned short' but the argument has type 'int'
| [-Werror,-Wformat] " %s\n", atomic_read(&sess->nconn),
-
| drivers/target/iscsi/iscsi_target.c:4365:12: error: format specifies
| type 'unsigned short' but the argument has type 'int' [-Werror,-Wformat]
| " %s\n", atomic_read(&sess->nconn)
For all warnings, the format specifier is '%hu' which describes an unsigned
short. The resulting type of atomic_read is an int. The proposed fix is to
listen to Clang and swap the format specifier.
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Link: https://lore.kernel.org/r/20220718180421.49697-1-justinstitt@google.com
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
UFS storage devices require bRefClkFreq attribute to be set to operate
correctly at high speed mode. The necessary value is determined by what the
SoC / board supports. The standard doesn't specify a method to query the
value, so the information needs to be fed in separately.
DT information feeds into setting up the clock framework, so platforms
using DT can get the UFS reference clock frequency from the clock
framework. A special node "ref_clk" from the clock array for the UFS
controller node is used as the source for the information.
On the platforms that do not use DT (e.g. Intel), the alternative mechanism
to feed the intended reference clock frequency is necessary. Specifying the
necessary information in DSD of the UFS controller ACPI node is an
alternative mechanism proposed in this patch. Those can be accessed via
firmware property facility in the kernel and in many ways simillar to
querying properties defined in DT.
This patch introduces a small helper function to query a predetermined ACPI
supplied property of the UFS controller, and uses it to attempt retrieving
reference clock value, unless that was already done by the clock
infrastructure.
Link: https://lore.kernel.org/r/20220715210230.1.I365d113d275117dee8fd055ce4fc7e6aebd0bce9@changeid
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Daniil Lunev <dlunev@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is some clean up necessary before returning. Smatch complains:
drivers/scsi/mpi3mr/mpi3mr_fw.c:4786 mpi3mr_soft_reset_handler()
warn: inconsistent returns '&mrioc->reset_mutex'.
Locked on : 4730
Unlocked on: 4786
Link: https://lore.kernel.org/r/YtVCEsxMU8buuMjP@kili
Fixes: f10af05732 ("scsi: mpi3mr: Resource Based Metering")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update driver to track cumulative pending large data size at the controller
level and at the throttle group level. When one of the values meet or
exceed the controller's firmware-determined high threshold value, then the
driver will divert future selective I/O to the firmware. Once both
controller level and at the throttle group level cumulative pending large
data size reach controller's firmware determined low threshold value, then
the driver will stop diverting I/Os to the firmware.
Link: https://lore.kernel.org/r/20220708195020.8323-2-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a SCSI device is removed while in active use, currently sg will
immediately return -ENODEV on any attempt to wait for active commands that
were sent before the removal. This is problematic for commands that use
SG_FLAG_DIRECT_IO since the data buffer may still be in use by the kernel
when userspace frees or reuses it after getting ENODEV, leading to
corrupted userspace memory (in the case of READ-type commands) or corrupted
data being sent to the device (in the case of WRITE-type commands). This
has been seen in practice when logging out of a iscsi_tcp session, where
the iSCSI driver may still be processing commands after the device has been
marked for removal.
Change the policy to allow userspace to wait for active sg commands even
when the device is being removed. Return -ENODEV only when there are no
more responses to read.
Link: https://lore.kernel.org/r/5ebea46f-fe83-2d0b-233d-d0dcb362dd0a@cybernetics.com
Cc: <stable@vger.kernel.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On some platforms, the current logic of relying on finding new packet
solely based on signature pattern can lead to driver reading stale
packets. Though this is a bug in those platforms, reduce such exposures by
limiting reading packets until the IN pointer.
Two module parameters are introduced:
ql2xrspq_follow_inptr:
When set, on newer adapters that has queue pointer shadowing, look for
response packets only until response queue in pointer.
When reset, response packets are read based on a signature pattern
logic (old way).
ql2xrspq_follow_inptr_legacy:
Like ql2xrspq_follow_inptr, but for those adapters where there is no
queue pointer shadowing.
Link: https://lore.kernel.org/r/20220713052045.10683-5-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>