linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-22 07:27:12 +08:00

Author	SHA1	Message	Date
Yu Kuai	2c04718edc	blk-mq: add documentation for new queue attribute async_dpeth Explain the attribute and the default value in different case. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 07:45:37 -07:00
Yu Kuai	2110858c51	block, bfq: convert to use request_queue->async_depth The default limits is unchanged, and user can configure async_depth now. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 07:45:36 -07:00
Yu Kuai	988bb1b9ed	mq-deadline: covert to use request_queue->async_depth In downstream kernel, we test with mq-deadline with many fio workloads, and we found a performance regression after commit `39823b47bb` ("block/mq-deadline: Fix the tag reservation code") with following test: [global] rw=randread direct=1 ramp_time=1 ioengine=libaio iodepth=1024 numjobs=24 bs=1024k group_reporting=1 runtime=60 [job1] filename=/dev/sda Root cause is that mq-deadline now support configuring async_depth, although the default value is nr_request, however the minimal value is 1, hence min_shallow_depth is set to 1, causing wake_batch to be 1. For consequence, sbitmap_queue will be waken up after each IO instead of 8 IO. In this test case, sda is HDD and max_sectors is 128k, hence each submitted 1M io will be splited into 8 sequential 128k requests, however due to there are 24 jobs and total tags are exhausted, the 8 requests are unlikely to be dispatched sequentially, and changing wake_batch to 1 will make this much worse, accounting blktrace D stage, the percentage of sequential io is decreased from 8% to 0.8%. Fix this problem by converting to request_queue->async_depth, where min_shallow_depth is set each time async_depth is updated. Noted elevator attribute async_depth is now removed, queue attribute with the same name is used instead. Fixes: `39823b47bb` ("block/mq-deadline: Fix the tag reservation code") Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 07:45:36 -07:00
Yu Kuai	8cbe62f4d8	kyber: covert to use request_queue->async_depth Instead of the internal async_depth, remove kqd->async_depth and related helpers. Noted elevator attribute async_depth is now removed, queue attribute with the same name is used instead. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 07:45:36 -07:00
Yu Kuai	f98afe4f31	blk-mq: add a new queue sysfs attribute async_depth Add a new field async_depth to request_queue and related APIs, this is currently not used, following patches will convert elevators to use this instead of internal async_depth. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 07:45:36 -07:00
Yu Kuai	cf02d7d41b	blk-mq: factor out a helper blk_mq_limit_depth() There are no functional changes, just make code cleaner. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 07:45:36 -07:00
Yu Kuai	1db61b0afd	blk-mq-sched: unify elevators checking for async requests bfq and mq-deadline consider sync writes as async requests and only reserve tags for sync reads by async_depth, however, kyber doesn't consider sync writes as async requests for now. Consider the case there are lots of dirty pages, and user use fsync to flush dirty pages. In this case sched_tags can be exhausted by sync writes and sync reads can stuck waiting for tag. Hence let kyber follow what mq-deadline and bfq did, and unify async requests checking for all elevators. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 07:45:36 -07:00
Yu Kuai	9fc7900b14	block: convert nr_requests to unsigned int This value represents the number of requests for elevator tags, or drivers tags if elevator is none. The max value for elevator tags is 2048, and in drivers at most 16 bits is used for tag. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 07:45:36 -07:00
Johannes Thumshirn	ee4784a83f	block: don't use strcpy to copy blockdev name 0-day bot flagged the use of strcpy() in blk_trace_setup(), because the source buffer can theoretically be bigger than the destination buffer. While none of the current callers pass a string bigger than BLKTRACE_BDEV_SIZE, use strscpy() to prevent eventual future misuse and silence the checker warnings. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202602020718.GUEIRyG9-lkp@intel.com/ Fixes: `113cbd6282` ("blktrace: pass blk_user_trace2 to setup functions") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 07:15:31 -07:00
Yu Kuai	65d466b629	blk-mq-debugfs: warn about possible deadlock Creating new debugfs entries can trigger fs reclaim, hence we can't do this with queue frozen, meanwhile, other locks that can be held while queue is frozen should not be held as well. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 07:05:19 -07:00
Yu Kuai	9d20fd6ce1	blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs() In blk_mq_update_nr_hw_queues(), debugfs_mutex is not held while creating debugfs entries for hctxs. Hence add debugfs_mutex there, it's safe because queue is not frozen. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 07:05:19 -07:00
Yu Kuai	5ae4b12ee6	blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos() Because this helper is only used by iocost and iolatency, while they don't have debugfs entries. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 07:05:19 -07:00
Yu Kuai	70bafa5e31	blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static Because it's only used inside blk-mq-debugfs.c now. Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 07:05:19 -07:00
Yu Kuai	3c17a346ff	blk-rq-qos: fix possible debugfs_mutex deadlock Currently rq-qos debugfs entries are created from rq_qos_add(), while rq_qos_add() can be called while queue is still frozen. This can deadlock because creating new entries can trigger fs reclaim. Fix this problem by delaying creating rq-qos debugfs entries after queue is unfrozen. - For wbt, 1) it can be initialized by default, fix it by calling new helper after wbt_init() from wbt_init_enable_default(); 2) it can be initialized by sysfs, fix it by calling new helper after queue is unfrozen from wbt_set_lat(). - For iocost and iolatency, they can only be initialized by blkcg configuration, however, they don't have debugfs entries for now, hence they are not handled yet. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 07:05:19 -07:00
Yu Kuai	3f0bea9f3b	blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos There is already a helper blk_mq_debugfs_register_rqos() to register one rqos, however this helper is called synchronously when the rqos is created with queue frozen. Prepare to fix possible deadlock to create blk-mq debugfs entries while queue is still frozen. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 07:05:19 -07:00
Yu Kuai	41afaeeda5	blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter If wbt is disabled by default and user configures wbt by sysfs, queue will be frozen first and then pcpu_alloc_mutex will be held in blk_stat_alloc_callback(). Fix this problem by allocating memory first before queue frozen. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 07:05:19 -07:00
Yu Kuai	2751b90051	blk-wbt: factor out a helper wbt_set_lat() To move implementation details inside blk-wbt.c, prepare to fix possible deadlock to call wbt_init() while queue is frozen in the next patch. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 07:05:19 -07:00
Ondrej Kozina	06564bae93	sed-opal: ignore locking ranges array when not enabling SUM. The locking ranges count and the array items are always ignored unless Single User Mode (SUM) is requested in the activate method. It is useless to enforce limits of unused array in the non-SUM case. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 07:04:43 -07:00
Jens Axboe	229f412574	Merge tag 'md-7.0-20260202' of git://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into for-7.0/block Pull MD fixes from Yu Kuai: "Bug Fixes: - Fix return value of mddev_trylock (Xiao Ni) - Fix memory leak in raid1_run() (Zilin Guan) Maintainers: - Add Li Nan as mdraid reviewer (Li Nan)" * tag 'md-7.0-20260202' of git://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux: MAINTAINERS: Add Li Nan as md/raid reviewer md: fix return value of mddev_trylock md/raid1: fix memory leak in raid1_run()	2026-02-02 07:03:23 -07:00
Li Nan	b36844f7d1	MAINTAINERS: Add Li Nan as md/raid reviewer I've long contributed to and reviewed the md/raid subsystem. I've fixed many bugs and done code refactors, with dozens of patches merged. I now volunteer to work as a reviewer for this subsystem. Link: https://lore.kernel.org/linux-raid/20260202083203.3017096-1-linan666@huaweicloud.com Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com>	2026-02-02 20:31:40 +08:00
Xiao Ni	05c8de4f09	md: fix return value of mddev_trylock A return value of 0 is treaded as successful lock acquisition. In fact, a return value of 1 means getting the lock successfully. Link: https://lore.kernel.org/linux-raid/20260127073951.17248-1-xni@redhat.com Fixes: `9e59d60976` ("md: call del_gendisk in control path") Reported-by: Bart Van Assche <bvanassche@acm.org> Closes: https://lore.kernel.org/linux-raid/20250611073108.25463-1-xni@redhat.com/T/#mfa369ef5faa4aa58e13e6d9fdb88aecd862b8f2f Signed-off-by: Xiao Ni <xni@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Li Nan <linan122@huawei.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com>	2026-02-02 15:39:55 +08:00
Zilin Guan	6abc7d5dcf	md/raid1: fix memory leak in raid1_run() raid1_run() calls setup_conf() which registers a thread via md_register_thread(). If raid1_set_limits() fails, the previously registered thread is not unregistered, resulting in a memory leak of the md_thread structure and the thread resource itself. Add md_unregister_thread() to the error path to properly cleanup the thread, which aligns with the error handling logic of other paths in this function. Compile tested only. Issue found using a prototype static analysis tool and code review. Link: https://lore.kernel.org/linux-raid/20260126071533.606263-1-zilin@seu.edu.cn Fixes: `97894f7d3c` ("md/raid1: use the atomic queue limit update APIs") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Li Nan <linan122@huawei.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com>	2026-02-02 15:35:03 +08:00
Ming Lei	5314d25afb	selftests: ublk: improve I/O ordering test with bpftrace Remove test_generic_01.sh since block layer may reorder I/O, making the test prone to false positives. Apply the improvements to test_generic_02.sh instead, which supposes for covering ublk dispatch io order. Rework test_generic_02 to verify that ublk dispatch doesn't reorder I/O by comparing request start order with completion order using bpftrace. The bpftrace script now: - Tracks each request's start sequence number in a map keyed by sector - On completion, verifies the request's start order matches expected completion order - Reports any out-of-order completions detected The test script: - Wait bpftrace BEGIN code block is run - Pins fio to CPU 0 for deterministic behavior - Uses block_io_start and block_rq_complete tracepoints - Checks bpftrace output for reordering errors Reported-and-tested-by: Alexander Atanasov <alex@zazolabs.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 14:56:28 -07:00
Ming Lei	d9a36ab302	selftests: ublk: reorganize tests into integrity and recover groups Move integrity-focused tests into new 'integrity' group: - test_null_04.sh -> test_integrity_01.sh - test_loop_08.sh -> test_integrity_02.sh Move recovery-focused tests into new 'recover' group: - test_generic_04.sh -> test_recover_01.sh - test_generic_05.sh -> test_recover_02.sh - test_generic_11.sh -> test_recover_03.sh - test_generic_14.sh -> test_recover_04.sh Update Makefile to reflect the reorganization. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 14:56:28 -07:00
Ming Lei	56a08b87f9	selftests: ublk: increase timeouts for parallel test execution When running tests in parallel with high JOBS count (e.g., JOBS=64), the existing timeouts can be insufficient due to system load: - Increase state wait loops from 20/50 to 100 iterations in _recover_ublk_dev(), __ublk_quiesce_dev(), and __ublk_kill_daemon() to handle slower state transitions under heavy load - Add --timeout=20 to udevadm settle calls to prevent indefinite hangs when udev event queue is overwhelmed by rapid device creation/deletion Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 14:56:28 -07:00
Ming Lei	64406dd2f6	selftests: ublk: add _ublk_sleep helper for parallel execution Add _ublk_sleep() helper function that uses different sleep times depending on whether tests run in parallel or sequential mode. Usage: _ublk_sleep <normal_secs> <parallel_secs> Export JOBS variable from Makefile so test scripts can detect parallel execution, and use _ublk_sleep in test_part_02.sh to handle the partition scan delay (1s normal, 5s parallel). Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 14:56:28 -07:00
Ming Lei	b6bbc3bec1	selftests: ublk: add group-based test targets Add convenient Makefile targets for running specific test groups: - run_generic, run_batch, run_null, run_loop, run_stripe, run_stress, etc. - run_all for running all tests Test groups are auto-detected from TEST_PROGS using pattern matching (test_<group>_<num>.sh -> group), and targets are generated dynamically using define/eval templates. Supports parallel execution via JOBS variable: - JOBS=1 (default): sequential with kselftest TAP output - JOBS>1: parallel execution with xargs -P Usage examples: make run_null # Sequential execution make run_stress JOBS=4 # Parallel with 4 jobs make run_all JOBS=8 # Run all tests with 8 parallel jobs With JOBS=8, running time of `make run_all` is reduced to 2m2s from 6m5s in my test VM. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 14:56:28 -07:00
Ming Lei	2021e6109d	selftests: ublk: track created devices for per-test cleanup Track device IDs in UBLK_DEVS array when created. Update _cleanup_test() to only delete devices created by this test instead of using 'del -a' which removes all devices. This prepares for running tests concurrently where each test should only clean up its own devices. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 14:56:28 -07:00
Ming Lei	92734a4f3a	selftests: ublk: add _ublk_del_dev helper function Add _ublk_del_dev() to delete a specific ublk device by ID and use it in all test scripts instead of calling UBLK_PROG directly. Also remove unused _remove_ublk_devices() function. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 14:56:28 -07:00
Ming Lei	842b6520e5	selftests: ublk: refactor test_loop_08 into separate functions Encapsulate each test case in its own function for better organization and maintainability: - _setup_device(): device and backfile initialization - _test_fill_and_verify(): initial data population - _test_corrupted_reftag(): reftag corruption detection test - _test_corrupted_data(): data corruption detection test - _test_bad_apptag(): apptag mismatch detection test Also fix temp file creation to use ${UBLK_TEST_DIR}/fio_err_XXXXX instead of creating in current directory. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 14:56:28 -07:00
Ming Lei	5af302a15a	selftests: ublk: simplify UBLK_TEST_DIR handling Remove intermediate TDIR variable and set UBLK_TEST_DIR directly in _prep_test(). Remove default initialization since the directory is created dynamically when tests run. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 14:56:28 -07:00
Caleb Sander Mateos	491af20b3c	ublk: remove "can't touch 'ublk_io' any more" comments The struct ublk_io is in fact accessed in __ublk_complete_rq() after the comment. But it's not racy to access the ublk_io between clearing its UBLK_IO_FLAG_OWNED_BY_SRV flag and completing the request, as no other thread can use the ublk_io in the meantime. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:38:43 -07:00
Alexander Atanasov	2feca79ef8	selftests: ublk: move test temp files into a sub directory Create and use a temporary directory for the files created during test runs. If TMPDIR environment variable is set use it as a base for the temporary directory path. TMPDIR=/mnt/scratch make run_tests and TMPDIR=/mnt/scratch ./test_generic_01.sh will place test directory under /mnt/scratch Signed-off-by: Alexander Atanasov <alex@zazolabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Alexander Atanasov	4e0d293af9	selftests: ublk: mark each test start and end time in dmesg Log test start and end time in dmesg, so generated log messages during the test run can be linked to specific test from the test suite. (switch to `date +%F %T`) Signed-off-by: Alexander Atanasov <alex@zazolabs.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Ming Lei	76334de7da	selftests: ublk: disable partition scan for integrity tests The null target doesn't handle IO, so disable partition scan to avoid IO failures caused by integrity verification during the kernel's partition table read. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Ming Lei	130975353b	selftests: ublk: refactor test_null_04 into separate functions Encapsulate each test case in its own function that creates the device, runs checks, and deletes only that device. This avoids calling _cleanup_test multiple times. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Ming Lei	7a30d3dfea	selftests: ublk: rename test_generic_15 to test_part_02 This test exercises partition scanning behavior, so move it to the test_part_* group for consistency. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Ming Lei	e07a2039b6	selftests: ublk: add selftest for UBLK_F_NO_AUTO_PART_SCAN Add test_part_01.sh to test the UBLK_F_NO_AUTO_PART_SCAN feature flag which allows suppressing automatic partition scanning during device startup while still allowing manual partition probing. The test verifies: - Normal behavior: partitions are auto-detected without the flag - With flag: partitions are not auto-detected during START_DEV - Manual scan: blockdev --rereadpt works with the flag Also update kublk tool to support --no_auto_part_scan option and recognize the feature flag. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Ming Lei	3a4d8bed0b	selftests: ublk: derive TID automatically from script name Add automatic TID derivation in test_common.sh based on the script filename. The TID is extracted by stripping the "test_" prefix and ".sh" suffix from the script name (e.g., test_loop_01.sh -> loop_01). This removes the need for each test script to manually define TID, reducing boilerplate and preventing potential mismatches between the script name and TID. Scripts can still override TID after sourcing test_common.sh if needed. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Ming Lei	8443e2087e	ublk: add UBLK_F_NO_AUTO_PART_SCAN feature flag Add a new feature flag UBLK_F_NO_AUTO_PART_SCAN to allow users to suppress automatic partition scanning when starting a ublk device. This is useful for some cases in which user don't want to scan partitions. Users still can manually trigger partition scanning later when appropriate using standard tools (e.g., partprobe, blockdev --rereadpt). Reported-by: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB63280C5637917C071C2F0D65A9A8A@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Ming Lei	66d3af8d5d	ublk: check list membership before cancelling batch fetch command Add !list_empty(&fcmd->node) check in ublk_batch_cancel_cmd() to ensure the fcmd hasn't already been removed from the list. Once an fcmd is removed from the list, it's considered claimed by whoever removed it and will be freed by that path. Meantime switch to list_del_init() for deleting it from list. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Caleb Sander Mateos	373df2c025	ublk: drop ublk_ctrl_start_recovery() header argument ublk_ctrl_start_recovery() only uses its const struct ublksrv_ctrl_cmd * header argument to log the dev_id. But this value is already available in struct ublk_device's ub_number field. So log ub_number instead and drop the unused header argument. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:11 -07:00
Caleb Sander Mateos	ed9f54cc1e	ublk: use READ_ONCE() to read struct ublksrv_ctrl_cmd struct ublksrv_ctrl_cmd is part of the io_uring_sqe, which may lie in userspace-mapped memory. It's racy to access its fields with normal loads, as userspace may write to them concurrently. Use READ_ONCE() to copy the ublksrv_ctrl_cmd from the io_uring_sqe to the stack. Use the local copy in place of the one in the io_uring_sqe. Fixes: `87213b0d84` ("ublk: allow non-blocking ctrl cmds in IO_URING_F_NONBLOCK issue") Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:11 -07:00
Govindarajulu Varadarajan	da7e4b75e5	ublk: Validate SQE128 flag before accessing the cmd ublk_ctrl_cmd_dump() accesses (header *)sqe->cmd before IO_URING_F_SQE128 flag check. This could cause out of boundary memory access. Move the SQE128 flag check earlier in ublk_ctrl_uring_cmd() to return -EINVAL immediately if the flag is not set. Fixes: `71f28f3136` ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Govindarajulu Varadarajan <govind.varadar@gmail.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:11 -07:00
Damien Le Moal	da562d92e6	block: introduce bdev_rot() Introduce the helper function bdev_rot() to test if a block device is a rotational one. The existing function bdev_nonrot() which tests for the opposite condition is redefined using this new helper. This avoids the double negation (operator and name) that appears when testing if a block device is a rotational device, thus making the code a little easier to read. Call sites of bdev_nonrot() in the block layer are updated to use this new helper. Remaining users in other subsystems are left unchanged for now. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-30 08:11:09 -07:00
Caleb Sander Mateos	ad5f2e2908	ublk: restore auto buf unregister refcount optimization Commit `1ceeedb597` ("ublk: optimize UBLK_IO_UNREGISTER_IO_BUF on daemon task") optimized ublk request buffer unregistration to use a non-atomic reference count decrement when performed on the ublk_io's daemon task. The optimization applied to auto buffer unregistration, which happens as part of handling UBLK_IO_COMMIT_AND_FETCH_REQ on the daemon task. However, commit `b749965edd` ("ublk: remove ublk_commit_and_fetch()") reordered the ublk_sub_req_ref() for the completed request before the io_buffer_unregister_bvec() call. As a result, task_registered_buffers is already 0 when io_buffer_unregister_bvec() calls ublk_io_release() and the non-atomic refcount optimization doesn't apply. Move the io_buffer_unregister_bvec() call back to before ublk_need_complete_req() to restore the reference counting optimization. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: `b749965edd` ("ublk: remove ublk_commit_and_fetch()") Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-29 19:53:54 -07:00
Damien Le Moal	2719bd1ee1	block: introduce blk_queue_rot() To check if a request queue is for a rotational device, a double negation is needed with the pattern "!blk_queue_nonrot(q)". Simplify this with the introduction of the helper blk_queue_rot() which tests if a requests queue limit has the BLK_FEAT_ROTATIONAL feature set. All call sites of blk_queue_nonrot() are modified to use blk_queue_rot() and blk_queue_nonrot() definition removed. No functional changes. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-29 13:15:50 -07:00
Damien Le Moal	068f5b5ef5	block: cleanup queue limit features definition Unwrap the definition of BLK_FEAT_ATOMIC_WRITES and renumber this feature to be sequential with BLK_FEAT_SKIP_TAGSET_QUIESCE. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-29 13:15:50 -07:00
Ming Lei	0921abdcbd	ublk: document IO reference counting design Add comprehensive documentation for ublk's split reference counting model (io->ref + io->task_registered_buffers) above ublk_init_req_ref() given this model isn't very straightforward. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-29 05:47:21 -07:00
Thorsten Blum	f46ebb9109	block: Replace snprintf with strscpy in check_partition Replace snprintf("%s", ...) with the faster and more direct strscpy(). Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-28 05:28:13 -07:00

1 2 3 4 5 ...

1412712 Commits