2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

954 Commits

Author SHA1 Message Date
Logan Gunthorpe
df6b0e205d md/raid5: Move stripe_request_ctx up
Move stripe_request_ctx up. No functional changes intended.

This will be necessary in the next patch to release the batch_last
in the context before sleeping.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 21:08:53 -06:00
Logan Gunthorpe
9734fe7bd5 md/raid5: Drop unnecessary call to r5c_check_stripe_cache_usage()
Now that raid5_get_active_stripe() has been refactored it is appearant
that r5c_check_stripe_cache_usage() doesn't need to be called in
the wait_for_stripe branch.

r5c_check_stripe_cache_usage() will only conditionally call
r5l_wake_reclaim(), but that function is called two lines later.

Drop the call for cleanup.

Reported-by: Martin Oliveira <martin.oliveira@eideticom.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 21:08:53 -06:00
Logan Gunthorpe
3514da58be md/raid5: Make is_inactive_blocked() helper
The logic to wait_for_stripe is difficult to parse being on so many
lines and with confusing operator precedence. Move it to a helper
function to make it easier to read.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 21:08:53 -06:00
Logan Gunthorpe
5165ed40a1 md/raid5: Refactor raid5_get_active_stripe()
Refactor the raid5_get_active_stripe() to read more linearly in
the order it's typically executed.

The init_stripe() call is called if a free stripe is found and the
function is exited early which removes a lot of if (sh) checks and
unindents the following code.

Remove the while loop in favour of the 'goto retry' pattern, which
reduces indentation further. And use a 'goto wait_for_stripe' instead
of an additional indent seeing it is the unusual path and this makes
the code easier to read.

No functional changes intended. Will make subsequent changes
in patches easier to understand.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 21:08:53 -06:00
Jackie Liu
a20d636bee raid5: fix duplicate checks for rdev->saved_raid_disk
'first' will always be greater than or equal to 0, it is unnecessary to
repeat the 0 check, clean it up.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:22:46 -06:00
Logan Gunthorpe
ee1aa06ba3 md/raid5: Convert prepare_to_wait() to wait_woken() api
raid5_get_active_stripe() can sleep in various situations and it
is called by make_stripe_request() while inside the
prepare_to_wait()/finish_wait() section. Nested waits like this are
not supported.

This was noticed while making other changes that add different sleeps
to raid5_get_active_stripe() that caused a WARNING with
CONFIG_DEBUG_ATOMIC_SLEEP.

No ill effects have been noticed with the code as is, but theoretically
a nested and here could cause a dead lock so it should be fixed.

To fix this, convert the prepare_to_wait() call to use wake_woken()
which supports nested sleeps.

Link: https://lwn.net/Articles/628628/
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:22:43 -06:00
Logan Gunthorpe
b9f91d80de md/raid5: Fix sectors_to_do bitmap overflow in raid5_make_request()
For unaligned IO that have nearly maximum sectors, the number of stripes
will end up being one greater than the size of the bitmap. When this
happens, the last stripe in the IO will not be processed as it should
be, resulting in data corruption.

However, this is not normally seen when the backing block devices have
4K physical block sizes since the block layer will split the request
before that happens.

To fix this increase the bitmap size by one bit and ensure the full
number of stripes are checked when calling find_first_bit().

Reported-by: David Sloan <David.Sloan@eideticom.com>
Fixes: 7e55c60acf ("md/raid5: Pivot raid5_make_request()")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:22:41 -06:00
Logan Gunthorpe
9ad1a74ff0 md/raid5: Increase restriction on max segments per request
The block layer defaults the maximum segments to 128, which means
requests tend to get split around the 512KB depending on how many
pages can be merged. There's no such restriction in the raid5 code
so increase the limit to USHRT_MAX so that larger requests can be
sent as one.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:43 -06:00
Logan Gunthorpe
df1b620a3e md/raid5: Improve debug prints
Add a debug print for raid5_make_request() so that each request is
printed and add the logical sector number to the debug print in
__add_stripe_bio().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:42 -06:00
Logan Gunthorpe
7e55c60acf md/raid5: Pivot raid5_make_request()
raid5_make_request() loops through every page in the request,
finds the appropriate stripe and adds the bio for that page in the
disk.

This causes a great deal of contention on the hash_lock and extra
work seeing each stripe must be found once for every data disk.

The number of times a stripe must be found can be reduced by pivoting
raid5_make_request() so that it loops through every stripe and then
loops through every disk in that stripe to see if the bio must be
added. This reduces the number of times the hash lock must be taken
by a factor equal to the number of data disks.

To accomplish this, the logical sectors that have already been added
must be tracked. Tracking them is done with a bitmap: the bits
for all pages are set at the start of the request and each bit
is cleared once the bio is added to a stripe.

Finding the next sector to be done is then just a call to
find_first_bit() so that sectors that have been done can simply be
skipped.

One minor downside is that the maximum sectors for a request must be
limited so that the bitmap can be appropriately sized on the stack.
This limit is arbitrarily chosen to be 256 stripe pages which works out
to 1MB if PAGE_SIZE == DEFAULT_STRIPE_SIZE. This doesn't actually
restrict the maximum request further seeing the default block queue
settings are used which restricts the number of segments to 128 (which
results in request sizes that are approximately 512KB).

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:42 -06:00
Logan Gunthorpe
486f605586 md/raid5: Check all disks in a stripe_head for reshape progress
When testing if a previous stripe has had reshape expand past it, use
the earliest or latest logical sector in all the disks for that stripe
head. This will allow adding multiple disks at a time in a subesquent
patch.

To do this cleaner, refactor the check into a helper function called
stripe_ahead_of_reshape().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:42 -06:00
Logan Gunthorpe
4ad1d9849f md/raid5: Refactor add_stripe_bio()
Factor out two helper functions from add_stripe_bio(): one to check for
overlap (stripe_bio_overlaps()), and one to actually add the bio to the
stripe (__add_stripe_bio()). The latter function will always succeed.

This will be useful in the next patch so that overlap can be checked for
multiple disks before adding any

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:42 -06:00
Logan Gunthorpe
3312e6c887 md/raid5: Keep a reference to last stripe_head for batch
When batching, every stripe head has to find the previous stripe head to
add to the batch list. This involves taking the hash lock which is
highly contended during IO.

Instead of finding the previous stripe_head each time, store a
reference to the previous stripe_head in a pointer so that it doesn't
require taking the contended lock another time.

The reference to the previous stripe must be released before scheduling
and waiting for work to get done. Otherwise, it can hold up
raid5_activate_delayed() and deadlock.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:42 -06:00
Logan Gunthorpe
0a2d1694de md/raid5: Refactor for loop in raid5_make_request() into while loop
The for loop with retry label can be more cleanly expressed as a while
loop by moving the logical_sector increment into the success path.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:42 -06:00
Logan Gunthorpe
4f35456076 md/raid5: Move read_seqcount_begin() into make_stripe_request()
Now that prepare_to_wait() isn't in the way, move read_sequcount_begin()
into make_stripe_request().

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:42 -06:00
Logan Gunthorpe
1cdb5b4170 md/raid5: Drop the do_prepare flag in raid5_make_request()
prepare_to_wait() can be reasonably called after schedule instead of
setting a flag and preparing in the next loop iteration.

This means that prepare_to_wait() will be called before
read_seqcount_begin(), but there shouldn't be any reason that the order
matters here. On the first iteration of the loop prepare_to_wait() is
already called first.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:42 -06:00
Logan Gunthorpe
f4aec6a097 md/raid5: Factor out helper from raid5_make_request() loop
Factor out the inner loop of raid5_make_request() into it's own helper
called make_stripe_request().

The helper returns a number of statuses: SUCCESS, RETRY,
SCHEDULE_AND_RETRY and FAIL. This makes the code a bit easier to
understand and allows the SCHEDULE_AND_RETRY path to be made common.

A context structure is added to contain do_flush. It will be used
more in subsequent patches for state that needs to be kept
outside the loop.

No functional changes intended. This will be cleaned up further in
subsequent patches to untangle the gen_lock and do_prepare logic
further.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:42 -06:00
Logan Gunthorpe
1baa1126e0 md/raid5: Move common stripe get code into new find_get_stripe() helper
Both uses of find_stripe() require a fairly complicated dance to
increment the reference count. Move this into a common find_get_stripe()
helper.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:40 -06:00
Logan Gunthorpe
8757fef675 md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio()
stripe_add_to_batch_list() is better done in the loop in make_request
instead of inside add_stripe_bio(). This is clearer and allows for
storing the batch_head state outside the loop in a subsequent patch.

The call to add_stripe_bio() in retry_aligned_read() is for read
and batching only applies to write. So it's impossible for batching
to happen at that call site.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:40 -06:00
Logan Gunthorpe
27fb701046 md/raid5: Refactor raid5_make_request loop
Break immediately if raid5_get_active_stripe() returns NULL and deindent
the rest of the loop. Annotate this check with an unlikely().

This makes the code easier to read and reduces the indentation level.

No functional changes intended.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:40 -06:00
Logan Gunthorpe
a8bb304ca5 md/raid5: Factor out ahead_of_reshape() function
There are a few uses of an ugly ternary operator in raid5_make_request()
to check if a sector is a head of a reshape sector.

Factor this out into a simple helper called ahead_of_reshape().

No functional changes intended.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:40 -06:00
Logan Gunthorpe
6e3f50d30a md/raid5: Make logic blocking check consistent with logic that blocks
The check in raid5_make_request differs very slightly from the logic
that causes it to block lower down. This likely does not cause a bug
as the check is fuzzy anyway (as reshape may move on between the first
check and the subsequent check). However, make it consistent so it can
be cleaned up in a subsequent patch.

The condition which causes the schedule is:

 !(mddev->reshape_backwards ? logical_sector < conf->reshape_progress :
   logical_sector >= conf->reshape_progress) &&
  (mddev->reshape_backwards ? logical_sector < conf->reshape_safe :
   logical_sector >= conf->reshape_safe)

The condition that causes the early bailout is made to match this.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:40 -06:00
Logan Gunthorpe
c629f345b4 md/raid5: suspend the array for calls to log_exit()
The raid5-cache code relies on there being no IO in flight when
log_exit() is called. There are two places where this is not
guaranteed so add mddev_suspend() and mddev_resume() calls to these
sites.

The site in raid5_change_consistency_policy() is in the error path,
and another similar call site already has suspend/resume calls just
below it; so it should be equally safe to make that change here.

There is one remaining site in raid5_remove_disk() that we call log_exit()
without suspending the array. Unfortunately, as the comment stated, we
cannot call mddev_suspend from raid5d.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:31 -06:00
Linus Torvalds
c013d0af81 for-5.20/block-2022-07-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLko3gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmQaD/90NKFj4v8I456TUQyg1jimXEsL+e84E6o2
 ALWVb6JzQvlPVQXNLnK5YKIunMWOTtTMz0nyB8sVRwVJVJO0P5d7QopAkZM8fkyU
 MK5OCzoryENw4DTc2wJS4in6cSbGylIuN74wMzlf7+M67JTImfoZQhbTMcjwzZfn
 b3OlL6sID7zMXwGcuOJPZyUJICCpDhzdSF9JXqKma5PQuG2SBmQyvFxJAcsoFBPc
 YetnoRIOIN6yBvsIZaPaYq7XI9MIvF0e67EQtyCEHj4tHpyVnyDWkeObVFULsISU
 gGEKbkYPvNUzRAU5Q1NBBHh1tTfkf/MaUxTuZwoEwZ/s04IGBGMmrZGyfvdfzYo6
 M7NwSEg/TrUSNfTwn65mQi7uOXu1pGkJrqz84Flm8u9Qid9Vd7LExLG5p/ggnWdH
 5th93MDEmtEg29e9DXpEAuS5d0t3TtSvosflaKpyfNNfr+P0rWCN6GM/uW62VUTK
 ls69SQh/AQJRbg64jU4xper6WhaYtSXK7TKEnxJycoEn9gYNyCcdot2uekth0xRH
 ChHGmRlteiqe/y4uFWn/2dcxWjoleiHbFjTaiRL75WVl8wIDEjw02LGuoZ61Ss9H
 WOV+MT7KqNjBGe6lreUY+O/PO02dzmoR6heJXN19p8zr/pBuLCTGX7UpO7rzgaBR
 4N1HEozvIw==
 =celk
 -----END PGP SIGNATURE-----

Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Improve the type checking of request flags (Bart)

 - Ensure queue mapping for a single queues always picks the right queue
   (Bart)

 - Sanitize the io priority handling (Jan)

 - rq-qos race fix (Jinke)

 - Reserved tags handling improvements (John)

 - Separate memory alignment from file/disk offset aligment for O_DIRECT
   (Keith)

 - Add new ublk driver, userspace block driver using io_uring for
   communication with the userspace backend (Ming)

 - Use try_cmpxchg() to cleanup the code in various spots (Uros)

 - Finally remove bdevname() (Christoph)

 - Clean up the zoned device handling (Christoph)

 - Clean up independent access range support (Christoph)

 - Clean up and improve block sysfs handling (Christoph)

 - Clean up and improve teardown of block devices.

   This turns the usual two step process into something that is simpler
   to implement and handle in block drivers (Christoph)

 - Clean up chunk size handling (Christoph)

 - Misc cleanups and fixes (Bart, Bo, Dan, GuoYong, Jason, Keith, Liu,
   Ming, Sebastian, Yang, Ying)

* tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block: (178 commits)
  ublk_drv: fix double shift bug
  ublk_drv: make sure that correct flags(features) returned to userspace
  ublk_drv: fix error handling of ublk_add_dev
  ublk_drv: fix lockdep warning
  block: remove __blk_get_queue
  block: call blk_mq_exit_queue from disk_release for never added disks
  blk-mq: fix error handling in __blk_mq_alloc_disk
  ublk: defer disk allocation
  ublk: rewrite ublk_ctrl_get_queue_affinity to not rely on hctx->cpumask
  ublk: fold __ublk_create_dev into ublk_ctrl_add_dev
  ublk: cleanup ublk_ctrl_uring_cmd
  ublk: simplify ublk_ch_open and ublk_ch_release
  ublk: remove the empty open and release block device operations
  ublk: remove UBLK_IO_F_PREFLUSH
  ublk: add a MAINTAINERS entry
  block: don't allow the same type rq_qos add more than once
  mmc: fix disk/queue leak in case of adding disk failure
  ublk_drv: fix an IS_ERR() vs NULL check
  ublk: remove UBLK_IO_F_INTEGRITY
  ublk_drv: remove unneeded semicolon
  ...
2022-08-02 13:46:35 -07:00
Linus Torvalds
d945404f74 block-5.19-2022-07-21
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLaEtsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjuZD/4joqo0PVz+O5MUadOmFxQ+Js2RCzxzDYwN
 rfTlzsS2ybcoG2ehs3TPGU2c3Q+vODcgFOLJtrKPsyxEZM49uuu7+romi9/jxW4P
 Pju8DmA0037elEGmml4i3EiMoIGyz1ScXFpR2Ii+LLTD6y7hNxHYxC/6PgWWaO59
 gIydFz9PenJmz+orlpRI08vkvifdaCCdFfqgnP+KtvJsee2FPUdg2xpA021a1Tzj
 7u1qPS1JnylQP1VQypqXnZTiHzmzmkQL4GNQIo/pje/RBstTFb3HOPXEoeVvQKrl
 WjNntc80LZvYCMBzcksDkGAT8YLxXOckKvDiaYGObIoEoh/K6E7CBoIo4Smu3mwr
 NqoaN3jZXnfTtZ40VEnxI6HnIbXG8Jc96P0/7Us3lL/W/wP0fD4fqNlumbkeMW2d
 ualU91J6+YTGrYYWClSTSNyWQF/MNr1XRL9LQiO4GDs0fkIL1WoxaY/+AnUp0Jcc
 PF8tm83852PcP+YJOOYgXD04gBmg2JCOrdarE87x+fIKp2Z7V4HPeR/hH5iOu9F3
 CbXaBTICJlo7jmVUMKMh5yXvTyx4TqDvpWGbpQweXG4t8c77TPMI2cIHVdj1yqgj
 L015BORXfsWnI0yX4IA32RqcoNwZHrrb930OxGDvUlxTrjfodlaLypYJ1wDKwRMi
 H8LTxE6igA==
 =zviX
 -----END PGP SIGNATURE-----

Merge tag 'block-5.19-2022-07-21' of git://git.kernel.dk/linux-block

Pull block fix from Jens Axboe:
 "Just a single fix for missing error propagation for an allocation
  failure in raid5"

* tag 'block-5.19-2022-07-21' of git://git.kernel.dk/linux-block:
  md/raid5: missing error code in setup_conf()
2022-07-22 12:41:14 -07:00
Dan Carpenter
5f7ef4875f md/raid5: missing error code in setup_conf()
Return -ENOMEM if the allocation fails.  Don't return success.

Fixes: 8fbcba6b99 ("md/raid5: Cleanup setup_conf() error returns")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
2022-07-19 10:58:33 -07:00
Bart Van Assche
a9010741ce md/raid5: Use the enum req_op and blk_opf_t types
Improve static type checking by using the enum req_op type for variables
that represent a request operation and the new blk_opf_t type for
variables that represent request flags.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-37-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:32 -06:00
Roman Gushchin
e33c267ab7 mm: shrinkers: provide shrinkers with names
Currently shrinkers are anonymous objects.  For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g.  for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.

This commit adds names to shrinkers.  register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.

In some cases it's not possible to determine a good name at the time when
a shrinker is allocated.  For such cases shrinker_debugfs_rename() is
provided.

The expected format is:
    <subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

After this change the shrinker debugfs directory looks like:
  $ cd /sys/kernel/debug/shrinker/
  $ ls
    dquota-cache-16     sb-devpts-28     sb-proc-47       sb-tmpfs-42
    mm-shadow-18        sb-devtmpfs-5    sb-proc-48       sb-tmpfs-43
    mm-zspool:zram0-34  sb-hugetlbfs-17  sb-pstore-31     sb-tmpfs-44
    rcu-kfree-0         sb-hugetlbfs-33  sb-rootfs-2      sb-tmpfs-49
    sb-aio-20           sb-iomem-12      sb-securityfs-6  sb-tracefs-13
    sb-anon_inodefs-15  sb-mqueue-21     sb-selinuxfs-22  sb-xfs:vda1-36
    sb-bdev-3           sb-nsfs-4        sb-sockfs-8      sb-zsmalloc-19
    sb-bpf-32           sb-pipefs-14     sb-sysfs-26      thp-deferred_split-10
    sb-btrfs:vda2-24    sb-proc-25       sb-tmpfs-1       thp-zero-9
    sb-cgroup2-30       sb-proc-39       sb-tmpfs-27      xfs-buf:vda1-37
    sb-configfs-23      sb-proc-41       sb-tmpfs-29      xfs-inodegc:vda1-38
    sb-dax-11           sb-proc-45       sb-tmpfs-35
    sb-debugfs-7        sb-proc-46       sb-tmpfs-40

[roman.gushchin@linux.dev: fix build warnings]
  Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
  Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:40 -07:00
Mikulas Patocka
617b365872 dm raid: fix KASAN warning in raid5_add_disks
There's a KASAN warning in raid5_add_disk when running the LVM testsuite.
The warning happens in the test
lvconvert-raid-reshape-linear_to_raid6-single-type.sh. We fix the warning
by verifying that rdev->saved_raid_disk is within limits.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-29 19:48:04 -04:00
Mikulas Patocka
1ebc2cec0b dm raid: fix KASAN warning in raid5_remove_disk
There's a KASAN warning in raid5_remove_disk when running the LVM
testsuite. We fix this warning by verifying that the "number" variable is
within limits.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-29 19:47:36 -04:00
Christoph Hellwig
913cce5a1e md: remove most calls to bdevname
Use the %pg format specifier to save on stack consumption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-05-22 23:07:21 -07:00
Christoph Hellwig
3d50d368c9 raid5: don't set the discard_alignment queue limit
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by raid5 is mostly
harmless but also useless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03 10:38:50 -06:00
Logan Gunthorpe
4631f39f05 md/raid5: Annotate functions that hold device_lock with __must_hold
A handful of functions note the device_lock must be held with a comment
but this is not comprehensive. Many other functions hold the lock when
taken so add an __must_hold() to each call to annotate when the lock is
held.

This makes it a bit easier to analyse device_lock.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-04-25 14:00:37 -07:00
Logan Gunthorpe
9aeb7f99a1 md/raid5: Annotate rdev/replacement access when mddev_lock is held
The mddev_lock should be held during raid5_remove_disk() which is when
the rdev/replacement pointers are modified. So any access to these
pointers marked __rcu should be safe whenever the mddev_lock is held.

There are numerous such access that currently produce sparse warnings.
Add a helper function, rdev_mdlock_deref() that wraps
rcu_dereference_protected() in all these instances.

This annotation fixes a number of sparse warnings.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-04-25 14:00:37 -07:00
Logan Gunthorpe
e38b043255 md/raid5: Annotate rdev/replacement accesses when nr_pending is elevated
There are a number of accesses to __rcu variables that should be safe
because nr_pending in the disk is known to be elevated.

Create a wrapper around rcu_dereference_protected() to annotate these
accesses and verify that nr_pending is non-zero.

This fixes a number of sparse warnings.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-04-25 14:00:36 -07:00
Logan Gunthorpe
b0920ede08 md/raid5: Add __rcu annotation to struct disk_info
rdev and replacement are protected in some circumstances with
rcu_dereference and synchronize_rcu (in raid5_remove_disk()). However,
they were not annotated with __rcu so a sparse warning is emitted for
every rcu_dereference() call.

Add the __rcu annotation and fix up the initialization with
RCU_INIT_POINTER, all pointer modifications with rcu_assign_pointer(),
a few cases where the pointer value is tested with rcu_access_pointer()
and one case where READ_ONCE() is used instead of rcu_dereference(),
a case in print_raid5_conf() that should have rcu_dereference() and
rcu_read_[un]lock() calls.

Additional sparse issues will be fixed up in further commits.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-04-25 14:00:36 -07:00
Logan Gunthorpe
8fbcba6b99 md/raid5: Cleanup setup_conf() error returns
Be more careful about the error returns. Most errors in this function
are actually ENOMEM, but it forcibly returns EIO if conf has been
allocated.

Instead return ret and ensure it is set appropriately before each goto
abort.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-04-25 14:00:36 -07:00
Mariusz Tkaczyk
57668f0a4c raid5: introduce MD_BROKEN
Raid456 module had allowed to achieve failed state. It was fixed by
fb73b357fb ("raid5: block failing device if raid will be failed").
This fix introduces a bug, now if raid5 fails during IO, it may result
with a hung task without completion. Faulty flag on the device is
necessary to process all requests and is checked many times, mainly in
analyze_stripe().
Allow to set faulty on drive again and set MD_BROKEN if raid is failed.

As a result, this level is allowed to achieve failed state again, but
communication with userspace (via -EBUSY status) will be preserved.

This restores possibility to fail array via #mdadm --set-faulty command
and will be fixed by additional verification on mdadm side.

Reproduction steps:
 mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
 mkfs.xfs /dev/md126 -f
 mount /dev/md126 /mnt/root/

 fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
--bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
--time_based --group_reporting --name=throughput-test-job
--eta-newline=1 &

 echo 1 > /sys/block/nvme2n1/device/device/remove
 echo 1 > /sys/block/nvme1n1/device/device/remove

 [ 1475.787779] Call Trace:
 [ 1475.793111] __schedule+0x2a6/0x700
 [ 1475.799460] schedule+0x38/0xa0
 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
 [ 1475.813856] ? finish_wait+0x80/0x80
 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
 [ 1475.828281] ? finish_wait+0x80/0x80
 [ 1475.834727] ? finish_wait+0x80/0x80
 [ 1475.841127] ? finish_wait+0x80/0x80
 [ 1475.847480] md_handle_request+0x119/0x190
 [ 1475.854390] md_make_request+0x8a/0x190
 [ 1475.861041] generic_make_request+0xcf/0x310
 [ 1475.868145] submit_bio+0x3c/0x160
 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390
 [ 1475.889149] iomap_apply+0xff/0x310
 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.909974] iomap_dio_rw+0x2f2/0x490
 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.923680] ? atime_needs_update+0x77/0xe0
 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
 [ 1475.953403] aio_read+0xd5/0x180
 [ 1475.959395] ? _cond_resched+0x15/0x30
 [ 1475.965907] io_submit_one+0x20b/0x3c0
 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180
 [ 1475.979335] ? do_io_getevents+0x7c/0xc0
 [ 1475.986009] do_syscall_64+0x5b/0x1a0
 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
 [ 1476.000255] RIP: 0033:0x7f11fc27978d
 [ 1476.006631] Code: Bad RIP value.
 [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.

Cc: stable@vger.kernel.org
Fixes: fb73b357fb ("raid5: block failing device if raid will be failed")
Reviewd-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
2022-04-25 14:00:35 -07:00
Christoph Hellwig
70200574cc block: remove QUEUE_FLAG_DISCARD
Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:59 -06:00
Christoph Hellwig
10f0d2a517 block: add a bdev_nonrot helper
Add a helper to check the nonrot flag based on the block_device instead
of having to poke into the block layer internal request_queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:59 -06:00
Linus Torvalds
561593a048 for-5.18/write-streams-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI1AHwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplPjEACVJzKg5NkxpdkDThvq5tejws9KxB/4mHJg
 NoDMcv1TF+Orsd/HNW6XrgYnbU0ObHom3568/xb8BNegRVFe7V4ME/4IYNRyGOmV
 qbfciu04L1UkJhI52CIidkOioFABL3r1zgLCIz5vk0Cv9X7Le9x0UabHxJf7u9S+
 Z3lNdyxezN0SGx8VT86l/7lSoHtG3VHO9IsQCuNGF02SB+6uGpXBlptbEoQ4nTxd
 T7/H9FNOe2Wf7eKvcOOds8UlvZYAfYcY0GcRrIOXdHIy25mKFWwn5cDgFTMOH5ID
 xXpm+JFkDkrfSW1o4FFPxbN9Z6RbVXbGCsrXlIragLO2MJQdXiIUxS1OPT5oAado
 H9MlX6QtkwziLW9zUWa/N/jmRjc2vzHAxD6JFg/wXxNdtY0kd8TQpaxwTB8mVDPN
 VCGutt7lJS1CQInQ+ppzbdqzzuLHC1RHAyWSmfUE9rb8cbjxtJBnSIorYRLUesMT
 GRwqVTXW0osxSgCb1iDiBCJANrX1yPZcemv4Wh1gzbT6IE9sWxWXsE5sy9KvswNc
 M+E4nu/TYYTfkynItJjLgmDLOoi+V0FBY6ba0mRPBjkriSP4AVlwsZLGVsAHQzuA
 o5paW1GjRCCwhIQ6+AzZIoOz6wqvprBlUgUkUneyYAQ2ZKC3pZi8zPnpoVdFucVa
 VaTzP71C1Q==
 =efaq
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block

Pull NVMe write streams removal from Jens Axboe:
 "This removes the write streams support in NVMe. No vendor ever really
  shipped working support for this, and they are not interested in
  supporting it.

  With the NVMe support gone, we have nothing in the tree that supports
  this. Remove passing around of the hints.

  The only discussion point in this patchset imho is the fact that the
  file specific write hint setting/getting fcntl helpers will now return
  -1/EINVAL like they did before we supported write hints. No known
  applications use these functions, I only know of one prototype that I
  help do for RocksDB, and that's not used. That said, with a change
  like this, it's always a bit controversial. Alternatively, we could
  just make them return 0 and pretend it worked. It's placement based
  hints after all"

* tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block:
  fs: remove fs.f_write_hint
  fs: remove kiocb.ki_hint
  block: remove the per-bio/request write hint
  nvme: remove support or stream based temperature hint
2022-03-26 11:51:46 -07:00
Linus Torvalds
6f2689a766 SCSI misc on 20220324
This series consists of the usual driver updates (qla2xxx, pm8001,
 libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
 and bug fixes.  The high blast radius core update is the removal of
 write same, which affects block and several non-SCSI devices.  The
 other big change, which is more local, is the removal of the SCSI
 pointer.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYjzDQyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQMYAQDEWUGV
 6U0+736AHVtOfiMNfiRN79B1HfXVoHvemnPcTwD/UlndwFfy/3GGOtoZmqEpc73J
 Ec1HDuUCE18H1H2QAh0=
 =/Ty9
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (qla2xxx, pm8001,
  libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
  and bug fixes.

  The high blast radius core update is the removal of write same, which
  affects block and several non-SCSI devices. The other big change,
  which is more local, is the removal of the SCSI pointer"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits)
  scsi: scsi_ioctl: Drop needless assignment in sg_io()
  scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn()
  scsi: lpfc: Copyright updates for 14.2.0.0 patches
  scsi: lpfc: Update lpfc version to 14.2.0.0
  scsi: lpfc: SLI path split: Refactor BSG paths
  scsi: lpfc: SLI path split: Refactor Abort paths
  scsi: lpfc: SLI path split: Refactor SCSI paths
  scsi: lpfc: SLI path split: Refactor CT paths
  scsi: lpfc: SLI path split: Refactor misc ELS paths
  scsi: lpfc: SLI path split: Refactor VMID paths
  scsi: lpfc: SLI path split: Refactor FDISC paths
  scsi: lpfc: SLI path split: Refactor LS_RJT paths
  scsi: lpfc: SLI path split: Refactor LS_ACC paths
  scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths
  scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths
  scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path
  scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe
  scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4
  scsi: lpfc: SLI path split: Refactor lpfc_iocbq
  scsi: lpfc: Use kcalloc()
  ...
2022-03-24 19:37:53 -07:00
Christoph Hellwig
03a6b195e8 raid5: initialize the stripe_head embeeded bios as needed
Use bio_init to initialize the bios when needed to the full state
instead of a partial initialization plus later setting of dev and op
and bio_reset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:13 -08:00
Christoph Hellwig
c75e707fe1 block: remove the per-bio/request write hint
With the NVMe support for this gone, there are no consumers of these hints
left, so remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:45:57 -07:00
Christoph Hellwig
10fa225c33 scsi: md: Remove WRITE_SAME support
There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.

Link: https://lore.kernel.org/r/20220209082828.2629273-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22 21:11:08 -05:00
Christoph Hellwig
abfc426d1b block: pass a block_device to bio_clone_fast
Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04 07:43:18 -07:00
Christoph Hellwig
a7c50c9404 block: pass a block_device and opf to bio_reset
Pass the block_device that we plan to use this bio for and the
operation to bio_reset to optimize the assigment.  A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:50:00 -07:00
Christoph Hellwig
49add4966d block: pass a block_device and opf to bio_init
Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment.  A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Xiao Ni
0c031fd37f md: Move alloc/free acct bioset in to personality
bioset acct is only needed for raid0 and raid5. Therefore, md_run only
allocates it for raid0 and raid5. However, this does not cover
personality takeover, which may cause uninitialized bioset. For example,
the following repro steps:

  mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1
  mdadm --wait /dev/md0
  mkfs.xfs /dev/md0
  mdadm /dev/md0 --grow -l5
  mount /dev/md0 /mnt

causes panic like:

[  225.933939] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  225.934903] #PF: supervisor instruction fetch in kernel mode
[  225.935639] #PF: error_code(0x0010) - not-present page
[  225.936361] PGD 0 P4D 0
[  225.936677] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
[  225.937525] CPU: 27 PID: 1133 Comm: mount Not tainted 5.16.0-rc3+ #706
[  225.938416] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.module_el8.4.0+547+a85d02ba 04/01/2014
[  225.939922] RIP: 0010:0x0
[  225.940289] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[  225.941196] RSP: 0018:ffff88815897eff0 EFLAGS: 00010246
[  225.941897] RAX: 0000000000000000 RBX: 0000000000092800 RCX: ffffffff81370a39
[  225.942813] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000092800
[  225.943772] RBP: 1ffff1102b12fe04 R08: fffffbfff0b43c01 R09: fffffbfff0b43c01
[  225.944807] R10: ffffffff85a1e007 R11: fffffbfff0b43c00 R12: ffff88810eaaaf58
[  225.945757] R13: 0000000000000000 R14: ffff88810eaaafb8 R15: ffff88815897f040
[  225.946709] FS:  00007ff3f2505080(0000) GS:ffff888fb5e00000(0000) knlGS:0000000000000000
[  225.947814] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  225.948556] CR2: ffffffffffffffd6 CR3: 000000015aa5a006 CR4: 0000000000370ee0
[  225.949537] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  225.950455] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  225.951414] Call Trace:
[  225.951787]  <TASK>
[  225.952120]  mempool_alloc+0xe5/0x250
[  225.952625]  ? mempool_resize+0x370/0x370
[  225.953187]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  225.953862]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  225.954464]  ? sched_clock_cpu+0x15/0x120
[  225.955019]  ? find_held_lock+0xac/0xd0
[  225.955564]  bio_alloc_bioset+0x1ed/0x2a0
[  225.956080]  ? lock_downgrade+0x3a0/0x3a0
[  225.956644]  ? bvec_alloc+0xc0/0xc0
[  225.957135]  bio_clone_fast+0x19/0x80
[  225.957651]  raid5_make_request+0x1370/0x1b70
[  225.958286]  ? sched_clock_cpu+0x15/0x120
[  225.958797]  ? __lock_acquire+0x8b2/0x3510
[  225.959339]  ? raid5_get_active_stripe+0xce0/0xce0
[  225.959986]  ? lock_is_held_type+0xd8/0x130
[  225.960528]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  225.961135]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  225.961703]  ? sched_clock_cpu+0x15/0x120
[  225.962232]  ? lock_release+0x27a/0x6c0
[  225.962746]  ? do_wait_intr_irq+0x130/0x130
[  225.963302]  ? lock_downgrade+0x3a0/0x3a0
[  225.963815]  ? lock_release+0x6c0/0x6c0
[  225.964348]  md_handle_request+0x342/0x530
[  225.964888]  ? set_in_sync+0x170/0x170
[  225.965397]  ? blk_queue_split+0x133/0x150
[  225.965988]  ? __blk_queue_split+0x8b0/0x8b0
[  225.966524]  ? submit_bio_checks+0x3b2/0x9d0
[  225.967069]  md_submit_bio+0x127/0x1c0
[...]

Fix this by moving alloc/free of acct bioset to pers->run and pers->free.

While we are on this, properly handle md_integrity_register() error in
raid0_run().

Fixes: daee202471 (md: check level before create and exit io_acct_set)
Cc: stable@vger.kernel.org
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
2022-01-06 08:37:03 -08:00
Vishal Verma
bf2c411bb1 md: raid456 add nowait support
Returns EAGAIN in case the raid456 driver would block waiting for reshape.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
2022-01-06 08:37:02 -08:00
Davidlohr Bueso
770b1d216d md/raid5: play nice with PREEMPT_RT
raid_run_ops() relies on the implicitly disabled preemption for
its percpu ops, although this is really about CPU locality. This
breaks RT semantics as it can take regular (and thus sleeping)
spinlocks, such as stripe_lock.

Add a local_lock such that non-RT does not change and continues
to be just map to preempt_disable/enable, but makes RT happy as
the region will use a per-CPU spinlock and thus be preemptible
and still guarantee CPU locality.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Song Liu <songliubraving@fb.com>
2022-01-06 08:37:02 -08:00
Guoqing Jiang
5467948604 md: remove unused argument from md_new_event
Actually, mddev is not used by md_new_event.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:50:33 -06:00
Guoqing Jiang
c6efe4341d md/raid5: call roundup_pow_of_two in raid5_run
Let's call roundup_pow_of_two here instead of open code.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:50:31 -06:00
Sebastian Andrzej Siewior
252034e03f md/raid5: Replace deprecated CPU-hotplug functions.
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20210803141621.780504-16-bigeasy@linutronix.de
2021-08-28 01:46:17 +02:00
Linus Torvalds
440462198d for-5.14/drivers-2021-06-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmDbd5UQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvsNEADCJKP81boFzRcdJo7EqaNDAzZyKOIg9Oq7
 4GZE0Wm0SgA6+04bKrNVd9KLcKvQ+NC1pK7UJemSSH2y9ir+zHfyYgAV0/+wFmYm
 NgHlDjBvf80XSI5wezcb6MxZT+R7IaIpDsW1ZvV9hFtPSncn5o2OIWiSdJtHT/Rv
 enlgZPc7OwNWoVMX8eR58IoO0k3S6GLpctUZHt/AUukaKgoOks0X523qhEPf3Upr
 RkbIZuqLWVgpdT6457iSE/OijUczD4thTI8bdprxzhgimOm2vV52sO6F5HtHc7GX
 qW+PWYUaiUk7UpObuOuyv0yyUG45ii73iY1W0w66RiyCjVTgtpdwwMQ38VlBcoOg
 zcE1jneAEJt6TiS6zfRaER/10JoCIG4gp1+apPuaXud/o3BqWI0cagVHAgaLziBI
 F7bDJkbJZIR6GrWMgemBI+mc5/LACBePxzPGLScKFptejtQ/ysfZQ6aCLROJWB2U
 4EnysAaUBf6tywj30JqfQvqFNGkHIgY95FKiXJW6GzqqwgBouNf48vS15BgkwI+2
 EijcqUhlOVNfc3RIc0ZL5c9KcPIN9t5sqBrWZe3wgCErhxAx6w6Za9nDdP+US9bl
 /apCpvDFlu59g8n1wtkNE/uC+XqdKDwsplYhnfpX0FGni5wIknhQq3bSe4dPFgSn
 pG5VMrw3pA==
 =D6dS
 -----END PGP SIGNATURE-----

Merge tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:
 "Pretty calm round, mostly just NVMe and a bit of MD:

   - NVMe updates (via Christoph)
        - improve the APST configuration algorithm (Alexey Bogoslavsky)
        - look for StorageD3Enable on companion ACPI device
          (Mario Limonciello)
        - allow selecting the network interface for TCP connections
          (Martin Belanger)
        - misc cleanups (Amit Engel, Chaitanya Kulkarni, Colin Ian King,
          Christoph)
        - move the ACPI StorageD3 code to drivers/acpi/ and add quirks
          for certain AMD CPUs (Mario Limonciello)
        - zoned device support for nvmet (Chaitanya Kulkarni)
        - fix the rules for changing the serial number in nvmet
          (Noam Gottlieb)
        - various small fixes and cleanups (Dan Carpenter, JK Kim,
          Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert
          Uytterhoeven, Daniel Wagner)

   - MD updates (Via Song)
        - iostats rewrite (Guoqing Jiang)
        - raid5 lock contention optimization (Gal Ofri)

   - Fall through warning fix (Gustavo)

   - Misc fixes (Gustavo, Jiapeng)"

* tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block: (78 commits)
  nvmet: use NVMET_MAX_NAMESPACES to set nn value
  loop: Fix missing discard support when using LOOP_CONFIGURE
  nvme.h: add missing nvme_lba_range_type endianness annotations
  nvme: remove zeroout memset call for struct
  nvme-pci: remove zeroout memset call for struct
  nvmet: remove zeroout memset call for struct
  nvmet: add ZBD over ZNS backend support
  nvmet: add Command Set Identifier support
  nvmet: add nvmet_req_bio put helper for backends
  nvmet: add req cns error complete helper
  block: export blk_next_bio()
  nvmet: remove local variable
  nvmet: use nvme status value directly
  nvmet: use u32 type for the local variable nsid
  nvmet: use u32 for nvmet_subsys max_nsid
  nvmet: use req->cmd directly in file-ns fast path
  nvmet: use req->cmd directly in bdev-ns fast path
  nvmet: make ver stable once connection established
  nvmet: allow mn change if subsys not discovered
  nvmet: make sn stable once connection was established
  ...
2021-06-30 12:21:16 -07:00
Gal Ofri
97ae27252f md/raid5: avoid device_lock in read_one_chunk()
There is a lock contention on device_lock in read_one_chunk().
device_lock is taken to sync conf->active_aligned_reads and
conf->quiesce.
read_one_chunk() takes the lock, then waits for quiesce=0 (resumed)
before incrementing active_aligned_reads.
raid5_quiesce() takes the lock, sets quiesce=2 (in-progress), then waits
for active_aligned_reads to be zero before setting quiesce=1
(suspended).

Introduce a fast (lockless) path in read_one_chunk(): activate aligned
read without taking device_lock.  In case quiesce starts while
activating the aligned-read in fast path, deactivate it and revert to
old behavior (take device_lock and wait for quiesce to finish).

Add smp store/load in raid5_quiesce()/read_one_chunk() respectively to
gaurantee that read_one_chunk() does not miss an ongoing quiesce.

My setups:
1. 8 local nvme drives (each up to 250k iops).
2. 8 ram disks (brd).

Each setup with raid6 (6+2), 1024 io threads on a 96 cpu-cores (48 per
socket) system. Record both iops and cpu spent on this contention with
rand-read-4k. Record bw with sequential-read-128k.  Note: in most cases
cpu is still busy but due to "new" bottlenecks.

nvme:
              | iops           | cpu  | bw
-----------------------------------------------
without patch | 1.6M           | ~50% | 5.5GB/s
with patch    | 2M (throttled) | 0%   | 16GB/s (throttled)

ram (brd):
              | iops           | cpu  | bw
-----------------------------------------------
without patch | 2M             | ~80% | 24GB/s
with patch    | 4M             | 0%   | 55GB/s

CC: Song Liu <song@kernel.org>
CC: Neil Brown <neilb@suse.de>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Gal Ofri <gal.ofri@storing.io>
Signed-off-by: Song Liu <song@kernel.org>
2021-06-14 22:32:07 -07:00
Rikard Falkeborn
c32dc04059 md: Constify attribute_group structs
The attribute_group structs are never modified, they're only passed to
sysfs_create_group() and sysfs_remove_group(). Make them const to allow
the compiler to put them in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Song Liu <song@kernel.org>
2021-06-14 22:32:07 -07:00
Guoqing Jiang
1147f58e10 md/raid5: avoid redundant bio clone in raid5_read_one_chunk
After enable io accounting, chunk read bio could be cloned twice which
is not good. To avoid such inefficiency, let's clone align_bio from
io_acct_set too, then we need only call md_account_bio in make_request
unconditionally.

Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: Song Liu <song@kernel.org>
2021-06-14 22:32:06 -07:00
Guoqing Jiang
c82aa1b767 md/raid5: move checking badblock before clone bio in raid5_read_one_chunk
We don't need to clone bio if the relevant region has badblock.

Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: Song Liu <song@kernel.org>
2021-06-14 22:32:06 -07:00
Guoqing Jiang
10764815ff md: add io accounting for raid0 and raid5
We introduce a new bioset (io_acct_set) for raid0 and raid5 since they
don't own clone infrastructure to accounting io. And the bioset is added
to mddev instead of to raid0 and raid5 layer, because with this way, we
can put common functions to md.h and reuse them in raid0 and raid5.

Also struct md_io_acct is added accordingly which includes io start_time,
the origin bio and cloned bio. Then we can call bio_{start,end}_io_acct
to get related io status.

Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: Song Liu <song@kernel.org>
2021-06-14 22:32:06 -07:00
Christoph Hellwig
cc14626791 md/raid5: remove an incorrect assert in in_chunk_boundary
Now that the original bdev is stored in the bio this assert is incorrect
and will trigger for any partitioned raid5 device.

Reported-by: Florian Dazinger <spam02@dazinger.net>
Tested-by: Florian Dazinger <spam02@dazinger.net>
Cc: stable@vger.kernel.org # 5.12
Fixes: 309dca309f ("block: store a block_device pointer in struct bio"),
Reviewed-by:  Guoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2021-05-25 18:03:15 -07:00
Sami Tolvanen
4f0f586bf0 treewide: Change list_sort to use const pointers
list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.

Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
2021-04-08 16:04:22 -07:00
Linus Torvalds
9820b4dca0 for-5.12/drivers-2021-02-17
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtnZ8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoOhD/9TZJN6mgvlmO2zuNlZwko0jD+HUYNRHdfa
 UiZhKs55ShlT/Wd8MMLcmMU2/+iztq4c/ZLK9eS7NHgKTu3GbgsICEZK+XLSTVJh
 gCwwEnY2dnBAIwBWxLeDG04DQvcwnOhVN1OSmhFbXG3dpElXSyfEjx00niGtl0tE
 5YtvmStpqkC0tHlxq8CMyyfaL1ODGmBK2uhDeQCO12SXIgKondJUaI3/H1l1dC5t
 +yg4PsSLqezo0oWmqdTEE7lcEJs4GK1ZOhIBLtWe6tl/zaVD6DuzJL83pChm8vF+
 qV4LCpJL0wUL7IG601AemFcUmEg34oC0FD6GYYhXxVOrlk43V6AfkycZ2rljNRop
 /+Ff+CmXfWPAwSfJi/vDlveCvgyAJNMpv4/GwynUM1563v3TYy0YjT6Jlz6M3TUn
 pS0MW7iUHj3t36U3JvcYnITqTPSfTMtYMOsWDx+V4+E9iGsQF1d7KZlCPMvjWAI2
 c3QuWjitXd10BZ2qUvSzAg6piv1taBKJxg1PsGlu707mHZp5J6VPAkYQn4rFgjua
 uCBbmRQCDF/wJ02IBmBUiMPP64UGbkhr+O3MILPqki967BdDDrLzjTs4e5zbMeu/
 qB9XRr1Yd95GCyS8I42OCC906NXrJ2R2E8dtV1XoASWGusL/wFZrLd1th8Uq8ibb
 Os+G4t1Qug==
 =vx6U
 -----END PGP SIGNATURE-----

Merge tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:

 - Remove the skd driver. It's been EOL for a long time (Damien)

 - NVMe pull requests
      - fix multipath handling of ->queue_rq errors (Chao Leng)
      - nvmet cleanups (Chaitanya Kulkarni)
      - add a quirk for buggy Amazon controller (Filippo Sironi)
      - avoid devm allocations in nvme-hwmon that don't interact well
        with fabrics (Hannes Reinecke)
      - sysfs cleanups (Jiapeng Chong)
      - fix nr_zones for multipath (Keith Busch)
      - nvme-tcp crash fix for no-data commands (Sagi Grimberg)
      - nvmet-tcp fixes (Sagi Grimberg)
      - add a missing __rcu annotation (Christoph)
      - failed reconnect fixes (Chao Leng)
      - various tracing improvements (Michal Krakowiak, Johannes
        Thumshirn)
      - switch the nvmet-fc assoc_list to use RCU protection (Leonid
        Ravich)
      - resync the status codes with the latest spec (Max Gurtovoy)
      - minor nvme-tcp improvements (Sagi Grimberg)
      - various cleanups (Rikard Falkeborn, Minwoo Im, Chaitanya
        Kulkarni, Israel Rukshin)

 - Floppy O_NDELAY fix (Denis)

 - MD pull request
      - raid5 chunk_sectors fix (Guoqing)

 - Use lore links (Kees)

 - Use DEFINE_SHOW_ATTRIBUTE for nbd (Liao)

 - loop lock scaling (Pavel)

 - mtip32xx PCI fixes (Bjorn)

 - bcache fixes (Kai, Dongdong)

 - Misc fixes (Tian, Yang, Guoqing, Joe, Andy)

* tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block: (64 commits)
  lightnvm: pblk: Replace guid_copy() with export_guid()/import_guid()
  lightnvm: fix unnecessary NULL check warnings
  nvme-tcp: fix crash triggered with a dataless request submission
  block: Replace lkml.org links with lore
  nbd: Convert to DEFINE_SHOW_ATTRIBUTE
  nvme: add 48-bit DMA address quirk for Amazon NVMe controllers
  nvme-hwmon: rework to avoid devm allocation
  nvmet: remove else at the end of the function
  nvmet: add nvmet_req_subsys() helper
  nvmet: use min of device_path and disk len
  nvmet: use invalid cmd opcode helper
  nvmet: use invalid cmd opcode helper
  nvmet: add helper to report invalid opcode
  nvmet: remove extra variable in id-ns handler
  nvmet: make nvmet_find_namespace() req based
  nvmet: return uniform error for invalid ns
  nvmet: set status to 0 in case for invalid nsid
  nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queues
  nvme-multipath: set nr_zones for zoned namespaces
  nvmet-tcp: fix potential race of tcp socket closing accept_work
  ...
2021-02-21 11:06:54 -08:00
Guoqing Jiang
c5eec74f25 md/raid5: cast chunk_sectors to sector_t value
Currently, raid5 calculates dev_sectors from chunk_sectors without
proper cast, which is problematic.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2021-02-03 22:48:16 -08:00
Christoph Hellwig
e82ed3a4fb md/raid6: refactor raid5_read_one_chunk
Refactor raid5_read_one_chunk so that all simple checks are done
before allocating the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
309dca309f block: store a block_device pointer in struct bio
Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device.  From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-24 18:17:20 -07:00
Christoph Hellwig
1c02fca620 block: remove the request_queue argument to the block_bio_remap tracepoint
The request_queue can trivially be derived from the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 09:42:00 -07:00
Song Liu
b44c018cdf md/raid5: fix oops during stripe resizing
KoWei reported crash during raid5 reshape:

[ 1032.252932] Oops: 0002 [#1] SMP PTI
[...]
[ 1032.252943] RIP: 0010:memcpy_erms+0x6/0x10
[...]
[ 1032.252947] RSP: 0018:ffffba1ac0c03b78 EFLAGS: 00010286
[ 1032.252949] RAX: 0000784ac0000000 RBX: ffff91bec3d09740 RCX: 0000000000001000
[ 1032.252951] RDX: 0000000000001000 RSI: ffff91be6781c000 RDI: 0000784ac0000000
[ 1032.252953] RBP: ffffba1ac0c03bd8 R08: 0000000000001000 R09: ffffba1ac0c03bf8
[ 1032.252954] R10: 0000000000000000 R11: 0000000000000000 R12: ffffba1ac0c03bf8
[ 1032.252955] R13: 0000000000001000 R14: 0000000000000000 R15: 0000000000000000
[ 1032.252958] FS:  0000000000000000(0000) GS:ffff91becf500000(0000) knlGS:0000000000000000
[ 1032.252959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1032.252961] CR2: 0000784ac0000000 CR3: 000000031780a002 CR4: 00000000001606e0
[ 1032.252962] Call Trace:
[ 1032.252969]  ? async_memcpy+0x179/0x1000 [async_memcpy]
[ 1032.252977]  ? raid5_release_stripe+0x8e/0x110 [raid456]
[ 1032.252982]  handle_stripe_expansion+0x15a/0x1f0 [raid456]
[ 1032.252988]  handle_stripe+0x592/0x1270 [raid456]
[ 1032.252993]  handle_active_stripes.isra.0+0x3cb/0x5a0 [raid456]
[ 1032.252999]  raid5d+0x35c/0x550 [raid456]
[ 1032.253002]  ? schedule+0x42/0xb0
[ 1032.253006]  ? schedule_timeout+0x10e/0x160
[ 1032.253011]  md_thread+0x97/0x160
[ 1032.253015]  ? wait_woken+0x80/0x80
[ 1032.253019]  kthread+0x104/0x140
[ 1032.253022]  ? md_start_sync+0x60/0x60
[ 1032.253024]  ? kthread_park+0x90/0x90
[ 1032.253027]  ret_from_fork+0x35/0x40

This is because cache_size_mutex was unlocked too early in resize_stripes,
which races with grow_one_stripe() that grow_one_stripe() allocates a
stripe with wrong pool_size.

Fix this issue by unlocking cache_size_mutex after updating pool_size.

Cc: <stable@vger.kernel.org> # v4.4+
Reported-by: KoWei Sung <winders@amazon.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-10-08 22:38:10 -07:00
Yufen Yu
3891258443 md/raid5: reallocate page array after setting new stripe_size
When try to resize stripe_size, we also need to free old
shared page array and allocate new.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Yufen Yu
f16acaf328 md/raid5: resize stripe_head when reshape array
When reshape array, we try to reuse shared pages of old stripe_head,
and allocate more for the new one if needed.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:45 -07:00
Yufen Yu
046169f048 md/raid5: let multiple devices of stripe_head share page
In current implementation, grow_buffers() uses alloc_page() to
allocate the buffers for each stripe_head, i.e. allocate a page
for each dev[i] in stripe_head.

After setting stripe_size as a configurable value by writing
sysfs entry, it means that we always allocate 64K buffers, but
just use 4K of them when stripe_size is 4K in 64KB arm64.

To avoid wasting memory, we try to let multiple sh->dev share
one real page. That means, multiple sh->dev[i].page will point
to the only page with different offset. Example of 64K PAGE_SIZE
and 4K stripe_size as following:

                    64K PAGE_SIZE
          +---+---+---+---+------------------------------+
          |   |   |   |   |
          |   |   |   |   |
          +-+-+-+-+-+-+-+-+------------------------------+
            ^   ^   ^   ^
            |   |   |   +----------------------------+
            |   |   |                                |
            |   |   +-------------------+            |
            |   |                       |            |
            |   +----------+            |            |
            |              |            |            |
            +-+            |            |            |
              |            |            |            |
        +-----+-----+------+-----+------+-----+------+------+
sh      | offset(0) | offset(4K) | offset(8K) | offset(12K) |
 +      +-----------+------------+------------+-------------+
 +----> dev[0].page  dev[1].page  dev[2].page  dev[3].page

A new 'pages' array will be added into stripe_head to record shared
page used by this stripe_head. Allocate them when grow_buffers()
and free them when shrink_buffers().

After trying to share page, the users of sh->dev[i].page need to take
care of the related page offset: page of issued bio and page passed
to xor compution functions. But thanks for previous different page offset
supported. Here, we just need to set correct dev[i].offset.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
4f86ff5580 md/raid6: let async recovery function support different page offset
For now, asynchronous raid6 recovery calculate functions are require
common offset for pages. But, we expect them to support different page
offset after introducing stripe shared page. Do that by simplily adding
page offset where each page address are referred. Then, replace the
old interface with the new ones in raid6 and raid6test.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
d69454bc9f md/raid6: let syndrome computor support different page offset
For now, syndrome compute functions require common offset in the pages
array. However, we expect them to support different offset when try to
use shared page in the following. Simplily covert them by adding page
offset where each page address are referred.

Since the only caller of async_gen_syndrome() and async_syndrome_val()
are in raid6, we don't want to reserve the old interface but modify the
interface directly. After that, replacing old interfaces with new ones
for raid6 and raid6test.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
a7c224a820 md/raid5: convert to new xor compution interface
We try to replace async_xor() and async_xor_val() with the new
introduced interface async_xor_offs() and async_xor_val_offs()
for raid456.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
248728dd04 md/raid5: make async_copy_data() to support different page offset
ops_run_biofill() and ops_run_biodrain() will call async_copy_data()
to copy sh->dev[i].page from or to bio page. For now, it implies the
offset of dev[i].page is 0. But we want to support different page offset
in the following.

Thus, pass page offset to these functions and replace 'page_offset'
with 'page_offset + poff'.

No functional change.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Yufen Yu
7aba13b715 md/raid5: add a new member of offset into r5dev
Add a new member of offset into struct r5dev. It indicates the
offset of related dev[i].page. For now, since each device have a
privated page, the value is always 0. Thus, we set offset as 0
when allcate page in grow_buffers() and resize_stripes().

To support following different page offset, we try to use the page
offset rather than '0' directly for async_memcpy() and ops_run_io().

We try to support different page offset for xor compution functions
in the following. To avoid repeatly allocate a new array each time,
we add a memory region into scribble buffer to record offset.

No functional change.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-09-24 16:44:44 -07:00
Christoph Hellwig
1cb039f3dc bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it.  This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore.  It is replaced with a queue attribute which
also is writable for easier testing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Christoph Hellwig
c2e4cd57cf block: lift setting the readahead size into the block layer
Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.  To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Christoph Hellwig
16ef510139 md: update the optimal I/O size on reshape
The raid5 and raid10 drivers currently update the read-ahead size,
but not the optimal I/O size on reshape.  To prepare for deriving the
read-ahead size from the optimal I/O size make sure it is updated
as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24 13:43:39 -06:00
Linus Torvalds
4d41ead6ea block-5.9-2020-08-28
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9JYHoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjMMD/9pp/oWPRv2pRXf7JBdQUh7zznIg84Uzv6c
 6Bkfq3eGRk/Rvv21kqbI5gZOnp3/se4M3DNBph89/6Q3idlHAot9UCIAVtsQ2+I6
 p/kNKMMkzIguxP0B6pb9aQTjuvSxUOuxNgGVDU8akPktjDj7Gv/gxIQVchAHqXob
 GL1vpBK/fvjiDHrXCCFVldCsm98P3FT7DzRvXkGXVW7Hg5KHuUVgCZtzeDMaB/dx
 YeGenGm5ZCOLvM1N+jgmF89XSqUmrWlZz8iqFVGJfHRf0G0QwDc2mpdNusnZdXDr
 sf1oG+uXszTTcK/w7/MPaHJyj6s7cKJOsXEnNUjU4VloOZvuemk3W8+t5qQ/+S/C
 usPuUa+QhiqoDLa7YyYgNSO6lx70s+S58TLAEgRbU7SHq/g8huOBQ3YT0MYuhX9C
 ZVGZfa2Yb+zWoQVFPqJcClCNjqzLihSo4qZnGSXiY7aDYM088mzLWnnFLkQ+axUq
 lecWHGOkF5ComTePQV31YCHx9uPtRZzyRJ+bENbFIcQc/CztpARGbADAaphZjSmV
 LXqHJyyNHlIo5IVITQm/qbdU9SdWS8bGNoYkB9Fa53sSWt6NIRfvXyWPE1OuyJ7C
 ICaXyMvo+pM5nXo28BnPwO30tfN5byqnvb7P0g3l7j3Ej/ujkhGPmWDyvj27cnve
 nrKgMZ9bXw==
 =UytK
 -----END PGP SIGNATURE-----

Merge tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - nbd timeout fix (Hou)

 - device size fix for loop LOOP_CONFIGURE (Martijn)

 - MD pull from Song with raid5 stripe size fix (Yufen)

* tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-block:
  md/raid5: make sure stripe_size as power of two
  loop: Set correct device size when using LOOP_CONFIGURE
  nbd: restore default timeout when setting it to zero
2020-08-28 16:38:29 -07:00
Yufen Yu
6af10a33c5 md/raid5: make sure stripe_size as power of two
Commit 3b5408b98e ("md/raid5: support config stripe_size by sysfs
entry") make stripe_size as a configurable value. It just requires
stripe_size as multiple of 4KB.

In fact, we should make sure stripe_size as power of two. Otherwise,
stripe_shift which is the result of ilog2 can not represent the real
stripe_size. Then, stripe_hash() and stripe_hash_locks_hash() may
get unexpected value.

Fixes: 3b5408b98e ("md/raid5: support config stripe_size by sysfs entry")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-27 22:41:03 -07:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Linus Torvalds
97d052ea3f A set of locking fixes and updates:
- Untangle the header spaghetti which causes build failures in various
     situations caused by the lockdep additions to seqcount to validate that
     the write side critical sections are non-preemptible.
 
   - The seqcount associated lock debug addons which were blocked by the
     above fallout.
 
     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict per
     CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot
     validate that the lock is held.
 
     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored and
     write_seqcount_begin() has a lockdep assertion to validate that the
     lock is held.
 
     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API is
     unchanged and determines the type at compile time with the help of
     _Generic which is possible now that the minimal GCC version has been
     moved up.
 
     Adding this lockdep coverage unearthed a handful of seqcount bugs which
     have been addressed already independent of this.
 
     While generaly useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if the
     writers are serialized by an associated lock, which leads to the well
     known reader preempts writer livelock. RT prevents this by storing the
     associated lock pointer independent of lockdep in the seqcount and
     changing the reader side to block on the lock when a reader detects
     that a writer is in the write side critical section.
 
  - Conversion of seqcount usage sites to associated types and initializers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8xmPYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTuQEACyzQCjU8PgehPp9oMqWzaX2fcVyuZO
 QU2yw6gmz2oTz3ZHUNwdW8UnzGh2OWosK3kDruoD9FtSS51lER1/ISfSPCGfyqxC
 KTjOcB1Kvxwq/3LcCx7Zi3ZxWApat74qs3EhYhKtEiQ2Y9xv9rLq8VV1UWAwyxq0
 eHpjlIJ6b6rbt+ARslaB7drnccOsdK+W/roNj4kfyt+gezjBfojGRdMGQNMFcpnv
 shuTC+vYurAVIiVA/0IuizgHfwZiXOtVpjVoEWaxg6bBH6HNuYMYzdSa/YrlDkZs
 n/aBI/Xkvx+Eacu8b1Zwmbzs5EnikUK/2dMqbzXKUZK61eV4hX5c2xrnr1yGWKTs
 F/juh69Squ7X6VZyKVgJ9RIccVueqwR2EprXWgH3+RMice5kjnXH4zURp0GHALxa
 DFPfB6fawcH3Ps87kcRFvjgm6FBo0hJ1AxmsW1dY4ACFB9azFa2euW+AARDzHOy2
 VRsUdhL9CGwtPjXcZ/9Rhej6fZLGBXKr8uq5QiMuvttp4b6+j9FEfBgD4S6h8csl
 AT2c2I9LcbWqyUM9P4S7zY/YgOZw88vHRuDH7tEBdIeoiHfrbSBU7EQ9jlAKq/59
 f+Htu2Io281c005g7DEeuCYvpzSYnJnAitj5Lmp/kzk2Wn3utY1uIAVszqwf95Ul
 81ppn2KlvzUK8g==
 =7Gj+
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
Linus Torvalds
e0fc99e21e for-5.9/drivers-20200803
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl8od3oQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppkpD/9D+XqD9qYcYTj+ShVCc5+3RtMG5ZiAAX0y
 l4QXomentn/1Y0UYXFGJH7JLZWrKYT0QiktLtfpe5pmTqRUkckTIyJQlsHb+K6Dz
 lFjtywRK9pcFYgiWIUg80wlJKrTa8QdnrlS/Esn4YITKGRbgMIdFvq2jymXC+1ho
 RgodlgzcBUREgHSLo0H3cqEKA53fQiJhKC6CbFrFdrkpf2yUpcTfEDtpSwuIuPj3
 2AUed1qXUtNjdHciCn3N37OuHqXKAA9noXAWfg9Gx/5zfGUNX9QJvlsny1AopgS0
 jJvPSDVAhu/qRLHW6q/ZOT0JAlHegguuTAOtgMh2cMpAS5sumCAtltxVcI7Qnx41
 HalMpTefXsVoBo0gfjqldnIPt34ZNj5aH5GYaH/wPpSg6VkTVBJK8GuQDBvg27qT
 w+U/T6EzuqniWXh/P3COhfrMCR9ueUOY1qWCRwzomlpeIfBhCzidt2wUqIxX1TOA
 Q0Ltf0eERDevsZbE+tIm+VAAg98kHehcS2t8lfFYFO6/PKu2iJpJt/HtJbZNBE+W
 rm96E4qXRiy1UuL7D9vBkaWsbnosuNHgGQXx57GlokQU+2IGBmOxV52XHiSxxpXd
 AS1ZTd56ItmID8VaU09Pbf7ZFbiCgdEAxIbUFzaCuvo+lxryHFphIUARNi/zPnNT
 UC2OzunCqA==
 =oADH
 -----END PGP SIGNATURE-----

Merge tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:

 - NVMe:
      - ZNS support (Aravind, Keith, Matias, Niklas)
      - Misc cleanups, optimizations, fixes (Baolin, Chaitanya, David,
        Dongli, Max, Sagi)

 - null_blk zone capacity support (Aravind)

 - MD:
      - raid5/6 fixes (ChangSyun)
      - Warning fixes (Damien)
      - raid5 stripe fixes (Guoqing, Song, Yufen)
      - sysfs deadlock fix (Junxiao)
      - raid10 deadlock fix (Vitaly)

 - struct_size conversions (Gustavo)

 - Set of bcache updates/fixes (Coly)

* tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block: (117 commits)
  md/raid5: Allow degraded raid6 to do rmw
  md/raid5: Fix Force reconstruct-write io stuck in degraded raid5
  raid5: don't duplicate code for different paths in handle_stripe
  raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show
  md: print errno in super_written
  md/raid5: remove the redundant setting of STRIPE_HANDLE
  md: register new md sysfs file 'uuid' read-only
  md: fix max sectors calculation for super 1.0
  nvme-loop: remove extra variable in create ctrl
  nvme-loop: set ctrl state connecting after init
  nvme-multipath: do not fall back to __nvme_find_path() for non-optimized paths
  nvme-multipath: fix logic for non-optimized paths
  nvme-rdma: fix controller reset hang during traffic
  nvme-tcp: fix controller reset hang during traffic
  nvmet: introduce the passthru Kconfig option
  nvmet: introduce the passthru configfs interface
  nvmet: Add passthru enable/disable helpers
  nvmet: add passthru code to process commands
  nvme: export nvme_find_get_ns() and nvme_put_ns()
  nvme: introduce nvme_ctrl_get_by_path()
  ...
2020-08-05 10:51:40 -07:00
Linus Torvalds
99ea1521a0 Remove uninitialized_var() macro for v5.9-rc1
- Clean up non-trivial uses of uninitialized_var()
 - Update documentation and checkpatch for uninitialized_var() removal
 - Treewide removal of uninitialized_var()
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oYLQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsfjEACvf0D3WL3H7sLHtZ2HeMwOgAzq
 il08t6vUscINQwiIIK3Be43ok3uQ1Q+bj8sr2gSYTwunV2IYHFferzgzhyMMno3o
 XBIGd1E+v1E4DGBOiRXJvacBivKrfvrdZ7AWiGlVBKfg2E0fL1aQbe9AYJ6eJSbp
 UGqkBkE207dugS5SQcwrlk1tWKUL089lhDAPd7iy/5RK76OsLRCJFzIerLHF2ZK2
 BwvA+NWXVQI6pNZ0aRtEtbbxwEU4X+2J/uaXH5kJDszMwRrgBT2qoedVu5LXFPi8
 +B84IzM2lii1HAFbrFlRyL/EMueVFzieN40EOB6O8wt60Y4iCy5wOUzAdZwFuSTI
 h0xT3JI8BWtpB3W+ryas9cl9GoOHHtPA8dShuV+Y+Q2bWe1Fs6kTl2Z4m4zKq56z
 63wQCdveFOkqiCLZb8s6FhnS11wKtAX4czvXRXaUPgdVQS1Ibyba851CRHIEY+9I
 AbtogoPN8FXzLsJn7pIxHR4ADz+eZ0dQ18f2hhQpP6/co65bYizNP5H3h+t9hGHG
 k3r2k8T+jpFPaddpZMvRvIVD8O2HvJZQTyY6Vvneuv6pnQWtr2DqPFn2YooRnzoa
 dbBMtpon+vYz6OWokC5QNWLqHWqvY9TmMfcVFUXE4AFse8vh4wJ8jJCNOFVp8On+
 drhmmImUr1YylrtVOw==
 =xHmk
 -----END PGP SIGNATURE-----

Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull uninitialized_var() macro removal from Kees Cook:
 "This is long overdue, and has hidden too many bugs over the years. The
  series has several "by hand" fixes, and then a trivial treewide
  replacement.

   - Clean up non-trivial uses of uninitialized_var()

   - Update documentation and checkpatch for uninitialized_var() removal

   - Treewide removal of uninitialized_var()"

* tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  compiler: Remove uninitialized_var() macro
  treewide: Remove uninitialized_var() usage
  checkpatch: Remove awareness of uninitialized_var() macro
  mm/debug_vm_pgtable: Remove uninitialized_var() usage
  f2fs: Eliminate usage of uninitialized_var() macro
  media: sur40: Remove uninitialized_var() usage
  KVM: PPC: Book3S PR: Remove uninitialized_var() usage
  clk: spear: Remove uninitialized_var() usage
  clk: st: Remove uninitialized_var() usage
  spi: davinci: Remove uninitialized_var() usage
  ide: Remove uninitialized_var() usage
  rtlwifi: rtl8192cu: Remove uninitialized_var() usage
  b43: Remove uninitialized_var() usage
  drbd: Remove uninitialized_var() usage
  x86/mm/numa: Remove uninitialized_var() usage
  docs: deprecated.rst: Add uninitialized_var()
2020-08-04 13:49:43 -07:00
ChangSyun Peng
45a4d8fd6c md/raid5: Allow degraded raid6 to do rmw
Degraded raid6 always do reconstruct-write now. With raid6 xor supported,
we can do rmw in degraded raid6. This patch can reduce many read IOs to
improve performance.

If the failed disk is P, Q or the disk we want to write to, we may need to
do reconstruct-write in max degraded raid6. In this situation we can not
read enough data from handle_stripe_dirtying() so we have to set force_rcw
in handle_stripe_fill() to read all data.

Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: BingJing Chang <bingjingc@synology.com>
Reviewed-by: Danny Shih <dannyshih@synology.com>
Signed-off-by: ChangSyun Peng <allenpeng@synology.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:54:31 -07:00
ChangSyun Peng
a1c6ae3d9f md/raid5: Fix Force reconstruct-write io stuck in degraded raid5
In degraded raid5, we need to read parity to do reconstruct-write when
data disks fail. However, we can not read parity from
handle_stripe_dirtying() in force reconstruct-write mode.

Reproducible Steps:

1. Create degraded raid5
mdadm -C /dev/md2 --assume-clean -l5 -n3 /dev/sda2 /dev/sdb2 missing
2. Set rmw_level to 0
echo 0 > /sys/block/md2/md/rmw_level
3. IO to raid5

Now some io may be stuck in raid5. We can use handle_stripe_fill() to read
the parity in this situation.

Cc: <stable@vger.kernel.org> # v4.4+
Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: BingJing Chang <bingjingc@synology.com>
Reviewed-by: Danny Shih <dannyshih@synology.com>
Signed-off-by: ChangSyun Peng <allenpeng@synology.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:54:25 -07:00
Guoqing Jiang
3a31cf3d21 raid5: don't duplicate code for different paths in handle_stripe
As we can see, R5_LOCKED is set and s.locked is increased whether
R5_ReWrite is set or not, so move it to common path.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:03:52 -07:00
Guoqing Jiang
e3914d596f md/raid5: remove the redundant setting of STRIPE_HANDLE
The flag is already set before compare rcw with rmw, so it is
not necessary to do it again.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:03:51 -07:00
Ahmed S. Darwish
0a87b25ff2 raid5: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-20-a.darwish@linutronix.de
2020-07-29 16:14:27 +02:00
Yufen Yu
83c3e5e17b md/raid5: use do_div() for 64 bit divisions in raid5_sync_request
We get compilation error on 32-bit architectures (e.g. m68k), as:

  ERROR: modpost: "__udivdi3" [drivers/md/raid456.ko] undefined!

Since 'sync_blocks' is defined as u64, use do_div() to fix this error.

Fixes: c911c46c01 ("md/raid456: convert macro STRIPE_* to RAID5_STRIPE_*")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-22 22:49:46 -07:00
Yufen Yu
3b5408b98e md/raid5: support config stripe_size by sysfs entry
Adding a new 'stripe_size' sysfs entry to set and show stripe_size.
stripe_size should not be bigger than PAGE_SIZE, and it requires to
be multiple of 4096. We can adjust stripe_size by writing value into
sysfs entry, likely, set stripe_size as 16KB:

          echo 16384 > /sys/block/md1/md/stripe_size

Show current stripe_size value:

          cat /sys/block/md1/md/stripe_size

For PAGE_SIZE is equal to 4096, 'stripe_size' can just be read.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-21 17:18:22 -07:00
Yufen Yu
e236858243 md/raid5: set default stripe_size as 4096
In RAID5, if issued bio size is bigger than stripe_size, it will be
split in the unit of stripe_size and process them one by one. Even
for size less then stripe_size, RAID5 also request data from disk at
least of stripe_size.

Nowdays, stripe_size is equal to the value of PAGE_SIZE. Since filesystem
usually issue bio in the unit of 4KB, there is no problem for PAGE_SIZE
as 4KB. But, for 64KB PAGE_SIZE, bio from filesystem requests 4KB data
while RAID5 issue IO at least stripe_size (64KB) each time. That will
waste resource of disk bandwidth and compute xor.

To avoding the waste, we want to make stripe_size configurable. This
patch just set default stripe_size as 4096. User can also set the value
bigger than 4KB for some special requirements, such as we know the
issued io size is more than 4KB.

To evaluate the new feature, we create raid5 device '/dev/md5' with
4 SSD disk and test it on arm64 machine with 64KB PAGE_SIZE.

1) We format /dev/md5 with mkfs.ext4 and mount ext4 with default
 configure on /mnt directory. Then, trying to test it by dbench with
 command: dbench -D /mnt -t 1000 10. Result show as:

 'stripe_size = 64KB'

  Operation      Count    AvgLat    MaxLat
  ----------------------------------------
  NTCreateX    9805011     0.021    64.728
  Close        7202525     0.001     0.120
  Rename        415213     0.051    44.681
  Unlink       1980066     0.079    93.147
  Deltree          240     1.793     6.516
  Mkdir            120     0.004     0.007
  Qpathinfo    8887512     0.007    37.114
  Qfileinfo    1557262     0.001     0.030
  Qfsinfo      1629582     0.012     0.152
  Sfileinfo     798756     0.040    57.641
  Find         3436004     0.019    57.782
  WriteX       4887239     0.021    57.638
  ReadX        15370483     0.005    37.818
  LockX          31934     0.003     0.022
  UnlockX        31933     0.001     0.021
  Flush         687205    13.302   530.088

 Throughput 307.799 MB/sec  10 clients  10 procs  max_latency=530.091 ms
 -------------------------------------------------------

 'stripe_size = 4KB'

  Operation      Count    AvgLat    MaxLat
  ----------------------------------------
  NTCreateX    11999166     0.021    36.380
  Close        8814128     0.001     0.122
  Rename        508113     0.051    29.169
  Unlink       2423242     0.070    38.141
  Deltree          300     1.885     7.155
  Mkdir            150     0.004     0.006
  Qpathinfo    10875921     0.007    35.485
  Qfileinfo    1905837     0.001     0.032
  Qfsinfo      1994304     0.012     0.125
  Sfileinfo     977450     0.029    26.489
  Find         4204952     0.019     9.361
  WriteX       5981890     0.019    27.804
  ReadX        18809742     0.004    33.491
  LockX          39074     0.003     0.025
  UnlockX        39074     0.001     0.014
  Flush         841022    10.712   458.848

 Throughput 376.777 MB/sec  10 clients  10 procs  max_latency=458.852 ms
 -------------------------------------------------------

 It show that setting stripe_size as 4KB has higher thoughput, i.e.
 (376.777 vs 307.799) and has smaller latency than that setting as 64KB.

 2) We try to evaluate IO throughput for /dev/md5 by fio with config:

 [4KB randwrite]
 direct=1
 numjob=2
 iodepth=64
 ioengine=libaio
 filename=/dev/md5
 bs=4KB
 rw=randwrite

 [64KB write]
 direct=1
 numjob=2
 iodepth=64
 ioengine=libaio
 filename=/dev/md5
 bs=1MB
 rw=write

 The result as follow:

               +                   +
               | stripe_size(64KB) | stripe_size(4KB)
 +----------------------------------------------------+
 4KB randwrite |     15MB/s        |      100MB/s
 +----------------------------------------------------+
 1MB write     |   1000MB/s        |      700MB/s

 The result show that when size of io is bigger than 4KB (64KB),
 64KB stripe_size has much higher IOPS. But for 4KB randwrite, that
 means, size of io issued to device are smaller, 4KB stripe_size
 have better performance.

Normally, default value (4096) can get relatively good performance.
But if each issued io is bigger than 4096, setting value more than
4096 may get better performance.

Here, we just set default stripe_size as 4096, and we will try to
support setting different stripe_size by sysfs interface in the
following patch.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-21 17:18:17 -07:00
Yufen Yu
c911c46c01 md/raid456: convert macro STRIPE_* to RAID5_STRIPE_*
Convert macro STRIPE_SIZE, STRIPE_SECTORS and STRIPE_SHIFT to
RAID5_STRIPE_SIZE(), RAID5_STRIPE_SECTORS() and RAID5_STRIPE_SHIFT().

This patch is prepare for the following adjustable stripe_size.
It will not change any existing functionality.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-21 17:18:12 -07:00
Kees Cook
3f649ab728 treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16 12:35:15 -07:00
Guoqing Jiang
1684e97538 raid5: remove the meaningless check in raid5_make_request
We can't guarntee the batched stripe to be set with STRIPE_HANDLE since
there are lots of functions could set the flag, such as sync_request,
ops_complete_* and end_{read,write}_request etc.

Also clear_batch_ready called in handle_stripe ensures the batched list
can't continue to be handled by handle_stripe.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-16 10:14:08 -07:00
Guoqing Jiang
cb9902db38 raid5: put the comment of clear_batch_ready to the right place
To make people understand the function well, let's put the comment to
the right place.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-16 10:14:04 -07:00
Guoqing Jiang
a377a472b9 raid5: call clear_batch_ready before set STRIPE_ACTIVE
We tried to only put the head sh of batch list to handle_list, then the
handle_stripe doesn't handle other members in the batch list. However,
we still got the calltrace in break_stripe_batch_list.

[593764.644269] stripe state: 2003
kernel: [593764.644299] ------------[ cut here ]------------
kernel: [593764.644308] WARNING: CPU: 12 PID: 856 at drivers/md/raid5.c:4625 break_stripe_batch_list+0x203/0x240 [raid456]
[...]
kernel: [593764.644363] Call Trace:
kernel: [593764.644370]  handle_stripe+0x907/0x20c0 [raid456]
kernel: [593764.644376]  ? __wake_up_common_lock+0x89/0xc0
kernel: [593764.644379]  handle_active_stripes.isra.57+0x35f/0x570 [raid456]
kernel: [593764.644382]  ? raid5_wakeup_stripe_thread+0x96/0x1f0 [raid456]
kernel: [593764.644385]  raid5d+0x480/0x6a0 [raid456]
kernel: [593764.644390]  ? md_thread+0x11f/0x160
kernel: [593764.644392]  md_thread+0x11f/0x160
kernel: [593764.644394]  ? wait_woken+0x80/0x80
kernel: [593764.644396]  kthread+0xfc/0x130
kernel: [593764.644398]  ? find_pers+0x70/0x70
kernel: [593764.644399]  ? kthread_create_on_node+0x70/0x70
kernel: [593764.644401]  ret_from_fork+0x1f/0x30

As we can see, the stripe was set with STRIPE_ACTIVE and STRIPE_HANDLE,
and only handle_stripe could set those flags then return. And since the
stipe was already in the batch list, we need to return earlier before
set the two flags.

And after dig a little about git history especially commit 3664847d95
("md/raid5: fix a race condition in stripe batch"), it seems the batched
stipe still could be handled by handle_stipe, then handle_stipe needs to
return earlier if clear_batch_ready to return true.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-16 10:12:18 -07:00
Damien Le Moal
2aada5b14b md: raid5: Fix compilation warning
Remove the if statement around the calls to sysfs_link_rdev() to avoid
the compilation warning "suggest braces around empty body in an ‘if’
statement" when compiling with W=1.

Also fix function description comments to avoid kdoc format warnings.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-15 22:46:07 -07:00
Junxiao Bi
e1a86dbbbd md: fix deadlock causing by sysfs_notify
The following deadlock was captured. The first process is holding 'kernfs_mutex'
and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device,
this pending bio list would be flushed by second process 'md127_raid1', but
it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace
sysfs_notify() can fix it. There were other sysfs_notify() invoked from io
path, removed all of them.

 PID: 40430  TASK: ffff8ee9c8c65c40  CPU: 29  COMMAND: "probe_file"
  #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec
  #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06
  #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6
  #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs]
  #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs]
  #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs]
  #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs]
  #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs]
  #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs]
  #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7
 #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93
 #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968
 #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2
 #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445
 #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f
 #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a
 #16 [ffffb87c4df37958] new_slab at ffffffff9a251430
 #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85
 #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d
 #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89
 #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10
 #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854
 #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377
 #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b
 #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118
 #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83
 #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619
 #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af
 #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566
 #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787
 #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d
 #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e
 #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949
 #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad
     RIP: 00007f617a5f2905  RSP: 00007f607334f838  RFLAGS: 00000246
     RAX: ffffffffffffffda  RBX: 00007f6064044b20  RCX: 00007f617a5f2905
     RDX: 00007f6064044b20  RSI: 00007f6064044b20  RDI: 00007f6064005890
     RBP: 00007f6064044aa0   R8: 0000000000000030   R9: 000000000000011c
     R10: 0000000000000013  R11: 0000000000000246  R12: 00007f606417e6d0
     R13: 00007f6064044aa0  R14: 00007f6064044b10  R15: 00000000ffffffff
     ORIG_RAX: 0000000000000006  CS: 0033  SS: 002b

 PID: 927    TASK: ffff8f15ac5dbd80  CPU: 42  COMMAND: "md127_raid1"
  #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec
  #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06
  #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e
  #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc
  #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013
  #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f
  #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83
  #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a
  #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696
  #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5
 #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c
 #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1]
 #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348
 #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005
 #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-14 22:58:51 -07:00