2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

468 Commits

Author SHA1 Message Date
Dave Chinner
89563e7dc0 xfs: factor out filestreams from xfs_bmap_btalloc_nullfb
There's many if (filestreams) {} else {} branches in this function.
Split it out into a filestreams specific function so that we can
then work directly on cleaning up the filestreams code without
impacting the rest of the allocation algorithms.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:54 +11:00
Dave Chinner
230e8fe846 xfs: fold xfs_alloc_ag_vextent() into callers
We don't need the multiplexing xfs_alloc_ag_vextent() provided
anymore - we can just call the exact/near/size variants directly.
This allows us to remove args->type completely and stop using
args->fsbno as an input to the allocator algorithms.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:54 +11:00
Dave Chinner
5f36b2ce79 xfs: introduce xfs_alloc_vextent_exact_bno()
Two of the callers to xfs_alloc_vextent_this_ag() actually want
exact block number allocation, not anywhere-in-ag allocation. Split
this out from _this_ag() as a first class citizen so no external
extent allocation code needs to care about args->type anymore.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:54 +11:00
Dave Chinner
db4710fd12 xfs: introduce xfs_alloc_vextent_near_bno()
The remaining callers of xfs_alloc_vextent() are all doing NEAR_BNO
allocations. We can replace that function with a new
xfs_alloc_vextent_near_bno() function that does this explicitly.

We also multiplex NEAR_BNO allocations through
xfs_alloc_vextent_this_ag via args->type. Replace all of these with
direct calls to xfs_alloc_vextent_near_bno(), too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:54 +11:00
Dave Chinner
2a7f6d41d8 xfs: use xfs_alloc_vextent_start_bno() where appropriate
Change obvious callers of single AG allocation to use
xfs_alloc_vextent_start_bno(). Callers no long need to specify
XFS_ALLOCTYPE_START_BNO, and so the type can be driven inward and
removed.

While doing this, also pass the allocation target fsb as a parameter
rather than encoding it in args->fsbno.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:53 +11:00
Dave Chinner
319c9e874a xfs: use xfs_alloc_vextent_first_ag() where appropriate
Change obvious callers of single AG allocation to use
xfs_alloc_vextent_first_ag(). This gets rid of
XFS_ALLOCTYPE_FIRST_AG as the type used within
xfs_alloc_vextent_first_ag() during iteration is _THIS_AG. Hence we
can remove the setting of args->type from all the callers of
_first_ag() and remove the alloctype.

While doing this, pass the allocation target fsb as a parameter
rather than encoding it in args->fsbno. This starts the process
of making args->fsbno an output only variable rather than
input/output.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:53 +11:00
Dave Chinner
8584332709 xfs: factor xfs_bmap_btalloc()
There are several different contexts xfs_bmap_btalloc() handles, and
large chunks of the code execute independent allocation contexts.
Try to untangle this mess a bit.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:53 +11:00
Dave Chinner
74c36a8689 xfs: use xfs_alloc_vextent_this_ag() where appropriate
Change obvious callers of single AG allocation to use
xfs_alloc_vextent_this_ag(). Drive the per-ag grabbing out to the
callers, too, so that callers with active references don't need
to do new lookups just for an allocation in a context that already
has a perag reference.

The only remaining caller that does single AG allocation through
xfs_alloc_vextent() is xfs_bmap_btalloc() with
XFS_ALLOCTYPE_NEAR_BNO. That is going to need more untangling before
it can be converted cleanly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:53 +11:00
Dave Chinner
76257a1587 xfs: introduce xfs_for_each_perag_wrap()
In several places we iterate every AG from a specific start agno and
wrap back to the first AG when we reach the end of the filesystem to
continue searching. We don't have a primitive for this iteration
yet, so add one for conversion of these algorithms to per-ag based
iteration.

The filestream AG select code is a mess, and this initially makes it
worse. The per-ag selection needs to be driven completely into the
filestream code to clean this up and it will be done in a future
patch that makes the filestream allocator use active per-ag
references correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:53 +11:00
Dave Chinner
7ac2ff8bb3 xfs: perags need atomic operational state
We currently don't have any flags or operational state in the
xfs_perag except for the pagf_init and pagi_init flags. And the
agflreset flag. Oh, there's also the pagf_metadata and pagi_inodeok
flags, too.

For controlling per-ag operations, we are going to need some atomic
state flags. Hence add an opstate field similar to what we already
have in the mount and log, and convert all these state flags across
to atomic bit operations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13 09:14:52 +11:00
Dave Chinner
692b6cddeb xfs: t_firstblock is tracking AGs not blocks
The tp->t_firstblock field is now raelly tracking the highest AG we
have locked, not the block number of the highest allocation we've
made. It's purpose is to prevent AGF locking deadlocks, so rename it
to "highest AG" and simplify the implementation to just track the
agno rather than a fsbno.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-11 04:11:06 +11:00
Dave Chinner
36b6ad2d9c xfs: drop firstblock constraints from allocation setup
Now that xfs_alloc_vextent() does all the AGF deadlock prevention
filtering for multiple allocations in a single transaction, we no
longer need the allocation setup code to care about what AGs we
might already have locked.

Hence we can remove all the "nullfb" conditional logic in places
like xfs_bmap_btalloc() and instead have them focus simply on
setting up locality constraints. If the allocation fails due to
AGF lock filtering in xfs_alloc_vextent, then we just fall back as
we normally do to more relaxed allocation constraints.

As a result, any allocation that allows AG scanning (i.e. not
confined to a single AG) and does not force a worst case full
filesystem scan will now be able to attempt allocation from AGs
lower than that defined by tp->t_firstblock. This is because
xfs_alloc_vextent() allows try-locking of the AGFs and hence enables
low space algorithms to at least -try- to get space from AGs lower
than the one that we have currently locked and allocated from. This
is a significant improvement in the low space allocation algorithm.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-11 04:10:06 +11:00
Dave Chinner
d5753847b2 xfs: block reservation too large for minleft allocation
When we enter xfs_bmbt_alloc_block() without having first allocated
a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we
are doing something like unwritten extent conversion, the transaction
block reservation is used as the minleft value.

This works for operations like unwritten extent conversion, but it
assumes that the block reservation is only for a BMBT split. THis is
not always true, and sometimes results in larger than necessary
minleft values being set. We only actually need enough space for a
btree split, something we already handle correctly in
xfs_bmapi_write() via the xfs_bmapi_minleft() calculation.

We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to
calculate the number of blocks a BMBT split on this inode is going to
require, not use the transaction block reservation that contains the
maximum number of blocks this transaction may consume in it...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-11 04:09:06 +11:00
Dave Chinner
1dd0510f6d xfs: fix low space alloc deadlock
I've recently encountered an ABBA deadlock with g/476. The upcoming
changes seem to make this much easier to hit, but the underlying
problem is a pre-existing one.

Essentially, if we select an AG for allocation, then lock the AGF
and then fail to allocate for some reason (e.g. minimum length
requirements cannot be satisfied), then we drop out of the
allocation with the AGF still locked.

The caller then modifies the allocation constraints - usually
loosening them up - and tries again. This can result in trying to
access AGFs that are lower than the AGF we already have locked from
the failed attempt. e.g. the failed attempt skipped several AGs
before failing, so we have locks an AG higher than the start AG.
Retrying the allocation from the start AG then causes us to violate
AGF lock ordering and this can lead to deadlocks.

The deadlock exists even if allocation succeeds - we can do a
followup allocations in the same transaction for BMBT blocks that
aren't guaranteed to be in the same AG as the original, and can move
into higher AGs. Hence we really need to move the tp->t_firstblock
tracking down into xfs_alloc_vextent() where it can be set when we
exit with a locked AG.

xfs_alloc_vextent() can also check there if the requested
allocation falls within the allow range of AGs set by
tp->t_firstblock. If we can't allocate within the range set, we have
to fail the allocation. If we are allowed to to non-blocking AGF
locking, we can ignore the AG locking order limitations as we can
use try-locks for the first iteration over requested AG range.

This invalidates a set of post allocation asserts that check that
the allocation is always above tp->t_firstblock if it is set.
Because we can use try-locks to avoid the deadlock in some
circumstances, having a pre-existing locked AGF doesn't always
prevent allocation from lower order AGFs. Hence those ASSERTs need
to be removed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-11 04:07:06 +11:00
Darrick J. Wong
ddccb81b26 xfs: pass the xfs_bmbt_irec directly through the log intent code
Instead of repeatedly boxing and unboxing the incore extent mapping
structure as it passes through the BUI code, pass the pointer directly
through.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05 08:48:11 -08:00
Darrick J. Wong
ddfdd530e4 xfs: invalidate xfs_bufs when allocating cow extents
While investigating test failures in xfs/17[1-3] in alwayscow mode, I
noticed through code inspection that xfs_bmap_alloc_userdata isn't
setting XFS_ALLOC_USERDATA when allocating extents for a file's CoW
fork.  COW staging extents should be flagged as USERDATA, since user
data are persisted to these blocks before being remapped into a file.

This mis-classification has a few impacts on the behavior of the system.
First, the filestreams allocator is supposed to keep allocating from a
chosen AG until it runs out of space in that AG.  However, it only does
that for USERDATA allocations, which means that COW allocations aren't
tied to the filestreams AG.  Fortunately, few people use filestreams, so
nobody's noticed.

A more serious problem is that xfs_alloc_ag_vextent_small looks for a
buffer to invalidate *if* the USERDATA flag is set and the AG is so full
that the allocation had to come from the AGFL because the cntbt is
empty.  The consequences of not invalidating the buffer are severe --
if the AIL incorrectly checkpoints a buffer that is now being used to
store user data, that action will clobber the user's written data.

Fix filestreams and yet another data corruption vector by flagging COW
allocations as USERDATA.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-12-01 09:36:16 -08:00
Dave Chinner
304a68b9c6 xfs: use iomap_valid method to detect stale cached iomaps
Now that iomap supports a mechanism to validate cached iomaps for
buffered write operations, hook it up to the XFS buffered write ops
so that we can avoid data corruptions that result from stale cached
iomaps. See:

https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/

or the ->iomap_valid() introduction commit for exact details of the
corruption vector.

The validity cookie we store in the iomap is based on the type of
iomap we return. It is expected that the iomap->flags we set in
xfs_bmbt_to_iomap() is not perturbed by the iomap core and are
returned to us in the iomap passed via the .iomap_valid() callback.
This ensures that the validity cookie is always checking the correct
inode fork sequence numbers to detect potential changes that affect
the extent cached by the iomap.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-11-29 09:09:17 +11:00
Zeng Heng
78b0f58bdf xfs: clean up "%Ld/%Lu" which doesn't meet C standard
The "%Ld" specifier, which represents long long unsigned,
doesn't meet C language standard, and even more,
it makes people easily mistake with "%ld", which represent
long unsigned. So replace "%Ld" with "lld".

Do the same with "%Lu".

Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-09-19 06:47:14 +10:00
Darrick J. Wong
c01147d929 xfs: replace inode fork size macros with functions
Replace the shouty macros here with typechecked helper functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-07-12 11:17:27 -07:00
Darrick J. Wong
932b42c66c xfs: replace XFS_IFORK_Q with a proper predicate function
Replace this shouty macro with a real C function that has a more
descriptive name.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-07-12 11:17:27 -07:00
Darrick J. Wong
e45d7cb235 xfs: use XFS_IFORK_Q to determine the presence of an xattr fork
Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the
attribute fork but i_forkoff is zero.  This eliminates the ambiguity
between i_forkoff and i_af.if_present, which should make it easier to
understand the lifetime of attr forks.

While we're at it, remove the if_present checks around calls to
xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr
forks that have already been torn down.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-07-09 15:17:21 -07:00
Darrick J. Wong
2ed5b09b3e xfs: make inode attribute forks a permanent part of struct xfs_inode
Syzkaller reported a UAF bug a while back:

==================================================================
BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958

CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted
5.15.0-0.30.3-20220406_1406 #3
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106
 print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256
 __kasan_report mm/kasan/report.c:442 [inline]
 kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459
 xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
 xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159
 xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36
 __vfs_getxattr+0xdf/0x13d fs/xattr.c:399
 cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300
 security_inode_need_killpriv+0x4c/0x97 security/security.c:1408
 dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912
 dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908
 do_truncate+0xc3/0x1e0 fs/open.c:56
 handle_truncate fs/namei.c:3084 [inline]
 do_open fs/namei.c:3432 [inline]
 path_openat+0x30ab/0x396d fs/namei.c:3561
 do_filp_open+0x1c4/0x290 fs/namei.c:3588
 do_sys_openat2+0x60d/0x98c fs/open.c:1212
 do_sys_open+0xcf/0x13c fs/open.c:1228
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0x0
RIP: 0033:0x7f7ef4bb753d
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d
RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0
RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0
 </TASK>

Allocated by task 2953:
 kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:434 [inline]
 __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467
 kasan_slab_alloc include/linux/kasan.h:254 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:3213 [inline]
 slab_alloc mm/slub.c:3221 [inline]
 kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226
 kmem_cache_zalloc include/linux/slab.h:711 [inline]
 xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287
 xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098
 xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746
 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
 __vfs_setxattr+0x11b/0x177 fs/xattr.c:180
 __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214
 __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275
 vfs_setxattr+0x154/0x33d fs/xattr.c:301
 setxattr+0x216/0x29f fs/xattr.c:575
 __do_sys_fsetxattr fs/xattr.c:632 [inline]
 __se_sys_fsetxattr fs/xattr.c:621 [inline]
 __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0x0

Freed by task 2949:
 kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
 kasan_set_track+0x1c/0x21 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
 ____kasan_slab_free mm/kasan/common.c:366 [inline]
 ____kasan_slab_free mm/kasan/common.c:328 [inline]
 __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374
 kasan_slab_free include/linux/kasan.h:230 [inline]
 slab_free_hook mm/slub.c:1700 [inline]
 slab_free_freelist_hook mm/slub.c:1726 [inline]
 slab_free mm/slub.c:3492 [inline]
 kmem_cache_free+0xdc/0x3ce mm/slub.c:3508
 xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773
 xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822
 xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413
 xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684
 xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802
 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
 __vfs_removexattr+0x106/0x16a fs/xattr.c:468
 cap_inode_killpriv+0x24/0x47 security/commoncap.c:324
 security_inode_killpriv+0x54/0xa1 security/security.c:1414
 setattr_prepare+0x1a6/0x897 fs/attr.c:146
 xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682
 xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065
 xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093
 notify_change+0xae5/0x10a1 fs/attr.c:410
 do_truncate+0x134/0x1e0 fs/open.c:64
 handle_truncate fs/namei.c:3084 [inline]
 do_open fs/namei.c:3432 [inline]
 path_openat+0x30ab/0x396d fs/namei.c:3561
 do_filp_open+0x1c4/0x290 fs/namei.c:3588
 do_sys_openat2+0x60d/0x98c fs/open.c:1212
 do_sys_open+0xcf/0x13c fs/open.c:1228
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0x0

The buggy address belongs to the object at ffff88802cec9188
 which belongs to the cache xfs_ifork of size 40
The buggy address is located 20 bytes inside of
 40-byte region [ffff88802cec9188, ffff88802cec91b0)
The buggy address belongs to the page:
page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x2cec9
flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80
raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb
 ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
>ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
                            ^
 ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb
 ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb
==================================================================

The root cause of this bug is the unlocked access to xfs_inode.i_afp
from the getxattr code paths while trying to determine which ILOCK mode
to use to stabilize the xattr data.  Unfortunately, the VFS does not
acquire i_rwsem when vfs_getxattr (or listxattr) call into the
filesystem, which means that getxattr can race with a removexattr that's
tearing down the attr fork and crash:

xfs_attr_set:                          xfs_attr_get:
xfs_attr_fork_remove:                  xfs_ilock_attr_map_shared:

xfs_idestroy_fork(ip->i_afp);
kmem_cache_free(xfs_ifork_cache, ip->i_afp);

                                       if (ip->i_afp &&

ip->i_afp = NULL;

                                           xfs_need_iread_extents(ip->i_afp))
                                       <KABOOM>

ip->i_forkoff = 0;

Regrettably, the VFS is much more lax about i_rwsem and getxattr than
is immediately obvious -- not only does it not guarantee that we hold
i_rwsem, it actually doesn't guarantee that we *don't* hold it either.
The getxattr system call won't acquire the lock before calling XFS, but
the file capabilities code calls getxattr with and without i_rwsem held
to determine if the "security.capabilities" xattr is set on the file.

Fixing the VFS locking requires a treewide investigation into every code
path that could touch an xattr and what i_rwsem state it expects or sets
up.  That could take years or even prove impossible; fortunately, we
can fix this UAF problem inside XFS.

An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to
ensure that i_forkoff is always zeroed before i_afp is set to null and
changed the read paths to use smp_rmb before accessing i_forkoff and
i_afp, which avoided these UAF problems.  However, the patch author was
too busy dealing with other problems in the meantime, and by the time he
came back to this issue, the situation had changed a bit.

On a modern system with selinux, each inode will always have at least
one xattr for the selinux label, so it doesn't make much sense to keep
incurring the extra pointer dereference.  Furthermore, Allison's
upcoming parent pointer patchset will also cause nearly every inode in
the filesystem to have extended attributes.  Therefore, make the inode
attribute fork structure part of struct xfs_inode, at a cost of 40 more
bytes.

This patch adds a clunky if_present field where necessary to maintain
the existing logic of xattr fork null pointer testing in the existing
codebase.  The next patch switches the logic over to XFS_IFORK_Q and it
all goes away.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-07-09 15:17:21 -07:00
Darrick J. Wong
732436ef91 xfs: convert XFS_IFORK_PTR to a static inline helper
We're about to make this logic do a bit more, so convert the macro to a
static inline function for better typechecking and fewer shouty macros.
No functional changes here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-07-09 15:17:21 -07:00
Dave Chinner
08d3e84fee xfs: pass perag to xfs_alloc_read_agf()
xfs_alloc_read_agf() initialises the perag if it hasn't been done
yet, so it makes sense to pass it the perag rather than pull a
reference from the buffer. This allows callers to be per-ag centric
rather than passing mount/agno pairs everywhere.

Whilst modifying the xfs_reflink_find_shared() function definition,
declare it static and remove the extern declaration as it is an
internal function only these days.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-07-07 19:07:40 +10:00
Dave Chinner
76b47e528e xfs: kill xfs_alloc_pagf_init()
Trivial wrapper around xfs_alloc_read_agf(), can be easily replaced
by passing a NULL agfbp to xfs_alloc_read_agf().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-07-07 19:07:32 +10:00
Darrick J. Wong
4ed6435cc3 xfs: stop artificially limiting the length of bunmap calls
In commit e1a4e37cc7, we clamped the length of bunmapi calls on the
data forks of shared files to avoid two failure scenarios: one where the
extent being unmapped is so sparsely shared that we exceed the
transaction reservation with the sheer number of refcount btree updates
and EFI intent items; and the other where we attach so many deferred
updates to the transaction that we pin the log tail and later the log
head meets the tail, causing the log to livelock.

We avoid triggering the first problem by tracking the number of ops in
the refcount btree cursor and forcing a requeue of the refcount intent
item any time we think that we might be close to overflowing.  This has
been baked into XFS since before the original e1a4 patch.

A recent patchset fixed the second problem by changing the deferred ops
code to finish all the work items created by each round of trying to
complete a refcount intent item, which eliminates the long chains of
deferred items (27dad); and causing long-running transactions to relog
their intent log items when space in the log gets low (74f4d).

Because this clamp affects /any/ unmapping request regardless of the
sharing factors of the component blocks, it degrades the performance of
all large unmapping requests -- whereas with an unshared file we can
unmap millions of blocks in one go, shared files are limited to
unmapping a few thousand blocks at a time, which causes the upper level
code to spin in a bunmapi loop even if it wasn't needed.

This also eliminates one more place where log recovery behavior can
differ from online behavior, because bunmapi operations no longer need
to requeue.  The fstest generic/447 was created to test the old fix, and
it still passes with this applied.

Partial-revert-of: e1a4e37cc7 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent")
Depends: 27dada070d ("xfs: change the order in which child and parent defer ops ar finished")
Depends: 74f4d6a1e0 ("xfs: only relog deferred intent items if free space in the log gets low")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-04-28 10:24:59 -07:00
Dave Chinner
a44a027a8b Merge tag 'large-extent-counters-v9' of https://github.com/chandanr/linux into xfs-5.19-for-next
xfs: Large extent counters

The commit xfs: fix inode fork extent count overflow
(3f8a4f1d87) mentions that 10 billion
data fork extents should be possible to create. However the
corresponding on-disk field has a signed 32-bit type. Hence this
patchset extends the per-inode data fork extent counter to 64 bits
(out of which 48 bits are used to store the extent count).

Also, XFS has an attribute fork extent counter which is 16 bits
wide. A workload that,
1. Creates 1 million 255-byte sized xattrs,
2. Deletes 50% of these xattrs in an alternating manner,
3. Tries to insert 400,000 new 255-byte sized xattrs
   causes the xattr extent counter to overflow.

Dave tells me that there are instances where a single file has more
than 100 million hardlinks. With parent pointers being stored in
xattrs, we will overflow the signed 16-bits wide attribute extent
counter when large number of hardlinks are created. Hence this
patchset extends the on-disk field to 32-bits.

The following changes are made to accomplish this,
1. A 64-bit inode field is carved out of existing di_pad and
   di_flushiter fields to hold the 64-bit data fork extent counter.
2. The existing 32-bit inode data fork extent counter will be used to
   hold the attribute fork extent counter.
3. A new incompat superblock flag to prevent older kernels from mounting
   the filesystem.

Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21 16:46:17 +10:00
Dave Chinner
e7d410ac33 xfs: convert bmapi flags to unsigned.
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
fields to be unsigned.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21 10:46:09 +10:00
Dave Chinner
0e5b8e4522 xfs: convert bmap extent type flags to unsigned.
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
fields to be unsigned.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21 10:46:01 +10:00
Chandan Babu R
4f86bb4b66 xfs: Conditionally upgrade existing inodes to use large extent counters
This commit enables upgrading existing inodes to use large extent counters
provided that underlying filesystem's superblock has large extent counter
feature enabled.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-13 07:02:44 +00:00
Chandan Babu R
83a21c1844 xfs: Directory's data fork extent counter can never overflow
The maximum file size that can be represented by the data fork extent counter
in the worst case occurs when all extents are 1 block in length and each block
is 1KB in size.

With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
1KB sized blocks, a file can reach upto,
(2^31) * 1KB = 2TB

This is much larger than the theoretical maximum size of a directory
i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.

Since a directory's inode can never overflow its data fork extent counter,
this commit removes all the overflow checks associated with
it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
data fork is larger than 96GB.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-13 07:02:07 +00:00
Chandan Babu R
df9ad5cc7a xfs: Introduce macros to represent new maximum extent counts for data/attr forks
This commit defines new macros to represent maximum extent counts allowed by
filesystems which have support for large per-inode extent counters.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11 04:11:19 +00:00
Chandan Babu R
0c35e7ba18 xfs: Use uint64_t to count maximum blocks that can be used by BMBT
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11 04:11:19 +00:00
Chandan Babu R
755c38ffe1 xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively
A future commit will introduce a 64-bit on-disk data extent counter and a
32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and
xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions
of these quantities.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11 04:11:18 +00:00
Chandan Babu R
bb1d50494c xfs: Use xfs_extnum_t instead of basic data types
xfs_extnum_t is the type to use to declare variables which have values
obtained from xfs_dinode->di_[a]nextents. This commit replaces basic
types (e.g. uint32_t) with xfs_extnum_t for such variables.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11 04:11:17 +00:00
Chandan Babu R
9feb8f1966 xfs: Introduce xfs_iext_max_nextents() helper
xfs_iext_max_nextents() returns the maximum number of extents possible for one
of data, cow or attribute fork. This helper will be extended further in a
future commit when maximum extent counts associated with data/attribute forks
are increased.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11 04:11:17 +00:00
Chandan Babu R
95f0b95e2b xfs: Define max extent length based on on-disk format definition
The maximum extent length depends on maximum block count that can be stored in
a BMBT record. Hence this commit defines MAXEXTLEN based on
BMBT_BLOCKCOUNT_BITLEN.

While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2022-04-11 04:11:17 +00:00
Christoph Hellwig
740fd671e0 xfs: pass the mapping flags to xfs_bmbt_to_iomap
To prepare for looking at the IOMAP_DAX flag in xfs_bmbt_to_iomap pass in
the input mapping flags to xfs_bmbt_to_iomap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-24-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04 08:58:53 -08:00
Darrick J. Wong
c201d9ca53 xfs: rename xfs_bmap_add_free to xfs_free_extent_later
xfs_bmap_add_free isn't a block mapping function; it schedules deferred
freeing operations for a later point in a compound transaction chain.
While it's primarily used by bunmapi, its use has expanded beyond that.
Move it to xfs_alloc.c and rename the function since it's now general
freeing functionality.  Bring the slab cache bits in line with the
way we handle the other intent items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-10-22 16:04:36 -07:00
Darrick J. Wong
f3c799c22c xfs: create slab caches for frequently-used deferred items
Create slab caches for the high-level structures that coordinate
deferred intent items, since they're used fairly heavily.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-10-22 16:04:36 -07:00
Darrick J. Wong
182696fb02 xfs: rename _zone variables to _cache
Now that we've gotten rid of the kmem_zone_t typedef, rename the
variables to _cache since that's what they are.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-10-22 16:04:20 -07:00
Darrick J. Wong
e7720afad0 xfs: remove kmem_zone typedef
Remove these typedefs by referencing kmem_cache directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-10-22 16:00:31 -07:00
Darrick J. Wong
0ed5f7356d xfs: compute absolute maximum nlevels for each btree type
Add code for all five btree types so that we can compute the absolute
maximum possible btree height for each btree type.  This is a setup for
the next patch, which makes every btree type have its own cursor cache.

The functions are exported so that we can have xfs_db report the
absolute maximum btree heights for each btree type, rather than making
everyone run their own ad-hoc computations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-10-19 11:45:16 -07:00
Darrick J. Wong
c0643f6fdd xfs: encode the max btree height in the cursor
Encode the maximum btree height in the cursor, since we're soon going to
allow smaller cursors for AG btrees and larger cursors for file btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-10-19 11:45:15 -07:00
Darrick J. Wong
6ca444cfd6 xfs: prepare xfs_btree_cur for dynamic cursor heights
Split out the btree level information into a separate struct and put it
at the end of the cursor structure as a VLA.  Files with huge data forks
(and in the future, the realtime rmap btree) will require the ability to
support many more levels than a per-AG btree cursor, which means that
we're going to create per-btree type cursor caches to conserve memory
for the more common case.

Note that a subsequent patch actually introduces dynamic cursor heights.
This one merely rearranges the structure to prepare for that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-10-19 11:45:14 -07:00
Darrick J. Wong
ae127f087d xfs: remove xfs_btree_cur_t typedef
Get rid of this old typedef before we start changing other things.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-10-14 09:19:32 -07:00
Dave Chinner
9343ee7690 xfs: convert bp->b_bn references to xfs_buf_daddr()
Stop directly referencing b_bn in code outside the buffer cache, as
b_bn is supposed to be used only as an internal cache index.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:15 -07:00
Dave Chinner
04fcad80cd xfs: introduce xfs_buf_daddr()
Introduce a helper function xfs_buf_daddr() to extract the disk
address of the buffer from the struct xfs_buf. This will replace
direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as
the XFS_BUF_ADDR() macro.

This patch introduces the helper function and replaces all uses of
XFS_BUF_ADDR() as this is just a simple sed replacement.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:14 -07:00
Dave Chinner
75c8c50fa1 xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown
Remove the shouty macro and instead use the inline function that
matches other state/feature check wrapper naming. This conversion
was done with sed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:13 -07:00
Dave Chinner
0560f31a09 xfs: convert mount flags to features
Replace m_flags feature checks with xfs_has_<feature>() calls and
rework the setup code to set flags in m_features.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:12 -07:00
Dave Chinner
38c26bfd90 xfs: replace xfs_sb_version checks with feature flag checks
Convert the xfs_sb_version_hasfoo() to checks against
mp->m_features. Checks of the superblock itself during disk
operations (e.g. in the read/write verifiers and the to/from disk
formatters) are not converted - they operate purely on the
superblock state. Everything else should use the mount features.

Large parts of this conversion were done with sed with commands like
this:

for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
	sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
done

With manual cleanups for things like "xfs_has_extflgbit" and other
little inconsistencies in naming.

The result is ia lot less typing to check features and an XFS binary
size reduced by a bit over 3kB:

$ size -t fs/xfs/built-in.a
	text	   data	    bss	    dec	    hex	filenam
before	1130866  311352     484 1442702  16038e (TOTALS)
after	1127727  311352     484 1439563  15f74b (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:12 -07:00
Darrick J. Wong
8b943d21d4 xfs: assorted fixes for 5.14, part 1
This branch contains the first round of various small fixes for 5.14.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmC5YvwACgkQ+H93GTRK
 tOuDTxAAmW+Mc4oPx/Fsa2xqvkDYCGaM1QtkIY9McrGsMdzF0JbB7bR5rrUET2+N
 /FsFTIQ31vZfMEJ69HM6NjQ4xXIKhdEj+PlGsjvg62BqhpPBRuFH7dLBWChiRcf2
 M5SN1P9kcAjIoMR2tKzh8FMDmJTcJlJxxBi9kxb3YK1+UesUHsPS8/jSW3OuIov3
 AJzAZ08SrA8ful5uCMy9Mf5uBgfRuAUjDzA5CM5kA1xqlHREQi+oHl62E81N33mu
 RR8tdHQzvPO5rHyX84GzV5cu2CsmDuOPF2nA5SUxRhIZFfMo5mEezA/nxqqACYti
 rnRGxZVwIG9YYBESVxXFQXIjn5lHoQ2jomk/CraszeVEJCteNRnpbVGzd1xczM3u
 0iuDuHy+aVB/3QhfA6/0vjfttCzkMEld9U9c3WEjIaw5iUCxe531yfrvVEyF9blx
 NBjnQyHGbt+y26BzBjD33NJEdDoZqS3UIQ/rmb4f2mitGN5d9faAcJ754uRJt3o4
 K9HXGjuR+iOH/tCZKDL1hBc4M/pFgNdeBWyFYdQhh8eSj9HSCTCG58zAyQ2WPOSr
 6D/f4BMqivKzzz0HicZPJAoazrtcKGrWTHTxidHIkI4le367NOwv6YJquJ0pFMBs
 8L7OdRqYT3yw6+qErCjEn03WkP9O7V8lHf8hFxt7+dNxDukIj40=
 =6eZB
 -----END PGP SIGNATURE-----

Merge tag 'assorted-fixes-5.14-1_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2

xfs: assorted fixes for 5.14, part 1

This branch contains the first round of various small fixes for 5.14.

* tag 'assorted-fixes-5.14-1_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: don't take a spinlock unconditionally in the DIO fastpath
  xfs: mark xfs_bmap_set_attrforkoff static
  xfs: Remove redundant assignment to busy
  xfs: sort variable alphabetically to avoid repeated declaration
2021-06-08 09:22:34 -07:00
Darrick J. Wong
c3eabd3650 xfs: initial agnumber -> perag conversions for shrink
If we want to use active references to the perag to be able to gate
 shrink removing AGs and hence perags safely, we've got a fair bit of
 work to do actually use perags in all the places we need to.
 
 There's a lot of code that iterates ag numbers and then
 looks up perags from that, often multiple times for the same perag
 in the one operation. If we want to use reference counted perags for
 access control, then we need to convert all these uses to perag
 iterators, not agno iterators.
 
 [Patches 1-4]
 
 The first step of this is consolidating all the perag management -
 init, free, get, put, etc into a common location. THis is spread all
 over the place right now, so move it all into libxfs/xfs_ag.[ch].
 This does expose kernel only bits of the perag to libxfs and hence
 userspace, so the structures and code is rearranged to minimise the
 number of ifdefs that need to be added to the userspace codebase.
 The perag iterator in xfs_icache.c is promoted to a first class API
 and expanded to the needs of the code as required.
 
 [Patches 5-10]
 
 These are the first basic perag iterator conversions and changes to
 pass the perag down the stack from those iterators where
 appropriate. A lot of this is obvious, simple changes, though in
 some places we stop passing the perag down the stack because the
 code enters into an as yet unconverted subsystem that still uses raw
 AGs.
 
 [Patches 11-16]
 
 These replace the agno passed in the btree cursor for per-ag btree
 operations with a perag that is passed to the cursor init function.
 The cursor takes it's own reference to the perag, and the reference
 is dropped when the cursor is deleted. Hence we get reference
 coverage for the entire time the cursor is active, even if the code
 that initialised the cursor drops it's reference before the cursor
 or any of it's children (duplicates) have been deleted.
 
 The first patch adds the perag infrastructure for the cursor, the
 next four patches convert a btree cursor at a time, and the last
 removes the agno from the cursor once it is unused.
 
 [Patches 17-21]
 
 These patches are a demonstration of the simplifications and
 cleanups that come from plumbing the perag through interfaces that
 select and then operate on a specific AG. In this case the inode
 allocation algorithm does up to three walks across all AGs before it
 either allocates an inode or fails. Two of these walks are purely
 just to select the AG, and even then it doesn't guarantee inode
 allocation success so there's a third walk if the selected AG
 allocation fails.
 
 These patches collapse the selection and allocation into a single
 loop, simplifies the error handling because xfs_dir_ialloc() always
 returns ENOSPC if no AG was selected for inode allocation or we fail
 to allocate an inode in any AG, gets rid of xfs_dir_ialloc()
 wrapper, converts inode allocation to run entirely from a single
 perag instance, and then factors xfs_dialloc() into a much, much
 simpler loop which is easy to understand.
 
 Hence we end up with the same inode allocation logic, but it only
 needs two complete iterations at worst, makes AG selection and
 allocation atomic w.r.t. shrink and chops out out over 100 lines of
 code from this hot code path.
 
 [Patch 22]
 
 Converts the unlink path to pass perags through it.
 
 There's more conversion work to be done, but this patchset gets
 through a large chunk of it in one hit. Most of the iterators are
 converted, so once this is solidified we can move on to converting
 these to active references for being able to free perags while the
 fs is still active.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmC3HUgUHGRhdmlkQGZy
 b21vcmJpdC5jb20ACgkQregpR/R1+h2yaw/+P0JzpI+6n06Ei00mjgE/Du/WhMLi
 0JQ93Grlj+miuGGT9DgGCiRpoZnefhEk+BH6JqoEw1DQ3T5ilmAzrHLUUHSQC3+S
 dv85sJduheQ6yHuoO+4MzkaSq6JWKe7E9gZwAsVyBul5aSjdmaJaQdPwYMTXSXo0
 5Uqq8ECFkMcaHVNjcBfasgR/fdyWy2Qe4PFTHTHdQpd+DNZ9UXgFKHW2og+1iry/
 zDIvdIppJULA09TvVcZuFjd/1NzHQ/fLj5PAzz8GwagB4nz2x3s78Zevmo5yW/jK
 3/+50vXa8ldhiHDYGTS3QXvS0xJRyqUyD47eyWOOiojZw735jEvAlCgjX6+0X1HC
 k3gCkQLv8l96fRkvUpgnLf/fjrUnlCuNBkm9d1Eq2Tied8dvLDtiEzoC6L05Nqob
 yd/nIUb1zwJFa9tsoheHhn0bblTGX1+zP0lbRJBje0LotpNO9DjGX5JoIK4GR7F8
 y1VojcdgRI14HlxUnbF3p8wmQByN+M2tnp6GSdv9BA65bjqi05Rj/steFdZHBV6x
 wiRs8Yh6BTvMwKgufHhRQHfRahjNHQ/T/vOE+zNbWqemS9wtEUDop+KvPhC36R/k
 o/cmr23cF8ESX2eChk7XM4On3VEYpcvp2zSFgrFqZYl6RWOwEis3Htvce3KuSTPp
 8Xq70te0gr2DVUU=
 =YNzW
 -----END PGP SIGNATURE-----

Merge tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.14-merge2

xfs: initial agnumber -> perag conversions for shrink

If we want to use active references to the perag to be able to gate
shrink removing AGs and hence perags safely, we've got a fair bit of
work to do actually use perags in all the places we need to.

There's a lot of code that iterates ag numbers and then
looks up perags from that, often multiple times for the same perag
in the one operation. If we want to use reference counted perags for
access control, then we need to convert all these uses to perag
iterators, not agno iterators.

[Patches 1-4]

The first step of this is consolidating all the perag management -
init, free, get, put, etc into a common location. THis is spread all
over the place right now, so move it all into libxfs/xfs_ag.[ch].
This does expose kernel only bits of the perag to libxfs and hence
userspace, so the structures and code is rearranged to minimise the
number of ifdefs that need to be added to the userspace codebase.
The perag iterator in xfs_icache.c is promoted to a first class API
and expanded to the needs of the code as required.

[Patches 5-10]

These are the first basic perag iterator conversions and changes to
pass the perag down the stack from those iterators where
appropriate. A lot of this is obvious, simple changes, though in
some places we stop passing the perag down the stack because the
code enters into an as yet unconverted subsystem that still uses raw
AGs.

[Patches 11-16]

These replace the agno passed in the btree cursor for per-ag btree
operations with a perag that is passed to the cursor init function.
The cursor takes it's own reference to the perag, and the reference
is dropped when the cursor is deleted. Hence we get reference
coverage for the entire time the cursor is active, even if the code
that initialised the cursor drops it's reference before the cursor
or any of it's children (duplicates) have been deleted.

The first patch adds the perag infrastructure for the cursor, the
next four patches convert a btree cursor at a time, and the last
removes the agno from the cursor once it is unused.

[Patches 17-21]

These patches are a demonstration of the simplifications and
cleanups that come from plumbing the perag through interfaces that
select and then operate on a specific AG. In this case the inode
allocation algorithm does up to three walks across all AGs before it
either allocates an inode or fails. Two of these walks are purely
just to select the AG, and even then it doesn't guarantee inode
allocation success so there's a third walk if the selected AG
allocation fails.

These patches collapse the selection and allocation into a single
loop, simplifies the error handling because xfs_dir_ialloc() always
returns ENOSPC if no AG was selected for inode allocation or we fail
to allocate an inode in any AG, gets rid of xfs_dir_ialloc()
wrapper, converts inode allocation to run entirely from a single
perag instance, and then factors xfs_dialloc() into a much, much
simpler loop which is easy to understand.

Hence we end up with the same inode allocation logic, but it only
needs two complete iterations at worst, makes AG selection and
allocation atomic w.r.t. shrink and chops out out over 100 lines of
code from this hot code path.

[Patch 22]

Converts the unlink path to pass perags through it.

There's more conversion work to be done, but this patchset gets
through a large chunk of it in one hit. Most of the iterators are
converted, so once this is solidified we can move on to converting
these to active references for being able to free perags while the
fs is still active.

* tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (23 commits)
  xfs: remove xfs_perag_t
  xfs: use perag through unlink processing
  xfs: clean up and simplify xfs_dialloc()
  xfs: inode allocation can use a single perag instance
  xfs: get rid of xfs_dir_ialloc()
  xfs: collapse AG selection for inode allocation
  xfs: simplify xfs_dialloc_select_ag() return values
  xfs: remove agno from btree cursor
  xfs: use perag for ialloc btree cursors
  xfs: convert allocbt cursors to use perags
  xfs: convert refcount btree cursor to use perags
  xfs: convert rmap btree cursor to using a perag
  xfs: add a perag to the btree cursor
  xfs: pass perags around in fsmap data dev functions
  xfs: push perags through the ag reservation callouts
  xfs: pass perags through to the busy extent code
  xfs: convert secondary superblock walk to use perags
  xfs: convert xfs_iwalk to use perag references
  xfs: convert raw ag walks to use for_each_perag
  xfs: make for_each_perag... a first class citizen
  ...
2021-06-08 09:13:13 -07:00
Christoph Hellwig
5a981e4ea8 xfs: mark xfs_bmap_set_attrforkoff static
xfs_bmap_set_attrforkoff is only used inside of xfs_bmap.c, so mark it
static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 14:58:59 -07:00
Dave Chinner
9bbafc7191 xfs: move xfs_perag_get/put to xfs_ag.[ch]
They are AG functions, not superblock functions, so move them to the
appropriate location.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02 10:48:24 +10:00
Dave Chinner
0fe0bbe00a xfs: bunmapi has unnecessary AG lock ordering issues
large directory block size operations are assert failing because
xfs_bunmapi() is not completely removing fragmented directory blocks
like so:

XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677
....
Call Trace:
 xfs_dir2_shrink_inode+0x1a8/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_rename+0xb79/0xc50
 ? avc_has_perm+0x8d/0x1a0
 ? avc_has_perm_noaudit+0x9a/0x120
 xfs_vn_rename+0xdb/0x150
 vfs_rename+0x719/0xb50
 ? __lookup_hash+0x6a/0xa0
 do_renameat2+0x413/0x5e0
 __x64_sys_rename+0x45/0x50
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

We are aborting the bunmapi() pass because of this specific chunk of
code:

                /*
                 * Make sure we don't touch multiple AGF headers out of order
                 * in a single transaction, as that could cause AB-BA deadlocks.
                 */
                if (!wasdel && !isrt) {
                        agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
                        if (prev_agno != NULLAGNUMBER && prev_agno > agno)
                                break;
                        prev_agno = agno;
                }

This is designed to prevent deadlocks in AGF locking when freeing
multiple extents by ensuring that we only ever lock in increasing
AG number order. Unfortunately, this also violates the "bunmapi will
always succeed" semantic that some high level callers depend on,
such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and
xfs_inactive_symlink_rmt().

This AG lock ordering was introduced back in 2017 to fix deadlocks
triggered by generic/299 as reported here:

https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/

This codebase is old enough that it was before we were defering all
AG based extent freeing from within xfs_bunmapi(). THat is, we never
actually lock AGs in xfs_bunmapi() any more - every non-rt based
extent free is added to the defer ops list, as is all BMBT block
freeing. And RT extents are not RT based, so there's no lock
ordering issues associated with them.

Hence this AGF lock ordering code is both broken and dead. Let's
just remove it so that the large directory block code works reliably
again.

Tested against xfs/538 and generic/299 which is the original test
that exposed the deadlocks that this code fixed.

Fixes: 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-05-27 08:11:24 -07:00
Dave Chinner
991c2c5980 xfs: btree format inode forks can have zero extents
xfs/538 is assert failing with this trace when testing with
directory block sizes of 64kB:

XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608
....
Call Trace:
 xfs_bmap_btree_to_extents+0x2a9/0x470
 ? kmem_cache_alloc+0xe7/0x220
 __xfs_bunmapi+0x4ca/0xdf0
 xfs_bunmapi+0x1a/0x30
 xfs_dir2_shrink_inode+0x71/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_remove+0x244/0x460
 xfs_vn_unlink+0x53/0xa0
 ? selinux_inode_unlink+0x13/0x20
 vfs_unlink+0x117/0x220
 do_unlinkat+0x1a2/0x2d0
 __x64_sys_unlink+0x42/0x60
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

This is a check to ensure that the extents have been read into
memory before we are doing a ifork btree manipulation. This assert
is bogus in the above case.

We have a fragmented directory block that has more extents in it
than can fit in extent format, so the inode data fork is in btree
format. xfs_dir2_shrink_inode() asks to remove all remaining 16
filesystem blocks from the inode so it can convert to short form,
and __xfs_bunmapi() removes all the extents. We now have a data fork
in btree format but have zero extents in the fork. This incorrectly
trips the xfs_need_iread_extents() assert because it assumes that an
empty extent btree means the extent tree has not been read into
memory yet. This is clearly not the case with xfs_bunmapi(), as it
has an explicit call to xfs_iread_extents() in it to pull the
extents into memory before it starts unmapping.

Also, the assert directly after this bogus one is:

	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);

Which covers the context in which it is legal to call
xfs_bmap_btree_to_extents just fine. Hence we should just remove the
bogus assert as it is clearly wrong and causes a regression.

The returns the test behaviour to the pre-existing assert failure in
xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to
remove all the extents in the range it was asked to unmap.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-05-27 08:11:24 -07:00
Christoph Hellwig
b2197a36c0 xfs: remove XFS_IFEXTENTS
The in-memory XFS_IFEXTENTS is now only used to check if an inode with
extents still needs the extents to be read into memory before doing
operations that need the extent map.  Add a new xfs_need_iread_extents
helper that returns true for btree format forks that do not have any
entries in the in-memory extent btree, and use that instead of checking
the XFS_IFEXTENTS flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig
0779f4a68d xfs: remove XFS_IFINLINE
Just check for an inline format fork instead of the using the equivalent
in-memory XFS_IFINLINE flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig
ac1e067211 xfs: remove XFS_IFBROOT
Just check for a btree format fork instead of the using the equivalent
in-memory XFS_IFBROOT flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:51 -07:00
Christoph Hellwig
2ac131df03 xfs: rename and simplify xfs_bmap_one_block
xfs_bmap_one_block is only called for the attribute fork.  Move it to
xfs_attr.c, drop the unused whichfork argument and code only executed for
the data fork and rename the result to xfs_attr_is_leaf.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:50 -07:00
Christoph Hellwig
862a804aae xfs: move the XFS_IFEXTENTS check into xfs_iread_extents
Move the XFS_IFEXTENTS check from the callers into xfs_iread_extents to
simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-15 09:35:50 -07:00
Dave Chinner
b2941046ea xfs: precalculate default inode attribute offset
Default attr fork offset is based on inode size, so is a fixed
geometry parameter of the inode. Move it to the xfs_ino_geometry
structure and stop calculating it on every call to
xfs_default_attroffset().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-04-07 14:37:07 -07:00
Dave Chinner
683ec9ba88 xfs: default attr fork size does not handle device inodes
Device inodes have a non-default data fork size of 8 bytes
as checked/enforced by xfs_repair. xfs_default_attroffset() doesn't
handle this, so lets do a minor refactor so it does.

Fixes: e6a688c332 ("xfs: initialise attr fork on inode create")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-04-07 14:37:07 -07:00
Chandan Babu R
b6785e279d xfs: Use struct xfs_bmdr_block instead of struct xfs_btree_block to calculate root node size
The incore data fork of an inode stores the bmap btree root node as 'struct
xfs_btree_block'. However, the ondisk version of the inode stores the bmap
btree root node as a 'struct xfs_bmdr_block'.

xfs_bmap_add_attrfork_btree() checks if the btree root node fits inside the
data fork of the inode. However, it incorrectly uses 'struct xfs_btree_block'
to compute the size of the bmap btree root node. Since size of 'struct
xfs_btree_block' is larger than that of 'struct xfs_bmdr_block',
xfs_bmap_add_attrfork_btree() could end up unnecessarily demoting the current
root node as the child of newly allocated root node.

This commit optimizes space usage by modifying xfs_bmap_add_attrfork_btree()
to use 'struct xfs_bmdr_block' to check if the bmap btree root node fits
inside the data fork of the inode.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:06 -07:00
Christoph Hellwig
7821ea302d xfs: move the di_forkoff field to struct xfs_inode
In preparation of removing the historic icinode struct, move the
forkoff field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:05 -07:00
Christoph Hellwig
031474c28a xfs: move the di_extsize field to struct xfs_inode
In preparation of removing the historic icinode struct, move the extsize
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:04 -07:00
Christoph Hellwig
6e73a545f9 xfs: move the di_nblocks field to struct xfs_inode
In preparation of removing the historic icinode struct, move the nblocks
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:03 -07:00
Christoph Hellwig
13d2c10b05 xfs: move the di_size field to struct xfs_inode
In preparation of removing the historic icinode struct, move the on-disk
size field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:03 -07:00
Chandan Babu R
6e8bd39d72 xfs: Initialize xfs_alloc_arg->total correctly when allocating minlen extents
xfs/538 can cause the following call trace to be printed when executing on a
multi-block directory configuration,

 WARNING: CPU: 1 PID: 2578 at fs/xfs/libxfs/xfs_bmap.c:717 xfs_bmap_extents_to_btree+0x520/0x5d0
 Call Trace:
  ? xfs_buf_rele+0x4f/0x450
  xfs_bmap_add_extent_hole_real+0x747/0x960
  xfs_bmapi_allocate+0x39a/0x440
  xfs_bmapi_write+0x507/0x9e0
  xfs_da_grow_inode_int+0x1cd/0x330
  ? up+0x12/0x60
  xfs_dir2_grow_inode+0x62/0x110
  ? xfs_trans_log_inode+0x234/0x2d0
  xfs_dir2_sf_to_block+0x103/0x940
  ? xfs_dir2_sf_check+0x8c/0x210
  ? xfs_da_compname+0x19/0x30
  ? xfs_dir2_sf_lookup+0xd0/0x3d0
  xfs_dir2_sf_addname+0x10d/0x910
  xfs_dir_createname+0x1ad/0x210
  xfs_create+0x404/0x620
  xfs_generic_create+0x24c/0x320
  path_openat+0xda6/0x1030
  do_filp_open+0x88/0x130
  ? kmem_cache_alloc+0x50/0x210
  ? __cond_resched+0x16/0x40
  ? kmem_cache_alloc+0x50/0x210
  do_sys_openat2+0x97/0x150
  __x64_sys_creat+0x49/0x70
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

This occurs because xfs_bmap_exact_minlen_extent_alloc() initializes
xfs_alloc_arg->total to xfs_bmalloca->minlen. In the context of
xfs_bmap_exact_minlen_extent_alloc(), xfs_bmalloca->minlen has a value of 1
and hence the space allocator could choose an AG which has less than
xfs_bmalloca->total number of free blocks available. As the transaction
proceeds, one of the future space allocation requests could fail due to
non-availability of free blocks in the AG that was originally chosen.

This commit fixes the bug by assigning xfs_alloc_arg->total to the value of
xfs_bmalloca->total.

Fixes: 3015196746 ("xfs: Introduce error injection to allocate only minlen size extents for files")
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-04-07 14:36:34 -07:00
Dave Chinner
e6a688c332 xfs: initialise attr fork on inode create
When we allocate a new inode, we often need to add an attribute to
the inode as part of the create. This can happen as a result of
needing to add default ACLs or security labels before the inode is
made visible to userspace.

This is highly inefficient right now. We do the create transaction
to allocate the inode, then we do an "add attr fork" transaction to
modify the just created empty inode to set the inode fork offset to
allow attributes to be stored, then we go and do the attribute
creation.

This means 3 transactions instead of 1 to allocate an inode, and
this greatly increases the load on the CIL commit code, resulting in
excessive contention on the CIL spin locks and performance
degradation:

 18.99%  [kernel]                [k] __pv_queued_spin_lock_slowpath
  3.57%  [kernel]                [k] do_raw_spin_lock
  2.51%  [kernel]                [k] __raw_callee_save___pv_queued_spin_unlock
  2.48%  [kernel]                [k] memcpy
  2.34%  [kernel]                [k] xfs_log_commit_cil

The typical profile resulting from running fsmark on a selinux enabled
filesytem is adds this overhead to the create path:

  - 15.30% xfs_init_security
     - 15.23% security_inode_init_security
	- 13.05% xfs_initxattrs
	   - 12.94% xfs_attr_set
	      - 6.75% xfs_bmap_add_attrfork
		 - 5.51% xfs_trans_commit
		    - 5.48% __xfs_trans_commit
		       - 5.35% xfs_log_commit_cil
			  - 3.86% _raw_spin_lock
			     - do_raw_spin_lock
				  __pv_queued_spin_lock_slowpath
		 - 0.70% xfs_trans_alloc
		      0.52% xfs_trans_reserve
	      - 5.41% xfs_attr_set_args
		 - 5.39% xfs_attr_set_shortform.constprop.0
		    - 4.46% xfs_trans_commit
		       - 4.46% __xfs_trans_commit
			  - 4.33% xfs_log_commit_cil
			     - 2.74% _raw_spin_lock
				- do_raw_spin_lock
				     __pv_queued_spin_lock_slowpath
			       0.60% xfs_inode_item_format
		      0.90% xfs_attr_try_sf_addname
	- 1.99% selinux_inode_init_security
	   - 1.02% security_sid_to_context_force
	      - 1.00% security_sid_to_context_core
		 - 0.92% sidtab_entry_to_string
		    - 0.90% sidtab_sid2str_get
			 0.59% sidtab_sid2str_put.part.0
	   - 0.82% selinux_determine_inode_label
	      - 0.77% security_transition_sid
		   0.70% security_compute_sid.part.0

And fsmark creation rate performance drops by ~25%. The key point to
note here is that half the additional overhead comes from adding the
attribute fork to the newly created inode. That's crazy, considering
we can do this same thing at inode create time with a couple of
lines of code and no extra overhead.

So, if we know we are going to add an attribute immediately after
creating the inode, let's just initialise the attribute fork inside
the create transaction and chop that whole chunk of code out of
the create fast path. This completely removes the performance
drop caused by enabling SELinux, and the profile looks like:

     - 8.99% xfs_init_security
         - 9.00% security_inode_init_security
            - 6.43% xfs_initxattrs
               - 6.37% xfs_attr_set
                  - 5.45% xfs_attr_set_args
                     - 5.42% xfs_attr_set_shortform.constprop.0
                        - 4.51% xfs_trans_commit
                           - 4.54% __xfs_trans_commit
                              - 4.59% xfs_log_commit_cil
                                 - 2.67% _raw_spin_lock
                                    - 3.28% do_raw_spin_lock
                                         3.08% __pv_queued_spin_lock_slowpath
                                   0.66% xfs_inode_item_format
                        - 0.90% xfs_attr_try_sf_addname
                  - 0.60% xfs_trans_alloc
            - 2.35% selinux_inode_init_security
               - 1.25% security_sid_to_context_force
                  - 1.21% security_sid_to_context_core
                     - 1.19% sidtab_entry_to_string
                        - 1.20% sidtab_sid2str_get
                           - 0.86% sidtab_sid2str_put.part.0
                              - 0.62% _raw_spin_lock_irqsave
                                 - 0.77% do_raw_spin_lock
                                      __pv_queued_spin_lock_slowpath
               - 0.84% selinux_determine_inode_label
                  - 0.83% security_transition_sid
                       0.86% security_compute_sid.part.0

Which indicates the XFS overhead of creating the selinux xattr has
been halved. This doesn't fix the CIL lock contention problem, just
means it's not a limiting factor for this workload. Lock contention
in the security subsystems is going to be an issue soon, though...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: fix compilation error when CONFIG_SECURITY=n]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
2021-03-25 16:47:51 -07:00
Darrick J. Wong
3de4eb106f xfs: allow reservation of rtblocks with xfs_trans_alloc_inode
Make it so that we can reserve rt blocks with the xfs_trans_alloc_inode
wrapper function, then convert a few more callsites.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong
3a1af6c317 xfs: refactor common transaction/inode/quota allocation idiom
Create a new helper xfs_trans_alloc_inode that allocates a transaction,
locks and joins an inode to it, and then reserves the appropriate amount
of quota against that transction.  Then replace all the open-coded
idioms with a single call to this helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong
02b7ee4eb6 xfs: reserve data and rt quota at the same time
Modify xfs_trans_reserve_quota_nblks so that we can reserve data and
realtime blocks from the dquot at the same time.  This change has the
theoretical side effect that for allocations to realtime files we will
reserve from the dquot both the number of rtblocks being allocated and
the number of bmbt blocks that might be needed to add the mapping.
However, since the mount code disables quota if it finds a realtime
device, this should not result in any behavior changes.

Now that we've moved the inode creation callers away from using the
_nblks function, we can repurpose the (now unused) ninos argument for
realtime blocks, so make that change.  This also replaces the flags
argument with a boolean parameter to force the reservation since we
don't need to distinguish between data and rt quota reservations any
more, and the only flag being passed in was FORCE_RES.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong
8554650003 xfs: create convenience wrappers for incore quota block reservations
Create a couple of convenience wrappers for creating and deleting quota
block reservations against future changes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Darrick J. Wong
4abe21ad67 xfs: clean up quota reservation callsites
Convert a few xfs_trans_*reserve* callsites that are open-coding other
convenience functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03 09:18:49 -08:00
Chandan Babu R
560ab6c0d1 xfs: Fix 'set but not used' warning in xfs_bmap_compute_alignments()
With both CONFIG_XFS_DEBUG and CONFIG_XFS_WARN disabled, the only reference to
local variable "error" in xfs_bmap_compute_alignments() gets eliminated during
pre-processing stage of the compilation process. This causes the compiler to
generate a "set but not used" warning.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-01 09:44:24 -08:00
Chandan Babu R
3015196746 xfs: Introduce error injection to allocate only minlen size extents for files
This commit adds XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag which
helps userspace test programs to get xfs_bmap_btalloc() to always
allocate minlen sized extents.

This is required for test programs which need a guarantee that minlen
extents allocated for a file do not get merged with their existing
neighbours in the inode's BMBT. "Inode fork extent overflow check" for
Directories, Xattrs and extension of realtime inodes need this since the
file offset at which the extents are being allocated cannot be
explicitly controlled from userspace.

One way to use this error tag is to,
1. Consume all of the free space by sequentially writing to a file.
2. Punch alternate blocks of the file. This causes CNTBT to contain
   sufficient number of one block sized extent records.
3. Inject XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag.
After step 3, xfs_bmap_btalloc() will issue space allocation
requests for minlen sized extents only.

ENOSPC error code is returned to userspace when there aren't any "one
block sized" extents left in any of the AGs.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R
07c72e5562 xfs: Process allocated extent in a separate function
This commit moves over the code in xfs_bmap_btalloc() which is
responsible for processing an allocated extent to a new function. Apart
from xfs_bmap_btalloc(), the new function will be invoked by another
function introduced in a future commit.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R
0961fddfdd xfs: Compute bmap extent alignments in a separate function
This commit moves over the code which computes stripe alignment and
extent size hint alignment into a separate function. Apart from
xfs_bmap_btalloc(), the new function will be used by another function
introduced in a future commit.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R
aff4db57d5 xfs: Remove duplicate assert statement in xfs_bmap_btalloc()
The check for verifying if the allocated extent is from an AG whose
index is greater than or equal to that of tp->t_firstblock is already
done a couple of statements earlier in the same function. Hence this
commit removes the redundant assert statement.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:49 -08:00
Chandan Babu R
02092a2f03 xfs: Check for extent overflow when renaming dir entries
A rename operation is essentially a directory entry remove operation
from the perspective of parent directory (i.e. src_dp) of rename's
source. Hence the only place where we check for extent count overflow
for src_dp is in xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real()
returns -ENOSPC when it detects a possible extent count overflow and in
response, the higher layers of directory handling code do the following:
1. Data/Free blocks: XFS lets these blocks linger until a future remove
   operation removes them.
2. Dabtree blocks: XFS swaps the blocks with the last block in the Leaf
   space and unmaps the last block.

For target_dp, there are two cases depending on whether the destination
directory entry exists or not.

When destination directory entry does not exist (i.e. target_ip ==
NULL), extent count overflow check is performed only when transaction
has a non-zero sized space reservation associated with it.  With a
zero-sized space reservation, XFS allows a rename operation to continue
only when the directory has sufficient free space in its data/leaf/free
space blocks to hold the new entry.

When destination directory entry exists (i.e. target_ip != NULL), all
we need to do is change the inode number associated with the already
existing entry. Hence there is no need to perform an extent count
overflow check.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Chandan Babu R
0dbc5cb1a9 xfs: Check for extent overflow when removing dir entries
Directory entry removal must always succeed; Hence XFS does the
following during low disk space scenario:
1. Data/Free blocks linger until a future remove operation.
2. Dabtree blocks would be swapped with the last block in the leaf space
   and then the new last block will be unmapped.

This facility is reused during low inode extent count scenario i.e. this
commit causes xfs_bmap_del_extent_real() to return -ENOSPC error code so
that the above mentioned behaviour is exercised causing no change to the
directory's extent count.

Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Chandan Babu R
727e1acd29 xfs: Check for extent overflow when trivally adding a new extent
When adding a new data extent (without modifying an inode's existing
extents) the extent count increases only by 1. This commit checks for
extent count overflow in such cases.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-22 16:54:47 -08:00
Dave Chinner
e82226138b xfs: remove xfs_buf_t typedef
Prepare for kernel xfs_buf  alignment by getting rid of the
xfs_buf_t typedef from userspace.

[darrick: This patch is a port of a userspace patch removing the
xfs_buf_t typedef in preparation to make the userspace xfs_buf code
behave more like its kernel counterpart.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-12-16 16:07:34 -08:00
Darrick J. Wong
33005fd0a5 xfs: refactor file range validation
Refactor all the open-coded validation of file block ranges into a
single helper, and teach the bmap scrubber to check the ranges.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
18695ad425 xfs: refactor realtime volume extent validation
Refactor all the open-coded validation of realtime device extents into a
single helper.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
67457eb0d2 xfs: refactor data device extent validation
Refactor all the open-coded validation of non-static data device extents
into a single helper.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
acf104c233 xfs: detect overflows in bmbt records
Detect file block mappings with a blockcount that's either so large that
integer overflows occur or are zero, because neither are valid in the
filesystem.  Worse yet, attempting directory modifications causes the
iext code to trip over the bmbt key handling and takes the filesystem
down.  We can fix most of this by preventing the bad metadata from
entering the incore structures in the first place.

Found by setting blockcount=0 in a directory data fork mapping and
watching the fireworks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
8df0fa39bd xfs: don't free rt blocks when we're doing a REMAP bunmapi call
When callers pass XFS_BMAPI_REMAP into xfs_bunmapi, they want the extent
to be unmapped from the given file fork without the extent being freed.
We do this for non-rt files, but we forgot to do this for realtime
files.  So far this isn't a big deal since nobody makes a bunmapi call
to a rt file with the REMAP flag set, but don't leave a logic bomb.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-23 08:58:51 -07:00
Darrick J. Wong
d0c20d38af xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files
The realtime flag only applies to the data fork, so don't use the
realtime block number checks on the attr fork of a realtime file.

Fixes: 30b0984d91 ("xfs: refactor bmap record validation")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-09-03 08:33:50 -07:00
Carlos Maiolino
32a2b11f46 xfs: Remove kmem_zone_zalloc() usage
Use kmem_cache_zalloc() directly.

With the exception of xlog_ticket_alloc() which will be dealt on the
next patch for readability.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-07-28 20:24:14 -07:00
Carlos Maiolino
3050bd0bfe xfs: Remove kmem_zone_alloc() usage
Use kmem_cache_alloc() directly.

All kmem_zone_alloc() users pass 0 as flags, which are translated into:
GFP_KERNEL | __GFP_NOWARN, and kmem_zone_alloc() loops forever until the
allocation succeeds.

We can use __GFP_NOFAIL to tell the allocator to loop forever rather
than doing it ourself, and because the allocation will never fail, we do
not need to use __GFP_NOWARN anymore. Hence, all callers can be
converted to use GFP_KERNEL | __GFP_NOFAIL

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: add a comment back in about nofail]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-07-28 20:24:14 -07:00
Darrick J. Wong
a5949d3fae xfs: force writes to delalloc regions to unwritten
When writing to a delalloc region in the data fork, commit the new
allocations (of the da reservation) as unwritten so that the mappings
are only marked written once writeback completes successfully.  This
fixes the problem of stale data exposure if the system goes down during
targeted writeback of a specific region of a file, as tested by
generic/042.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-05-27 08:49:28 -07:00
Christoph Hellwig
f7e67b20ec xfs: move the fork format fields into struct xfs_ifork
Both the data and attr fork have a format that is stored in the legacy
idinode.  Move it into the xfs_ifork structure instead, where it uses
up padding.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig
daf83964a3 xfs: move the per-fork nextents fields into struct xfs_ifork
There are there are three extents counters per inode, one for each of
the forks.  Two are in the legacy icdinode and one is directly in
struct xfs_inode.  Switch to a single counter in the xfs_ifork structure
where it uses up padding at the end of the structure.  This simplifies
various bits of code that just wants the number of extents counter and
can now directly dereference it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig
4b516ff4e7 xfs: remove the NULL fork handling in xfs_bmapi_read
Now that we fully verify the inode forks before they are added to the
inode cache, the crash reported in

  https://bugzilla.kernel.org/show_bug.cgi?id=204031

can't happen anymore, as we'll never let an inode that has inconsistent
nextents counts vs the presence of an in-core attr fork leak into the
inactivate code path.  So remove the work around to try to handle the
case, and just return an error and warn if the fork is not present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig
1a1c57b282 xfs: remove the special COW fork handling in xfs_bmapi_read
We don't call xfs_bmapi_read for the COW fork anymore, so remove the
special casing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig
e9e2eae89d xfs: only check the superblock version for dinode size calculation
The size of the dinode structure is only dependent on the file system
version, so instead of checking the individual inode version just use
the newly added xfs_sb_version_has_large_dinode helper, and simplify
various calling conventions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19 08:48:47 -07:00
Dave Chinner
8ef547976a xfs: rename btree cursor private btree member flags
BPRV is not longer appropriate because bc_private is going away.
Script:

$ sed -i 's/BTCUR_BPRV/BTCUR_BMBT/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

With manual cleanup to the definitions in fs/xfs/libxfs/xfs_btree.h

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change "BC_BT" to "BTCUR_BMBT", fix subject line typo]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00