2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

527 Commits

Author SHA1 Message Date
Kent Overstreet
91b5d97fdf bcachefs: get_unlocked_mut_path -> bch2_path_get_unlocked_mut
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
5dd8c60e1e bcachefs: iter/update/trigger/str_hash flag cleanup
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.

These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
bf5f6a689b bcachefs: __BTREE_ITER_ALL_SNAPSHOTS -> BTREE_ITER_SNAPSHOT_FIELD
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
be31bf439c bcachefs: When traversing to interior nodes, propagate result to paths to same leaf node
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
00589cadb1 bcachefs: bch2_btree_path_to_text()
Long form version of bch2_btree_path_to_text() - useful in error
messages and tracepoints.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
7423330e30 bcachefs: prt_printf() now respects \r\n\t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:17 -04:00
Kent Overstreet
79032b0781 bcachefs: Improved topology repair checks
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Hongbo Li
36f9ef109b bcachefs: fix trans->mem realloc in __bch2_trans_kmalloc
The old code doesn't consider the mem alloced from mempool when call
krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to
free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX"
is inaccurate when trans->mem was allocated by krealloc function.
Instead, we use used_mempool stuff to record the situation, and realloc
or free the trans->mem in elegant way.

Also, after krealloc failed in __bch2_trans_kmalloc, the old data
should be copied to the new buffer when alloc from mempool_alloc.

Fixes: 31403dca5b ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
5ca8ff157d bcachefs: Use kvzalloc() when dynamically allocating btree paths
THis silences a mm/page_alloc.c warning about allocating more than a
page with GFP_NOFAIL - and there's no reason for this to not have a
vmalloc fallback anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
83bd5985fa bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
3254c1b0e5 bcachefs: Save key_cache_path in peek_slot()
When bch2_btree_iter_peek_slot() clones the iterator to search for the
next key, and then discovers that the key from the cloned iterator is
the key we want to return - we also want to save the
iter->key_cache_path as well, for the update path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
52946d828a bcachefs: Kill more -EIO error codes
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
fc634d8e46 bcachefs: btree_and_journal_iter.trans
we now always have a btree_trans when using a btree_and_journal_iter;
prep work for adding prefetching to btree_and_journal_iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
fadc6067f2 bcachefs: Set path->uptodate when no node at level
We were failing to set path->uptodate when reaching the end of a btree
node iterator, causing the new prefetch code for backpointers gc to go
into an infinite loop.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:30:56 -04:00
Kent Overstreet
ba89083e9f bcachefs: Fix journal replay with unreadable btree roots
When a btree root is unreadable, we still might be able to get some data
back by replaying what's in the journal. Previously though, we got
confused when journal replay would attempt to replay a key for a level
that didn't exist.

This adds bch2_btree_increase_depth(), so that journal replay can handle
this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:18:13 -04:00
Kent Overstreet
204f45140f bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the
keyspace where no keys are visible in the current snapshot, we have a
problem - we'll scan for a very long time before scanning terminates.

Awhile back, this was fixed for most cases with peek_upto() (and
assertions that enforce that it's being used).

But the fix missed the fact that the inodes btree is different - every
key offset is in a different snapshot tree, not just the inode field.

Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24 20:41:46 -05:00
Kent Overstreet
b97de45365 bcachefs: Improve trace_trans_restart_relock
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet
49a5192c0e bcachefs: Add an option to control btree node prefetching
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
89056f245b bcachefs: track transaction durations
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet
83322e8ca8 bcachefs: btree_trans always has stats
reserve slot 0 for unknown (when we overflow), to avoid some branches

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:19 -05:00
Kent Overstreet
c558c577cb bcachefs: bch2_btree_trans_peek_slot_updates
refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag
and making it always-on

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
359e89add5 bcachefs: bch2_btree_trans_peek_prev_updates
bch2_btree_iter_peek_prev() now supports BTREE_ITER_WITH_UPDATES

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
eb6863598a bcachefs: bch2_btree_trans_peek_updates
refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag
and making it always-on

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
0c99e17d3b bcachefs: growable btree_paths
XXX: we're allocating memory with btree locks held - bad

We need to plumb through an error path so we can do
allocate_dropping_locks() - but we're merging this now because it fixes
a transaction path overflow caused by indirect extent fragmentation, and
the resize path is rare.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
2c3b0fc3bd bcachefs: trans->nr_paths
Start to plumb through dynamically growable btree_paths; this patch
replaces most BTREE_ITER_MAX references with trans->nr_paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
5cc6daf749 bcachefs: trans->updates will also be resizable
the reflink triggers are also bumping up against the maximum number of
paths in a transaction - and generating proportional numbers of updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
31403dca5b bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS
- Some tweaks to greatly reduce locking overhead for the list of btree
   transactions, so that it can always be enabled: leave btree_trans
   objects on the list when they're on the percpu single item freelist,
   and only check for duplicates in the same process when
   CONFIG_BCACHEFS_DEBUG is enabled

 - don't zero out the full btree_trans() unless we allocated it from
   the mempool

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
fea153a845 bcachefs: rcu protect trans->paths
Upcoming patches are going to be changing trans->paths to a
reallocatable buffer. We need to guard against use after free when it's
used by other threads; this introduces RCU protection to those paths and
changes them to check for trans->paths == NULL

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
6474b70610 bcachefs: Clean up btree_trans
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
398c98347d bcachefs: kill btree_path.idx
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
542e639674 bcachefs: bch2_btree_iter_peek_prev() no longer uses path->idx
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
566eabd36f bcachefs: bch2_path_get() no longer uses path->idx
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
b0b6737822 bcachefs: trans_for_each_path_with_node() no longer uses path->idx
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
ccb7b08fbb bcachefs: trans_for_each_path() no longer uses path->idx
path->idx is now a code smell: we should be using path_idx_t, since it's
stable across btree path reallocation.

This is also a bit faster, using the same loop counter vs. fetching
path->idx from each path we iterate over.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
311e446a41 bcachefs: bch2_btree_path_to_text() -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
1f75ba4e65 bcachefs: struct trans_for_each_path_inorder_iter
reducing our usage of path->idx

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
7f9821a7c1 bcachefs: btree_insert_entry -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
07f383c71f bcachefs: btree_iter -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
788cc25d15 bcachefs: btree_path_alloc() -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
96ed47d130 bcachefs: bch2_btree_path_traverse() -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
f6363acaa6 bcachefs: bch2_btree_path_make_mut() -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
4617d94617 bcachefs: bch2_btree_path_set_pos() -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
74e600c19a bcachefs; bch2_path_put() -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
255ebbbf75 bcachefs: bch2_path_get() -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
5ce8b92da0 bcachefs: minor bch2_btree_path_set_pos() optimization
bpos_eq() is cheaper than bpos_cmp()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
0bc64d7e26 bcachefs: kill __bch2_btree_iter_peek_upto_and_restart()
dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
79904fa2bb bcachefs: bch2_trans_srcu_lock() should be static
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
559e6c2336 bcachefs: trans_for_each_update() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
e06af20719 bcachefs: fix userspace build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
679972348d bcachefs: kill btree_trans->wb_updates
the btree write buffer path now creates a journal entry directly

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
f33600057f bcachefs: bch2_trans_node_add no longer uses trans_for_each_path()
In the future we'll be making trans->paths resizable and potentially
having _many_ more paths (for fsck); we need to start fixing algorithms
that walk each path in a transaction where possible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
24de63dacb bcachefs: Improve trans->extra_journal_entries
Instead of using a darray, we now allocate journal entries for the
transaction commit path with our normal bump allocator - with an inlined
fastpath, and using btree_transaction_stats to remember how much to
initially allocate so as to avoid transaction restarts.

This is prep work for converting write buffer updates to use this
mechanism.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
a83b6c895c bcachefs: kill btree_path->(alloc_seq|downgrade_seq)
These were for extra info in tracepoints for debugging a specialized
issue - we do not want to bloat btree_path for this, at least in release
builds.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
f8fd5871be bcachefs: reserve path idx 0 for sentinal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
6e92d15546 bcachefs: Refactor trans->paths_allocated to be standard bitmap
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
a564c9fad5 bcachefs: Include btree_trans in more tracepoints
This gives us more context information - e.g. which codepath is invoking
btree node reads.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:40 -05:00
Kent Overstreet
3398124444 bcachefs: Improve trace_trans_restart_would_deadlock
In the CI, we're seeing tests failing due to excessive would_deadlock
transaction restarts - the tracepoint now includes the lock cycle that
occured.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00
Kent Overstreet
e153a0d70b bcachefs: Improve trace_trans_restart_too_many_iters()
We now include the list of paths in use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00
Kent Overstreet
3c471b6588 bcachefs: convert bch_fs_flags to x-macro
Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
ad9c7992eb bcachefs: Kill btree_iter->journal_pos
For BTREE_ITER_WITH_JOURNAL, we memoize lookups in the journal keys, to
avoid the binary search overhead.

Previously we stashed the pos of the last key returned from the journal,
in order to force the lookup to be redone when rewinding.

Now bch2_journal_keys_peek_upto() handles rewinding itself when
necessary - so we can slim down btree_iter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
e56978c80d bcachefs: Kill BTREE_ITER_ALL_LEVELS
As discussed in the previous patch, BTREE_ITER_ALL_LEVELS appears to be
racy with concurrent interior node updates - and perhaps it is fixable,
but it's tricky and unnecessary.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
cd5bd16282 bcachefs: Fix redundant variable initialization
path->level was being read, but never used.

Reported-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
fa014953f9 bcachefs: Fix extents iteration + snapshots interaction
peek_upto() checks against the end position and bails out before
FILTER_SNAPSHOTS checks; this is because if we end up at a different
inode number than the original search key none of the keys we see might
be visibile in the current snapshot - we might be looking at inode in a
completely different subvolume.

But this is broken, because when we're iterating over extents we're
checking against the extent start position to decide when to bail out,
and the extent start position isn't monotonically increasing until after
we've run FILTER_SNAPSHOTS.

Fix this by adding a simple inode number check where the old bailout
check was, and moving the main check to the correct position.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2024-01-01 11:42:39 -05:00
Thomas Bertschinger
50a8a732d2 bcachefs: fix invalid memory access in bch2_fs_alloc() error path
When bch2_fs_alloc() gets an error before calling
bch2_fs_btree_iter_init(), bch2_fs_btree_iter_exit() makes an invalid
memory access because btree_trans_list is uninitialized.

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Fixes: 6bd68ec266 ("bcachefs: Heap allocate btree_trans")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-14 15:24:14 -05:00
Kent Overstreet
8a443d3ea1 bcachefs: Proper refcounting for journal_keys
The btree iterator code overlays keys from the journal until journal
replay is finished; since we're now starting copygc/rebalance etc.
before replay is finished, this is multithreaded access and thus needs
refcounting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24 02:43:12 -05:00
Kent Overstreet
006ccc3090 bcachefs: Kill journal pre-reservations
This deletes the complicated and somewhat expensive journal
pre-reservation machinery in favor of just using journal watermarks:
when the journal is more than half full, we run journal reclaim more
aggressively, and when the journal is more than 3/4s full we only allow
journal reclaim to get new journal reservations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14 23:44:43 -05:00
Kent Overstreet
9fcdd23b6e bcachefs: Add a comment for BTREE_INSERT_NOJOURNAL usage
BTREE_INSERT_NOJOURNAL is primarily used for a performance optimization
related to inode updates and fsync - document it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04 22:19:13 -04:00
Kent Overstreet
f82755e4e8 bcachefs: Data move path now uses bch2_trans_unlock_long()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04 22:19:11 -04:00
Kent Overstreet
c4accde498 bcachefs: Ensure srcu lock is not held too long
The SRCU read lock that btree_trans takes exists to make it safe for
bch2_trans_relock() to deref pointers to btree nodes/key cache items we
don't have locked, but as a side effect it blocks reclaim from freeing
those items.

Thus, it's important to not hold it for too long: we need to
differentiate between bch2_trans_unlock() calls that will be only for a
short duration, and ones that will be for an unbounded duration.

This introduces bch2_trans_unlock_long(), to be used mainly by the data
move paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04 14:17:11 -04:00
Kent Overstreet
be9e782df3 bcachefs: Don't downgrade locks on transaction restart
We should only be downgrading locks on success - otherwise, our
transaction restarts won't be getting the correct locks and we'll
livelock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet
88dfe193bd bcachefs: bch2_btree_id_str()
Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet
6bd68ec266 bcachefs: Heap allocate btree_trans
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
96dea3d599 bcachefs: Fix W=12 build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Colin Ian King
6bf3766b52 bcachefs: Fix a handful of spelling mistakes in various messages
There are several spelling mistakes in error messages. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
5b7fbdcd5b bcachefs: Fix silent enum conversion error
This changes mark_btree_node_locked() to take an enum
btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED
is -1 which may cause problems converting back and forth to
six_lock_type if short enums are in use.

With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type
enum.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
8e877caaad bcachefs: Split out snapshot.c
subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:11 -04:00
Kent Overstreet
ff5b741c25 bcachefs: Zero btree_paths on allocation
This fixes a bug in the cycle detector, bch2_check_for_deadlock() - we
have to make sure the node pointers in the btree paths array are set to
something not-garbage before another thread may see them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:11 -04:00
Kent Overstreet
401585fe87 bcachefs: btree_journal_iter.c
Split out a new file from recovery.c for managing the list of keys we
read from the journal: before journal replay finishes the btree iterator
code needs to be able to iterate over and return keys from the journal
as well, so there's a fair bit of code here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
1e81f89b02 bcachefs: Fix assorted checkpatch nits
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
bf5a261c7a bcachefs: Assorted fixes for clang
clang had a few more warnings about enum conversion, and also didn't
like the opts.c initializer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:09 -04:00
Kent Overstreet
73bd774d28 bcachefs: Assorted sparse fixes
- endianness fixes
 - mark some things static
 - fix a few __percpu annotations
 - fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
faa6cb6c13 bcachefs: Allow for unknown btree IDs
We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet
a5b696ee6e bcachefs: seqmutex; fix a lockdep splat
We can't be holding btree_trans_lock while copying to user space, which
might incur a page fault. To fix this, convert it to a seqmutex so we
can unlock/relock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
49c7cd9d8d bcachefs: More drop_locks_do() conversions
Using drop_locks_do() ensures that every unlock() is paired with a
relock(), with proper error checking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
78367aaa5a bcachefs: bch2_trans_kmalloc no longer allocates memory with btree locks held
When allocating memory, gfp flags should generally be

 - GFP_NOWAIT|__GFP_NOWARN if btree locks are held
 - GFP_NOFS if in the IO path or otherwise holding resources needed for
   IO submission
 - GFP_KERNEL otherwise

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
b5fd75669a bcachefs: drop_locks_do()
Add a new helper for the common pattern of:
 - trans_unlock()
 - do something
 - trans_relock()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
f154c3eb42 bcachefs: trans_for_each_path_safe()
bch2_btree_trans_to_text() is used on btree_trans objects that are owned
by different threads - when printing out deadlock cycles - so we need a
safe version of trans_for_each_path(), else we race with seeing a
btree_path that was just allocated and not fully initialized:

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:02 -04:00
Kent Overstreet
32913f49f5 six locks: Seq now only incremented on unlock
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:02 -04:00
Kent Overstreet
1fb4fe6317 six locks: Kill six_lock_state union
As suggested by Linus, this drops the six_lock_state union in favor of
raw bitmasks.

On the one hand, bitfields give more type-level structure to the code.
However, a significant amount of the code was working with
six_lock_state as a u64/atomic64_t, and the conversions from the
bitfields to the u64 were deemed a bit too out-there.

More significantly, because bitfield order is poorly defined (#ifdef
__LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the
sequence number would overflow into the rest of the bitfield if the
compiler didn't put the sequence number at the high end of the word.

The new code is a bit saner when we're on an architecture without real
atomic64_t support - all accesses to lock->state now go through
atomic64_*() operations.

On architectures with real atomic64_t support, we additionally use
atomic bit ops for setting/clearing individual bits.

Text size: 7467 bytes -> 4649 bytes - compilers still suck at
bitfields.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:02 -04:00
Kent Overstreet
f375d6ca58 bcachefs: Don't call local_clock() twice in trans_begin()
local_clock() is not as cheap as we'd like it to be, alas

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:01 -04:00
Kent Overstreet
11f117374a bcachefs: Call bch2_path_put_nokeep() before bch2_path_put()
bch2_path_put_nokeep() is sketchy, and we should consider removing it:
it unconditionally frees btree_paths once their ref hits 0.

The assumption is that we only use it for paths that have never been
visible outside the btree core btree code; i.e. higher level code will
never be making assumptions about locking based on these paths.

However, there's subtle brokenness with this approach:

 - If we call bch2_path_put(), then bch2_path_put_nokeep(),
   bch2_path_put() may free the first path on the assumption that we we
   have another path keeping a node locked - but then
   bch2_path_put_nokeep() just unconditionally frees it.

The same bug may arise if we're calling bch2_path_put() and
bch2_path_put_nokeep() on the same (refcounted) path, or two adjacent
paths that point to the same btree node.

This patch hacks around one of these bugs by calling
bch2_path_put_nokeep() first in bch2_trans_iter_exit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:58 -04:00
Kent Overstreet
65d48e3525 bcachefs: Private error codes: ENOMEM
This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
511b629aca bcachefs: bch2_btree_iter_peek_node_and_restart()
Minor refactoring for the Rust interface.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet
1306f87de3 bcachefs: Plumb btree_trans through btree cache code
Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned
device support is going to require a new allocation for every btree node
write. This is a bit of prep work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
0f2ea6550f bcachefs: bch2_btree_iter_peek_and_restart_outlined()
Needed for interfacing with Rust - bindgen can't handle inline
functions, alas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
5e2d8be8bd bcachefs: Split trans->last_begin_ip and trans->last_restarted_ip
These are two different things - this improves our debug assert
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:53 -04:00
Kent Overstreet
6623c0fcdf bcachefs: Add an assertion for using multiple btree_trans
A thread should never be using more than one btree_trans - doing so is
an invitation for deadlocks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:53 -04:00
Kent Overstreet
920e69bc3d bcachefs: Btree write buffer
This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.

This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.

This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.

Changes:
 - A new btree_insert_type enum, for btree_insert_entries. Specifies
   btree, btree key cache, or btree write buffer.

 - bch2_trans_update_buffered(): updates via the btree write buffer
   don't need a btree path, so we need a new update path.

 - Transaction commit path changes:
   The update to the btree write buffer both mutates global, and can
   fail if there isn't currently room. Therefore we do all write buffer
   updates in the transaction all at once, and also if it fails we have
   to revert filesystem usage counter changes.

   If there isn't room we flush the write buffer in the transaction
   commit error path and retry.

 - A new persistent option, for specifying the number of entries in the
   write buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00
Kent Overstreet
60b5538877 bcachefs: trans->notrace_relock_fail
When we unlock in order to submit IO, the next relock event is likely to
fail if submit_bio() blocked - we shouldn't those events in our _fail
stats, since those are expected events and shouldn't cause test
failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00
Kent Overstreet
434b1c75a4 bcachefs: Switch a BUG_ON() to a panic()
This assert is popping - rarely - in the CI, this will help us track it
down from the logs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00