2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

169 Commits

Author SHA1 Message Date
Kent Overstreet
6bd68ec266 bcachefs: Heap allocate btree_trans
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
96dea3d599 bcachefs: Fix W=12 build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
9d2a7bd8b7 bcachefs: Improve bch2_moving_ctxt_to_text()
Print more information out about moving contexts - fold in the output of
the redundant bch2_data_jobs_to_text(), and also include information
relevant to whether move_data() should be blocked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:11 -04:00
Kent Overstreet
73bd774d28 bcachefs: Assorted sparse fixes
- endianness fixes
 - mark some things static
 - fix a few __percpu annotations
 - fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
e53a961c6b bcachefs: Rename enum alloc_reserve -> bch_watermark
This is prep work for consolidating with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
a5b696ee6e bcachefs: seqmutex; fix a lockdep splat
We can't be holding btree_trans_lock while copying to user space, which
might incur a page fault. To fix this, convert it to a seqmutex so we
can unlock/relock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Brian Foster
fec4fc82b5 bcachefs: create internal disk_groups sysfs file
We have bch2_sb_disk_groups_to_text() to dump disk group labels, but
no good information on device group membership at runtime. Add
bch2_disk_groups_to_text() and an associated 'disk_groups' sysfs
file to print group and device relationships.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
8669199438 bcachefs: Print out counters correctly
Most counters aren't in units of sectors, and the ones that are should
just be switched to bytes, for simplicity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:59 -04:00
Kent Overstreet
b9fa375bab bcachefs: bch2_fs_moving_ctxts_to_text()
This also adds bch2_write_op_to_text(): now we can see outstand moves,
useful for debugging shutdown with the upcoming BCH_WRITE_WAIT_FOR_EC
and likely for other things in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:57 -04:00
Kent Overstreet
b1cfe5ed2b bcachefs: Improve dev_alloc_debug_to_text()
Now we also print the number of buckets reserved for each watermark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
c85d779609 bcachefs: bch2_copygc_wait_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
2f4e9472fa bcachefs: bch2_open_bucket_to_text()
Factor out a common helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
350175bf9b bcachefs: Improved nocow locking
This improves the nocow lock table so that hash table entries have
multiple locks, and locks specify which bucket they're for - i.e. we can
now resolve hash collisions.

This is important because the allocator has to skip buckets that are
locked in the nocow lock table, and previously hash collisions would
cause it to spuriously skip unlocked buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:52 -04:00
Daniel Hill
71fe14655f bcachefs: expose nocow_lock table in sysfs
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:51 -04:00
Kent Overstreet
d94189ad56 bcachefs: Debug mode for c->writes references
This adds a debug mode where we split up the c->writes refcount into
distinct refcounts for every codepath that takes a reference, and adds
sysfs code to print the value of each ref.

This will make it easier to debug shutdown hangs due to refcount leaks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00
Kent Overstreet
a101957649 bcachefs: More style fixes
Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
46fee692ee bcachefs: Improved btree write statistics
This replaces sysfs btree_avg_write_size with btree_write_stats, which
now breaks out statistics by the source of the btree write.

Btree writes that are too small are a source of inefficiency, and
excessive btree resort overhead - this will let us see what's causing
them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Daniel Hill
597c6d17b1 bcachefs: make durability a read-write sysfs option
Sometimes the user may need to change durability after formatting to
match current hardware setup, this option provides a quick and flexible
alternative to removing then adding the device.
It is HIGHLY ADVISED TO RUN REREPLICATE after changing this value so the
system doesn't remain degraded.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
2da671dc4a bcachefs: Use btree_type_has_ptrs() more consistently
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
5c1ef830f6 bcachefs: Errcodes can now subtype standard error codes
The next patch is going to be adding private error codes for all the
places we return -ENOSPC.

Additionally, this patch updates return paths at all module boundaries
to call bch2_err_class(), to return the standard error code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
b8eec67591 bcachefs: Add a manual trigger for lock wakeups
Spotted a lockup once that appeared to be a lost wakeup. Adding a manual
trigger for lock wakeups will make it easy to tell if that's what it is
next time it occurs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
674cfc2624 bcachefs: Add persistent counters for all tracepoints
Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:39 -04:00
Kent Overstreet
a3d7afa5c1 bcachefs: Always use percpu_ref_tryget_live() on c->writes
If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
50b13beef0 bcachefs: Improve an error message
When inserting a key type that's not valid for a given btree, we should
print out which btree we were inserting into.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:34 -04:00
Kent Overstreet
0e96f5dcd7 bcachefs: Call bch2_do_invalidates() when going read write
Like bch2_do_discards(), we should check if this needs to be done when
going rw.

Also, add some sysfs code for debugging bucket invalidation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
401ec4db63 bcachefs: Printbuf rework
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Daniel Hill
104c69745f bcachefs: Add persistent counters
This adds a new superblock field for persisting counters
and adds a sysfs interface in counters/ exposing these counters.

The superblock field is ignored by older versions letting us avoid
an on disk version bump.

Each sysfs file outputs a counter that tracks since filesystem
creation and a counter for the current mount session.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:32 -04:00
Kent Overstreet
42796f74f4 bcachefs: Ensure sysfs show fns print a newline
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
822835ffea bcachefs: Fold bucket_state in to BCH_DATA_TYPES()
Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
8058ea64c3 bcachefs: Add a sysfs attr for triggering discards
We're currently debugging an issue with discards not getting run; this
patch adds a manual trigger so we can then watch the tracepoint while it
runs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
f25d8215f4 bcachefs: Kill allocator threads & freelists
Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
b17d3cec14 bcachefs: Run btree updates after write out of write_point
In the write path, after the write to the block device(s) complete we
have to punt to process context to do the btree update.

Instead of using the work item embedded in op->cl, this patch switches
to a per write-point work item. This helps with two different issues:

 - lock contention: btree updates to the same writepoint will (usually)
   be updating the same alloc keys
 - context switch overhead: when we're bottlenecked on btree updates,
   having a thread (running out of a work item) checking the write point
   for completed ops is cheaper than queueing up a new work item and
   waking up a kworker.

In an arbitrary benchmark, 4k random writes with fio running inside a
VM, this patch resulted in a 10% improvement in total iops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
3e1547116f bcachefs: x-macroize alloc_reserve enum
This makes an array of strings available, like our other enums.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
63c4b25453 bcachefs: Better superblock opt validation
This moves validation of superblock options to bch2_sb_validate(), so
they'll be checked in the write path as well.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:28 -04:00
Kent Overstreet
f0cc5d2931 bcachefs: Check for rw before setting opts via sysfs
This isn't a correctness issue, it just eliminates errors in the dmesg
log when we're RO.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
5521b1dfa2 bcachefs: Convert bch2_sb_to_text to master option list
Options no longer have to be manually added to bch2_sb_to_text() - it
now uses the master list of options in opts.h. Also, improve some of the
formatting by converting it to tabstops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:27 -04:00
Kent Overstreet
75ef2c59bc bcachefs: Start moving debug info from sysfs to debugfs
In sysfs, files can only output at most PAGE_SIZE. This is a problem for
debug info that needs to list an arbitrary number of times, and because
of this limit some of our debug info has been terser and harder to read
than we'd like.

This patch moves info about journal pins and cached btree nodes to
debugfs, and greatly expands and improves the output we return.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
fa8e94faee bcachefs: Heap allocate printbufs
This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:25 -04:00
Kent Overstreet
12bf93a429 bcachefs: Add .to_text() methods for all superblock sections
This patch improves the superblock .to_text() methods and adds methods
for all types that were missing them. It also improves printbufs by
allowing them to specfiy what units we want to be printing in, and adds
new wrapper methods for unifying our kernel and userspace environments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:24 -04:00
Kent Overstreet
c45c866761 bcachefs: bch2_gc_gens() no longer uses bucket array
Like the previous patches, this converts bch2_gc_gens() to use the alloc
btree directly, and private arrays of generation numbers for its own
recalculation of oldest_gen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:23 -04:00
Kent Overstreet
acc3e09b67 bcachefs: Rename data_op_data_progress -> data_jobs
Mild refactoring.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:22 -04:00
Kent Overstreet
862bfd5062 bcachefs: Update sysfs compression_stats for snapshots
- BTREE_ITER_ALL_SNAPSHOTS flag is required here
 - change it to also walk the reflink btree
 - change it to accumulate stats for all pointers in an extent
 - change it to account for incompressible data

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
abe19d458e bcachefs: Refactor open_bucket code
Prep work for adding a hash table of open buckets - instead of embedding
a bch_extent_ptr, we need to refer to the bucket directly so that we're
not calling sector_to_bucket() in the hash table lookup code, which has
an expensive divide.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
4141fde0be bcachefs: Fix bch2_journal_meta()
This patch ensures that the journal entry written gets written as flush
entry, which is important for the shutdown path - the last entry written
needs to be a flush entry.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:19 -04:00
Kent Overstreet
7243498de7 bcachefs: Kill non-lru cache replacement policies
Prep work for persistent LRUs and getting rid of the in memory bucket
array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:19 -04:00
Kent Overstreet
8244f3209b bcachefs: Option improvements
This adds flags for options that must be a power of two (block size and
btree node size), and options that are stored in the superblock as a
power of two (encoded extent max).

Also: options are now stored in memory in the same units they're
displayed in (bytes): we now convert when getting and setting from the
superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:19 -04:00
Kent Overstreet
6df893fb11 bcachefs: Kill some obsolete sysfs code
fs internal/alloc_debug doesn't show anything bcachefs fs usage shows.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:19 -04:00
Kent Overstreet
506717865b bcachefs: Specify filesystem options
We've got three types of options now - filesystem, device and inode, and
a given option may belong to more than one of those types.

This patch changes the options to specify explicitly when they're a
filesystem option - in the future we'll probably be adding more device
options.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
2430e72f42 bcachefs: Convert journal sysfs params to regular options
This converts journal_write_delay, journal_flush_disabled, and
journal_reclaim_delay to normal filesystems options, and also adds them
to the superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
e15a57ac05 bcachefs: Kill bucket quantiles sysfs code
We're getting rid of code that uses the in memory bucket array - and we
now have better mechanisms for viewing most of what the bucket quantiles
code gave us (especially internal fragmentation).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
9a796fdb06 bcachefs: bch2_trans_exit() no longer returns errors
Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:14 -04:00
Kent Overstreet
67e0dd8f0d bcachefs: btree_path
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:11 -04:00
Brett Holman
8dd6ed9451 bcachefs: add progress stats to sysfs
This adds progress stats to sysfs for copygc, rebalance, recovery, and the
cmd_job ioctls.

Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:10 -04:00
Kent Overstreet
2e655e6de2 bcachefs: Add open_buckets to sysfs
This is to help debug a rare shutdown deadlock in the allocator code -
the btree code is leaking open_buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
c0ebe3e48c bcachefs: Assorted endianness fixes
Found by sparse

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
d44a6e350e bcachefs: Drop old style btree node coalescing
We have foreground btree node merging now, and any future btree node
merging improvements are going to be based off of that code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
ac516d0e7d bcachefs: Add the status of bucket gen gc to sysfs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
ba5f03d362 bcachefs: Add a sysfs var for average btree write size
Useful number for performance tuning.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:58 -04:00
Kent Overstreet
2436cb9fad bcachefs: Use x-macros for more enums
This patch standardizes all the enums that have associated string tables
(probably more enums should have string tables).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
41f8b09edc bcachefs: Rename BTREE_ID enums for consistency with other enums
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
bae895a5a3 bcachefs: Add allocator thread state to sysfs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
51c66fedc0 bcachefs: Rip out copygc pd controller
We have a separate mechanism for ratelimiting copygc now - the pd
controller has only been causing problems.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
5bbe4bf95b bcachefs: Add copygc wait to sysfs
Currently debugging an issue with copygc not running when it's supposed
to, and this is an obvious first step.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
cb66fc5fe4 bcachefs: Fix copygc threshold
Awhile back the meaning of is_available_bucket() and thus also
bch_dev_usage->buckets_unavailable changed to include buckets that are
owned by the allocator - this was so that the stat could be persisted
like other allocation information, and wouldn't have to be regenerated
by walking each bucket at mount time.

This broke copygc, which needs to consider buckets that are reclaimable
and haven't yet been grabbed by the allocator thread and moved onta
freelist. This patch fixes that by adding dev_buckets_reclaimable() for
copygc and the allocator thread, and cleans up some of the callers a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
fcb3431be8 bcachefs: Redo checks for sufficient devices
When the replicas mechanism was added, for tracking data by which drives
it's replicated on, the check for whether we have sufficient devices was
never updated to make use of it. This patch finally does that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
2abe542087 bcachefs: Persist 64 bit io clocks
Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.

This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
5b593ee172 bcachefs: Add support for doing btree updates prior to journal replay
Some errors may need to be fixed in order for GC to successfully run -
walk and mark all metadata. But we can't start the allocators and do
normal btree updates until after GC has completed, and allocation
information is known to be consistent, so we need a different method of
doing btree updates.

Fortunately, we already have code for walking the btree while overlaying
keys from the journal to be replayed. This patch adds an update path
that adds keys to the list of keys to be replayed by journal replay, and
also fixes up iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
72eab8da47 bcachefs: Refactor dev usage
This is to make it more amenable for serialization.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
3187aa8d57 bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.

 - we now have more open_buckets than we used to, and the reserves work
   better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
   we're holding open_buckets pinned anymore.

 - We have the btree key cache for updates to the alloc btree, meaning
   we no longer need the btree reserve to ensure the allocator can make
   forward progress.

This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.

Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:50 -04:00
Kent Overstreet
ec3d21a9f2 bcachefs: Add error handling to unit & perf tests
This way, these tests can be used with tests that inject IO errors and
shut down the filesystem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
d8ebed7d24 bcachefs: Add btree cache stats to sysfs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet
1676a398d3 bcachefs: Delete dead journalling code
Usage of the journal has gotten somewhat simpler over time - neat.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet
29364f3453 bcachefs: Drop sysfs interface to debug parameters
It's not used much anymore, the module paramter interface is better.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
7807e14384 bcachefs: Convert various code to printbuf
printbufs know how big the buffer is that was allocated, so we can get
rid of the random PAGE_SIZEs all over the place.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
3d080aa52f bcachefs: Delete unused arguments
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
e6d1161530 bcachefs: Make copygc thread global
Per device copygc threads don't move data to different devices and they
make fragmentation works - they don't make much sense anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
89fd25be70 bcachefs: Use x-macros for data types
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
703e2a43bf bcachefs: Move stripe creation to workqueue
This is mainly to solve a lock ordering issue, and also simplifies the
code a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
ba6dd1dd49 bcachefs: Improve stripe triggers/heap code
Soon we'll be able to modify existing stripes - replacing empty blocks
with new blocks and new p/q blocks. This patch updates the trigger code
to handle pointers changing in an existing stripe; also, it
significantly improves how the stripes heap works, which means we can
get rid of the stripe creation/deletion lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
649a9b68ac bcachefs: Track sectors of erasure coded data
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
d211b408ab bcachefs: Fix lock ordering with new btree cache code
The code that checks lock ordering was recently changed to go off of the
pos of the btree node, rather than the iterator, but the btree cache
code didn't update to handle iterators that point to cached bkeys. Oops

Also, update various debug code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
5d20ba48f0 bcachefs: Use cached iterators for alloc btree
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
451570a5bc bcachefs: Implement a new gc that only recalcs oldest gen
Full mark and sweep gc doesn't (yet?) work with the new btree key cache
code, but it also blocks updates to interior btree nodes for the
duration and isn't really necessary in practice; we aren't currently
attempting to repair errors in allocation info at runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
1ada160618 bcachefs: Turn c->state_lock into an rwsem
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
495aabede3 bcachefs: Add debug code to print btree transactions
Intented to help debug deadlocks, since we can't use lockdep to check
btree node lock ordering.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00
Kent Overstreet
ab05de4ce4 bcachefs: Track incompressible data
This fixes the background_compression option: wihout some way of marking
data as incompressible, rebalance will keep rewriting incompressible
data over and over.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:34 -04:00
Kent Overstreet
3e548da8f5 bcachefs: Don't print anything when device doesn't have a label
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:34 -04:00
Kent Overstreet
1c3ff72c0f bcachefs: Convert some enums to x-macros
Helps for preventing things from getting out of sync.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet
5873efbfd9 bcachefs: Make io timers less buggy
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet
bd7e82ee2a bcachefs: kill ca->freelist_lock
All uses were supposed to be switched over to c->freelist_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet
20bceecb31 bcachefs: More work to avoid transaction restarts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
5e82a9a1f4 bcachefs: Write out fs usage consistently
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet
94f651e2c7 bcachefs: Return errors from for_each_btree_key()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet
3ea2b1e128 bcachefs: cmp_int()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet
a0e0bda117 bcachefs: Pass flags arg to bch2_alloc_write()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet
a1d58243f9 bcachefs: add ability to run gc on metadata only
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet
424eb88130 bcachefs: Only get btree iters from btree transactions
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:18 -04:00
Kent Overstreet
134915f3d3 bcachefs: Go rw lazily
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:18 -04:00
Kent Overstreet
768ac63924 bcachefs: Add a mechanism for blocking the journal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet
4c97e04aa8 bcachefs: percpu utility code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:15 -04:00