- Centralize format strings in bcachefs.h
- Add bch2_fmt_inum_offset() and related helpers
- Switch error messages for inodes to also print out the offset, in
bytes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Warnings ought to always have a format string/log message - makes them
considerably more useful.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This switches where we take quota reservations to be per bch_wirte_op
instead of per dio_write, so we can drop the quota reservation in the
same place as we call i_sectors_acct(), and only take/release
ei_quota_lock once.
In the future we'd like ei_quota_lock to not be a mutex, so that we can
avoid punting to process context before deliving write completions in
nocow mode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We have a unique lock used for controlling adding to the pagecache: the
lock has two states, where both states are shared - the lock may be held
multiple times for either state - but not both states at the same time.
This is exactly what we need for nocow mode locking, so this patch pulls
it out of fs.c into its own file.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
BCH_WRITE_FLUSH is a write flag that causes a journal flush. It's only
used in the direct IO path, and this will allow for some consolidation
with the regular fsync path, which will help with the upcoming nocow
mode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- With BCH_WRITE_SYNC, we no longer need the completion in struct
dio_write
- Pull out bch2_dio_write_copy_iov() into a separate non-inline
function, it's code that doesn't run in the common case
- Copy mapping and inode pointers into dio_write, avoiding pointer
chasing at the start of bch2_dio_write_loop()
- kthread_use_mm() is not needed in the common case; move it into
bch2_dio_write_loop_async()
- factor out various helpers from bch2_dio_write_loop() and rework
control flow for better icache utilization
Other small optimizations:
- bch2_keylist_free() is only used in one place, at the end of the
bch2_write() path - drop the reinit
- in bch2_disk_reservation_put(), check if res->sectors is nonzero
before touching c->online_reserved, since that will likely be a cache
miss
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bcachefs: More DIO write path optimization
Better code prefetching (?)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds a new flag for the write path, BCH_WRITE_SYNC, and switches
the O_DIRECT write path to use it when we're not running asynchronously.
It runs the btree update after the write in the original thread's
context instead of a kworker, cutting context switches in half.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Per fstests generic/275, on -ENOSPC we're supposed write until the
filesystem is full - i.e. do a partial write instead of failing the full
write.
This is a partial fix for the buffered write path: we'll still fail on a
page boundary.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:
ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- We now correctly allow soft limits to be exceeded, instead of always
returning -EDQUOT
- Disk quota grate times/warnings can now be set, not just the
systemwide defaults
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When modifying a file, we may be required to drop the suid/sgid bits -
we were missing a file_modified() call to do this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
An error case was jumping to the wrong label, creating an infinite loop
- oops.
This fixes fstests generic/648.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds a new helper, quota_reserve_range(), which takes a quota
reservation for unallocated blocks in a given file range, and uses it in
bch2_remap_file_range().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This used to be needed more for buffered IO, but now the block layer has
writeback throttling - we can delete this now.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The next patch is going to be adding private error codes for all the
places we return -ENOSPC.
Additionally, this patch updates return paths at all module boundaries
to call bch2_err_class(), to return the standard error code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Per Dave Chinner and the xfs folks, .writepage is no longer needed, and
it's better not to define it if .writepages is the intended path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This improves some of our warnings and assertions - they imply possible
filesystem inconsistencies, so they should be calling
bch2_fs_inconsistent().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
With backpointers this doesn't work anymore - backpointers always need
to be updated to point to the new extent position.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
These warnings are symptomatic of something else going wrong, we don't
want them spamming up the logs as that'll make it harder to find the
real issue.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When the iov_iter is a bvec iter, it's possible the IO was submitted
from a kthread that didn't have an mm to switch to.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This fixes a bug in the DIO read path where, when using a loopback
device in DIO mode, we'd allocate a biovec that would get overwritten
and leaked in bio_iov_iter_get_pages() -> bio_iov_bvec_set().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since we retry reads when we discover we read from a pointer that went
stale, if a dirty pointer is erroniously stale it would cause us to loop
retrying that read forever - unless we check before issuing the read,
while the btree is still locked, when we know that a dirty pointer
should never be stale.
This patch adds that check, along with printing some helpful debug info.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We recently added an assertion that when we truncate a file to 0,
i_blocks should also go to 0 - but that's not necessarily true if we're
doing an emergency shutdown, lots of invariants no longer hold true in
that case.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
A user reported hitting this assertion, and we can't reproduce it yet,
but it shouldn't be fatal - so convert it to a warning.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This tweaks the fallocate code to also update the page cache to reflect
the new on disk reservations, giving us better i_sectors consistency.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch adds code to read page state before writing to pages that
aren't uptodate, which corrects i_sectors being tempororarily too large
and means we may not need to get a disk reservation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# Conflicts:
# fs/bcachefs/fs-io.c
Reading from cached data, which calls bch2_bucket_io_time_reset(), is
leading to transaction iterator overflows - this standardizes the
workaround.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This fixes another i_sectors accounting bug - we need to differentiate
between dirty writes that overwrite a reservation and dirty writes to
unallocated space - dirty writes to unallocated space increase
i_sectors, dirty writes over a reservation do not.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When bch2_truncate_page() discards dirty sectors in the page cache, we
need to account for that - we don't need to account for allocated
sectors because that'll be done by the bch2_fpunch() call when it
updates the btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
If at all possible we'd prefer to not fail page writeback unless the
filesystem has been shutdown; allowing errors in page writeback means
things we'd like to assert about i_size consistency between the VFS and
the btree go out the window.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- fpunch wasn't always correctly updating i_size - when we drop buffered
writes that were extending a file, we become responsible for writing
i_size.
- fzero was sometimes zeroing out more data that it should have -
block_start and block_end were being rounded in the wrong directions
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Now that we're recording in each inode the journal sequence number of
the most recent update, fsync becomes a lot simpler and we can delete
all the plumbing for ei_journal_seq.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Code that uses for_each_btree_key often wants transaction restarts to be
handled locally and not returned. Originally, we wouldn't return
transaction restarts if there was a single iterator in the transaction -
the reasoning being if there weren't other iterators being invalidated,
and the current iterator was being advanced/retraversed, there weren't
any locks or iterators we were required to preserve.
But with the btree_path conversion that approach doesn't work anymore -
even when we're using for_each_btree_key() with a single iterator there
will still be two paths in the transaction, since we now always preserve
the path at the pos the iterator was initialized at - the reason being
that on restart we often restart from the same place.
And it turns out there's now a lot of for_each_btree_key() uses that _do
not_ want transaction restarts handled locally, and should be returning
them.
This patch splits out for_each_btree_key_norestart() and
for_each_btree_key_continue_norestart(), and converts existing users as
appropriate. for_each_btree_key(), for_each_btree_key_continue(), and
for_each_btree_node() now handle transaction restarts themselves by
calling bch2_trans_begin() when necessary - and the old hack to not
return transaction restarts when there's a single path in the
transaction has been deleted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This plumbs around the subvolume ID as was done previously for other
filesystem code, but now for the IO paths - the control flow in the IO
paths is trickier so the changes in this patch are more involved.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
To implement snapshots, we need every filesystem btree operation (every
btree operation without a subvolume) to start by looking up the
subvolume and getting the current snapshot ID, with
bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing
btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.
This patch adds those bch2_subvolume_get_snapshot() calls, and also
switches to passing around a subvol_inum instead of just an inode
number.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.
This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
__bch2_read() -> __bch2_read_extent() -> bch2_bucket_io_time_reset() may
cause a transaction restart, which we don't return an error for because
it doesn't prevent us from making forward progress on the read we're
submitting.
Instead, change __bch2_read() and bchfs_read() to check for transaction
restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Upcoming patch will require that a transaction restart is always
immediately followed by bch2_trans_begin().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is needed for snapshots because we need to start handling lock
restarts even when just calling bch2_inode_peek().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Do not attempt to shortcut a truncate when the given new size is
the same as the current size. There may be blocks allocated to the
file that extend beyond the i_size. The ctime and mtime should
not be updated in this case.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
After the v5.12 rebase, we started oopsing when truncate was passed
ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This
refactors things so that truncate/extend finish by using
bch2_setattr_nonsize(), which solves the problem.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.
This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Commit c42bca92be "bio: don't copy bvec
for direct IO" changed bio_iov_iter_get_pages() to point bio->bi_iovec
at the incoming biovec, meaning if we already allocated one, it'll be
leaked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We already had op->end_io as an alternative mechanism to op->cl.parent
for delivering write completions; this switches all code paths to using
op->end_io.
Two reasons:
- op->end_io is more efficient, due to fewer atomic ops, this completes
the conversion that was originally only done for the direct IO path.
- We'll be restructing the write path to use a different mechanism for
punting to process context, refactoring to not use op->cl will make
that easier.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Buffered writes may have to increase their disk reservation at btree
update time, due to compression and erasure coding being unpredictable:
O_DIRECT writes should be checking for -ENOSPC, but buffered writes have
already been accepted and should not.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Writeback throttling is a kernel config option and not always enabled.
When it's not enabled we need a fallback, to avoid unbounded memory
pinning and work item backlogs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Upcoming patch is going to disallow multiple btree_trans on the stack.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
An option was added to control whether reflink support was on or off
because for a long time, reflink + inline data extent support was
missing - but that's since been fixed, so we can drop the option now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In the read path, for retry of indirect extents to work we need to
differentiate between the location in the btree the read was for, vs.
the location where we found the data. This patch adds that plumbing to
bch_read_bio.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev()
consistent with peek() and next(), w.r.t. iter->pos.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We had a deadlock on page_lock, because buffered reads signal completion
by unlocking the page, but the dio read path normally dirties the pages
it's reading to with set_page_dirty_lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.
In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Originally, we'd check for -ENOSPC when getting a disk reservation
whenever the new extent took up more space on disk than the old extent.
Erasure coding screwed this up, because with erasure coding writes are
initially replicated, and then in the background the extra replicas are
dropped when the stripe is created. This means that with erasure coding
enabled, writes will always take up more space on disk than the data
they're overwriting - but, according to posix, overwrites aren't
supposed to return ENOSPC.
So, in this patch we fudge things: if the new extent has more replicas
than the _effective_ replicas of the old extent, or if the old extent is
compressed and the new one isn't, we check for ENOSPC when getting the
disk reservation - otherwise, we don't.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With the btree key cache code, we don't need to update the alloc btree
lazily - and this will mean we can remove the bch2_alloc_write() call in
the shutdown path.
Future work: we really need to expend the bucket IO clocks from 16 to 64
bits, so that we don't have to rescale them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
On write error, the vfs inode's i_size may be inconsistent with the
btree inode's i_size - flag this so we don't have spurious assertions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
it's useful to know whether an error was for a read or a write - this
also standardizes error messages a bit more.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.
This patch also simplifies the iterator allocation code a bit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We were incorrectly ignoring the return value of __readahead_batch,
leading to a null ptr deref in __bch2_page_state_create().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In the dio write path, when get_user_pages() invokes the fault handler
we have a recursive locking situation - we have to handle the lock
ordering ourselves or we have a deadlock: this patch addresses that by
checking for locking ordering violations and doing the unlock/relock
dance if necessary.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
These recently added helpers simplify the code.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is dead code; delete the function.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If the bkey_on_stack_reassemble() call in __bch2_read_indirect_extent()
reallocates the buffer, k in bch2_read - which we pointed at the
bkey_on_stack buffer - will now point to a stale buffer. Whoops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
__bch2_truncate_page() will mark some of the blocks in a page as
unallocated. But, if the page is mmapped (and writable), every block in
the page needs to be marked dirty, else those blocks won't be written by
__bch2_writepage().
The solution is to change those userspace mappings to RO, so that we
force bch2_page_mkwrite() to be called again.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In the buffered write path, we have to check for short writes that write
to the full page, where the page wasn't UpToDate; when this happens, the
page is partly garbage, so we have to zero it out and revert that part
of the write.
This check was wrong - we reverted total from copied, but didn't revert
the iov_iter, probably also leading to corrupted writes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
It appears this was erronious, a different bug was responsible
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug where the BCH_WRITE_SKIP_CLOSURE_PUT was set
incorrectly, causing the completion to be delivered multiple times.
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reflink might be buggy, so we're adding an option so users can help
bisect what's going on.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When a bkey_on_stack is passed to bch_read_indirect_extent, there is no
guarantee that it will be big enough to hold the bkey. And
bch_read_indirect_extent is not aware of bkey_on_stack to call realloc
on it. This cause a stack corruption.
This commit makes bch_read_indirect_extent aware of bkey_on_stack so it
can call realloc when appropriate.
Tested-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
the bcachefs io path in io.c can't bounce writes larger than that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This was another bug because of bch2_btree_iter_set_pos() invalidating
iterators.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
All iterators should be released now with bch2_trans_iter_put(), so
TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is
always used.
Also convert more code to __bch2_trans_do().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, when doing multiple update in the same transaction commit
that overwrote each other, we relied on doing the updates in the same
order as the bch2_trans_update() calls in order to get the correct
result. But that wasn't correct for triggers; bch2_trans_mark_update()
when marking overwrites would do the wrong thing because it hadn't seen
the update that was being overwritten.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
BTREE_INSERT_ATOMIC should really be the default mode, and there's not
that much code that doesn't need it - so this is prep work for getting
rid of the flag.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
It needs to be called when we get -EINTR due to e.g. lock restart - this
fixes a transaction iterators overflow bug.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Normally the in memory i_size is always greater than or equal to i_size
on disk; this doesn't hold on filesystem error.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This implements extents that have their data inline, in the value,
instead of the bkey value being pointers to the data - and the read and
write paths are updated to read from these new extent types and write
them out, when the write size is small enough.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This changes bch2_cut_front and bch2_cut_back so that they're able to
shorten the size of the value, and it also changes the extent update
path to update the accounting in the btree node when this happens.
When the size of the value is shortened, they zero out the space that's
no longer used, so it's interpreted as noops (as implemented in the last
patch).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>