The rest of the set_nameidata() callers treat IS_ERR(pathname) as
"bail out immediately with PTR_ERR(pathname) as error". Makes
life simpler for callers; do_filp_open() is the only exception
and its callers would also benefit from such calling conventions
change.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... or visible outside of audit, really. Note that references
held in delayed_filename always have refcount 1, and from the
moment of complete_getname() or equivalent point in getname...()
there won't be any references to struct filename instance left
in places visible to other threads.
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There are two filename-related problems in io_uring and its
interplay with audit.
Filenames are imported when request is submitted and used when
it is processed. Unfortunately, the latter may very well
happen in a different thread. In that case the reference to
filename is put into the wrong audit_context - that of submitting
thread, not the processing one. Audit logics is called by
the latter, and it really wants to be able to find the names
in audit_context current (== processing) thread.
Another related problem is the headache with refcounts -
normally all references to given struct filename are visible
only to one thread (the one that uses that struct filename).
io_uring violates that - an extra reference is stashed in
audit_context of submitter. It gets dropped when submitter
returns to userland, which can happen simultaneously with
processing thread deciding to drop the reference it got.
We paper over that by making refcount atomic, but that means
pointless headache for everyone.
Solution: the notion of partially imported filenames. Namely,
already copied from userland, but *not* exposed to audit yet.
io_uring can create that in submitter thread, and complete the
import (obtaining the usual reference to struct filename) in
processing thread.
Object: struct delayed_filename.
Primitives for working with it:
delayed_getname(&delayed_filename, user_string) - copies the name from
userland, returning 0 and stashing the address of (still incomplete)
struct filename in delayed_filename on success and returning -E... on
error.
delayed_getname_uflags(&delayed_filename, user_string, atflags) -
similar, in the same relation to delayed_getname() as getname_uflags()
is to getname()
complete_getname(&delayed_filename) - completes the import of filename
stashed in delayed_filename and returns struct filename to caller,
emptying delayed_filename.
CLASS(filename_complete_delayed, name)(&delayed_filename) - variant of
CLASS(filename) with complete_getname() for constructor.
dismiss_delayed_filename(&delayed_filename) - destructor; drops whatever
might be stashed in delayed_filename, emptying it.
putname_to_delayed(&delayed_filename, name) - if name is shared, stashes
its copy into delayed_filename and drops the reference to name, otherwise
stashes the name itself in there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
s/names_cachep/names_cache/ for consistency with dentry cache.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Always allocate struct filename from names_cachep, long name or short;
short names would be embedded into struct filename. Longer ones do
not cannibalize the original struct filename - put them into kmalloc'ed
buffers (PATH_MAX-sized for import from userland, strlen() + 1 - for
ones originating kernel-side, where we know the length beforehand).
Cutoff length for short names is chosen so that struct filename would be
192 bytes long - that's both a multiple of 64 and large enough to cover
the majority of real-world uses.
Simplifies logics in getname()/putname() and friends.
[fixed an embarrassing braino in EMBEDDED_NAME_MAX, first reported by
Dan Carpenter]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Instances of struct filename come from names_cachep (via
__getname()). That is done by getname_flags() and getname_kernel()
and these two are the main callers of __getname(). However, there are
other callers that simply want to allocate PATH_MAX bytes for uses that
have nothing to do with struct filename.
We want saner allocation rules for long pathnames, so that struct
filename would *always* come from names_cachep, with the out-of-line
pathname getting kmalloc'ed. For that we need to be able to change the
size of objects allocated by getname_flags()/getname_kernel().
That requires the rest of __getname() users to stop using
names_cachep; we could explicitly switch all of those to kmalloc(),
but that would cause quite a bit of noise. So the plan is to switch
getname_...() to new helpers and turn __getname() into a wrapper for
kmalloc(). Remaining __getname() users could be converted to explicit
kmalloc() at leisure, hopefully along with figuring out what size do
they really want - PATH_MAX is an overkill for some of them, used out
of laziness ("we have a convenient helper that does 4K allocations and
that's large enough, let's use it").
As a side benefit, names_cachep is no longer used outside
of fs/namei.c, so we can move it there and be done with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Take the "long name" case into a helper (getname_long()). In
case of failure have the caller deal with freeing the original
struct filename.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In case of long name don't reread what we'd already copied.
memmove() it instead. That avoids the possibility of ending
up with empty name there and the need to look at the flags
on the slow path.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... so don't use __getname() there. Switch it (and ntfs_d_hash(), while
we are at it) to kmalloc(PATH_MAX, GFP_NOWAIT). Yes, ntfs_d_hash()
almost certainly can do with smaller allocations, but let ntfs folks
deal with that - keep the allocation size as-is for now.
Stop abusing names_cachep in ntfs, period - various uses of that thing
in there have nothing to do with pathnames; just use k[mz]alloc() and
be done with that. For now let's keep sizes as-in, but AFAICS none of
the users actually want PATH_MAX.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Originally we tried to avoid multiple insertions into audit names array
during retry loop by a cute hack - memorize the userland pointer and
if there already is a match, just grab an extra reference to it.
Cute as it had been, it had problems - two identical pointers had
audit aux entries merged, two identical strings did not. Having
different behaviour for syscalls that differ only by addresses of
otherwise identical string arguments is obviously wrong - if nothing
else, compiler can decide to merge identical string literals.
Besides, this hack does nothing for non-audited processes - they get
a fresh copy for retry. It's not time-critical, but having behaviour
subtly differ that way is bogus.
These days we have very few places that import filename more than once
(9 functions total) and it's easy to massage them so we get rid of all
re-imports. With that done, we don't need audit_reusename() anymore.
There's no need to memorize userland pointer either.
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Take getname_flags() and putname() outside of retry loop.
Since getname_flags() is the only thing that cares about LOOKUP_EMPTY,
don't bother with setting LOOKUP_EMPTY in lookup_flags - just pass it
to getname_flags() and be done with that.
The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.
In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().
The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.
In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().
The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.
In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().
The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.
In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().
The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.
Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.
The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.
Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.
The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.
Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.
The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.
Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.
The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Not all users match that model, but most of them do. By the end of
the series we'll be left with very few irregular ones...
Added:
CLASS(filename, name)(user_path) =>
getname(user_path)
CLASS(filename_kernel, name)(string) =>
getname_kernel(string)
CLASS(filename_flags, name)(user_path, flags) =>
getname_flags(user_path, flags)
CLASS(filename_uflags, name)(user_path, flags) =>
getname_uflags(user_path, flags)
CLASS(filename_maybe_null, name)(user_path, flags) =>
getname_maybe_null(user_path, flags)
all with putname() as destructor.
"flags" in filename_flags is in LOOKUP_... space, only LOOKUP_EMPTY matters.
"flags" in filename_uflags and filename_maybe_null is in AT_...... space,
and only AT_EMPTY_PATH matters.
filename_flags conventions might be worth reconsidering later (it might or
might not be better off with boolean instead)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull crypto library fixes from Eric Biggers:
- A couple more fixes for the lib/crypto KUnit tests
- Fix missing MMU protection for the AES S-box
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: aes: Fix missing MMU protection for AES S-box
MAINTAINERS: add test vector generation scripts to "CRYPTO LIBRARY"
lib/crypto: tests: Fix syntax error for old python versions
lib/crypto: tests: polyval_kunit: Increase iterations for preparekey in IRQs
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for some reported issues.
Included in here is:
- much reported rust_binder fix
- counter driver fixes
- new device ids for the mei driver
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
rust_binder: remove spin_lock() in rust_shrink_free_page()
mei: me: add nova lake point S DID
counter: 104-quad-8: Fix incorrect return value in IRQ handler
counter: interrupt-cnt: Drop IRQF_NO_THREAD flag
Pull x86 fix from Ingo Molnar:
"Disable GCOV instrumentation in the SEV noinstr.c collection of SEV
noinstr methods, to further robustify the code"
* tag 'x86-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Disable GCOV on noinstr object
Pull scheduler fix from Ingo Molnar:
"Fix a crash in sched_mm_cid_after_execve()"
* tag 'sched-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/mm_cid: Prevent NULL mm dereference in sched_mm_cid_after_execve()
Pull misc irqchip fixes from Ingo Molnar:
- Fix an endianness bug in the gic-v5 irqchip driver
- Revert a broken commit from the riscv-imsic irqchip driver
* tag 'irq-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "irqchip/riscv-imsic: Embed the vector array in lpriv"
irqchip/gic-v5: Fix gicv5_its_map_event() ITTE read endianness
In a vain attempt to consolidate the email zoo switch everything to the
kernel.org account.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull RISC-V fixes from Paul Walmsley:
"Notable changes include a fix to close one common microarchitectural
attack vector for out-of-order cores. Another patch exposed an
omission in my boot test coverage, which is currently missing
relocatable kernels. Otherwise, the fixes seem to be settling down for
us.
- Fix CONFIG_RELOCATABLE=y boots by building Image files from
vmlinux, rather than vmlinux.unstripped, now that the .modinfo
section is included in vmlinux.unstripped
- Prevent branch predictor poisoning microarchitectural attacks that
use the syscall index as a vector by using array_index_nospec() to
clamp the index after the bounds check (as x86 and ARM64 already
do)
- Fix a crash in test_kprobes when building with Clang
- Fix a deadlock possible when tracing is enabled for SBI ecalls
- Fix the definition of the Zk standard RISC-V ISA extension bundle,
which was missing the Zknh extension
- A few other miscellaneous non-functional cleanups, removing unused
macros, fixing an out-of-date path in code comments, resolving a
compile-time warning for a type mismatch in a pr_crit(), and
removing an unnecessary header file inclusion"
* tag 'riscv-for-linus-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: trace: fix snapshot deadlock with sbi ecall
riscv: remove irqflags.h inclusion in asm/bitops.h
riscv: cpu_ops_sbi: smp_processor_id() returns int, not unsigned int
riscv: configs: Clean up references to non-existing configs
riscv: kexec_image: Fix dead link to boot-image-header.rst
riscv: pgtable: Cleanup useless VA_USER_XXX definitions
riscv: cpufeature: Fix Zk bundled extension missing Zknh
riscv: fix KUnit test_kprobes crash when building with Clang
riscv: Sanitize syscall table indexing under speculation
riscv: boot: Always make Image from vmlinux, not vmlinux.unstripped
Pull driver core fixes from Danilo Krummrich:
- Fix swapped example values for the `family` and `machine` attributes
in the sysfs SoC bus ABI documentation
- Fix Rust build and intra-doc issues when optional subsystems
(CONFIG_PCI, CONFIG_AUXILIARY_BUS, CONFIG_PRINTK) are disabled
- Fix typos and incorrect safety comments in Rust PCI, DMA, and
device ID documentation
* tag 'driver-core-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
rust: device: Remove explicit import of CStrExt
rust: pci: fix typos in Bar struct's comments
rust: device: fix broken intra-doc links
rust: dma: fix broken intra-doc links
rust: driver: fix broken intra-doc links to example driver types
rust: device_id: replace incorrect word in safety documentation
rust: dma: remove incorrect safety documentation
docs: ABI: sysfs-devices-soc: Fix swapped sample values
Pull kselftest fix from Shuah Khan:
"Fix tracing test_multiple_writes stalls when buffer_size_kb is less
than 12KB"
* tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/tracing: Fix test_multiple_writes stall
Pull iomu fixes from Joerg Roedel:
- several Kconfig-related build fixes
- fix for when gcc 8.5 on PPC refuses to inline a function from a
header file
* tag 'iommu-fixes-v6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommupt: Make pt_feature() always_inline
iommufd/selftest: Prevent module/builtin conflicts in kconfig
iommufd/selftest: Add missing kconfig for DMA_SHARED_BUFFER
iommupt: Fix the kunit building
Sheng Yong reported [1] that Android APEX images didn't work with commit
072a7c7cdb ("erofs: don't bother with s_stack_depth increasing for
now") because "EROFS-formatted APEX file images can be stored within an
EROFS-formatted Android system partition."
In response, I sent a quick fat-fingered [PATCH v3] to address the
report. Unfortunately, the updated condition was incorrect:
if (erofs_is_fileio_mode(sbi)) {
- sb->s_stack_depth =
- file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
- if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
- erofs_err(sb, "maximum fs stacking depth exceeded");
+ inode = file_inode(sbi->dif0.file);
+ if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
+ inode->i_sb->s_stack_depth) {
The condition `!sb->s_bdev` is always true for all file-backed EROFS
mounts, making the check effectively a no-op.
The real fix tested and confirmed by Sheng Yong [2] at that time was
[PATCH v3 RESEND], which correctly ensures the following EROFS^2 setup
works:
EROFS (on a block device) + EROFS (file-backed mount)
But sadly I screwed it up again by upstreaming the outdated [PATCH v3].
This patch applies the same logic as the delta between the upstream
[PATCH v3] and the real fix [PATCH v3 RESEND].
Reported-by: Sheng Yong <shengyong1@xiaomi.com>
Closes: https://lore.kernel.org/r/3acec686-4020-4609-aee4-5dae7b9b0093@gmail.com [1]
Fixes: 072a7c7cdb ("erofs: don't bother with s_stack_depth increasing for now")
Link: https://lore.kernel.org/r/243f57b8-246f-47e7-9fb1-27a771e8e9e8@gmail.com [2]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc 8.5 on powerpc does not automatically inline these functions even
though they evaluate to constants in key cases. Since the constant
propagation is essential for some code elimination and built-time checks
this causes a build failure:
ERROR: modpost: "__pt_no_sw_bit" [drivers/iommu/generic_pt/fmt/iommu_amdv1.ko] undefined!
Caused by this:
if (pts_feature(&pts, PT_FEAT_DMA_INCOHERENT) &&
!pt_test_sw_bit_acquire(&pts,
SW_BIT_CACHE_FLUSH_DONE))
flush_writes_item(&pts);
Where pts_feature() evaluates to a constant false. Mark them as
__always_inline to force it to evaluate to a constant and trigger the code
elimination.
Fixes: 7c5b184db7 ("genpt: Generic Page Table base API")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512230720.9y9DtWIo-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The selftest now depends on the AMDv1 page table, however the selftest
kconfig itself is just an sub-option of the main IOMMUFD module kconfig.
This means it cannot be modular and so kconfig allowed a modular
IOMMU_PT_AMDV1 with a built in IOMMUFD. This causes link failures:
ld: vmlinux.o: in function `mock_domain_alloc_pgtable.isra.0':
selftest.c:(.text+0x12e8ad3): undefined reference to `pt_iommu_amdv1_init'
ld: vmlinux.o: in function `BSWAP_SHUFB_CTL':
sha1-avx2-asm.o:(.rodata+0xaa36a8): undefined reference to `pt_iommu_amdv1_read_and_clear_dirty'
ld: sha1-avx2-asm.o:(.rodata+0xaa36f0): undefined reference to `pt_iommu_amdv1_map_pages'
ld: sha1-avx2-asm.o:(.rodata+0xaa36f8): undefined reference to `pt_iommu_amdv1_unmap_pages'
ld: sha1-avx2-asm.o:(.rodata+0xaa3720): undefined reference to `pt_iommu_amdv1_iova_to_phys'
Adjust the kconfig to disable IOMMUFD_TEST if IOMMU_PT_AMDV1 is incompatible.
Fixes: e93d5945ed ("iommufd: Change the selftest to use iommupt instead of xarray")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512210135.freQWpxa-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The test doesn't build without it, dma-buf.h does not provide stub
functions if it is not enabled. Compilation can fail with:
ERROR:root:ld: vmlinux.o: in function `iommufd_test':
(.text+0x3b1cdd): undefined reference to `dma_buf_get'
ld: (.text+0x3b1d08): undefined reference to `dma_buf_put'
ld: (.text+0x3b2105): undefined reference to `dma_buf_export'
ld: (.text+0x3b211f): undefined reference to `dma_buf_fd'
ld: (.text+0x3b2e47): undefined reference to `dma_buf_move_notify'
Add the missing select.
Fixes: d2041f1f11 ("iommufd/selftest: Add some tests for the dmabuf flow")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The kunit doesn't work since the below commit made GENERIC_PT
unselectable:
$ make ARCH=x86_64 O=build_kunit_x86_64 olddefconfig
ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config.
This is probably due to unsatisfied dependencies.
Missing: CONFIG_DEBUG_GENERIC_PT=y, CONFIG_IOMMUFD_TEST=y,
CONFIG_IOMMU_PT_X86_64=y, CONFIG_GENERIC_PT=y, CONFIG_IOMMU_PT_AMDV1=y,
CONFIG_IOMMU_PT_VTDSS=y, CONFIG_IOMMU_PT=y, CONFIG_IOMMU_PT_KUNIT_TEST=y
Also remove the unneeded CONFIG_IOMMUFD_TEST reference as the iommupt kunit
doesn't interact with iommufd, and it doesn't currently build for the
kunit due problems with DMA_SHARED buffer either.
Fixes: 01569c216d ("genpt: Make GENERIC_PT invisible")
Fixes: 1dd4187f53 ("iommupt: Add a kunit test for Generic Page Table")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Pull erofs fix from Gao Xiang:
- Don't increase s_stack_depth which caused regressions in some
composefs mount setups (EROFS + ovl^2)
Instead just allow one extra unaccounted fs stacking level for
straightforward cases.
* tag 'erofs-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: don't bother with s_stack_depth increasing for now
Previously, commit d53cd891f0 ("erofs: limit the level of fs stacking
for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
stack overflow when stacking an unlimited number of EROFS on top of
each other.
This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
(and such setups are already used in production for quite a long time).
One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
from 2 to 3, but proving that this is safe in general is a high bar.
After a long discussion on GitHub issues [1] about possible solutions,
one conclusion is that there is no need to support nesting file-backed
EROFS mounts on stacked filesystems, because there is always the option
to use loopback devices as a fallback.
As a quick fix for the composefs regression for this cycle, instead of
bumping `s_stack_depth` for file backed EROFS mounts, we disallow
nesting file-backed EROFS over EROFS and over filesystems with
`s_stack_depth` > 0.
This works for all known file-backed mount use cases (composefs,
containerd, and Android APEX for some Android vendors), and the fix is
self-contained.
Essentially, we are allowing one extra unaccounted fs stacking level of
EROFS below stacking filesystems, but EROFS can only be used in the read
path (i.e. overlayfs lower layers), which typically has much lower stack
usage than the write path.
We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
stack usage analysis or using alternative approaches, such as splitting
the `s_stack_depth` limitation according to different combinations of
stacking.
Fixes: d53cd891f0 ("erofs: limit the level of fs stacking for file-backed mounts")
Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
Reported-by: Timothée Ravier <tim@siosm.fr>
Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Alexander Larsson <alexl@redhat.com>
Reviewed-and-tested-by: Sheng Yong <shengyong1@xiaomi.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Pull block fixes from Jens Axboe:
- Kill unlikely checks for blk-rq-qos. These checks are really
all-or-nothing, either the branch is taken all the time, or it's not.
Depending on the configuration, either one of those cases may be
true. Just remove the annotation
- Fix for merging bios with different app tags set
- Fix for a recently introduced slowdown due to RCU synchronization
- Fix for a status change on loop while it's in use, and then a later
fix for that fix
- Fix for the async partition scanning in ublk
* tag 'block-6.19-20260109' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
ublk: fix use-after-free in ublk_partition_scan_work
blk-mq: avoid stall during boot due to synchronize_rcu_expedited
loop: add missing bd_abort_claiming in loop_set_status
block: don't merge bios with different app_tags
blk-rq-qos: Remove unlikely() hints from QoS checks
loop: don't change loop device under exclusive opener in loop_set_status