Commit Graph

1413152 Commits

Author SHA1 Message Date
Al Viro
2e2d64aea5 do_filp_open(): DTRT when getting ERR_PTR() as pathname
The rest of the set_nameidata() callers treat IS_ERR(pathname) as
"bail out immediately with PTR_ERR(pathname) as error".  Makes
life simpler for callers; do_filp_open() is the only exception
and its callers would also benefit from such calling conventions
change.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:18:07 -05:00
Al Viro
ba33ac100d ksmbd_vfs_rename(): vfs_path_parent_lookup() accepts ERR_PTR() as name
no need to check in the caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:18:07 -05:00
Al Viro
edefe6bda7 ksmbd_vfs_path_lookup(): vfs_path_parent_lookup() accepts ERR_PTR() as name
no need to check in the caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:18:07 -05:00
Al Viro
1c38f1f9b0 move_mount(): filename_lookup() accepts ERR_PTR() as filename
no need to check it in the caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:18:07 -05:00
Al Viro
def2a02a4c file_setattr(): filename_lookup() accepts ERR_PTR() as filename
no need to check it in the caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:18:07 -05:00
Al Viro
58a49cc9eb file_getattr(): filename_lookup() accepts ERR_PTR() as filename
no need to check it in the caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:18:07 -05:00
Al Viro
741c97fecb struct filename ->refcnt doesn't need to be atomic
... or visible outside of audit, really.  Note that references
held in delayed_filename always have refcount 1, and from the
moment of complete_getname() or equivalent point in getname...()
there won't be any references to struct filename instance left
in places visible to other threads.

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:18:07 -05:00
Al Viro
9fa3ec8458 allow incomplete imports of filenames
There are two filename-related problems in io_uring and its
interplay with audit.

Filenames are imported when request is submitted and used when
it is processed.  Unfortunately, the latter may very well
happen in a different thread.  In that case the reference to
filename is put into the wrong audit_context - that of submitting
thread, not the processing one.  Audit logics is called by
the latter, and it really wants to be able to find the names
in audit_context current (== processing) thread.

Another related problem is the headache with refcounts -
normally all references to given struct filename are visible
only to one thread (the one that uses that struct filename).
io_uring violates that - an extra reference is stashed in
audit_context of submitter.  It gets dropped when submitter
returns to userland, which can happen simultaneously with
processing thread deciding to drop the reference it got.

We paper over that by making refcount atomic, but that means
pointless headache for everyone.

Solution: the notion of partially imported filenames.  Namely,
already copied from userland, but *not* exposed to audit yet.

io_uring can create that in submitter thread, and complete the
import (obtaining the usual reference to struct filename) in
processing thread.

Object: struct delayed_filename.

Primitives for working with it:

delayed_getname(&delayed_filename, user_string) - copies the name from
userland, returning 0 and stashing the address of (still incomplete)
struct filename in delayed_filename on success and returning -E... on
error.

delayed_getname_uflags(&delayed_filename, user_string, atflags) -
similar, in the same relation to delayed_getname() as getname_uflags()
is to getname()

complete_getname(&delayed_filename) - completes the import of filename
stashed in delayed_filename and returns struct filename to caller,
emptying delayed_filename.

CLASS(filename_complete_delayed, name)(&delayed_filename) - variant of
CLASS(filename) with complete_getname() for constructor.

dismiss_delayed_filename(&delayed_filename) - destructor; drops whatever
might be stashed in delayed_filename, emptying it.

putname_to_delayed(&delayed_filename, name) - if name is shared, stashes
its copy into delayed_filename and drops the reference to name, otherwise
stashes the name itself in there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:18:07 -05:00
Al Viro
a9900a27df switch __getname_maybe_null() to CLASS(filename_flags)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:18:07 -05:00
Mateusz Guzik
7ca83f8ebe fs: hide names_cache behind runtime const machinery
s/names_cachep/names_cache/ for consistency with dentry cache.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:17:26 -05:00
Al Viro
8c888b3190 struct filename: saner handling of long names
Always allocate struct filename from names_cachep, long name or short;
short names would be embedded into struct filename.  Longer ones do
not cannibalize the original struct filename - put them into kmalloc'ed
buffers (PATH_MAX-sized for import from userland, strlen() + 1 - for
ones originating kernel-side, where we know the length beforehand).

Cutoff length for short names is chosen so that struct filename would be
192 bytes long - that's both a multiple of 64 and large enough to cover
the majority of real-world uses.

Simplifies logics in getname()/putname() and friends.

[fixed an embarrassing braino in EMBEDDED_NAME_MAX, first reported by
Dan Carpenter]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
c3a3577cdb struct filename: use names_cachep only for getname() and friends
Instances of struct filename come from names_cachep (via
__getname()).  That is done by getname_flags() and getname_kernel()
and these two are the main callers of __getname().  However, there are
other callers that simply want to allocate PATH_MAX bytes for uses that
have nothing to do with struct filename.

	We want saner allocation rules for long pathnames, so that struct
filename would *always* come from names_cachep, with the out-of-line
pathname getting kmalloc'ed.  For that we need to be able to change the
size of objects allocated by getname_flags()/getname_kernel().

	That requires the rest of __getname() users to stop using
names_cachep; we could explicitly switch all of those to kmalloc(),
but that would cause quite a bit of noise.  So the plan is to switch
getname_...() to new helpers and turn __getname() into a wrapper for
kmalloc().  Remaining __getname() users could be converted to explicit
kmalloc() at leisure, hopefully along with figuring out what size do
they really want - PATH_MAX is an overkill for some of them, used out
of laziness ("we have a convenient helper that does 4K allocations and
that's large enough, let's use it").

	As a side benefit, names_cachep is no longer used outside
of fs/namei.c, so we can move it there and be done with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
8f2ac84817 getname_flags() massage, part 2
Take the "long name" case into a helper (getname_long()). In
case of failure have the caller deal with freeing the original
struct filename.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
8ba29c85e2 getname_flags() massage, part 1
In case of long name don't reread what we'd already copied.
memmove() it instead.  That avoids the possibility of ending
up with empty name there and the need to look at the flags
on the slow path.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
ca2a04e84a ntfs: ->d_compare() must not block
... so don't use __getname() there.  Switch it (and ntfs_d_hash(), while
we are at it) to kmalloc(PATH_MAX, GFP_NOWAIT).  Yes, ntfs_d_hash()
almost certainly can do with smaller allocations, but let ntfs folks
deal with that - keep the allocation size as-is for now.

Stop abusing names_cachep in ntfs, period - various uses of that thing
in there have nothing to do with pathnames; just use k[mz]alloc() and
be done with that.  For now let's keep sizes as-in, but AFAICS none of
the users actually want PATH_MAX.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
41670a5900 get rid of audit_reusename()
Originally we tried to avoid multiple insertions into audit names array
during retry loop by a cute hack - memorize the userland pointer and
if there already is a match, just grab an extra reference to it.

Cute as it had been, it had problems - two identical pointers had
audit aux entries merged, two identical strings did not.  Having
different behaviour for syscalls that differ only by addresses of
otherwise identical string arguments is obviously wrong - if nothing
else, compiler can decide to merge identical string literals.

Besides, this hack does nothing for non-audited processes - they get
a fresh copy for retry.  It's not time-critical, but having behaviour
subtly differ that way is bogus.

These days we have very few places that import filename more than once
(9 functions total) and it's easy to massage them so we get rid of all
re-imports.  With that done, we don't need audit_reusename() anymore.
There's no need to memorize userland pointer either.

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
1ee5220eb3 do_readlinkat(): import pathname only once
Take getname_flags() and putname() outside of retry loop.

Since getname_flags() is the only thing that cares about LOOKUP_EMPTY,
don't bother with setting LOOKUP_EMPTY in lookup_flags - just pass it
to getname_flags() and be done with that.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
cf6b819c22 do_sys_truncate(): import pathname only once
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
85a4fe3c99 user_statfs(): import pathname only once
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
c3fa2b7cf5 chroot(2): import pathname only once
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
592ab7fbb8 chdir(2): import pathname only once
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
b756d8ba83 do_utimes_path(): import pathname only once
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
2e2d892fe9 do_fchownat(): import pathname only once
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
67591df968 do_fchmodat(): import pathname only once
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:44 -05:00
Al Viro
0cf1149673 do_faccessat(): import pathname only once
Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:16:43 -05:00
Al Viro
24df85ffb9 allow to use CLASS() for struct filename *
Not all users match that model, but most of them do.  By the end of
the series we'll be left with very few irregular ones...

Added:
CLASS(filename, name)(user_path) =>
	getname(user_path)
CLASS(filename_kernel, name)(string) =>
	getname_kernel(string)
CLASS(filename_flags, name)(user_path, flags) =>
	getname_flags(user_path, flags)
CLASS(filename_uflags, name)(user_path, flags) =>
	getname_uflags(user_path, flags)
CLASS(filename_maybe_null, name)(user_path, flags) =>
	getname_maybe_null(user_path, flags)
all with putname() as destructor.

"flags" in filename_flags is in LOOKUP_... space, only LOOKUP_EMPTY matters.
"flags" in filename_uflags and filename_maybe_null is in AT_...... space,
and only AT_EMPTY_PATH matters.

filename_flags conventions might be worth reconsidering later (it might or
might not be better off with boolean instead)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:15:47 -05:00
Al Viro
12b5bc2a0d init_link(): turn into a trivial wrapper for do_linkat()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:03:32 -05:00
Al Viro
8714a249da init_symlink(): turn into a trivial wrapper for do_symlinkat()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:02:36 -05:00
Al Viro
b0f27ace08 init_mkdir(): turn into a trivial wrapper for do_mkdirat()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:01:38 -05:00
Al Viro
4bfe0692d6 init_mknod(): turn into a trivial wrapper for do_mknodat()
Same as init_unlink() and init_rmdir() already are; the only obstacle
is do_mknodat() being static.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13 15:01:32 -05:00
Linus Torvalds
0f61b1860c Linux 6.19-rc5 v6.19-rc5 2026-01-11 17:03:14 -10:00
Linus Torvalds
7143203341 Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:

 - A couple more fixes for the lib/crypto KUnit tests

 - Fix missing MMU protection for the AES S-box

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: aes: Fix missing MMU protection for AES S-box
  MAINTAINERS: add test vector generation scripts to "CRYPTO LIBRARY"
  lib/crypto: tests: Fix syntax error for old python versions
  lib/crypto: tests: polyval_kunit: Increase iterations for preparekey in IRQs
2026-01-11 15:07:56 -10:00
Linus Torvalds
9c7ef209cd Merge tag 'char-misc-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are some small char/misc driver fixes for some reported issues.
  Included in here is:

   - much reported rust_binder fix

   - counter driver fixes

   - new device ids for the mei driver

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  rust_binder: remove spin_lock() in rust_shrink_free_page()
  mei: me: add nova lake point S DID
  counter: 104-quad-8: Fix incorrect return value in IRQ handler
  counter: interrupt-cnt: Drop IRQF_NO_THREAD flag
2026-01-11 07:27:44 -10:00
Linus Torvalds
316a94cb63 Merge tag 'x86-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "Disable GCOV instrumentation in the SEV noinstr.c collection of SEV
  noinstr methods, to further robustify the code"

* tag 'x86-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Disable GCOV on noinstr object
2026-01-11 07:19:43 -10:00
Linus Torvalds
fac4bdbaca Merge tag 'sched-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Fix a crash in sched_mm_cid_after_execve()"

* tag 'sched-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/mm_cid: Prevent NULL mm dereference in sched_mm_cid_after_execve()
2026-01-11 07:11:53 -10:00
Linus Torvalds
fe948326e9 Merge tag 'perf-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
 "Fix perf swevent hrtimer deinit regression"

* tag 'perf-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Ensure swevent hrtimer is properly destroyed
2026-01-11 06:55:27 -10:00
Linus Torvalds
88730166f3 Merge tag 'irq-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc irqchip fixes from Ingo Molnar:

 - Fix an endianness bug in the gic-v5 irqchip driver

 - Revert a broken commit from the riscv-imsic irqchip driver

* tag 'irq-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "irqchip/riscv-imsic: Embed the vector array in lpriv"
  irqchip/gic-v5: Fix gicv5_its_map_event() ITTE read endianness
2026-01-11 06:36:20 -10:00
Thomas Gleixner
2e4b28c48f treewide: Update email address
In a vain attempt to consolidate the email zoo switch everything to the
kernel.org account.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-11 06:09:11 -10:00
Linus Torvalds
755bc1335e Merge tag 'riscv-for-linus-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
 "Notable changes include a fix to close one common microarchitectural
  attack vector for out-of-order cores. Another patch exposed an
  omission in my boot test coverage, which is currently missing
  relocatable kernels. Otherwise, the fixes seem to be settling down for
  us.

   - Fix CONFIG_RELOCATABLE=y boots by building Image files from
     vmlinux, rather than vmlinux.unstripped, now that the .modinfo
     section is included in vmlinux.unstripped

   - Prevent branch predictor poisoning microarchitectural attacks that
     use the syscall index as a vector by using array_index_nospec() to
     clamp the index after the bounds check (as x86 and ARM64 already
     do)

   - Fix a crash in test_kprobes when building with Clang

   - Fix a deadlock possible when tracing is enabled for SBI ecalls

   - Fix the definition of the Zk standard RISC-V ISA extension bundle,
     which was missing the Zknh extension

   - A few other miscellaneous non-functional cleanups, removing unused
     macros, fixing an out-of-date path in code comments, resolving a
     compile-time warning for a type mismatch in a pr_crit(), and
     removing an unnecessary header file inclusion"

* tag 'riscv-for-linus-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: trace: fix snapshot deadlock with sbi ecall
  riscv: remove irqflags.h inclusion in asm/bitops.h
  riscv: cpu_ops_sbi: smp_processor_id() returns int, not unsigned int
  riscv: configs: Clean up references to non-existing configs
  riscv: kexec_image: Fix dead link to boot-image-header.rst
  riscv: pgtable: Cleanup useless VA_USER_XXX definitions
  riscv: cpufeature: Fix Zk bundled extension missing Zknh
  riscv: fix KUnit test_kprobes crash when building with Clang
  riscv: Sanitize syscall table indexing under speculation
  riscv: boot: Always make Image from vmlinux, not vmlinux.unstripped
2026-01-10 15:54:41 -10:00
Linus Torvalds
0fa27899e0 Merge tag 'driver-core-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:

 - Fix swapped example values for the `family` and `machine` attributes
   in the sysfs SoC bus ABI documentation

 - Fix Rust build and intra-doc issues when optional subsystems
   (CONFIG_PCI, CONFIG_AUXILIARY_BUS, CONFIG_PRINTK) are disabled

 - Fix typos and incorrect safety comments in Rust PCI, DMA, and
   device ID documentation

* tag 'driver-core-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  rust: device: Remove explicit import of CStrExt
  rust: pci: fix typos in Bar struct's comments
  rust: device: fix broken intra-doc links
  rust: dma: fix broken intra-doc links
  rust: driver: fix broken intra-doc links to example driver types
  rust: device_id: replace incorrect word in safety documentation
  rust: dma: remove incorrect safety documentation
  docs: ABI: sysfs-devices-soc: Fix swapped sample values
2026-01-10 15:04:04 -10:00
Linus Torvalds
b061fcffe3 Merge tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
 "Fix tracing test_multiple_writes stalls when buffer_size_kb is less
  than 12KB"

* tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/tracing: Fix test_multiple_writes stall
2026-01-10 14:57:55 -10:00
Linus Torvalds
97313d6113 Merge tag 'iommu-fixes-v6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iomu fixes from Joerg Roedel:

 - several Kconfig-related build fixes

 - fix for when gcc 8.5 on PPC refuses to inline a function from a
   header file

* tag 'iommu-fixes-v6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommupt: Make pt_feature() always_inline
  iommufd/selftest: Prevent module/builtin conflicts in kconfig
  iommufd/selftest: Add missing kconfig for DMA_SHARED_BUFFER
  iommupt: Fix the kunit building
2026-01-10 07:14:40 -10:00
Gao Xiang
7893cc1225 erofs: fix file-backed mounts no longer working on EROFS partitions
Sheng Yong reported [1] that Android APEX images didn't work with commit
072a7c7cdb ("erofs: don't bother with s_stack_depth increasing for
now") because "EROFS-formatted APEX file images can be stored within an
EROFS-formatted Android system partition."

In response, I sent a quick fat-fingered [PATCH v3] to address the
report.  Unfortunately, the updated condition was incorrect:

         if (erofs_is_fileio_mode(sbi)) {
-            sb->s_stack_depth =
-                file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
-            if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
-                erofs_err(sb, "maximum fs stacking depth exceeded");
+            inode = file_inode(sbi->dif0.file);
+            if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
+                inode->i_sb->s_stack_depth) {

The condition `!sb->s_bdev` is always true for all file-backed EROFS
mounts, making the check effectively a no-op.

The real fix tested and confirmed by Sheng Yong [2] at that time was
[PATCH v3 RESEND], which correctly ensures the following EROFS^2 setup
works:
    EROFS (on a block device) + EROFS (file-backed mount)

But sadly I screwed it up again by upstreaming the outdated [PATCH v3].

This patch applies the same logic as the delta between the upstream
[PATCH v3] and the real fix [PATCH v3 RESEND].

Reported-by: Sheng Yong <shengyong1@xiaomi.com>
Closes: https://lore.kernel.org/r/3acec686-4020-4609-aee4-5dae7b9b0093@gmail.com [1]
Fixes: 072a7c7cdb ("erofs: don't bother with s_stack_depth increasing for now")
Link: https://lore.kernel.org/r/243f57b8-246f-47e7-9fb1-27a771e8e9e8@gmail.com [2]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-10 06:39:20 -10:00
Jason Gunthorpe
6a3d5fda2c iommupt: Make pt_feature() always_inline
gcc 8.5 on powerpc does not automatically inline these functions even
though they evaluate to constants in key cases. Since the constant
propagation is essential for some code elimination and built-time checks
this causes a build failure:

 ERROR: modpost: "__pt_no_sw_bit" [drivers/iommu/generic_pt/fmt/iommu_amdv1.ko] undefined!

Caused by this:

	if (pts_feature(&pts, PT_FEAT_DMA_INCOHERENT) &&
	    !pt_test_sw_bit_acquire(&pts,
				    SW_BIT_CACHE_FLUSH_DONE))
		flush_writes_item(&pts);

Where pts_feature() evaluates to a constant false. Mark them as
__always_inline to force it to evaluate to a constant and trigger the code
elimination.

Fixes: 7c5b184db7 ("genpt: Generic Page Table base API")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512230720.9y9DtWIo-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-01-10 10:50:45 +01:00
Jason Gunthorpe
7adfd68274 iommufd/selftest: Prevent module/builtin conflicts in kconfig
The selftest now depends on the AMDv1 page table, however the selftest
kconfig itself is just an sub-option of the main IOMMUFD module kconfig.

This means it cannot be modular and so kconfig allowed a modular
IOMMU_PT_AMDV1 with a built in IOMMUFD. This causes link failures:

   ld: vmlinux.o: in function `mock_domain_alloc_pgtable.isra.0':
   selftest.c:(.text+0x12e8ad3): undefined reference to `pt_iommu_amdv1_init'
   ld: vmlinux.o: in function `BSWAP_SHUFB_CTL':
   sha1-avx2-asm.o:(.rodata+0xaa36a8): undefined reference to `pt_iommu_amdv1_read_and_clear_dirty'
   ld: sha1-avx2-asm.o:(.rodata+0xaa36f0): undefined reference to `pt_iommu_amdv1_map_pages'
   ld: sha1-avx2-asm.o:(.rodata+0xaa36f8): undefined reference to `pt_iommu_amdv1_unmap_pages'
   ld: sha1-avx2-asm.o:(.rodata+0xaa3720): undefined reference to `pt_iommu_amdv1_iova_to_phys'

Adjust the kconfig to disable IOMMUFD_TEST if IOMMU_PT_AMDV1 is incompatible.

Fixes: e93d5945ed ("iommufd: Change the selftest to use iommupt instead of xarray")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512210135.freQWpxa-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10 10:40:58 +01:00
Jason Gunthorpe
faa37ff3bf iommufd/selftest: Add missing kconfig for DMA_SHARED_BUFFER
The test doesn't build without it, dma-buf.h does not provide stub
functions if it is not enabled. Compilation can fail with:

 ERROR:root:ld: vmlinux.o: in function `iommufd_test':
 (.text+0x3b1cdd): undefined reference to `dma_buf_get'
 ld: (.text+0x3b1d08): undefined reference to `dma_buf_put'
 ld: (.text+0x3b2105): undefined reference to `dma_buf_export'
 ld: (.text+0x3b211f): undefined reference to `dma_buf_fd'
 ld: (.text+0x3b2e47): undefined reference to `dma_buf_move_notify'

Add the missing select.

Fixes: d2041f1f11 ("iommufd/selftest: Add some tests for the dmabuf flow")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10 10:40:58 +01:00
Jason Gunthorpe
cefd81e76a iommupt: Fix the kunit building
The kunit doesn't work since the below commit made GENERIC_PT
unselectable:

 $ make ARCH=x86_64 O=build_kunit_x86_64 olddefconfig
 ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config.
 This is probably due to unsatisfied dependencies.
 Missing: CONFIG_DEBUG_GENERIC_PT=y, CONFIG_IOMMUFD_TEST=y,
 CONFIG_IOMMU_PT_X86_64=y, CONFIG_GENERIC_PT=y, CONFIG_IOMMU_PT_AMDV1=y,
 CONFIG_IOMMU_PT_VTDSS=y, CONFIG_IOMMU_PT=y, CONFIG_IOMMU_PT_KUNIT_TEST=y

Also remove the unneeded CONFIG_IOMMUFD_TEST reference as the iommupt kunit
doesn't interact with iommufd, and it doesn't currently build for the
kunit due problems with DMA_SHARED buffer either.

Fixes: 01569c216d ("genpt: Make GENERIC_PT invisible")
Fixes: 1dd4187f53 ("iommupt: Add a kunit test for Generic Page Table")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10 10:40:57 +01:00
Linus Torvalds
b6151c4e60 Merge tag 'erofs-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:

 - Don't increase s_stack_depth which caused regressions in some
   composefs mount setups (EROFS + ovl^2)

   Instead just allow one extra unaccounted fs stacking level for
   straightforward cases.

* tag 'erofs-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: don't bother with s_stack_depth increasing for now
2026-01-09 19:34:50 -10:00
Gao Xiang
072a7c7cdb erofs: don't bother with s_stack_depth increasing for now
Previously, commit d53cd891f0 ("erofs: limit the level of fs stacking
for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
stack overflow when stacking an unlimited number of EROFS on top of
each other.

This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
(and such setups are already used in production for quite a long time).

One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
from 2 to 3, but proving that this is safe in general is a high bar.

After a long discussion on GitHub issues [1] about possible solutions,
one conclusion is that there is no need to support nesting file-backed
EROFS mounts on stacked filesystems, because there is always the option
to use loopback devices as a fallback.

As a quick fix for the composefs regression for this cycle, instead of
bumping `s_stack_depth` for file backed EROFS mounts, we disallow
nesting file-backed EROFS over EROFS and over filesystems with
`s_stack_depth` > 0.

This works for all known file-backed mount use cases (composefs,
containerd, and Android APEX for some Android vendors), and the fix is
self-contained.

Essentially, we are allowing one extra unaccounted fs stacking level of
EROFS below stacking filesystems, but EROFS can only be used in the read
path (i.e. overlayfs lower layers), which typically has much lower stack
usage than the write path.

We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
stack usage analysis or using alternative approaches, such as splitting
the `s_stack_depth` limitation according to different combinations of
stacking.

Fixes: d53cd891f0 ("erofs: limit the level of fs stacking for file-backed mounts")
Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
Reported-by: Timothée Ravier <tim@siosm.fr>
Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Alexander Larsson <alexl@redhat.com>
Reviewed-and-tested-by: Sheng Yong <shengyong1@xiaomi.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-01-10 13:01:15 +08:00
Linus Torvalds
cb2076b091 Merge tag 'block-6.19-20260109' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:

 - Kill unlikely checks for blk-rq-qos. These checks are really
   all-or-nothing, either the branch is taken all the time, or it's not.
   Depending on the configuration, either one of those cases may be
   true. Just remove the annotation

 - Fix for merging bios with different app tags set

 - Fix for a recently introduced slowdown due to RCU synchronization

 - Fix for a status change on loop while it's in use, and then a later
   fix for that fix

 - Fix for the async partition scanning in ublk

* tag 'block-6.19-20260109' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  ublk: fix use-after-free in ublk_partition_scan_work
  blk-mq: avoid stall during boot due to synchronize_rcu_expedited
  loop: add missing bd_abort_claiming in loop_set_status
  block: don't merge bios with different app_tags
  blk-rq-qos: Remove unlikely() hints from QoS checks
  loop: don't change loop device under exclusive opener in loop_set_status
2026-01-09 15:42:46 -10:00