linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-22 07:27:12 +08:00

Author	SHA1	Message	Date
Paulo Alcantara	12b4c5d98c	smb: client: fix krb5 mount with username option Customer reported that some of their krb5 mounts were failing against a single server as the client was trying to mount the shares with wrong credentials. It turned out the client was reusing SMB session from first mount to try mounting the other shares, even though a different username= option had been specified to the other mounts. By using username mount option along with sec=krb5 to search for principals from keytab is supported by cifs.upcall(8) since cifs-utils-4.8. So fix this by matching username mount option in match_session() even with Kerberos. For example, the second mount below should fail with -ENOKEY as there is no 'foobar' principal in keytab (/etc/krb5.keytab). The client ends up reusing SMB session from first mount to perform the second one, which is wrong. ``` $ ktutil ktutil: add_entry -password -p testuser -k 1 -e aes256-cts Password for testuser@ZELDA.TEST: ktutil: write_kt /etc/krb5.keytab ktutil: quit $ klist -ke Keytab name: FILE:/etc/krb5.keytab KVNO Principal ---- ---------------------------------------------------------------- 1 testuser@ZELDA.TEST (aes256-cts-hmac-sha1-96) $ mount.cifs //w22-root2/scratch /mnt/1 -o sec=krb5,username=testuser $ mount.cifs //w22-root2/scratch /mnt/2 -o sec=krb5,username=foobar $ mount -t cifs \| grep -Po 'username=\K\w+' testuser testuser ``` Reported-by: Oscar Santos <ossantos@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-15 20:53:09 -05:00
Linus Torvalds	2c361c9b7f	Merge tag 'ceph-for-7.0-rc4' of https://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "A small pile of CephFS and messenger bug fixes, all marked for stable" * tag 'ceph-for-7.0-rc4' of https://github.com/ceph/ceph-client: libceph: Fix potential out-of-bounds access in ceph_handle_auth_reply() libceph: Use u32 for non-negative values in ceph_monmap_decode() MAINTAINERS: update email address of Dongsheng Yang libceph: reject preamble if control segment is empty libceph: admit message frames only in CEPH_CON_S_OPEN state libceph: prevent potential out-of-bounds reads in process_message_header() ceph: do not skip the first folio of the next object in writeback ceph: fix memory leaks in ceph_mdsc_build_path() ceph: add a bunch of missing ceph_path_info initializers ceph: fix i_nlink underrun during async unlink	2026-03-13 14:03:58 -07:00
Linus Torvalds	399af66228	Merge tag 'xfs-fixes-7.0-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Carlos Maiolino: "A couple race fixes found on the new healthmon mechanism, and another flushing dquots during filesystem shutdown" * tag 'xfs-fixes-7.0-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix integer overflow in bmap intent sort comparator xfs: fix undersized l_iclog_roundoff values xfs: ensure dquot item is deleted from AIL only after log shutdown xfs: remove redundant set null for ip->i_itemp xfs: fix returned valued from xfs_defer_can_append xfs: Remove redundant NULL check after __GFP_NOFAIL xfs: fix race between healthmon unmount and read_iter xfs: remove scratch field from struct xfs_gc_bio	2026-03-13 10:49:15 -07:00
Linus Torvalds	d874ca0522	Merge tag 'v7.0-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Fix reconnect when using non-default port - Fix default retransmission behavior - Fix open handle reuse in cifs_open - Fix export for smb2-mapperror-test - Fix potential corruption on write retry - Fix potentially uninitialized superblock flags - Fix missing O_DIRECT and O_SYNC flags on create * tag 'v7.0-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: make default value of retrans as zero smb: client: fix open handle lookup in cifs_open() smb: client: fix iface port assignment in parse_server_interfaces smb/client: only export symbol for 'smb2maperror-test' module smb: client: fix in-place encryption corruption in SMB2_write() smb: client: fix sbflags initialization smb: client: fix atomic open with O_DIRECT & O_SYNC	2026-03-13 10:46:32 -07:00
Linus Torvalds	8004279c41	Merge tag 'nfs-for-7.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: - Fix NFS KConfig typos - Decrement re_receiving on the early exit paths - return EISDIR on nfs3_proc_create if d_alias is a dir * tag 'nfs-for-7.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix NFS KConfig typos xprtrdma: Decrement re_receiving on the early exit paths nfs: return EISDIR on nfs3_proc_create if d_alias is a dir	2026-03-12 12:38:17 -07:00
Linus Torvalds	e0b38d286e	Merge tag 'for-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - detect possible file name hash collision earlier so it does not lead to transaction abort - handle b-tree leaf overflows when snapshotting a subvolume with set received UUID, leading to transaction abort - in zoned mode, reorder relocation block group initialization after the transaction kthread start - fix orphan cleanup state tracking of subvolume, this could lead to invalid dentries under some conditions - add locking around updates of dynamic reclain state update - in subpage mode, add missing RCU unlock when trying to releae extent buffer - remap tree fixes: - add missing description strings for the newly added remap tree - properly update search key when iterating backrefs * tag 'for-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: remove duplicated definition of btrfs_printk_in_rcu() btrfs: remove unnecessary transaction abort in the received subvol ioctl btrfs: abort transaction on failure to update root in the received subvol ioctl btrfs: fix transaction abort on set received ioctl due to item overflow btrfs: fix transaction abort when snapshotting received subvolumes btrfs: fix transaction abort on file creation due to name hash collision btrfs: read key again after incrementing slot in move_existing_remaps() btrfs: add missing RCU unlock in error path in try_release_subpage_extent_buffer() btrfs: set BTRFS_ROOT_ORPHAN_CLEANUP during subvol create btrfs: zoned: move btrfs_zoned_reserve_data_reloc_bg() after kthread start btrfs: hold space_info->lock when clearing periodic reclaim ready btrfs: print-tree: add remap tree definitions	2026-03-12 12:15:27 -07:00
Linus Torvalds	2c7e63d702	Merge tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN and netfilter. Current release - regressions: - eth: mana: Null service_wq on setup error to prevent double destroy Previous releases - regressions: - nexthop: fix percpu use-after-free in remove_nh_grp_entry - sched: teql: fix NULL pointer dereference in iptunnel_xmit on TEQL slave xmit - bpf: fix nd_tbl NULL dereference when IPv6 is disabled - neighbour: restore protocol != 0 check in pneigh update - tipc: fix divide-by-zero in tipc_sk_filter_connect() - eth: - mlx5: - fix crash when moving to switchdev mode - fix DMA FIFO desync on error CQE SQ recovery - iavf: fix PTP use-after-free during reset - bonding: fix type confusion in bond_setup_by_slave() - lan78xx: fix WARN in __netif_napi_del_locked on disconnect Previous releases - always broken: - core: add xmit recursion limit to tunnel xmit functions - net-shapers: don't free reply skb after genlmsg_reply() - netfilter: - fix stack out-of-bounds read in pipapo_drop() - fix OOB read in nfnl_cthelper_dump_table() - mctp: - fix device leak on probe failure - i2c: fix skb memory leak in receive path - can: keep the max bitrate error at 5% - eth: - bonding: fix nd_tbl NULL dereference when IPv6 is disabled - bnxt_en: fix RSS table size check when changing ethtool channels - amd-xgbe: prevent CRC errors during RX adaptation with AN disabled - octeontx2-af: devlink: fix NIX RAS reporter recovery condition" * tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits) net: prevent NULL deref in ip[6]tunnel_xmit() octeontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt status octeontx2-af: devlink: fix NIX RAS reporter recovery condition net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support net/mana: Null service_wq on setup error to prevent double destroy selftests: rtnetlink: add neighbour update test neighbour: restore protocol != 0 check in pneigh update net: dsa: realtek: Fix LED group port bit for non-zero LED group tipc: fix divide-by-zero in tipc_sk_filter_connect() net: dsa: microchip: Fix error path in PTP IRQ setup bpf: bpf_out_neigh_v6: Fix nd_tbl NULL dereference when IPv6 is disabled bpf: bpf_out_neigh_v4: Fix nd_tbl NULL dereference when IPv6 is disabled net: bonding: Fix nd_tbl NULL dereference when IPv6 is disabled ipv6: move the disable_ipv6_mod knob to core code net: bcmgenet: fix broken EEE by converting to phylib-managed state net-shapers: don't free reply skb after genlmsg_reply() net: dsa: mxl862xx: don't set user_mii_bus net: ethernet: arc: emac: quiesce interrupts before requesting IRQ page_pool: store detach_time as ktime_t to avoid false-negatives net: macb: Shuffle the tx ring before enabling tx ...	2026-03-12 11:33:35 -07:00
Shyam Prasad N	e3beefd3af	cifs: make default value of retrans as zero When retrans mount option was introduced, the default value was set as 1. However, in the light of some bugs that this has exposed recently we should change it to 0 and retain the old behaviour before this option was introduced. Cc: <stable@vger.kernel.org> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-11 18:46:42 -05:00
Paulo Alcantara	40e75e42f4	smb: client: fix open handle lookup in cifs_open() When looking up open handles to be re-used in cifs_open(), calling cifs_get_{writable,readable}_path() is wrong as it will look up for the first matching open handle, and if @file->f_flags doesn't match, it will ignore the remaining open handles in cifsInodeInfo::openFileList that might potentially match @file->f_flags. For writable and readable handles, fix this by calling __cifs_get_writable_file() and __find_readable_file(), respectively, with FIND_OPEN_FLAGS set. With the patch, the following program ends up with two opens instead of three sent over the wire. ``` #define _GNU_SOURCE #include <unistd.h> #include <string.h> #include <fcntl.h> int main(int argc, char *argv[]) { int fd; fd = open("/mnt/1/foo", O_CREAT \| O_WRONLY \| O_TRUNC, 0664); close(fd); fd = open("/mnt/1/foo", O_DIRECT \| O_WRONLY); close(fd); fd = open("/mnt/1/foo", O_WRONLY); close(fd); fd = open("/mnt/1/foo", O_DIRECT \| O_WRONLY); close(fd); return 0; } ``` ``` $ mount.cifs //srv/share /mnt/1 -o ... $ gcc test.c && ./a.out ``` Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-11 18:46:40 -05:00
Henrique Carvalho	d4c7210d2f	smb: client: fix iface port assignment in parse_server_interfaces parse_server_interfaces() initializes interface socket addresses with CIFS_PORT. When the mount uses a non-default port this overwrites the configured destination port. Later, cifs_chan_update_iface() copies this sockaddr into server->dstaddr, causing reconnect attempts to use the wrong port after server interface updates. Use the existing port from server->dstaddr instead. Cc: stable@vger.kernel.org Fixes: `fe856be475` ("CIFS: parse and store info on iface queries") Tested-by: Dr. Thomas Orgis <thomas.orgis@uni-hamburg.de> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-11 18:46:28 -05:00
Long Li	362c490980	xfs: fix integer overflow in bmap intent sort comparator xfs_bmap_update_diff_items() sorts bmap intents by inode number using a subtraction of two xfs_ino_t (uint64_t) values, with the result truncated to int. This is incorrect when two inode numbers differ by more than INT_MAX (2^31 - 1), which is entirely possible on large XFS filesystems. Fix this by replacing the subtraction with cmp_int(). Cc: <stable@vger.kernel.org> # v4.9 Fixes: `9f3afb57d5` ("xfs: implement deferred bmbt map/unmap operations") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-11 13:21:42 +01:00
Ye Bin	88d37abb36	smb/client: only export symbol for 'smb2maperror-test' module Only export smb2_get_err_map_test smb2_error_map_table_test and smb2_error_map_num symbol for 'smb2maperror-test' module. Fixes: `7d0bf050a5` ("smb/client: make SMB2 maperror KUnit tests a separate module") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-10 17:22:04 -05:00
Bharath SM	d78840a6a3	smb: client: fix in-place encryption corruption in SMB2_write() SMB2_write() places write payload in iov[1..n] as part of rq_iov. smb3_init_transform_rq() pointer-shares rq_iov, so crypt_message() encrypts iov[1] in-place, replacing the original plaintext with ciphertext. On a replayable error, the retry sends the same iov[1] which now contains ciphertext instead of the original data, resulting in corruption. The corruption is most likely to be observed when connections are unstable, as reconnects trigger write retries that re-send the already-encrypted data. This affects SFU mknod, MF symlinks, etc. On kernels before 6.10 (prior to the netfs conversion), sync writes also used this path and were similarly affected. The async write path wasn't unaffected as it uses rq_iter which gets deep-copied. Fix by moving the write payload into rq_iter via iov_iter_kvec(), so smb3_init_transform_rq() deep-copies it before encryption. Cc: stable@vger.kernel.org #6.3+ Acked-by: Henrique Carvalho <henrique.carvalho@suse.com> Acked-by: Shyam Prasad N <sprasad@microsoft.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-10 17:22:03 -05:00
Arnd Bergmann	fae11330dc	smb: client: fix sbflags initialization The newly introduced variable is initialized in an #ifdef block but used outside of it, leading to undefined behavior when CONFIG_CIFS_ALLOW_INSECURE_LEGACY is disabled: fs/smb/client/dir.c:417:9: error: variable 'sbflags' is uninitialized when used here [-Werror,-Wuninitialized] 417 \| if (sbflags & CIFS_MOUNT_DYNPERM) \| ^~~~~~~ Move the initialization into the declaration, the same way as the other similar function do it. Fixes: `4fc3a433c1` ("smb: client: use atomic_t for mnt_cifs_flags") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-10 17:22:03 -05:00
Paulo Alcantara	4a7d2729dc	smb: client: fix atomic open with O_DIRECT & O_SYNC When user application requests O_DIRECT\|O_SYNC along with O_CREAT on open(2), CREATE_NO_BUFFER and CREATE_WRITE_THROUGH bits were missed in CREATE request when performing an atomic open, thus leading to potentially data integrity issues. Fix this by setting those missing bits in CREATE request when O_DIRECT\|O_SYNC has been specified in cifs_do_create(). Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Henrique Carvalho <henrique.carvalho@suse.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-10 17:21:42 -05:00
Darrick J. Wong	52a8a1ba88	xfs: fix undersized l_iclog_roundoff values If the superblock doesn't list a log stripe unit, we set the incore log roundoff value to 512. This leads to corrupt logs and unmountable filesystems in generic/617 on a disk with 4k physical sectors... XFS (sda1): Mounting V5 Filesystem ff3121ca-26e6-4b77-b742-aaff9a449e1c XFS (sda1): Torn write (CRC failure) detected at log block 0x318e. Truncating head block from 0x3197. XFS (sda1): failed to locate log tail XFS (sda1): log mount/recovery failed: error -74 XFS (sda1): log mount failed XFS (sda1): Mounting V5 Filesystem ff3121ca-26e6-4b77-b742-aaff9a449e1c XFS (sda1): Ending clean mount ...on the current xfsprogs for-next which has a broken mkfs. xfs_info shows this... meta-data=/dev/sda1 isize=512 agcount=4, agsize=644992 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 = exchange=1 metadir=1 data = bsize=4096 blocks=2579968, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=1 log =internal log bsize=4096 blocks=16384, version=2 = sectsz=4096 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 = rgcount=0 rgsize=268435456 extents = zoned=0 start=0 reserved=0 ...observe that the log section has sectsz=4096 sunit=0, which means that the roundoff factor is 512, not 4096 as you'd expect. We should fix mkfs not to generate broken filesystems, but anyone can fuzz the ondisk superblock so we should be more cautious. I think the inadequate logic predates commit `a6a65fef5e`, but that's clearly going to require a different backport. Cc: stable@vger.kernel.org # v5.14 Fixes: `a6a65fef5e` ("xfs: log stripe roundoff is a property of the log") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-10 16:19:31 +01:00
Long Li	186ac39b8a	xfs: ensure dquot item is deleted from AIL only after log shutdown In xfs_qm_dqflush(), when a dquot flush fails due to corruption (the out_abort error path), the original code removed the dquot log item from the AIL before calling xfs_force_shutdown(). This ordering introduces a subtle race condition that can lead to data loss after a crash. The AIL tracks the oldest dirty metadata in the journal. The position of the tail item in the AIL determines the log tail LSN, which is the oldest LSN that must be preserved for crash recovery. When an item is removed from the AIL, the log tail can advance past the LSN of that item. The race window is as follows: if the dquot item happens to be at the tail of the log, removing it from the AIL allows the log tail to advance. If a concurrent log write is sampling the tail LSN at the same time and subsequently writes a complete checkpoint (i.e., one containing a commit record) to disk before the shutdown takes effect, the journal will no longer protect the dquot's last modification. On the next mount, log recovery will not replay the dquot changes, even though they were never written back to disk, resulting in silent data loss. Fix this by calling xfs_force_shutdown() before xfs_trans_ail_delete() in the out_abort path. Once the log is shut down, no new log writes can complete with an updated tail LSN, making it safe to remove the dquot item from the AIL. Cc: stable@vger.kernel.org Fixes: `b707fffda6` ("xfs: abort consistently on dquot flush failure") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-10 09:40:38 +01:00
Long Li	f1d77b863b	xfs: remove redundant set null for ip->i_itemp ip->i_itemp has been set null in xfs_inode_item_destroy(), so there is no need set it null again in xfs_inode_free_callback(). Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-10 09:39:53 +01:00
Hristo Venev	081a0b78ef	ceph: do not skip the first folio of the next object in writeback When `ceph_process_folio_batch` encounters a folio past the end of the current object, it should leave it in the batch so that it is picked up in the next iteration. Removing the folio from the batch means that it does not get written back and remains dirty instead. This makes `fsync()` silently skip some of the data, delays capability release, and breaks coherence with `O_DIRECT`. The link below contains instructions for reproducing the bug. Cc: stable@vger.kernel.org Fixes: `ce80b76dd3` ("ceph: introduce ceph_process_folio_batch() method") Link: https://tracker.ceph.com/issues/75156 Signed-off-by: Hristo Venev <hristo@venev.name> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-03-09 12:34:40 +01:00
Max Kellermann	040d159a45	ceph: fix memory leaks in ceph_mdsc_build_path() Add __putname() calls to error code paths that did not free the "path" pointer obtained by __getname(). If ownership of this pointer is not passed to the caller via path_info.path, the function must free it before returning. Cc: stable@vger.kernel.org Fixes: `3fd945a79e` ("ceph: encode encrypted name in ceph_mdsc_build_path and dentry release") Fixes: `550f7ca98e` ("ceph: give up on paths longer than PATH_MAX") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-03-09 12:34:40 +01:00
Max Kellermann	43323a5934	ceph: add a bunch of missing ceph_path_info initializers ceph_mdsc_build_path() must be called with a zero-initialized ceph_path_info parameter, or else the following ceph_mdsc_free_path_info() may crash. Example crash (on Linux 6.18.12): virt_to_cache: Object is not a Slab page! WARNING: CPU: 184 PID: `2871736` at mm/slub.c:6732 kmem_cache_free+0x316/0x400 [...] Call Trace: [...] ceph_open+0x13d/0x3e0 do_dentry_open+0x134/0x480 vfs_open+0x2a/0xe0 path_openat+0x9a3/0x1160 [...] cache_from_obj: Wrong slab cache. names_cache but object is from ceph_inode_info WARNING: CPU: 184 PID: `2871736` at mm/slub.c:6746 kmem_cache_free+0x2dd/0x400 [...] kernel BUG at mm/slub.c:634! Oops: invalid opcode: 0000 [#1] SMP NOPTI RIP: 0010:__slab_free+0x1a4/0x350 Some of the ceph_mdsc_build_path() callers had initializers, but others had not, even though they were all added by commit `15f519e9f8` ("ceph: fix race condition validating r_parent before applying state"). The ones without initializer are suspectible to random crashes. (I can imagine it could even be possible to exploit this bug to elevate privileges.) Unfortunately, these Ceph functions are undocumented and its semantics can only be derived from the code. I see that ceph_mdsc_build_path() initializes the structure only on success, but not on error. Calling ceph_mdsc_free_path_info() after a failed ceph_mdsc_build_path() call does not even make sense, but that's what all callers do, and for it to be safe, the structure must be zero-initialized. The least intrusive approach to fix this is therefore to add initializers everywhere. Cc: stable@vger.kernel.org Fixes: `15f519e9f8` ("ceph: fix race condition validating r_parent before applying state") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-03-09 12:34:40 +01:00
Max Kellermann	ce0123cbb4	ceph: fix i_nlink underrun during async unlink During async unlink, we drop the `i_nlink` counter before we receive the completion (that will eventually update the `i_nlink`) because "we assume that the unlink will succeed". That is not a bad idea, but it races against deletions by other clients (or against the completion of our own unlink) and can lead to an underrun which emits a WARNING like this one: WARNING: CPU: 85 PID: 25093 at fs/inode.c:407 drop_nlink+0x50/0x68 Modules linked in: CPU: 85 UID: 3221252029 PID: 25093 Comm: php-cgi8.1 Not tainted 6.14.11-cm4all1-ampere #655 Hardware name: Supermicro ARS-110M-NR/R12SPD-A, BIOS 1.1b 10/17/2023 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : drop_nlink+0x50/0x68 lr : ceph_unlink+0x6c4/0x720 sp : ffff80012173bc90 x29: ffff80012173bc90 x28: ffff086d0a45aaf8 x27: ffff0871d0eb5680 x26: ffff087f2a64a718 x25: 0000020000000180 x24: 0000000061c88647 x23: 0000000000000002 x22: ffff07ff9236d800 x21: 0000000000001203 x20: ffff07ff9237b000 x19: ffff088b8296afc0 x18: 00000000f3c93365 x17: 0000000000070000 x16: ffff08faffcbdfe8 x15: ffff08faffcbdfec x14: 0000000000000000 x13: 45445f65645f3037 x12: 34385f6369706f74 x11: 0000a2653104bb20 x10: ffffd85f26d73290 x9 : ffffd85f25664f94 x8 : 00000000000000c0 x7 : 0000000000000000 x6 : 0000000000000002 x5 : 0000000000000081 x4 : 0000000000000481 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff08727d3f91e8 Call trace: drop_nlink+0x50/0x68 (P) vfs_unlink+0xb0/0x2e8 do_unlinkat+0x204/0x288 __arm64_sys_unlinkat+0x3c/0x80 invoke_syscall.constprop.0+0x54/0xe8 do_el0_svc+0xa4/0xc8 el0_svc+0x18/0x58 el0t_64_sync_handler+0x104/0x130 el0t_64_sync+0x154/0x158 In ceph_unlink(), a call to ceph_mdsc_submit_request() submits the CEPH_MDS_OP_UNLINK to the MDS, but does not wait for completion. Meanwhile, between this call and the following drop_nlink() call, a worker thread may process a CEPH_CAP_OP_IMPORT, CEPH_CAP_OP_GRANT or just a CEPH_MSG_CLIENT_REPLY (the latter of which could be our own completion). These will lead to a set_nlink() call, updating the `i_nlink` counter to the value received from the MDS. If that new `i_nlink` value happens to be zero, it is illegal to decrement it further. But that is exactly what ceph_unlink() will do then. The WARNING can be reproduced this way: 1. Force async unlink; only the async code path is affected. Having no real clue about Ceph internals, I was unable to find out why the MDS wouldn't give me the "Fxr" capabilities, so I patched get_caps_for_async_unlink() to always succeed. (Note that the WARNING dump above was found on an unpatched kernel, without this kludge - this is not a theoretical bug.) 2. Add a sleep call after ceph_mdsc_submit_request() so the unlink completion gets handled by a worker thread before drop_nlink() is called. This guarantees that the `i_nlink` is already zero before drop_nlink() runs. The solution is to skip the counter decrement when it is already zero, but doing so without a lock is still racy (TOCTOU). Since ceph_fill_inode() and handle_cap_grant() both hold the `ceph_inode_info.i_ceph_lock` spinlock while set_nlink() runs, this seems like the proper lock to protect the `i_nlink` updates. I found prior art in NFS and SMB (using `inode.i_lock`) and AFS (using `afs_vnode.cb_lock`). All three have the zero check as well. Cc: stable@vger.kernel.org Fixes: `2ccb45462a` ("ceph: perform asynchronous unlink if we have sufficient caps") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-03-09 12:34:40 +01:00
Thorsten Blum	441336115d	ksmbd: Don't log keys in SMB3 signing and encryption key generation When KSMBD_DEBUG_AUTH logging is enabled, generate_smb3signingkey() and generate_smb3encryptionkey() log the session, signing, encryption, and decryption key bytes. Remove the logs to avoid exposing credentials. Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-08 21:28:39 -05:00
Marios Makassikis	1e689a5617	smb: server: fix use-after-free in smb2_open() The opinfo pointer obtained via rcu_dereference(fp->f_opinfo) is dereferenced after rcu_read_unlock(), creating a use-after-free window. Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-08 21:28:39 -05:00
Namjae Jeon	eac3361e3d	ksmbd: fix use-after-free in smb_lazy_parent_lease_break_close() opinfo pointer obtained via rcu_dereference(fp->f_opinfo) is being accessed after rcu_read_unlock() has been called. This creates a race condition where the memory could be freed by a concurrent writer between the unlock and the subsequent pointer dereferences (opinfo->is_lease, etc.), leading to a use-after-free. Fixes: `5fb282ba4f` ("ksmbd: fix possible null-deref in smb_lazy_parent_lease_break_close") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-08 21:28:39 -05:00
Namjae Jeon	1dfd062caa	ksmbd: fix use-after-free by using call_rcu() for oplock_info ksmbd currently frees oplock_info immediately using kfree(), even though it is accessed under RCU read-side critical sections in places like opinfo_get() and proc_show_files(). Since there is no RCU grace period delay between nullifying the pointer and freeing the memory, a reader can still access oplock_info structure after it has been freed. This can leads to a use-after-free especially in opinfo_get() where atomic_inc_not_zero() is called on already freed memory. Fix this by switching to deferred freeing using call_rcu(). Fixes: `18b4fac5ef` ("ksmbd: fix use-after-free in smb_break_all_levII_oplock()") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-08 21:28:39 -05:00
Ali Khaledi	40955015fa	ksmbd: fix use-after-free in proc_show_files due to early rcu_read_unlock The opinfo pointer obtained via rcu_dereference(fp->f_opinfo) is dereferenced after rcu_read_unlock(), creating a use-after-free window. A concurrent opinfo_put() can free the opinfo between the unlock and the subsequent access to opinfo->is_lease, opinfo->o_lease->state, and opinfo->level. Fix this by deferring rcu_read_unlock() until after all opinfo field accesses are complete. The values needed (const_names, count, level) are copied into local variables under the RCU read lock, and the potentially-sleeping seq_printf calls happen after the lock is released. Found by AI-assisted code review (Claude Opus 4.6, Anthropic) in collaboration with Ali Khaledi. Cc: stable@vger.kernel.org Fixes: `b38f99c121` ("ksmbd: add procfs interface for runtime monitoring and statistics") Signed-off-by: Ali Khaledi <ali.khaledi1989@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-08 21:28:39 -05:00
Guenter Roeck	c15e7c62fe	smb/server: Fix another refcount leak in smb2_open() If ksmbd_override_fsids() fails, we jump to err_out2. At that point, fp is NULL because it hasn't been assigned dh_info.fp yet, so ksmbd_fd_put(work, fp) will not be called. However, dh_info.fp was already inserted into the session file table by ksmbd_reopen_durable_fd(), so it will leak in the session file table until the session is closed. Move fp = dh_info.fp; ahead of the ksmbd_override_fsids() check to fix the problem. Found by an experimental AI code review agent at Google. Fixes: `c8efcc7861` ("ksmbd: add support for durable handles v1/v2") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-08 21:28:38 -05:00
Miaoqian Lin	4245a79003	rxrpc, afs: Fix missing error pointer check after rxrpc_kernel_lookup_peer() rxrpc_kernel_lookup_peer() can also return error pointers in addition to NULL, so just checking for NULL is not sufficient. Fix this by: (1) Changing rxrpc_kernel_lookup_peer() to return -ENOMEM rather than NULL on allocation failure. (2) Making the callers in afs use IS_ERR() and PTR_ERR() to pass on the error code returned. Fixes: `72904d7b9b` ("rxrpc, afs: Allow afs to pin rxrpc_peer objects") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/368272.1772713861@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-06 17:49:52 -08:00
Linus Torvalds	e0c505cb76	Merge tag 'v7.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Fix potential oops on open failure - Fix unmount to better free deferred closes - Use proper constant-time MAC comparison function - Two buffer allocation size fixes - Two minor cleanups - make SMB2 kunit tests a distinct module * tag 'v7.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix oops due to uninitialised var in smb2_unlink() cifs: open files should not hold ref on superblock smb: client: Compare MACs in constant time smb/client: remove unused SMB311_posix_query_info() smb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info() smb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op() smb: update some doc references smb/client: make SMB2 maperror KUnit tests a separate module	2026-03-06 16:07:22 -08:00
Carlos Maiolino	54fcd2f95f	xfs: fix returned valued from xfs_defer_can_append xfs_defer_can_append returns a bool, it shouldn't be returning a NULL. Found by code inspection. Fixes: `4dffb2cbb4` ("xfs: allow pausing of pending deferred work items") Cc: <stable@vger.kernel.org> # v6.8 Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Souptick Joarder <souptick.joarder@hpe.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-06 09:30:07 +01:00
Paulo Alcantara	048efe129a	smb: client: fix oops due to uninitialised var in smb2_unlink() If SMB2_open_init() or SMB2_close_init() fails (e.g. reconnect), the iovs set @rqst will be left uninitialised, hence calling SMB2_open_free(), SMB2_close_free() or smb2_set_related() on them will oops. Fix this by initialising @close_iov and @open_iov before setting them in @rqst. Reported-by: Thiago Becker <tbecker@redhat.com> Fixes: `1cf9f2a6a5` ("smb: client: handle unlink(2) of files open by different clients") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-05 20:41:16 -06:00
Linus Torvalds	5ee8dbf546	Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux Pull fsverity fix from Eric Biggers: "Prevent CONFIG_FS_VERITY from being enabled when the page size is 256K, since it doesn't work in that case" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: add dependency on 64K or smaller pages	2026-03-05 11:52:03 -08:00
hongao	281cb17787	xfs: Remove redundant NULL check after __GFP_NOFAIL kzalloc() is called with __GFP_NOFAIL, so a NULL return is not expected. Drop the redundant !map check in xfs_dabuf_map(). Also switch the nirecs-sized allocation to kcalloc(). Signed-off-by: hongao <hongao@uniontech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-05 10:02:45 +01:00
Linus Torvalds	0b3bb20580	Merge tag 'vfs-7.0-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - kthread: consolidate kthread exit paths to prevent use-after-free - iomap: - don't mark folio uptodate if read IO has bytes pending - don't report direct-io retries to fserror - reject delalloc mappings during writeback - ns: tighten visibility checks - netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence * tag 'vfs-7.0-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: reject delalloc mappings during writeback iomap: don't mark folio uptodate if read IO has bytes pending selftests: fix mntns iteration selftests nstree: tighten permission checks for listing nsfs: tighten permission checks for handle opening nsfs: tighten permission checks for ns iteration ioctls netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence kthread: consolidate kthread exit paths to prevent use-after-free iomap: don't report direct-io retries to fserror	2026-03-04 15:03:16 -08:00
Shyam Prasad N	340cea84f6	cifs: open files should not hold ref on superblock Today whenever we deal with a file, in addition to holding a reference on the dentry, we also get a reference on the superblock. This happens in two cases: 1. when a new cinode is allocated 2. when an oplock break is being processed The reasoning for holding the superblock ref was to make sure that when umount happens, if there are users of inodes and dentries, it does not try to clean them up and wait for the last ref to superblock to be dropped by last of such users. But the side effect of doing that is that umount silently drops a ref on the superblock and we could have deferred closes and lease breaks still holding these refs. Ideally, we should ensure that all of these users of inodes and dentries are cleaned up at the time of umount, which is what this code is doing. This code change allows these code paths to use a ref on the dentry (and hence the inode). That way, umount is ensured to clean up SMB client resources when it's the last ref on the superblock (For ex: when same objects are shared). The code change also moves the call to close all the files in deferred close list to the umount code path. It also waits for oplock_break workers to be flushed before calling kill_anon_super (which eventually frees up those objects). Fixes: `24261fc23d` ("cifs: delay super block destruction until all cifsFileInfo objects are gone") Fixes: `705c79101c` ("smb: client: fix use-after-free in cifs_oplock_break") Cc: <stable@vger.kernel.org> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-04 10:11:39 -06:00
Darrick J. Wong	d320f160aa	iomap: reject delalloc mappings during writeback Filesystems should never provide a delayed allocation mapping to writeback; they're supposed to allocate the space before replying. This can lead to weird IO errors and crashes in the block layer if the filesystem is being malicious, or if it hadn't set iomap->dev because it's a delalloc mapping. Fix this by failing writeback on delalloc mappings. Currently no filesystems actually misbehave in this manner, but we ought to be stricter about things like that. Cc: stable@vger.kernel.org # v5.5 Fixes: `598ecfbaa7` ("iomap: lift the xfs writeback code to iomap") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20260302173002.GL13829@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-04 14:31:56 +01:00
Joanne Koong	debc1a492b	iomap: don't mark folio uptodate if read IO has bytes pending If a folio has ifs metadata attached to it and the folio is partially read in through an async IO helper with the rest of it then being read in through post-EOF zeroing or as inline data, and the helper successfully finishes the read first, then post-EOF zeroing / reading inline will mark the folio as uptodate in iomap_set_range_uptodate(). This is a problem because when the read completion path later calls iomap_read_end(), it will call folio_end_read(), which sets the uptodate bit using XOR semantics. Calling folio_end_read() on a folio that was already marked uptodate clears the uptodate bit. Fix this by not marking the folio as uptodate if the read IO has bytes pending. The folio uptodate state will be set in the read completion path through iomap_end_read() -> folio_end_read(). Reported-by: Wei Gao <wegao@suse.com> Suggested-by: Sasha Levin <sashal@kernel.org> Tested-by: Wei Gao <wegao@suse.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: stable@vger.kernel.org # v6.19 Link: https://lore.kernel.org/linux-fsdevel/aYbmy8JdgXwsGaPP@autotest-wegao.qe.prg2.suse.org/ Fixes: `b2f35ac414` ("iomap: add caller-provided callbacks for read and readahead") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260303233420.874231-2-joannelkoong@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-04 14:18:54 +01:00
Darrick J. Wong	0ca1a8331c	xfs: fix race between healthmon unmount and read_iter xfs/1879 on one of my test VMs got stuck due to the xfs_io healthmon subcommand sleeping in wait_event_interruptible at: xfs_healthmon_read_iter+0x558/0x5f8 [xfs] vfs_read+0x248/0x320 ksys_read+0x78/0x120 Looking at xfs_healthmon_read_iter, in !O_NONBLOCK mode it will sleep until the mount cookie == DETACHED_MOUNT_COOKIE, there are events waiting to be formatted, or there are formatted events in the read buffer that could be copied to userspace. Poking into the running kernel, I see that there are zero events in the list, the read buffer is empty, and the mount cookie is indeed in DETACHED state. IOWs, xfs_healthmon_has_eventdata should have returned true, but instead we're asleep waiting for a wakeup. I think what happened here is that xfs_healthmon_read_iter and xfs_healthmon_unmount were racing with each other, and _read_iter lost the race. _unmount queued an unmount event, which woke up _read_iter. It found, formatted, and copied the event out to userspace. That cleared out the pending event list and emptied the read buffer. xfs_io then called read() again, so _has_eventdata decided that we should sleep on the empty event queue. Next, _unmount called xfs_healthmon_detach, which set the mount cookie to DETACHED. Unfortunately, it didn't call wake_up_all on the hm, so the wait_event_interruptible in the _read_iter thread remains asleep. That's why the test stalled. Fix this by moving the wake_up_all call to xfs_healthmon_detach. Fixes: `b3a289a2a9` ("xfs: create event queuing, formatting, and discovery infrastructure") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-04 10:11:47 +01:00
Damien Le Moal	6270b8ac2f	xfs: remove scratch field from struct xfs_gc_bio The scratch field in struct xfs_gc_bio is unused. Remove it. Fixes: `102f444b57` ("xfs: rework zone GC buffer management") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2026-03-04 09:34:12 +01:00
Eric Biggers	26bc83b88b	smb: client: Compare MACs in constant time To prevent timing attacks, MAC comparisons need to be constant-time. Replace the memcmp() with the correct function, crypto_memneq(). Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-03 20:56:36 -06:00
ZhangGuoDong	8098179dc9	smb/client: remove unused SMB311_posix_query_info() It is currently unused, as now we are doing compounding instead (see smb2_query_path_info()). Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-03 18:03:56 -06:00
ZhangGuoDong	9621b996e4	smb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info() SMB311_posix_query_info() is currently unused, but it may still be used in some stable versions, so these changes are submitted as a separate patch. Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer, so the allocated buffer matches the actual struct size. Fixes: `b1bc1874b8` ("smb311: Add support for SMB311 query info (non-compounded)") Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-03 18:03:56 -06:00
ZhangGuoDong	12c43a062a	smb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op() Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer, so the allocated buffer matches the actual struct size. Fixes: `6a5f6592a0` ("SMB311: Add support for query info using posix extensions (level 100)") Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-03-03 18:03:56 -06:00
Linus Torvalds	c44db6c820	Merge tag 'for-7.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "One-liner or short fixes for minor/moderate problems reported recently: - fixes or level adjustments of error messages - fix leaked transaction handles after aborted transactions, when using the remap tree feature - fix a few leaked chunk maps after errors - fix leaked page array in io_uring encoded read if an error occurs and the 'finished' is not called - fix double release of reserved extents when doing a range COW - don't commit super block when the filesystem is in shutdown state - fix squota accounting condition when checking members vs parent usage - other error handling fixes" * tag 'for-7.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: check block group lookup in remove_range_from_remap_tree() btrfs: fix transaction handle leaks in btrfs_last_identity_remap_gone() btrfs: fix chunk map leak in btrfs_map_block() after btrfs_translate_remap() btrfs: fix chunk map leak in btrfs_map_block() after btrfs_chunk_map_num_copies() btrfs: fix compat mask in error messages in btrfs_check_features() btrfs: print correct subvol num if active swapfile prevents deletion btrfs: fix warning in scrub_verify_one_metadata() btrfs: fix objectid value in error message in check_extent_data_ref() btrfs: fix incorrect key offset in error message in check_dev_extent_item() btrfs: fix error message order of parameters in btrfs_delete_delayed_dir_index() btrfs: don't commit the super block when unmounting a shutdown filesystem btrfs: free pages on error in btrfs_uring_read_extent() btrfs: fix referenced/exclusive check in squota_check_parent_usage() btrfs: remove pointless WARN_ON() in cache_save_setup() btrfs: convert log messages to error level in btrfs_replay_log() btrfs: remove btrfs_handle_fs_error() after failure to recover log trees btrfs: remove redundant warning message in btrfs_check_uuid_tree() btrfs: change warning messages to error level in open_ctree() btrfs: fix a double release on reserved extents in cow_one_range() btrfs: handle discard errors in in btrfs_finish_extent_commit()	2026-03-03 09:08:00 -08:00
Filipe Manana	0749cab617	btrfs: remove duplicated definition of btrfs_printk_in_rcu() It's defined twice in a row for the !CONFIG_PRINTK case, so remove one of the duplicates. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-03-03 17:20:51 +01:00
Filipe Manana	8dd0e6807b	btrfs: remove unnecessary transaction abort in the received subvol ioctl If we fail to remove an item from the uuid tree, we don't need to abort the transaction since we have not done any change before. So remove that transaction abort. Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-03-03 17:20:39 +01:00
Filipe Manana	0f475ee0eb	btrfs: abort transaction on failure to update root in the received subvol ioctl If we failed to update the root we don't abort the transaction, which is wrong since we already used the transaction to remove an item from the uuid tree. Fixes: `dd5f9615fc` ("Btrfs: maintain subvolume items in the UUID tree") CC: stable@vger.kernel.org # 3.12+ Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-03-03 17:03:59 +01:00
Filipe Manana	87f2c46003	btrfs: fix transaction abort on set received ioctl due to item overflow If the set received ioctl fails due to an item overflow when attempting to add the BTRFS_UUID_KEY_RECEIVED_SUBVOL we have to abort the transaction since we did some metadata updates before. This means that if a user calls this ioctl with the same received UUID field for a lot of subvolumes, we will hit the overflow, trigger the transaction abort and turn the filesystem into RO mode. A malicious user could exploit this, and this ioctl does not even requires that a user has admin privileges (CAP_SYS_ADMIN), only that he/she owns the subvolume. Fix this by doing an early check for item overflow before starting a transaction. This is also race safe because we are holding the subvol_sem semaphore in exclusive (write) mode. A test case for fstests will follow soon. Fixes: `dd5f9615fc` ("Btrfs: maintain subvolume items in the UUID tree") CC: stable@vger.kernel.org # 3.12+ Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-03-03 17:03:59 +01:00
Filipe Manana	e1b18b9590	btrfs: fix transaction abort when snapshotting received subvolumes Currently a user can trigger a transaction abort by snapshotting a previously received snapshot a bunch of times until we reach a BTRFS_UUID_KEY_RECEIVED_SUBVOL item overflow (the maximum item size we can store in a leaf). This is very likely not common in practice, but if it happens, it turns the filesystem into RO mode. The snapshot, send and set_received_subvol and subvol_setflags (used by receive) don't require CAP_SYS_ADMIN, just inode_owner_or_capable(). A malicious user could use this to turn a filesystem into RO mode and disrupt a system. Reproducer script: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi # Use smallest node size to make the test faster. mkfs.btrfs -f --nodesize 4K $DEV mount $DEV $MNT # Create a subvolume and set it to RO so that it can be used for send. btrfs subvolume create $MNT/sv touch $MNT/sv/foo btrfs property set $MNT/sv ro true # Send and receive the subvolume into snaps/sv. mkdir $MNT/snaps btrfs send $MNT/sv \| btrfs receive $MNT/snaps # Now snapshot the received subvolume, which has a received_uuid, a # lot of times to trigger the leaf overflow. total=500 for ((i = 1; i <= $total; i++)); do echo -ne "\rCreating snapshot $i/$total" btrfs subvolume snapshot -r $MNT/snaps/sv $MNT/snaps/sv_$i > /dev/null done echo umount $MNT When running the test: $ ./test.sh (...) Create subvolume '/mnt/sdi/sv' At subvol /mnt/sdi/sv At subvol sv Creating snapshot 496/500ERROR: Could not create subvolume: Value too large for defined data type Creating snapshot 497/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 498/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 499/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 500/500ERROR: Could not create subvolume: Read-only file system And in dmesg/syslog: $ dmesg (...) [251067.627338] BTRFS warning (device sdi): insert uuid item failed -75 (0x4628b21c4ac8d898, 0x2598bee2b1515c91) type 252! [251067.629212] ------------[ cut here ]------------ [251067.630033] BTRFS: Transaction aborted (error -75) [251067.630871] WARNING: fs/btrfs/transaction.c:1907 at create_pending_snapshot.cold+0x52/0x465 [btrfs], CPU#10: btrfs/615235 [251067.632851] Modules linked in: btrfs dm_zero (...) [251067.644071] CPU: 10 UID: 0 PID: 615235 Comm: btrfs Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full) [251067.646165] Tainted: [W]=WARN [251067.646733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [251067.648735] RIP: 0010:create_pending_snapshot.cold+0x55/0x465 [btrfs] [251067.649984] Code: f0 48 0f (...) [251067.653313] RSP: 0018:ffffce644908fae8 EFLAGS: 00010292 [251067.653987] RAX: 00000000ffffff01 RBX: ffff8e5639e63a80 RCX: 00000000ffffffd3 [251067.655042] RDX: ffff8e53faa76b00 RSI: 00000000ffffffb5 RDI: ffffffffc0919750 [251067.656077] RBP: ffffce644908fbd8 R08: 0000000000000000 R09: ffffce644908f820 [251067.657068] R10: ffff8e5adc1fffa8 R11: 0000000000000003 R12: ffff8e53c0431bd0 [251067.658050] R13: ffff8e5414593600 R14: ffff8e55efafd000 R15: 00000000ffffffb5 [251067.659019] FS: 00007f2a4944b3c0(0000) GS:ffff8e5b27dae000(0000) knlGS:0000000000000000 [251067.660115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [251067.660943] CR2: 00007ffc5aa57898 CR3: 00000005813a2003 CR4: 0000000000370ef0 [251067.661972] Call Trace: [251067.662292] <TASK> [251067.662653] create_pending_snapshots+0x97/0xc0 [btrfs] [251067.663413] btrfs_commit_transaction+0x26e/0xc00 [btrfs] [251067.664257] ? btrfs_qgroup_convert_reserved_meta+0x35/0x390 [btrfs] [251067.665238] ? _raw_spin_unlock+0x15/0x30 [251067.665837] ? record_root_in_trans+0xa2/0xd0 [btrfs] [251067.666531] btrfs_mksubvol+0x330/0x580 [btrfs] [251067.667145] btrfs_mksnapshot+0x74/0xa0 [btrfs] [251067.667827] __btrfs_ioctl_snap_create+0x194/0x1d0 [btrfs] [251067.668595] btrfs_ioctl_snap_create_v2+0x107/0x130 [btrfs] [251067.669479] btrfs_ioctl+0x1580/0x2690 [btrfs] [251067.670093] ? count_memcg_events+0x6d/0x180 [251067.670849] ? handle_mm_fault+0x1a0/0x2a0 [251067.671652] __x64_sys_ioctl+0x92/0xe0 [251067.672406] do_syscall_64+0x50/0xf20 [251067.673129] entry_SYSCALL_64_after_hwframe+0x76/0x7e [251067.674096] RIP: 0033:0x7f2a495648db [251067.674812] Code: 00 48 89 (...) [251067.678227] RSP: 002b:00007ffc5aa57840 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [251067.679691] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2a495648db [251067.681145] RDX: 00007ffc5aa588b0 RSI: 0000000050009417 RDI: 0000000000000004 [251067.682511] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 [251067.683842] R10: 000000000000000a R11: 0000000000000246 R12: 00007ffc5aa59910 [251067.685176] R13: 00007ffc5aa588b0 R14: 0000000000000004 R15: 0000000000000006 [251067.686524] </TASK> [251067.686972] ---[ end trace 0000000000000000 ]--- [251067.687890] BTRFS: error (device sdi state A) in create_pending_snapshot:1907: errno=-75 unknown [251067.689049] BTRFS info (device sdi state EA): forced readonly [251067.689054] BTRFS warning (device sdi state EA): Skipping commit of aborted transaction. [251067.690119] BTRFS: error (device sdi state EA) in cleanup_transaction:2043: errno=-75 unknown [251067.702028] BTRFS info (device sdi state EA): last unmount of filesystem 46dc3975-30a2-4a69-a18f-418b859cccda Fix this by ignoring -EOVERFLOW errors from btrfs_uuid_tree_add() in the snapshot creation code when attempting to add the BTRFS_UUID_KEY_RECEIVED_SUBVOL item. This is OK because it's not critical and we are still able to delete the snapshot, as snapshot/subvolume deletion ignores if a BTRFS_UUID_KEY_RECEIVED_SUBVOL is missing (see inode.c:btrfs_delete_subvolume()). As for send/receive, we can still do send/receive operations since it always peeks the first root ID in the existing BTRFS_UUID_KEY_RECEIVED_SUBVOL (it could peek any since all snapshots have the same content), and even if the key is missing, it falls back to searching by BTRFS_UUID_KEY_SUBVOL key. A test case for fstests will be sent soon. Fixes: `dd5f9615fc` ("Btrfs: maintain subvolume items in the UUID tree") CC: stable@vger.kernel.org # 3.12+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-03-03 17:03:59 +01:00

1 2 3 4 5 ...

104325 Commits