Commit Graph

102029 Commits

Author SHA1 Message Date
Linus Torvalds
1cc41c88ef Merge tag 'nfs-for-6.18-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:

 - Various fixes when using NFS with TLS

 - Localio direct-IO fixes

 - Fix error handling in nfs_atomic_open_v23()

 - Fix sysfs memory leak when nfs_client kobject add fails

 - Fix an incorrect parameter when calling nfs4_call_sync()

 - Fix a failing LTP test when using delegated timestamps

* tag 'nfs-for-6.18-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix LTP test failures when timestamps are delegated
  NFSv4: Fix an incorrect parameter when calling nfs4_call_sync()
  NFS: sysfs: fix leak when nfs_client kobject add fails
  NFSv2/v3: Fix error handling in nfs_atomic_open_v23()
  nfs/localio: do not issue misaligned DIO out-of-order
  nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion
  nfs/localio: backfill missing partial read support for misaligned DIO
  nfs/localio: add refcounting for each iocb IO associated with NFS pgio header
  nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support
  NFS: Check the TLS certificate fields in nfs_match_client()
  pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLS
  pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect()
  pnfs: Fix TLS logic in _nfs4_pnfs_v3_ds_connect()
2025-11-14 13:44:23 -08:00
Linus Torvalds
95baf63fe8 Merge tag 'v6.18-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - Multichannel reconnect channel selection fix

 - Fix for smbdirect (RDMA) disconnect bug

 - Fix for incorrect username length check

 - Fix memory leak in mount parm processing

* tag 'v6.18-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: let smbd_disconnect_rdma_connection() turn CREATED into DISCONNECTED
  smb: fix invalid username check in smb3_fs_context_parse_param()
  cifs: client: fix memory leak in smb3_fs_context_parse_param
  smb: client: fix cifs_pick_channel when channel needs reconnect
2025-11-14 08:30:48 -08:00
Linus Torvalds
2ccec59446 Merge tag 'erofs-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:

 - Add Chunhai Guo as a EROFS reviewer to get more eyes from interested
   industry vendors

 - Fix infinite loop caused by incomplete crafted zstd-compressed data
   (thanks to Robert again!)

* tag 'erofs-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: avoid infinite loop due to incomplete zstd-compressed data
  MAINTAINERS: erofs: add myself as reviewer
2025-11-13 05:02:59 -08:00
Linus Torvalds
967a72fa7f Merge tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:

 - Fix smbdirect (RDMA) disconnect hang bug

 - Fix potential Denial of Service when connection limit exceeded

 - Fix smbdirect (RDMA) connection (potentially accessing freed memory)
   bug

* tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbd:
  smb: server: let smb_direct_disconnect_rdma_connection() turn CREATED into DISCONNECTED
  ksmbd: close accepted socket when per-IP limit rejects connection
  smb: server: rdma: avoid unmapping posted recv on accept failure
2025-11-13 04:57:38 -08:00
Linus Torvalds
6fa9041b71 Merge tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
 "Address recently reported issues or issues found at the recent NFS
  bake-a-thon held in Raleigh, NC.

  Issues reported with v6.18-rc:
   - Address a kernel build issue
   - Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED

  Issues that need expedient stable backports:
   - Close a refcount leak exposure
   - Report support for NFSv4.2 CLONE correctly
   - Fix oops during COPY_NOTIFY processing
   - Prevent rare crash after XDR encoding failure
   - Prevent crash due to confused or malicious NFSv4.1 client"

* tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it"
  nfsd: ensure SEQUENCE replay sends a valid reply.
  NFSD: Never cache a COMPOUND when the SEQUENCE operation fails
  NFSD: Skip close replay processing if XDR encoding fails
  NFSD: free copynotify stateid in nfs4_free_ol_stateid()
  nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes
  nfsd: fix refcount leak in nfsd_set_fh_dentry()
2025-11-12 18:41:01 -08:00
Linus Torvalds
8341374f67 Merge tag 'for-6.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - fix new inode name tracking in tree-log

 - fix conventional zone and stripe calculations in zoned mode

 - fix bio reference counts on error paths in relocation and scrub

* tag 'for-6.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: release root after error in data_reloc_print_warning_inode()
  btrfs: scrub: put bio after errors in scrub_raid56_parity_stripe()
  btrfs: do not update last_log_commit when logging inode due to a new name
  btrfs: zoned: fix stripe width calculation
  btrfs: zoned: fix conventional zone capacity calculation
2025-11-11 10:13:17 -08:00
Stefan Metzmacher
d93a89684d smb: client: let smbd_disconnect_rdma_connection() turn CREATED into DISCONNECTED
When smbd_disconnect_rdma_connection() turns SMBDIRECT_SOCKET_CREATED
into SMBDIRECT_SOCKET_ERROR, we'll have the situation that
smbd_disconnect_rdma_work() will set SMBDIRECT_SOCKET_DISCONNECTING
and call rdma_disconnect(), which likely fails as we never reached
the RDMA_CM_EVENT_ESTABLISHED. it means that
wait_event(sc->status_wait, sc->status == SMBDIRECT_SOCKET_DISCONNECTED)
in smbd_destroy() will hang forever in SMBDIRECT_SOCKET_DISCONNECTING
never reaching SMBDIRECT_SOCKET_DISCONNECTED.

So we directly go from SMBDIRECT_SOCKET_CREATED to
SMBDIRECT_SOCKET_DISCONNECTED.

Fixes: ffbfc73e84 ("smb: client: let smbd_disconnect_rdma_connection() set SMBDIRECT_SOCKET_ERROR...")
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-11 11:05:35 -06:00
Yiqi Sun
ed6612165b smb: fix invalid username check in smb3_fs_context_parse_param()
Since the maximum return value of strnlen(..., CIFS_MAX_USERNAME_LEN)
is CIFS_MAX_USERNAME_LEN, length check in smb3_fs_context_parse_param()
is always FALSE and invalid.

Fix the comparison in if statement.

Signed-off-by: Yiqi Sun <sunyiqixm@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-11 10:01:47 -06:00
Stefan Metzmacher
55286b1e1b smb: server: let smb_direct_disconnect_rdma_connection() turn CREATED into DISCONNECTED
When smb_direct_disconnect_rdma_connection() turns SMBDIRECT_SOCKET_CREATED
into SMBDIRECT_SOCKET_ERROR, we'll have the situation that
smb_direct_disconnect_rdma_work() will set SMBDIRECT_SOCKET_DISCONNECTING
and call rdma_disconnect(), which likely fails as we never reached
the RDMA_CM_EVENT_ESTABLISHED. it means that
wait_event(sc->status_wait, sc->status == SMBDIRECT_SOCKET_DISCONNECTED)
in free_transport() will hang forever in SMBDIRECT_SOCKET_DISCONNECTING
never reaching SMBDIRECT_SOCKET_DISCONNECTED.

So we directly go from SMBDIRECT_SOCKET_CREATED to
SMBDIRECT_SOCKET_DISCONNECTED.

Fixes: b3fd52a0d8 ("smb: server: let smb_direct_disconnect_rdma_connection() set SMBDIRECT_SOCKET_ERROR...")
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-11 09:50:35 -06:00
Dai Ngo
b623390045 NFS: Fix LTP test failures when timestamps are delegated
The utimes01 and utime06 tests fail when delegated timestamps are
enabled, specifically in subtests that modify the atime and mtime
fields using the 'nobody' user ID.

The problem can be reproduced as follow:

# echo "/media *(rw,no_root_squash,sync)" >> /etc/exports
# export -ra
# mount -o rw,nfsvers=4.2 127.0.0.1:/media /tmpdir
# cd /opt/ltp
# ./runltp -d /tmpdir -s utimes01
# ./runltp -d /tmpdir -s utime06

This issue occurs because nfs_setattr does not verify the inode's
UID against the caller's fsuid when delegated timestamps are
permitted for the inode.

This patch adds the UID check and if it does not match then the
request is sent to the server for permission checking.

Fixes: e12912d941 ("NFSv4: Add support for delegated atime and mtime attributes")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 16:55:12 -05:00
Trond Myklebust
1f214e9c3a NFSv4: Fix an incorrect parameter when calling nfs4_call_sync()
The Smatch static checker noted that in _nfs4_proc_lookupp(), the flag
RPC_TASK_TIMEOUT is being passed as an argument to nfs4_init_sequence(),
which is clearly incorrect.
Since LOOKUPP is an idempotent operation, nfs4_init_sequence() should
not ask the server to cache the result. The RPC_TASK_TIMEOUT flag needs
to be passed down to the RPC layer.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Fixes: 76998ebb91 ("NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 14:30:46 -05:00
Yang Xiuwei
7a7a345652 NFS: sysfs: fix leak when nfs_client kobject add fails
If adding the second kobject fails, drop both references to avoid sysfs
residue and memory leak.

Fixes: e96f9268ee ("NFS: Make all of /sys/fs/nfs network-namespace unique")

Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Benjamin Coddington <ben.coddington@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 14:30:45 -05:00
Trond Myklebust
85d2c2392a NFSv2/v3: Fix error handling in nfs_atomic_open_v23()
When nfs_do_create() returns an EEXIST error, it means that a regular
file could not be created. That could mean that a symlink needs to be
resolved. If that's the case, a lookup needs to be kicked off.

Reported-by: Stephen Abbene <sabbene87@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220710
Fixes: 7c6c5249f0 ("NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 14:30:45 -05:00
Mike Snitzer
6a218b9c31 nfs/localio: do not issue misaligned DIO out-of-order
From https://lore.kernel.org/linux-nfs/aQHASIumLJyOoZGH@infradead.org/

On Wed, Oct 29, 2025 at 12:20:40AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 27, 2025 at 12:18:30PM -0400, Mike Snitzer wrote:
> > LOCALIO's misaligned DIO will issue head/tail followed by O_DIRECT
> > middle (via AIO completion of that aligned middle). So out of order
> > relative to file offset.
>
> That's in general a really bad idea.  It will obviously work, but
> both on SSDs and out of place write file systems it is a sure way
> to increase your garbage collection overhead a lot down the line.

Fix this by never issuing misaligned DIO out of order. This fix means
the DIO-aligned middle will only use AIO completion if there is no
misaligned end segment. Otherwise, all 3 segments of a misaligned DIO
will be issued without AIO completion to ensure file offset increases
properly for all partial READ or WRITE situations.

Factoring out nfs_local_iter_setup() helps standardize repetitive
nfs_local_iters_setup_dio() code and is inspired by cleanup work that
Chuck Lever did on the NFSD Direct code.

Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 13:28:45 -05:00
Mike Snitzer
d32ddfeb55 nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion
LOCALIO's misaligned DIO WRITE support requires synchronous IO for any
misaligned head and/or tail that are issued using buffered IO.  In
addition, it is important that the O_DIRECT middle be on stable
storage upon its completion via AIO.

Otherwise, a misaligned DIO WRITE could mix buffered IO for the
head/tail and direct IO for the DIO-aligned middle -- which could lead
to problems associated with deferred writes to stable storage (such as
out of order partial completions causing incorrect advancement of the
file's offset, etc).

Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 10:32:28 -05:00
Mike Snitzer
d0497dd274 nfs/localio: backfill missing partial read support for misaligned DIO
Misaligned DIO read can be split into 3 IOs, must handle potential for
short read from each component IO (follows same pattern used for
handling partial writes, except upper layer read code handles advancing
offset before retry).

Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 10:32:28 -05:00
Mike Snitzer
f2060bdc21 nfs/localio: add refcounting for each iocb IO associated with NFS pgio header
Improve completion handling of as many as 3 IOs associated with each
misaligned DIO by using a atomic_t to track completion of each IO.

Update nfs_local_pgio_done() to use precise atomic_t accounting for
remaining iov_iter (up to 3) associated with each iocb, so that each
NFS LOCALIO pgio header is only released after all IOs have completed.
But also allow early return if/when a short read or write occurs.

Fixes reported BUG: KASAN: slab-use-after-free in nfs_local_call_read:
https://lore.kernel.org/linux-nfs/aPSvi5Yr2lGOh5Jh@dell-per750-06-vm-07.rhts.eng.pek2.redhat.com/

Reported-by: Yongcheng Yang <yoyang@redhat.com>
Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 10:32:28 -05:00
Mike Snitzer
51a491f270 nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support
Each filesystem is meant to fallback to retrying DIO in terms buffered
IO when it might encounter -ENOTBLK when issuing DIO (which can happen
if the VFS cannot invalidate the page cache).

So NFS doesn't need special handling for -ENOTBLK.

Also, explicitly initialize a couple DIO related iocb members rather
than simply rely on data structure zeroing.

Fixes: c817248fc8 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 10:32:28 -05:00
Trond Myklebust
fb2cba0854 NFS: Check the TLS certificate fields in nfs_match_client()
If the TLS security policy is of type RPC_XPRTSEC_TLS_X509, then the
cert_serial and privkey_serial fields need to match as well since they
define the client's identity, as presented to the server.

Fixes: 90c9550a8d ("NFS: support the kernel keyring for TLS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 10:32:28 -05:00
Trond Myklebust
8ab523ce78 pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLS
The default setting for the transport security policy must be
RPC_XPRTSEC_NONE, when using a TCP or RDMA connection without TLS.
Conversely, when using TLS, the security policy needs to be set.

Fixes: 6c0a8c5fcf ("NFS: Have struct nfs_client carry a TLS policy field")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 10:32:28 -05:00
Trond Myklebust
28e19737e1 pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect()
Don't try to add an RDMA transport to a client that is already marked as
being a TCP/TLS transport.

Fixes: a35518cae4 ("NFSv4.1/pnfs: fix NFS with TLS in pnfs")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 10:32:28 -05:00
Trond Myklebust
7aca00d950 pnfs: Fix TLS logic in _nfs4_pnfs_v3_ds_connect()
Don't try to add an RDMA transport to a client that is already marked as
being a TCP/TLS transport.

Fixes: 04a1526366 ("pnfs/flexfiles: connect to NFSv3 DS using TLS if MDS connection uses TLS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10 10:32:27 -05:00
NeilBrown
1cff14b7fc nfsd: ensure SEQUENCE replay sends a valid reply.
nfsd4_enc_sequence_replay() uses nfsd4_encode_operation() to encode a
new SEQUENCE reply when replaying a request from the slot cache - only
ops after the SEQUENCE are replayed from the cache in ->sl_data.

However it does this in nfsd4_replay_cache_entry() which is called
*before* nfsd4_sequence() has filled in reply fields.

This means that in the replayed SEQUENCE reply:
 maxslots will be whatever the client sent
 target_maxslots will be -1 (assuming init to zero, and
      nfsd4_encode_sequence() subtracts 1)
 status_flags will be zero

The incorrect maxslots value, in particular, can cause the client to
think the slot table has been reduced in size so it can discard its
knowledge of current sequence number of the later slots, though the
server has not discarded those slots.  When the client later wants to
use a later slot, it can get NFS4ERR_SEQ_MISORDERED from the server.

This patch moves the setup of the reply into a new helper function and
call it *before* nfsd4_replay_cache_entry() is called.  Only one of the
updated fields was used after this point - maxslots.  So the
nfsd4_sequence struct has been extended to have separate maxslots for
the request and the response.

Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Closes: https://lore.kernel.org/linux-nfs/20251010194449.10281-1-okorniev@redhat.com/
Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-10 09:31:52 -05:00
Chuck Lever
c96573c0d7 NFSD: Never cache a COMPOUND when the SEQUENCE operation fails
RFC 8881 normatively mandates that operations where the initial
SEQUENCE operation in a compound fails must not modify the slot's
replay cache.

nfsd4_cache_this() doesn't prevent such caching. So when SEQUENCE
fails, cstate.data_offset is not set, allowing
read_bytes_from_xdr_buf() to access uninitialized memory.

Reported-by: rtm@csail.mit.edu
Closes: https://lore.kernel.org/linux-nfs/c3628d57-94ae-48cf-8c9e-49087a28cec9@oracle.com/T/#t
Fixes: 468de9e54a ("nfsd41: expand solo sequence check")
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-10 09:31:52 -05:00
Chuck Lever
ff8141e49c NFSD: Skip close replay processing if XDR encoding fails
The replay logic added by commit 9411b1d4c7 ("nfsd4: cleanup
handling of nfsv4.0 closed stateid's") cannot be done if encoding
failed due to a short send buffer; there's no guarantee that the
operation encoder has actually encoded the data that is being copied
to the replay cache.

Reported-by: rtm@csail.mit.edu
Closes: https://lore.kernel.org/linux-nfs/c3628d57-94ae-48cf-8c9e-49087a28cec9@oracle.com/T/#t
Fixes: 9411b1d4c7 ("nfsd4: cleanup handling of nfsv4.0 closed stateid's")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-10 09:31:52 -05:00
Olga Kornievskaia
4aa17144d5 NFSD: free copynotify stateid in nfs4_free_ol_stateid()
Typically copynotify stateid is freed either when parent's stateid
is being close/freed or in nfsd4_laundromat if the stateid hasn't
been used in a lease period.

However, in case when the server got an OPEN (which created
a parent stateid), followed by a COPY_NOTIFY using that stateid,
followed by a client reboot. New client instance while doing
CREATE_SESSION would force expire previous state of this client.
It leads to the open state being freed thru release_openowner->
nfs4_free_ol_stateid() and it finds that it still has copynotify
stateid associated with it. We currently print a warning and is
triggerred

WARNING: CPU: 1 PID: 8858 at fs/nfsd/nfs4state.c:1550 nfs4_free_ol_stateid+0xb0/0x100 [nfsd]

This patch, instead, frees the associated copynotify stateid here.

If the parent stateid is freed (without freeing the copynotify
stateids associated with it), it leads to the list corruption
when laundromat ends up freeing the copynotify state later.

[ 1626.839430] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
[ 1626.842828] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth cfg80211 rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace nfs_localio ext4 crc16 mbcache jbd2 overlay uinput snd_seq_dummy snd_hrtimer qrtr rfkill vfat fat uvcvideo snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops snd_hda_intel uvc snd_intel_dspcfg videobuf2_v4l2 videobuf2_common snd_hda_codec snd_hda_core videodev snd_hwdep snd_seq mc snd_seq_device snd_pcm snd_timer snd soundcore sg loop auth_rpcgss vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs 8021q garp stp llc mrp nvme ghash_ce e1000e nvme_core sr_mod nvme_keyring nvme_auth cdrom vmwgfx drm_ttm_helper ttm sunrpc dm_mirror dm_region_hash dm_log iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse dm_multipath dm_mod nfnetlink
[ 1626.855594] CPU: 2 UID: 0 PID: 199 Comm: kworker/u24:33 Kdump: loaded Tainted: G    B   W           6.17.0-rc7+ #22 PREEMPT(voluntary)
[ 1626.857075] Tainted: [B]=BAD_PAGE, [W]=WARN
[ 1626.857573] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024
[ 1626.858724] Workqueue: nfsd4 laundromat_main [nfsd]
[ 1626.859304] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1626.860010] pc : __list_del_entry_valid_or_report+0x148/0x200
[ 1626.860601] lr : __list_del_entry_valid_or_report+0x148/0x200
[ 1626.861182] sp : ffff8000881d7a40
[ 1626.861521] x29: ffff8000881d7a40 x28: 0000000000000018 x27: ffff0000c2a98200
[ 1626.862260] x26: 0000000000000600 x25: 0000000000000000 x24: ffff8000881d7b20
[ 1626.862986] x23: ffff0000c2a981e8 x22: 1fffe00012410e7d x21: ffff0000920873e8
[ 1626.863701] x20: ffff0000920873e8 x19: ffff000086f22998 x18: 0000000000000000
[ 1626.864421] x17: 20747562202c3839 x16: 3932326636383030 x15: 3030666666662065
[ 1626.865092] x14: 6220646c756f6873 x13: 0000000000000001 x12: ffff60004fd9e4a3
[ 1626.865713] x11: 1fffe0004fd9e4a2 x10: ffff60004fd9e4a2 x9 : dfff800000000000
[ 1626.866320] x8 : 00009fffb0261b5e x7 : ffff00027ecf2513 x6 : 0000000000000001
[ 1626.866938] x5 : ffff00027ecf2510 x4 : ffff60004fd9e4a3 x3 : 0000000000000000
[ 1626.867553] x2 : 0000000000000000 x1 : ffff000096069640 x0 : 000000000000006d
[ 1626.868167] Call trace:
[ 1626.868382]  __list_del_entry_valid_or_report+0x148/0x200 (P)
[ 1626.868876]  _free_cpntf_state_locked+0xd0/0x268 [nfsd]
[ 1626.869368]  nfs4_laundromat+0x6f8/0x1058 [nfsd]
[ 1626.869813]  laundromat_main+0x24/0x60 [nfsd]
[ 1626.870231]  process_one_work+0x584/0x1050
[ 1626.870595]  worker_thread+0x4c4/0xc60
[ 1626.870893]  kthread+0x2f8/0x398
[ 1626.871146]  ret_from_fork+0x10/0x20
[ 1626.871422] Code: aa1303e1 aa1403e3 910e8000 97bc55d7 (d4210000)
[ 1626.871892] SMP: stopping secondary CPUs

Reported-by: rtm@csail.mit.edu
Closes: https://lore.kernel.org/linux-nfs/d8f064c1-a26f-4eed-b4f0-1f7f608f415f@oracle.com/T/#t
Fixes: 624322f1ad ("NFSD add COPY_NOTIFY operation")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-10 09:31:52 -05:00
Edward Adam Davis
9a6b60cb14 nilfs2: avoid having an active sc_timer before freeing sci
Because kthread_stop did not stop sc_task properly and returned -EINTR,
the sc_timer was not properly closed, ultimately causing the problem [1]
reported by syzbot when freeing sci due to the sc_timer not being closed.

Because the thread sc_task main function nilfs_segctor_thread() returns 0
when it succeeds, when the return value of kthread_stop() is not 0 in
nilfs_segctor_destroy(), we believe that it has not properly closed
sc_timer.

We use timer_shutdown_sync() to sync wait for sc_timer to shutdown, and
set the value of sc_task to NULL under the protection of lock
sc_state_lock, so as to avoid the issue caused by sc_timer not being
properly shutdowned.

[1]
ODEBUG: free active (active state 0) object: 00000000dacb411a object type: timer_list hint: nilfs_construction_timeout
Call trace:
 nilfs_segctor_destroy fs/nilfs2/segment.c:2811 [inline]
 nilfs_detach_log_writer+0x668/0x8cc fs/nilfs2/segment.c:2877
 nilfs_put_super+0x4c/0x12c fs/nilfs2/super.c:509

Link: https://lkml.kernel.org/r/20251029225226.16044-1-konishi.ryusuke@gmail.com
Fixes: 3f66cc261c ("nilfs2: use kthread_create and kthread_stop for the log writer thread")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+24d8b70f039151f65590@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=24d8b70f039151f65590
Tested-by: syzbot+24d8b70f039151f65590@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Cc: <stable@vger.kernel.org>	[6.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-09 21:19:46 -08:00
Wei Yang
895b4c0c79 fs/proc: fix uaf in proc_readdir_de()
Pde is erased from subdir rbtree through rb_erase(), but not set the node
to EMPTY, which may result in uaf access.  We should use RB_CLEAR_NODE()
set the erased node to EMPTY, then pde_subdir_next() will return NULL to
avoid uaf access.

We found an uaf issue while using stress-ng testing, need to run testcase
getdent and tun in the same time.  The steps of the issue is as follows:

1) use getdent to traverse dir /proc/pid/net/dev_snmp6/, and current
   pde is tun3;

2) in the [time windows] unregister netdevice tun3 and tun2, and erase
   them from rbtree.  erase tun3 first, and then erase tun2.  the
   pde(tun2) will be released to slab;

3) continue to getdent process, then pde_subdir_next() will return
   pde(tun2) which is released, it will case uaf access.

CPU 0                                      |    CPU 1
-------------------------------------------------------------------------
traverse dir /proc/pid/net/dev_snmp6/      |   unregister_netdevice(tun->dev)   //tun3 tun2
sys_getdents64()                           |
  iterate_dir()                            |
    proc_readdir()                         |
      proc_readdir_de()                    |     snmp6_unregister_dev()
        pde_get(de);                       |       proc_remove()
        read_unlock(&proc_subdir_lock);    |         remove_proc_subtree()
                                           |           write_lock(&proc_subdir_lock);
        [time window]                      |           rb_erase(&root->subdir_node, &parent->subdir);
                                           |           write_unlock(&proc_subdir_lock);
        read_lock(&proc_subdir_lock);      |
        next = pde_subdir_next(de);        |
        pde_put(de);                       |
        de = next;    //UAF                |

rbtree of dev_snmp6
                        |
                    pde(tun3)
                     /    \
                  NULL  pde(tun2)

Link: https://lkml.kernel.org/r/20251025024233.158363-1-albin_yang@163.com
Signed-off-by: Wei Yang <albinwyang@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: wangzijie <wangzijie1@honor.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-09 21:19:43 -08:00
Joshua Rogers
98a5fd31cb ksmbd: close accepted socket when per-IP limit rejects connection
When the per-IP connection limit is exceeded in ksmbd_kthread_fn(),
the code sets ret = -EAGAIN and continues the accept loop without
closing the just-accepted socket. That leaks one socket per rejected
attempt from a single IP and enables a trivial remote DoS.

Release client_sk before continuing.

This bug was found with ZeroPath.

Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-09 17:47:52 -06:00
Joshua Rogers
e904d81ad1 smb: server: rdma: avoid unmapping posted recv on accept failure
smb_direct_prepare_negotiation() posts a recv and then, if
smb_direct_accept_client() fails, calls put_recvmsg() on the same
buffer. That unmaps and recycles a buffer that is still posted on
the QP., which can lead to device DMA into unmapped or reused memory.

Track whether the recv was posted and only return it if it was never
posted. If accept fails after a post, leave it for teardown to drain
and complete safely.

Signed-off-by: Joshua Rogers <linux@joshua.hu>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-09 17:47:49 -06:00
Edward Adam Davis
e8c73eb7db cifs: client: fix memory leak in smb3_fs_context_parse_param
The user calls fsconfig twice, but when the program exits, free() only
frees ctx->source for the second fsconfig, not the first.
Regarding fc->source, there is no code in the fs context related to its
memory reclamation.

To fix this memory leak, release the source memory corresponding to ctx
or fc before each parsing.

syzbot reported:
BUG: memory leak
unreferenced object 0xffff888128afa360 (size 96):
  backtrace (crc 79c9c7ba):
    kstrdup+0x3c/0x80 mm/util.c:84
    smb3_fs_context_parse_param+0x229b/0x36c0 fs/smb/client/fs_context.c:1444

BUG: memory leak
unreferenced object 0xffff888112c7d900 (size 96):
  backtrace (crc 79c9c7ba):
    smb3_fs_context_fullpath+0x70/0x1b0 fs/smb/client/fs_context.c:629
    smb3_fs_context_parse_param+0x2266/0x36c0 fs/smb/client/fs_context.c:1438

Reported-by: syzbot+72afd4c236e6bc3f4bac@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=72afd4c236e6bc3f4bac
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-09 17:30:17 -06:00
Henrique Carvalho
79280191c2 smb: client: fix cifs_pick_channel when channel needs reconnect
cifs_pick_channel iterates candidate channels using cur. The
reconnect-state test mistakenly used a different variable.

This checked the wrong slot and would cause us to skip a healthy channel
and to dispatch on one that needs reconnect, occasionally failing
operations when a channel was down.

Fix by replacing for the correct variable.

Fixes: fc43a8ac39 ("cifs: cifs_pick_channel should try selecting active channels")
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-09 17:30:17 -06:00
Linus Torvalds
7bb4d65125 Merge tag 'v6.18rc4-SMB-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - Fix change notify packet validation check

 - Refcount fix (e.g. rename error paths)

 - Fix potential UAF due to missing locks on directory lease refcount

* tag 'v6.18rc4-SMB-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: validate change notify buffer before copy
  smb: client: fix refcount leak in smb2_set_path_attr
  smb: client: fix potential UAF in smb2_close_cached_fid()
2025-11-08 10:17:30 -08:00
Linus Torvalds
e284d5118a Merge tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
 "This contain fixes for the RT and zoned allocator, and a few fixes for
  atomic writes"

* tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: free xfs_busy_extents structure when no RT extents are queued
  xfs: fix zone selection in xfs_select_open_zone_mru
  xfs: fix a rtgroup leak when xfs_init_zone fails
  xfs: fix various problems in xfs_atomic_write_cow_iomap_begin
  xfs: fix delalloc write failures in software-provided atomic writes
2025-11-08 08:43:01 -08:00
Joshua Rogers
4012abe8a7 smb: client: validate change notify buffer before copy
SMB2_change_notify called smb2_validate_iov() but ignored the return
code, then kmemdup()ed using server provided OutputBufferOffset/Length.

Check the return of smb2_validate_iov() and bail out on error.

Discovered with help from the ZeroPath security tooling.

Signed-off-by: Joshua Rogers <linux@joshua.hu>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: stable@vger.kernel.org
Fixes: e3e9463414 ("smb3: improve SMB3 change notification support")
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-07 10:15:43 -06:00
Linus Torvalds
cff0a1be08 Merge tag 'v6.18-rc4-smb-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:

 - More safely detect RDMA capable devices correctly

* tag 'v6.18-rc4-smb-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: detect RDMA capable netdevs include IPoIB
  ksmbd: detect RDMA capable lower devices when bridge and vlan netdev is used
2025-11-07 07:39:57 -08:00
Linus Torvalds
c668da99b9 Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt fix from Eric Biggers:
 "Fix an UBSAN warning that started occurring when the block layer
  started supporting logical_block_size > PAGE_SIZE"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: fix left shift underflow when inode->i_blkbits > PAGE_SHIFT
2025-11-06 12:45:15 -08:00
Gao Xiang
f2a12cc3b9 erofs: avoid infinite loop due to incomplete zstd-compressed data
Currently, the decompression logic incorrectly spins if compressed
data is truncated in crafted (deliberately corrupted) images.

Fixes: 7c35de4df1 ("erofs: Zstandard compression support")
Reported-by: Robert Morris <rtm@csail.mit.edu>
Closes: https://lore.kernel.org/r/50958.1761605413@localhost
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
2025-11-07 04:10:45 +08:00
Christoph Hellwig
d8a823c6f0 xfs: free xfs_busy_extents structure when no RT extents are queued
kmemleak occasionally reports leaking xfs_busy_extents structure
from xfs_scrub calls after running xfs/528 (but attributed to following
tests), which seems to be caused by not freeing the xfs_busy_extents
structure when tr.queued is 0 and xfs_trim_rtgroup_extents breaks out
of the main loop.  Free the structure in this case.

Fixes: a3315d1130 ("xfs: use rtgroup busy extent list for FITRIM")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-06 08:59:19 +01:00
Zilin Guan
c367af440e btrfs: release root after error in data_reloc_print_warning_inode()
data_reloc_print_warning_inode() calls btrfs_get_fs_root() to obtain
local_root, but fails to release its reference when paths_from_inode()
returns an error. This causes a potential memory leak.

Add a missing btrfs_put_root() call in the error path to properly
decrease the reference count of local_root.

Fixes: b9a9a85059 ("btrfs: output affected files when relocation fails")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:01:12 +01:00
Zilin Guan
5fea61aa1c btrfs: scrub: put bio after errors in scrub_raid56_parity_stripe()
scrub_raid56_parity_stripe() allocates a bio with bio_alloc(), but
fails to release it on some error paths, leading to a potential
memory leak.

Add the missing bio_put() calls to properly drop the bio reference
in those error cases.

Fixes: 1009254bf2 ("btrfs: scrub: use scrub_stripe to implement RAID56 P/Q scrub")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:01:12 +01:00
Filipe Manana
bfe3d755ef btrfs: do not update last_log_commit when logging inode due to a new name
When logging that a new name exists, we skip updating the inode's
last_log_commit field to prevent a later explicit fsync against the inode
from doing nothing (as updating last_log_commit makes btrfs_inode_in_log()
return true). We are detecting, at btrfs_log_inode(), that logging a new
name is happening by checking the logging mode is not LOG_INODE_EXISTS,
but that is not enough because we may log parent directories when logging
a new name of a file in LOG_INODE_ALL mode - we need to check that the
logging_new_name field of the log context too.

An example scenario where this results in an explicit fsync against a
directory not persisting changes to the directory is the following:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  $ touch /mnt/foo

  $ sync

  $ mkdir /mnt/dir

  # Write some data to our file and fsync it.
  $ xfs_io -c "pwrite -S 0xab 0 64K" -c "fsync" /mnt/foo

  # Add a new link to our file. Since the file was logged before, we
  # update it in the log tree by calling btrfs_log_new_name().
  $ ln /mnt/foo /mnt/dir/bar

  # fsync the root directory - we expect it to persist the dentry for
  # the new directory "dir".
  $ xfs_io -c "fsync" /mnt

  <power fail>

After mounting the fs the entry for directory "dir" does not exists,
despite the explicit fsync on the root directory.

Here's why this happens:

1) When we fsync the file we log the inode, so that it's present in the
   log tree;

2) When adding the new link we enter btrfs_log_new_name(), and since the
   inode is in the log tree we proceed to updating the inode in the log
   tree;

3) We first set the inode's last_unlink_trans to the current transaction
   (early in btrfs_log_new_name());

4) We then eventually enter btrfs_log_inode_parent(), and after logging
   the file's inode, we call btrfs_log_all_parents() because the inode's
   last_unlink_trans matches the current transaction's ID (updated in the
   previous step);

5) So btrfs_log_all_parents() logs the root directory by calling
   btrfs_log_inode() for the root's inode with a log mode of LOG_INODE_ALL
   so that new dentries are logged;

6) At btrfs_log_inode(), because the log mode is LOG_INODE_ALL, we
   update root inode's last_log_commit to the last transaction that
   changed the inode (->last_sub_trans field of the inode), which
   corresponds to the current transaction's ID;

7) Then later when user space explicitly calls fsync against the root
   directory, we enter btrfs_sync_file(), which calls skip_inode_logging()
   and that returns true, since its call to btrfs_inode_in_log() returns
   true and there are no ordered extents (it's a directory, never has
   ordered extents). This results in btrfs_sync_file() returning without
   syncing the log or committing the current transaction, so all the
   updates we did when logging the new name, including logging the root
   directory,  are not persisted.

So fix this by but updating the inode's last_log_commit if we are sure
we are not logging a new name (if ctx->logging_new_name is false).

A test case for fstests will follow soon.

Reported-by: Vyacheslav Kovalevsky <slava.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/03c5d7ec-5b3d-49d1-95bc-8970a7f82d87@gmail.com/
Fixes: 130341be7f ("btrfs: always update the logged transaction when logging new names")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:01:01 +01:00
Naohiro Aota
6a1ab50135 btrfs: zoned: fix stripe width calculation
The stripe offset calculation in the zoned code for raid0 and raid10
wrongly uses map->stripe_size to calculate it. In fact, map->stripe_size is
the size of the device extent composing the block group, which always is
the zone_size on the zoned setup.

Fix it by using BTRFS_STRIPE_LEN and BTRFS_STRIPE_LEN_SHIFT. Also, optimize
the calculation a bit by doing the common calculation only once.

Fixes: c0d90a79e8 ("btrfs: zoned: fix alloc_offset calculation for partly conventional block groups")
CC: stable@vger.kernel.org # 6.17+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:00:08 +01:00
Naohiro Aota
94f54924b9 btrfs: zoned: fix conventional zone capacity calculation
When a block group contains both conventional zone and sequential zone, the
capacity of the block group is wrongly set to the block group's full
length. The capacity should be calculated in btrfs_load_block_group_* using
the last allocation offset.

Fixes: 568220fa96 ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree")
CC: stable@vger.kernel.org # v6.12+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:00:06 +01:00
Christoph Hellwig
21ab5179aa xfs: fix zone selection in xfs_select_open_zone_mru
xfs_select_open_zone_mru needs to pass XFS_ZONE_ALLOC_OK to
xfs_try_use_zone because we only want to tightly pack into zones of the
same or a compatible temperature instead of any available zone.

This got broken in commit 0301dae732 ("xfs: refactor hint based zone
allocation"), which failed to update this particular caller when
switching to an enum.  xfs/638 sometimes, but not reliably fails due to
this change.

Fixes: 0301dae732 ("xfs: refactor hint based zone allocation")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05 16:54:38 +01:00
Christoph Hellwig
f5714a3c1a xfs: fix a rtgroup leak when xfs_init_zone fails
Drop the rtgrop reference when xfs_init_zone fails for a conventional
device.

Fixes: 4e4d520755 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05 16:53:49 +01:00
Darrick J. Wong
8d7bba1e83 xfs: fix various problems in xfs_atomic_write_cow_iomap_begin
I think there are several things wrong with this function:

A) xfs_bmapi_write can return a much larger unwritten mapping than what
   the caller asked for.  We convert part of that range to written, but
   return the entire written mapping to iomap even though that's
   inaccurate.

B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an
   unwritten mapping could be *smaller* than the write range (or even
   the hole range).  In this case, we convert too much file range to
   written state because we then return a smaller mapping to iomap.

C) It doesn't handle delalloc mappings.  This I covered in the patch
   that I already sent to the list.

D) Reassigning count_fsb to handle the hole means that if the second
   cmap lookup attempt succeeds (due to racing with someone else) we
   trim the mapping more than is strictly necessary.  The changing
   meaning of count_fsb makes this harder to notice.

E) The tracepoint is kinda wrong because @length is mutated.  That makes
   it harder to chase the data flows through this function because you
   can't just grep on the pos/bytecount strings.

F) We don't actually check that the br_state = XFS_EXT_NORM assignment
   is accurate, i.e that the cow fork actually contains a written
   mapping for the range we're interested in

G) Somewhat inadequate documentation of why we need to xfs_trim_extent
   so aggressively in this function.

H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped
   the write range to s_maxbytes.

Fix these issues, and then the atomic writes regressions in generic/760,
generic/617, generic/091, generic/263, and generic/521 all go away for
me.

Cc: stable@vger.kernel.org # v6.16
Fixes: bd1d2c21d5 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05 16:52:49 +01:00
Darrick J. Wong
8d54eacd82 xfs: fix delalloc write failures in software-provided atomic writes
With the 20 Oct 2025 release of fstests, generic/521 fails for me on
regular (aka non-block-atomic-writes) storage:

QA output created by 521
dowrite: write: Input/output error
LOG DUMP (8553 total operations):
1(  1 mod 256): SKIPPED (no operation)
2(  2 mod 256): WRITE    0x7e000 thru 0x8dfff	(0x10000 bytes) HOLE
3(  3 mod 256): READ     0x69000 thru 0x79fff	(0x11000 bytes)
4(  4 mod 256): FALLOC   0x53c38 thru 0x5e853	(0xac1b bytes) INTERIOR
5(  5 mod 256): COPY 0x55000 thru 0x59fff	(0x5000 bytes) to 0x25000 thru 0x29fff
6(  6 mod 256): WRITE    0x74000 thru 0x88fff	(0x15000 bytes)
7(  7 mod 256): ZERO     0xedb1 thru 0x11693	(0x28e3 bytes)

with a warning in dmesg from iomap about XFS trying to give it a
delalloc mapping for a directio write.  Fix the software atomic write
iomap_begin code to convert the reservation into a written mapping.
This doesn't fix the data corruption problems reported by generic/760,
but it's a start.

Cc: stable@vger.kernel.org # v6.16
Fixes: bd1d2c21d5 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05 16:52:49 +01:00
Yongpeng Yang
1e39da974c fscrypt: fix left shift underflow when inode->i_blkbits > PAGE_SHIFT
When simulating an nvme device on qemu with both logical_block_size and
physical_block_size set to 8 KiB, an error trace appears during
partition table reading at boot time. The issue is caused by
inode->i_blkbits being larger than PAGE_SHIFT, which leads to a left
shift of -1 and triggering a UBSAN warning.

[    2.697306] ------------[ cut here ]------------
[    2.697309] UBSAN: shift-out-of-bounds in fs/crypto/inline_crypt.c:336:37
[    2.697311] shift exponent -1 is negative
[    2.697315] CPU: 3 UID: 0 PID: 274 Comm: (udev-worker) Not tainted 6.18.0-rc2+ #34 PREEMPT(voluntary)
[    2.697317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    2.697320] Call Trace:
[    2.697324]  <TASK>
[    2.697325]  dump_stack_lvl+0x76/0xa0
[    2.697340]  dump_stack+0x10/0x20
[    2.697342]  __ubsan_handle_shift_out_of_bounds+0x1e3/0x390
[    2.697351]  bh_get_inode_and_lblk_num.cold+0x12/0x94
[    2.697359]  fscrypt_set_bio_crypt_ctx_bh+0x44/0x90
[    2.697365]  submit_bh_wbc+0xb6/0x190
[    2.697370]  block_read_full_folio+0x194/0x270
[    2.697371]  ? __pfx_blkdev_get_block+0x10/0x10
[    2.697375]  ? __pfx_blkdev_read_folio+0x10/0x10
[    2.697377]  blkdev_read_folio+0x18/0x30
[    2.697379]  filemap_read_folio+0x40/0xe0
[    2.697382]  filemap_get_pages+0x5ef/0x7a0
[    2.697385]  ? mmap_region+0x63/0xd0
[    2.697389]  filemap_read+0x11d/0x520
[    2.697392]  blkdev_read_iter+0x7c/0x180
[    2.697393]  vfs_read+0x261/0x390
[    2.697397]  ksys_read+0x71/0xf0
[    2.697398]  __x64_sys_read+0x19/0x30
[    2.697399]  x64_sys_call+0x1e88/0x26a0
[    2.697405]  do_syscall_64+0x80/0x670
[    2.697410]  ? __x64_sys_newfstat+0x15/0x20
[    2.697414]  ? x64_sys_call+0x204a/0x26a0
[    2.697415]  ? do_syscall_64+0xb8/0x670
[    2.697417]  ? irqentry_exit_to_user_mode+0x2e/0x2a0
[    2.697420]  ? irqentry_exit+0x43/0x50
[    2.697421]  ? exc_page_fault+0x90/0x1b0
[    2.697422]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    2.697425] RIP: 0033:0x75054cba4a06
[    2.697426] Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
[    2.697427] RSP: 002b:00007fff973723a0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
[    2.697430] RAX: ffffffffffffffda RBX: 00005ea9a2c02760 RCX: 000075054cba4a06
[    2.697432] RDX: 0000000000002000 RSI: 000075054c190000 RDI: 000000000000001b
[    2.697433] RBP: 00007fff973723c0 R08: 0000000000000000 R09: 0000000000000000
[    2.697434] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[    2.697434] R13: 00005ea9a2c027c0 R14: 00005ea9a2be5608 R15: 00005ea9a2be55f0
[    2.697436]  </TASK>
[    2.697436] ---[ end trace ]---

This situation can happen for block devices because when
CONFIG_TRANSPARENT_HUGEPAGE is enabled, the maximum logical_block_size
is 64 KiB. set_init_blocksize() then sets the block device
inode->i_blkbits to 13, which is within this limit.

File I/O does not trigger this problem because for filesystems that do
not support the FS_LBS feature, sb_set_blocksize() prevents
sb->s_blocksize_bits from being larger than PAGE_SHIFT. During inode
allocation, alloc_inode()->inode_init_always() assigns inode->i_blkbits
from sb->s_blocksize_bits. Currently, only xfs_fs_type has the FS_LBS
flag, and since xfs I/O paths do not reach submit_bh_wbc(), it does not
hit the left-shift underflow issue.

Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Fixes: 47dd675323 ("block/bdev: lift block size restrictions to 64k")
Cc: stable@vger.kernel.org
[EB: use folio_pos() and consolidate the two shifts by i_blkbits]
Link: https://lore.kernel.org/r/20251105003642.42796-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-04 16:37:38 -08:00
Shuhao Fu
b540de9e3b smb: client: fix refcount leak in smb2_set_path_attr
Fix refcount leak in `smb2_set_path_attr` when path conversion fails.

Function `cifs_get_writable_path` returns `cfile` with its reference
counter `cfile->count` increased on success. Function `smb2_compound_op`
would decrease the reference counter for `cfile`, as stated in its
comment. By calling `smb2_rename_path`, the reference counter of `cfile`
would leak if `cifs_convert_path_to_utf16` fails in `smb2_set_path_attr`.

Fixes: 8de9e86c67 ("cifs: create a helper to find a writeable handle by path name")
Acked-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-04 16:03:56 -06:00