linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Jakub Kicinski	af2d6148d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml `880d43ca9a` ("netlink: specs: clean up spaces in brackets") `af52020fc5` ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c `a44312d58e` ("net: phy: Don't register LEDs for genphy") `f0f2b992d8` ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c `5fde0fcbd7` ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") `ea045a0de3` ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c `ae3264a25a` ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") `a8594c956c` ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-17 11:00:33 -07:00
Linus Torvalds	3f31a806a6	19 hotfixes. A whopping 16 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 14 are for MM. Three gdb-script fixes and a kallsyms build fix. -----BEGIN PGP SIGNATURE----- iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaHGbTgAKCRDdBJ7gKXxA jowqAPiCWBFfcFaX20BxVaMU1PjC3Lh9llDXqQwBhBNdcadSAP44SGQ8nrfV+piB OcNz2AEwBBfS354G0Etlh4k08YoAAw== =IDDc -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "19 hotfixes. A whopping 16 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 14 are for MM. Three gdb-script fixes and a kallsyms build fix" * tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "sched/numa: add statistics of numa balance task" mm: fix the inaccurate memory statistics issue for users mm/damon: fix divide by zero in damon_get_intervals_score() samples/damon: fix damon sample mtier for start failure samples/damon: fix damon sample wsse for start failure samples/damon: fix damon sample prcl for start failure kasan: remove kasan_find_vm_area() to prevent possible deadlock scripts: gdb: vfs: support external dentry names mm/migrate: fix do_pages_stat in compat mode mm/damon/core: handle damon_call_control as normal under kdmond deactivation mm/rmap: fix potential out-of-bounds page table access during batched unmap mm/hugetlb: don't crash when allocating a folio if there are no resv scripts/gdb: de-reference per-CPU MCE interrupts scripts/gdb: fix interrupts.py after maple tree conversion maple_tree: fix mt_destroy_walk() on root leaf node mm/vmalloc: leave lazy MMU mode on PTE mapping error scripts/gdb: fix interrupts display after MCP on x86 lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() kallsyms: fix build without execinfo	2025-07-12 10:30:47 -07:00
Linus Torvalds	3b428e1cfc	Changes since last update: - Address cache aliasing for mappable page cache folios; - Allow readdir() to be interrupted; - Fix large fragment handling which was errored out by mistake; - Add missing tracepoints; - Use memcpy_to_folio() to replace copy_to_iter() for inline data. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmhyeJQRHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qqlBBAAgPWmt8AqJBa+8BRI2VzM7dAygHODp14d 2m0NojMiONLh+vveCt/BTLnapqiOrnfUP9HXCzFjULClzLAjm7zUe3y1m304WGT+ WIgQpu6ZKEvoMLKAPWEjGmevixX6W3eeGSjoKJv8XUHBhLrH2QdLGu7GoM1j1Qk4 mf40VvzAyA7HkCf3jFOo7BOhMhzuAWfCGy+lMN4taDK+eQ3kpcola60Sjy0pUrew HHH4qFDO/wJ1Mh5DVFFcH82QBVFNuNlbqY/0twyENrPuDUSrnbTgXTIHjNYsdO5p kWSHQMBEPS9R4vJBYUG8yKWGR1nVT3MCfm8e0eebawazLiKBbTTRa9PHTdzC2w9F gVyMcJBSPtZTera4z+KoZVSBXU7Om0YS7TZdFAbocrMv06/l/F88mlbsy0b+uHRU k0WcyMmR+TbdJicsQ57jJ1xoNBpe12NDtoLjeCZLhC0Sd9bNS2LkxzthqQk33v/I 8SqzGoTyISyxALGZm07HI+e4GBTmGAgKjJEAEjcFRl5pFQivExJq59lg2Gp4vUo5 DD2ZN3uENERpPBrXFmXpDLwDYCBoZYUJCOfByr5zwBhy8/JjtKwXT0Bkcr6QQ+pT 8rraONl56ijBv4n6AjnjVM4ZScvoBEynAgYZnYAJ8tprix81+MQv8yx+iTKXQT5q AujV/p1p+lQ= =7VXc -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Fix for a cache aliasing issue by adding missing flush_dcache_folio(), which causes execution failures on some arm32 setups. Fix for large compressed fragments, which could be generated by -Eall-fragments option (but should be rare) and was rejected by mistake due to an on-disk hardening commit. The remaining ones are small fixes. Summary: - Address cache aliasing for mappable page cache folios - Allow readdir() to be interrupted - Fix large fragment handling which was errored out by mistake - Add missing tracepoints - Use memcpy_to_folio() to replace copy_to_iter() for inline data" * tag 'erofs-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix large fragment handling erofs: allow readdir() to be interrupted erofs: address D-cache aliasing erofs: use memcpy_to_folio() to replace copy_to_iter() erofs: fix to add missing tracepoint in erofs_read_folio() erofs: fix to add missing tracepoint in erofs_readahead()	2025-07-12 10:20:03 -07:00
Linus Torvalds	4412b8b23d	bcachefs fixes for 6.16-rc6 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmhxCtkACgkQE6szbY3K bnaiLA//bTSyfeCXu35Lk5DObSSsZdKS/EUyL68yaahdF5BNjwX17jS4pOb05glO lp0S/vCI6ut9XJ+emLfzxH+X3YOzSRFBzrRZwxC2RsMDojbbLb1O/4L2S1Auu6ep YtH0mbjxiOe/GukW+dJ9XL8/fKBRM6FLeW1qsnPV9WSn2oBgk6xcEQJMbdC1Jh5G oex7GU0e336rYMQh3HHxEP4McN+Nk7FWe+hctrcxAUrS2M8EH7q8CTEklypd51+H MAfoyvmMw52Ctk9wtinoQ7nRw7fNztYo3umqYKvhNkfzeTcGNJxyABbmh6H6ZQ0z R6WBf3dszM8A/fziT39zPt9kHQGmcQVLtxSXvXbh5rgcK8x1XXPGUZaqYcEgG4aL 804LZOw2p9vTgbxi1hL6qo24ieXjEZR94bdccE7Ju8/sI34H374jvWevsQlHncQJ y1jlCTIp3TXsqqVprAkoQcP4J/d7wQUyHY41hKFCn1Gk8kMrsfRZfAeI4E7Qc5iL q9KzCYWTIVMc63dGtYK1wmxc7kd2tdPQOA1JnfeZD/1I8PHQHE4VNIGn4XOjG90g tIy3UFjlgnUoGRt+LCyeI0XVtbgw7n0OpN/G7emgiSx7FxHkEjdncy9FPz8hfVZm 3u2MowY0QL0cAG70Iv2H9TdoPQW7Asb2TSku+NMnelCWs4YxKeo= =7KDN -----END PGP SIGNATURE----- Merge tag 'bcachefs-2025-07-11' of git://evilpiepirate.org/bcachefs Pull bcachefs fixes from Kent Overstreet. * tag 'bcachefs-2025-07-11' of git://evilpiepirate.org/bcachefs: bcachefs: Don't set BCH_FS_error on transaction restart bcachefs: Fix additional misalignment in journal space calculations bcachefs: Don't schedule non persistent passes persistently bcachefs: Fix bch2_btree_transactions_read() synchronization bcachefs: btree read retry fixes bcachefs: btree node scan no longer uses btree cache bcachefs: Tweak btree cache helpers for use by btree node scan bcachefs: Fix btree for nonexistent tree depth bcachefs: Fix bch2_io_failures_to_text() bcachefs: bch2_fpunch_snapshot()	2025-07-12 10:13:27 -07:00
Linus Torvalds	2632d81f5a	three ksmbd server fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmhxBE4ACgkQiiy9cAdy T1EwyAv/c0ruIh4ywIT84F2Mu73ks1td9ODlSUY7msxLNNhyMx8U0pUeGV0toi1b n2LxTunf/LmfmHedjGDXDn2YjPSvEPcAyKXM1nc2mT6QVuUoa9eST7Hmm/4EhPF2 RXrWSsYydLyHnwfXC4zvMGSNY1dSecm2l/hkNs4i7fkNkij5PsieEY2mQBxci4G3 9Y1KisQac8he8vLc+e/TFMQvhyA6ns20/JskgYCEnP00FwNOb7lXASy/cuRd149b TfsE4K4aDG2f2oXgh5O9zOmJCJRxrP/O8sJpSotXUoStt0HviJ5G0mujHEZwhTNh vINBa0yXo1VU429he5Z69Vq5rZHzKpFprvSc1rF0feLDal66QsJpSwU6p+o/oZ42 R0RiFnUbVz6VdDfr5q7kWk1nHfQwy7Zj62Aac167D6+H9ymB7in9EJ7ZjPQbP8Yy eVahqjdtL16ZIXn9T0o7gcqp3VlYjdzSHZl9D0ZHpbHAwlt8MN/C7lYshdQe121Y wnfpKu5z =/Xzo -----END PGP SIGNATURE----- Merge tag 'v6.16-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - fix use after free in lease break - small fix for freeing rdma transport (fixes missing logging of cm_qp_destroy) - fix write count leak * tag 'v6.16-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix potential use-after-free in oplock/lease break ack ksmbd: fix a mount write count leak in ksmbd_vfs_kern_path_locked() smb: server: make use of rdma_destroy_qp()	2025-07-12 10:06:06 -07:00
Linus Torvalds	5f02b80c21	Revert "eventpoll: Fix priority inversion problem" This reverts commit `8c44dac8ad`. I haven't figured out what the actual bug in this commit is, but I did spend a lot of time chasing it down and eventually succeeded in bisecting it down to this. For some reason, this eventpoll commit ends up causing delays and stuck user space processes, but it only happens on one of my machines, and only during early boot or during the flurry of initial activity when logging in. I must be triggering some very subtle timing issue, but once I figured out the behavior pattern that made it reasonably reliable to trigger, it did bisect right to this, and reverting the commit fixes the problem. Of course, that was only after I had failed at bisecting it several times, and had flailed around blaming both the drm people and the netlink people for the odd problems. The most obvious of which happened at the time of the first graphical login (the most common symptom being that some gnome app aborted due to a 30s timeout, often leading to the whole session then failing if it was some critical component like gnome-shell or similar). Acked-by: Nam Cao <namcao@linutronix.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-07-11 17:10:32 -07:00
Gao Xiang	b44686c839	erofs: fix large fragment handling Fragments aren't limited by Z_EROFS_PCLUSTER_MAX_DSIZE. However, if a fragment's logical length is larger than Z_EROFS_PCLUSTER_MAX_DSIZE but the fragment is not the whole inode, it currently returns -EOPNOTSUPP because m_flags has the wrong EROFS_MAP_ENCODED flag set. It is not intended by design but should be rare, as it can only be reproduced by mkfs with `-Eall-fragments` in a specific case. Let's normalize fragment m_flags using the new EROFS_MAP_FRAGMENT. Reported-by: Axel Fontaine <axel@axelfontaine.com> Closes: https://github.com/erofs/erofs-utils/issues/23 Fixes: `7c3ca1838a` ("erofs: restrict pcluster size limitations") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250711195826.3601157-1-hsiangkao@linux.alibaba.com	2025-07-12 04:02:44 +08:00
Jakub Kicinski	3321e97eab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc6). No conflicts. Adjacent changes: Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml `0a12c435a1` ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible") `b3603c0466` ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 10:10:49 -07:00
Chao Yu	d31fbdc4c7	erofs: allow readdir() to be interrupted In a quick slow device, readdir() may loop for long time in large directory, let's give a chance to allow it to be interrupted by userspace. Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250710073619.4083422-1-chao@kernel.org [ Gao Xiang: move cond_resched() to the end of the while loop. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-07-10 17:08:27 +08:00
Gao Xiang	27917e8194	erofs: address D-cache aliasing Flush the D-cache before unlocking folios for compressed inodes, as they are dirtied during decompression. Avoid calling flush_dcache_folio() on every CPU write, since it's more like playing whack-a-mole without real benefit. It has no impact on x86 and arm64/risc-v: on x86, flush_dcache_folio() is a no-op, and on arm64/risc-v, PG_dcache_clean (PG_arch_1) is clear for new page cache folios. However, certain ARM boards are affected, as reported. Fixes: `3883a79abd` ("staging: erofs: introduce VLE decompression support") Closes: https://lore.kernel.org/r/c1e51e16-6cc6-49d0-a63e-4e9ff6c4dd53@pengutronix.de Closes: https://lore.kernel.org/r/38d43fae-1182-4155-9c5b-ffc7382d9917@siemens.com Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Tested-by: Stefan Kerkmann <s.kerkmann@pengutronix.de> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250709034614.2780117-2-hsiangkao@linux.alibaba.com	2025-07-10 17:08:27 +08:00
Gao Xiang	f5443d0d1a	erofs: use memcpy_to_folio() to replace copy_to_iter() Using copy_to_iter() here is overkill and even messy. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250709034614.2780117-1-hsiangkao@linux.alibaba.com	2025-07-10 17:08:26 +08:00
Chao Yu	99f7619a77	erofs: fix to add missing tracepoint in erofs_read_folio() Commit `771c994ea5` ("erofs: convert all uncompressed cases to iomap") converts to use iomap interface, it removed trace_erofs_readpage() tracepoint in the meantime, let's add it back. Fixes: `771c994ea5` ("erofs: convert all uncompressed cases to iomap") Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250708111942.3120926-1-chao@kernel.org Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-07-10 17:08:26 +08:00
Chao Yu	d53238b614	erofs: fix to add missing tracepoint in erofs_readahead() Commit `771c994ea5` ("erofs: convert all uncompressed cases to iomap") converts to use iomap interface, it removed trace_erofs_readahead() tracepoint in the meantime, let's add it back. Fixes: `771c994ea5` ("erofs: convert all uncompressed cases to iomap") Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250707084832.2725677-1-chao@kernel.org Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-07-10 17:08:25 +08:00
Baolin Wang	82241a83cd	mm: fix the inaccurate memory statistics issue for users On some large machines with a high number of CPUs running a 64K pagesize kernel, we found that the 'RES' field is always 0 displayed by the top command for some processes, which will cause a lot of confusion for users. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 875525 root 20 0 12480 0 0 R 0.3 0.0 0:00.08 top 1 root 20 0 172800 0 0 S 0.0 0.0 0:04.52 systemd The main reason is that the batch size of the percpu counter is quite large on these machines, caching a significant percpu value, since converting mm's rss stats into percpu_counter by commit `f1a7941243` ("mm: convert mm's rss stats into percpu_counter"). Intuitively, the batch number should be optimized, but on some paths, performance may take precedence over statistical accuracy. Therefore, introducing a new interface to add the percpu statistical count and display it to users, which can remove the confusion. In addition, this change is not expected to be on a performance-critical path, so the modification should be acceptable. In addition, the 'mm->rss_stat' is updated by using add_mm_counter() and dec/inc_mm_counter(), which are all wrappers around percpu_counter_add_batch(). In percpu_counter_add_batch(), there is percpu batch caching to avoid 'fbc->lock' contention. This patch changes task_mem() and task_statm() to get the accurate mm counters under the 'fbc->lock', but this should not exacerbate kernel 'mm->rss_stat' lock contention due to the percpu batch caching of the mm counters. The following test also confirm the theoretical analysis. I run the stress-ng that stresses anon page faults in 32 threads on my 32 cores machine, while simultaneously running a script that starts 32 threads to busy-loop pread each stress-ng thread's /proc/pid/status interface. From the following data, I did not observe any obvious impact of this patch on the stress-ng tests. w/o patch: stress-ng: info: [6848] 4,399,219,085,152 CPU Cycles 67.327 B/sec stress-ng: info: [6848] 1,616,524,844,832 Instructions 24.740 B/sec (0.367 instr. per cycle) stress-ng: info: [6848] 39,529,792 Page Faults Total 0.605 M/sec stress-ng: info: [6848] 39,529,792 Page Faults Minor 0.605 M/sec w/patch: stress-ng: info: [2485] 4,462,440,381,856 CPU Cycles 68.382 B/sec stress-ng: info: [2485] 1,615,101,503,296 Instructions 24.750 B/sec (0.362 instr. per cycle) stress-ng: info: [2485] 39,439,232 Page Faults Total 0.604 M/sec stress-ng: info: [2485] 39,439,232 Page Faults Minor 0.604 M/sec On comparing a very simple app which just allocates & touches some memory against v6.1 (which doesn't have `f1a7941243`) and latest Linus tree (`4c06e63b92`) I can see that on latest Linus tree the values for VmRSS, RssAnon and RssFile from /proc/self/status are all zeroes while they do report values on v6.1 and a Linus tree with this patch. Link: https://lkml.kernel.org/r/f4586b17f66f97c174f7fd1f8647374fdb53de1c.1749119050.git.baolin.wang@linux.alibaba.com Fixes: `f1a7941243` ("mm: convert mm's rss stats into percpu_counter") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by Donet Tom <donettom@linux.ibm.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-09 21:07:55 -07:00
Linus Torvalds	8c2e52ebbe	eventpoll: don't decrement ep refcount while still holding the ep mutex Jann Horn points out that epoll is decrementing the ep refcount and then doing a mutex_unlock(&ep->mtx); afterwards. That's very wrong, because it can lead to a use-after-free. That pattern is actually fine for the very last reference, because the code in question will delay the actual call to "ep_free(ep)" until after it has unlocked the mutex. But it's wrong for the much subtler "next to last" case when somebody else may also be dropping their reference and free the ep while we're still using the mutex. Note that this is true even if that other user is also using the same ep mutex: mutexes, unlike spinlocks, can not be used for object ownership, even if they guarantee mutual exclusion. A mutex "unlock" operation is not atomic, and as one user is still accessing the mutex as part of unlocking it, another user can come in and get the now released mutex and free the data structure while the first user is still cleaning up. See our mutex documentation in Documentation/locking/mutex-design.rst, in particular the section [1] about semantics: "mutex_unlock() may access the mutex structure even after it has internally released the lock already - so it's not safe for another context to acquire the mutex and assume that the mutex_unlock() context is not using the structure anymore" So if we drop our ep ref before the mutex unlock, but we weren't the last one, we may then unlock the mutex, another user comes in, drops _their_ reference and releases the 'ep' as it now has no users - all while the mutex_unlock() is still accessing it. Fix this by simply moving the ep refcount dropping to outside the mutex: the refcount itself is atomic, and doesn't need mutex protection (that's the whole _point_ of refcounts: unlike mutexes, they are inherently about object lifetimes). Reported-by: Jann Horn <jannh@google.com> Link: https://docs.kernel.org/locking/mutex-design.html#semantics [1] Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-07-09 10:38:29 -07:00
Kent Overstreet	fec5e6f97d	bcachefs: Don't set BCH_FS_error on transaction restart This started showing up more when we started logging the error being corrected in the journal - but __bch2_fsck_err() could return transaction restarts before that. Setting BCH_FS_error incorrectly causes recovery passes to not be cleared, among other issues. Fixes: `b43f724927` ("bcachefs: Log fsck errors in the journal") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-08 15:24:15 -04:00
Namjae Jeon	50f930db22	ksmbd: fix potential use-after-free in oplock/lease break ack If ksmbd_iov_pin_rsp return error, use-after-free can happen by accessing opinfo->state and opinfo_put and ksmbd_fd_put could called twice. Reported-by: Ziyan Xu <research@securitygossip.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-07-08 11:25:44 -05:00
Al Viro	277627b431	ksmbd: fix a mount write count leak in ksmbd_vfs_kern_path_locked() If the call of ksmbd_vfs_lock_parent() fails, we drop the parent_path references and return an error. We need to drop the write access we just got on parent_path->mnt before we drop the mount reference - callers assume that ksmbd_vfs_kern_path_locked() returns with mount write access grabbed if and only if it has returned 0. Fixes: `864fb5d371` ("ksmbd: fix possible deadlock in smb2_open") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-07-08 11:25:44 -05:00
Stefan Metzmacher	0c2b53997e	smb: server: make use of rdma_destroy_qp() The qp is created by rdma_create_qp() as t->cm_id->qp and t->qp is just a shortcut. rdma_destroy_qp() also calls ib_destroy_qp(cm_id->qp) internally, but it is protected by a mutex, clears the cm_id and also calls trace_cm_qp_destroy(). This should make the tracing more useful as both rdma_create_qp() and rdma_destroy_qp() are traces and it makes the code look more sane as functions from the same layer are used for the specific qp object. trace-cmd stream -e rdma_cma:cm_qp_create -e rdma_cma:cm_qp_destroy shows this now while doing a mount and unmount from a client: <...>-80 [002] 378.514182: cm_qp_create: cm.id=1 src=172.31.9.167:5445 dst=172.31.9.166:37113 tos=0 pd.id=0 qp_type=RC send_wr=867 recv_wr=255 qp_num=1 rc=0 <...>-6283 [001] 381.686172: cm_qp_destroy: cm.id=1 src=172.31.9.167:5445 dst=172.31.9.166:37113 tos=0 qp_num=1 Before we only saw the first line. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <stfrench@microsoft.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Fixes: `0626e6641f` ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Tom Talpey <tom@talpey.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-07-08 11:25:43 -05:00
Kent Overstreet	74f3931a1b	bcachefs: Fix additional misalignment in journal space calculations Additional fix on top of `f54b2a80d0` bcachefs: Fix misaligned bucket check in journal space calculations Make sure that when we calculate space for the next entry it's not misaligned: we need to round_down() to filesystem block size in multiple places (next entry size calculation as well as total space available). Reported-by: Ondřej Kraus <neverberlerfellerer@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-07 18:19:30 -04:00
Kent Overstreet	7de3c8b407	bcachefs: Don't schedule non persistent passes persistently if (!(in_recovery && (flags & RUN_RECOVERY_PASS_nopersistent))) should have been if (!in_recovery && !(flags & RUN_RECOVERY_PASS_nopersistent))) But the !in_recovery part was also wrong: the assumption is that if we're in recovery we'll just rewind and run the recovery pass immediately, but we're not able to do so if we've already gone RW and the pass must be run before we go RW. In that case, we need to schedule it in the superblock so it can be run on the next mount attempt. Scheduling it persistently is fine, because it'll be cleared in the superblock immediately when the pass completes successfully. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-07 14:10:47 -04:00
Linus Torvalds	bab5cac627	fix for the breakage spotted by Neil in the interplay between /proc/sys ->d_compare() weirdness and parallel lookups Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaGc7XwAKCRBZ7Krx/gZQ 6+vZAQDRVSFm5thegwyUQfUawE/Ocl/4lqJyumiHfjy36wESKgEA9UH2Vug83YK8 pTvs2qRy+2uSX3G+9DqA2iATnIrCtAk= =ali5 -----END PGP SIGNATURE----- Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull /proc/sys dcache lookup fix from Al Viro: "Fix for the breakage spotted by Neil in the interplay between /proc/sys ->d_compare() weirdness and parallel lookups" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix proc_sys_compare() handling of in-lookup dentries	2025-07-06 13:10:39 -07:00
Linus Torvalds	05df91921d	five smb3 client fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmhocdcACgkQiiy9cAdy T1FIdgwAp96u6cP9gpZcNSkJd5nkLfFlR5q6vsZkoOS99lI/OcBJyp2Jn1cL1se3 oRH8aLIku++ma5wjhbwCTDelPC2SwZr8RL09KA2NwcroNpR5nHBmRrSWSY5EYH+r HIQXZaMruzUECv9G/hABxh0hHMmMAU2QoE5z+OfJ5aKGhVSxy6XeAcX7vGfbfrOW kyK7vZsm/dfg6/G0dnxMHr2Yu8qQcCyJAogHAf6w9aJ6Jb2/fYYHL6jgWIar6PEy 6QhJvADco23ppF86omX6BBpwhlQLuKVADXPkhdypG8fMWwV/IEqyioy0pyHYdF0w nDaSJ0YfEAvnJKX8AEOQCbV4sN/vb8KwTrFVggav3Ref4YVWCpXt9qnlyq09HQd7 LXQdqxhrkbX8XRTeKUIshF4CNXmu2QBWRUfCKMyQAi8YzqGF7O51a+/Rz1ZQ3hiK 8s6gKnRGXz3Cn63MNoxfCoY6UJuvHb/7aXpjWCPmoRLLJmVqd+hM4j7Np4lHQ9tE WSdyHQeY =rrCJ -----END PGP SIGNATURE----- Merge tag 'v6.16-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Two reconnect fixes including one for a reboot/reconnect race - Fix for incorrect file type that can be returned by SMB3.1.1 POSIX extensions - tcon initialization fix - Fix for resolving Windows symlinks with absolute paths * tag 'v6.16-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix native SMB symlink traversal smb: client: fix race condition in negotiate timeout by using more precise timing cifs: all initializations for tcon should happen in tcon_info_alloc smb: client: fix warning when reconnecting channel smb: client: fix readdir returning wrong type with POSIX extensions	2025-07-05 13:05:28 -07:00
Kent Overstreet	63a83463d2	bcachefs: Fix bch2_btree_transactions_read() synchronization Since we're accessing btree_trans objects owned by another thread, we need to guard against using pointers to freed key cache entries: we need our own srcu read lock, and we should skip a btree_trans if it didn't hold the srcu lock (and thus it might have pointers to freed key cache entries). 00693 Mem abort info: 00693 ESR = 0x0000000096000005 00693 EC = 0x25: DABT (current EL), IL = 32 bits 00693 SET = 0, FnV = 0 00693 EA = 0, S1PTW = 0 00693 FSC = 0x05: level 1 translation fault 00693 Data abort info: 00693 ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 00693 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 00693 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 00693 user pgtable: 4k pages, 39-bit VAs, pgdp=000000012e650000 00693 [000000008fb96218] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 00693 Internal error: Oops: 0000000096000005 [#1] SMP 00693 Modules linked in: 00693 CPU: 0 UID: 0 PID: 4307 Comm: cat Not tainted 6.16.0-rc2-ktest-g9e15af94fd86 #27578 NONE 00693 Hardware name: linux,dummy-virt (DT) 00693 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) 00693 pc : six_lock_counts+0x20/0xe8 00693 lr : bch2_btree_bkey_cached_common_to_text+0x38/0x130 00693 sp : ffffff80ca98bb60 00693 x29: ffffff80ca98bb60 x28: 000000008fb96200 x27: 0000000000000007 00693 x26: ffffff80eafd06b8 x25: 0000000000000000 x24: ffffffc080d75a60 00693 x23: ffffff80eafd0000 x22: ffffffc080bdfcc0 x21: ffffff80eafd0210 00693 x20: ffffff80c192ff08 x19: 000000008fb96200 x18: 00000000ffffffff 00693 x17: 0000000000000000 x16: 0000000000000000 x15: 00000000ffffffff 00693 x14: 0000000000000000 x13: ffffff80ceb5a29a x12: 20796220646c6568 00693 x11: 72205d3e303c5b20 x10: 0000000000000020 x9 : ffffffc0805fb6b0 00693 x8 : 0000000000000020 x7 : 0000000000000000 x6 : 0000000000000020 00693 x5 : ffffff80ceb5a29c x4 : 0000000000000001 x3 : 000000000000029c 00693 x2 : 0000000000000000 x1 : ffffff80ef66c000 x0 : 000000008fb96200 00693 Call trace: 00693 six_lock_counts+0x20/0xe8 (P) 00693 bch2_btree_bkey_cached_common_to_text+0x38/0x130 00693 bch2_btree_trans_to_text+0x260/0x2a8 00693 bch2_btree_transactions_read+0xac/0x1e8 00693 full_proxy_read+0x74/0xd8 00693 vfs_read+0x90/0x300 00693 ksys_read+0x6c/0x108 00693 __arm64_sys_read+0x20/0x30 00693 invoke_syscall.constprop.0+0x54/0xe8 00693 do_el0_svc+0x44/0xc8 00693 el0_svc+0x18/0x58 00693 el0t_64_sync_handler+0x104/0x130 00693 el0t_64_sync+0x154/0x158 00693 Code: 910003fd f9423c22 f90017e2 d2800002 (f9400c01) 00693 ---[ end trace 0000000000000000 ]--- Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-05 12:42:41 -04:00
Kent Overstreet	14dd95647e	bcachefs: btree read retry fixes Fix btree node read retries after validate errors: __btree_err() is the wrong place to flag a topology error: that is done by btree_lost_data(). Additionally, some calls to bch2_bkey_pick_read_device() were not updated in the 6.16 rework for improved log messages; we were failing to signal that we still had a retry. Cc: Nikita Ofitserov <himikof@gmail.com> Cc: Alan Huang <mmpgouride@gmail.com> Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-05 12:42:41 -04:00
Kent Overstreet	a77ffbe34d	bcachefs: btree node scan no longer uses btree cache Previously, btree node scan used the btree node cache to check if btree nodes were readable, but this is subject to interference from threads scanning different devices trying to read the same node - and more critically, nodes that we already attempted and failed to read before kicking off scan. Instead, we now allocate a 'struct btree' that does not live in the btree node cache, and call bch2_btree_node_read_done() directly. Cc: Nikita Ofitserov <himikof@gmail.com> Reviewed-by: Nikita Ofitserov <himikof@gmail.com> Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-05 12:42:41 -04:00
Kent Overstreet	c2b2c7d1da	bcachefs: Tweak btree cache helpers for use by btree node scan btree node scan needs to not use the btree node cache: that causes interference from prior failed reads and parallel workers. Instead we need to allocate btree nodes that don't live in the btree cache, so that we can call bch2_btree_node_read_done() directly. This patch tweaks the low level helpers so they don't touch the btree cache lists. Cc: Nikita Ofitserov <himikof@gmail.com> Reviewed-by: Nikita Ofitserov <himikof@gmail.com> Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-04 23:17:07 -04:00
Kent Overstreet	c72d628469	bcachefs: Fix btree for nonexistent tree depth The fix for when we should increase tree depth in journal replay was entirely bogus. We should only increase the tree depth in journal replay when recovery from btree node scan, and then only for keys found by btree node scan. This needs additional work - we should be shooting down existing interior node pointers when recovery from scan, they shouldn't be showing up here. Fixes: `b47a82ff47` ("bcachefs: Only run 'increase_depth' for keys from btree node csan") Cc: Alan Huang <mmpgouride@gmail.com> Reported-by: syzbot+8deb6ff4415db67a9f18@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-04 15:47:13 -04:00
Kent Overstreet	ddb9680a72	bcachefs: Fix bch2_io_failures_to_text() This wasn't updated when we added tracking for btree validate errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-04 15:47:13 -04:00
Kent Overstreet	63d6e93119	bcachefs: bch2_fpunch_snapshot() Add a new version of fpunch for operating on a snapshot ID, not a subvolume - and use it for "extent past end of inode" repair. Previously, repair would try to delete everything at once, but deleting too many extents at once can overflow the btree_trans bump allocator, as well as causing other problems - the new helper properly uses bch2_extent_trim_atomic(). Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-04 15:45:22 -04:00
Linus Torvalds	482deed9df	bcachefs fixes for 6.16-rc5 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmhnIIEACgkQE6szbY3K bnbi5g/9G3EMNL9LtU6tNPoLQdqINuKYtPtR/Wv3ZkPbLDlTdEilJSRelRpzYjaR aCxL3VgtIONX2uxOLOl3ODA9T4vnDBYCYdyPPeGzdMXA0YVRKZCmmg5REmFjhToK c0s5y6lTA6hXdWX0+DVvUODdnFgtVMeXgErzDqTxMZv3h/f1E5feuMaOZilJtlBl JElM5NsKsZSZCyDnq8pIowpvPA7WhH4HQeFLaK7HznFl7BFEUUt6ohhsieAiClMY 1gfUcV/FwXRL6a7KbqKrdE8dtO6nB3mezx/TTHH5tbzvuoqbq375NNwvq0L4Vr2G DaSEU73he5Q0xvVFMj2DCyqUKe6cwccIgs+CFpM9FBrl4SUdVyq4/dN9GYrdyI5L ufK7Jd+f8Ekjl8WcAcS3LPp9pI8KwmT6fTsoZqZVvi+bFPeIVBb/YVP9Rm12iS2m ia+jj3xsPfYwMzsI0Rj/gxb+KnggnKOnDMKhgw4Yz5H0M9i8Rls6VAc62ZZx9xmz oyXdGuJN8wk8uXyr4yjux7i0hacFNSkBHcfnkVNu90rlJ8qh07O2EoLncQdv7vG6 YpjwGR9XD0YIF8RfKlufkHhzVC6R7DUx0W7UCrEYhduRu+hEGhLrzL9vxlNVceWK 5SRDB7KwFotoECStDlWAQY3g8nTWRH1d2t8qBJDOSGG7SauCKro= =b+3P -----END PGP SIGNATURE----- Merge tag 'bcachefs-2025-07-03' of git://evilpiepirate.org/bcachefs Pull bcachefs fixes from Kent Overstreet: "The 'opts.casefold_disabled' patch is non critical, but would be a 6.15 backport; it's to address the casefolding + overlayfs incompatibility that was discovvered late. It's late because I was hoping that this would be addressed on the overlayfs side (and will be in 6.17), but user reports keep coming in on this one (lots of people are using docker these days)" * tag 'bcachefs-2025-07-03' of git://evilpiepirate.org/bcachefs: bcachefs: opts.casefold_disabled bcachefs: Work around deadlock to btree node rewrites in journal replay bcachefs: Fix incorrect transaction restart handling bcachefs: fix btree_trans_peek_prev_journal() bcachefs: mark invalid_btree_id autofix	2025-07-04 09:29:22 -07:00
Linus Torvalds	2eb7f03acf	vfs-6.16-rc5.fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaGeHBAAKCRCRxhvAZXjc omJNAQCnHIDuiscCUFeevb5sMNqws6td2kexX8reLxbdzzTrFgEAwAKxy5BVhNlg NusCZ2taYmenAK+HjI3JEw6c/3IKqwE= =NxGx -----END PGP SIGNATURE----- Merge tag 'vfs-6.16-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a regression caused by the anonymous inode rework. Making them regular files causes various places in the kernel to tip over starting with io_uring. Revert to the former status quo and port our assertion to be based on checking the inode so we don't lose the valuable VFS__ON_() assertions that have already helped discover weird behavior our outright bugs. - Fix the the upper bound calculation in fuse_fill_write_pages() - Fix priority inversion issues in the eventpoll code - Make secretmen use anon_inode_make_secure_inode() to avoid bypassing the LSM layer - Fix a netfs hang due to missing case in final DIO read result collection - Fix a double put of the netfs_io_request struct - Provide some helpers to abstract out NETFS_RREQ_IN_PROGRESS flag wrangling - Fix infinite looping in netfs_wait_for_pause/request() - Fix a netfs ref leak on an extra subrequest inserted into a request's list of subreqs - Fix various cifs RPC callbacks to set NETFS_SREQ_NEED_RETRY if a subrequest fails retriably - Fix a cifs warning in the workqueue code when reconnecting a channel - Fix the updating of i_size in netfs to avoid a race between testing if we should have extended the file with a DIO write and changing i_size - Merge the places in netfs that update i_size on write - Fix coredump socket selftests * tag 'vfs-6.16-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: anon_inode: rework assertions netfs: Update tracepoints in a number of ways netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read netfs: Merge i_size update functions netfs: Fix i_size updating smb: client: set missing retry flag in cifs_writev_callback() smb: client: set missing retry flag in cifs_readv_callback() smb: client: set missing retry flag in smb2_writev_callback() netfs: Fix ref leak on inserted extra subreq in write retry netfs: Fix looping in wait functions netfs: Provide helpers to perform NETFS_RREQ_IN_PROGRESS flag wangling netfs: Fix double put of request netfs: Fix hang due to missing case in final DIO read result collection eventpoll: Fix priority inversion problem fuse: fix fuse_fill_write_pages() upper bound calculation fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypass selftests/coredump: Fix "socket_detect_userspace_client" test failure	2025-07-04 09:06:49 -07:00
Paolo Abeni	6b9fd8857b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc5). No conflicts. No adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-04 08:03:18 +02:00
Al Viro	b969f96148	fix proc_sys_compare() handling of in-lookup dentries There's one case where ->d_compare() can be called for an in-lookup dentry; usually that's nothing special from ->d_compare() point of view, but... proc_sys_compare() is weird. The thing is, /proc/sys subdirectories can look differently for different processes. Up to and including having the same name resolve to different dentries - all of them hashed. The way it's done is ->d_compare() refusing to admit a match unless this dentry is supposed to be visible to this caller. The information needed to discriminate between them is stored in inode; it is set during proc_sys_lookup() and until it's done d_splice_alias() we really can't tell who should that dentry be visible for. Normally there's no negative dentries in /proc/sys; we can run into a dying dentry in RCU dcache lookup, but those can be safely rejected. However, ->d_compare() is also called for in-lookup dentries, before they get positive - or hashed, for that matter. In case of match we will wait until dentry leaves in-lookup state and repeat ->d_compare() afterwards. In other words, the right behaviour is to treat the name match as sufficient for in-lookup dentries; if dentry is not for us, we'll see that when we recheck once proc_sys_lookup() is done with it. While we are at it, fix the misspelled READ_ONCE and WRITE_ONCE there. Fixes: `d9171b9345` ("parallel lookups machinery, part 4 (and last)") Reported-by: NeilBrown <neilb@brown.name> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-03 20:59:09 -04:00
Paulo Alcantara	3363da82e0	smb: client: fix native SMB symlink traversal We've seen customers having shares mounted in paths like /??/C:/ or /??/UNC/foo.example.com/share in order to get their native SMB symlinks successfully followed from different mounts. After commit `12b466eb52` ("cifs: Fix creating and resolving absolute NT-style symlinks"), the client would then convert absolute paths from "/??/C:/" to "/mnt/c/" by default. The absolute paths would vary depending on the value of symlinkroot= mount option. Fix this by restoring old behavior of not trying to convert absolute paths by default. Only do this if symlinkroot= was _explicitly_ set. Before patch: $ mount.cifs //w22-fs0/test2 /mnt/1 -o vers=3.1.1,username=xxx,password=yyy $ ls -l /mnt/1/symlink2 lrwxr-xr-x 1 root root 15 Jun 20 14:22 /mnt/1/symlink2 -> /mnt/c/testfile $ mkdir -p /??/C:; echo foo > //??/C:/testfile $ cat /mnt/1/symlink2 cat: /mnt/1/symlink2: No such file or directory After patch: $ mount.cifs //w22-fs0/test2 /mnt/1 -o vers=3.1.1,username=xxx,password=yyy $ ls -l /mnt/1/symlink2 lrwxr-xr-x 1 root root 15 Jun 20 14:22 /mnt/1/symlink2 -> '/??/C:/testfile' $ mkdir -p /??/C:; echo foo > //??/C:/testfile $ cat /mnt/1/symlink2 foo Cc: linux-cifs@vger.kernel.org Reported-by: Pierguido Lambri <plambri@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Stefan Metzmacher <metze@samba.org> Fixes: `12b466eb52` ("cifs: Fix creating and resolving absolute NT-style symlinks") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-07-03 18:43:04 -05:00
Wang Zhaolong	266b5d02e1	smb: client: fix race condition in negotiate timeout by using more precise timing When the SMB server reboots and the client immediately accesses the mount point, a race condition can occur that causes operations to fail with "Host is down" error. Reproduction steps: # Mount SMB share mount -t cifs //192.168.245.109/TEST /mnt/ -o xxxx ls /mnt # Reboot server ssh root@192.168.245.109 reboot ssh root@192.168.245.109 /path/to/cifs_server_setup.sh ssh root@192.168.245.109 systemctl stop firewalld # Immediate access fails ls /mnt ls: cannot access '/mnt': Host is down # But works if there is a delay The issue is caused by a race condition between negotiate and reconnect. The 20-second negotiate timeout mechanism can interfere with the normal recovery process when both are triggered simultaneously. ls cifsd --------------------------------------------------- cifs_getattr cifs_revalidate_dentry cifs_get_inode_info cifs_get_fattr smb2_query_path_info smb2_compound_op SMB2_open_init smb2_reconnect cifs_negotiate_protocol smb2_negotiate cifs_send_recv smb_send_rqst wait_for_response cifs_demultiplex_thread cifs_read_from_socket cifs_readv_from_socket server_unresponsive cifs_reconnect __cifs_reconnect cifs_abort_connection mid->mid_state = MID_RETRY_NEEDED cifs_wake_up_task cifs_sync_mid_result // case MID_RETRY_NEEDED rc = -EAGAIN; // In smb2_negotiate() rc = -EHOSTDOWN; The server_unresponsive() timeout triggers cifs_reconnect(), which aborts ongoing mid requests and causes the ls command to receive -EAGAIN, leading to -EHOSTDOWN. Fix this by introducing a dedicated `neg_start` field to precisely tracks when the negotiate process begins. The timeout check now uses this accurate timestamp instead of `lstrp`, ensuring that: 1. Timeout is only triggered after negotiate has actually run for 20s 2. The mechanism doesn't interfere with concurrent recovery processes 3. Uninitialized timestamps (value 0) don't trigger false timeouts Fixes: `7ccc146546` ("smb: client: fix hang in wait_for_response() for negproto") Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-07-03 18:41:49 -05:00
Linus Torvalds	4c06e63b92	for-6.16-rc4-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmhmwFQACgkQxWXV+ddt WDsVMA/+NuSth71V0AfiDnFyqjgDMqIlZL2+dqBiTYHXQQHKbqiUlKvYkWICCT6T 1YgDV+95XJYy4TDBoA49Ndd/l+CiDcMLbOYeneIfbJy13ts84jVANPkl4n03gPkF ktibCw15h0MENVctTCPc71dX2X0cV9WPf4iDmoxUZiukDA376akGTArZKwH4tVVg 4qVpzUtDdNOf848D+8DZKGd+ot/RWgEdLkFCZES27BMg/OFemxBK1MU6K8VjxiKF VoaSVJRDXuug8oVBAGNl86XpiSgd4gHyoNNA5b4mhdSWMSBMxUAaILsONT9pNQZA CFyHA1Jp2gLOIzQIzeXwWgXaAOQDtco8YWYaXhf0v0mySs89tweXjOibfj2mU9pS wPaJyeD+nyRDMwPa4VWEws64D3vXX6aKwiThUENuDmxBvrRXjrkGYH9tf0LNzDDe OKv/vOCfeyutxbjKhP+qElMhdh73BZnJ4UCxxYRRDq2v1Mg+k06swl+6uL6xenme a2KLJlwEoG6LAlkpZzV66ZEaIHDyGBZNdVYtuA/G3dDtmlt0aLXDdp1eq7NivS1j aV7cd0JMX89lAUtqKT932ZOw8RoDrUPPjsnXzCaZJ69mMVyEkxyCV+iYHTTJPDga W5Vg8Tq3d1gwxMebZHvyI6wwUhmGA0wUFG2eohYY/tcSrrUlrHQ= =Ke0p -----END PGP SIGNATURE----- Merge tag 'for-6.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - tree-log fixes: - fixes of log tracking of directories and subvolumes - fix iteration and error handling of inode references during log replay - fix free space tree rebuild (reported by syzbot) * tag 'for-6.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: use btrfs_record_snapshot_destroy() during rmdir btrfs: propagate last_unlink_trans earlier when doing a rmdir btrfs: record new subvolume in parent dir earlier to avoid dir logging races btrfs: fix inode lookup error handling during log replay btrfs: fix iteration of extrefs during log replay btrfs: fix missing error handling when searching for inode refs during log replay btrfs: fix failure to rebuild free space tree using multiple transactions	2025-07-03 13:29:56 -07:00
Linus Torvalds	d32e907d15	xfs: Fixes for 6.16-rc5 Signed-off-by: Carlos Maiolino <cem@kernel.org> -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQSmtYVZ/MfVMGUq1GNcsMJ8RxYuYwUCaGKExAAKCRBcsMJ8RxYu Y4fTAXoCMZGmJKwTbcBk/9u2nD1ehULBDBQB+jDEjxQUile2fMvSMndxqAw0Dgt5 RAg055kBfiwXnK92j2dgayVabNDY3HAxcmGe4B3OBC58/7rNINtgdujfj/gtHZLG M0Cko5OICA== =QrM0 -----END PGP SIGNATURE----- Merge tag 'xfs-fixes-6.16-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Carlos Maiolino: - Fix umount hang with unflushable inodes (and add new tracepoint used for debugging this) - Fix ABBA deadlock in xfs_reclaim_inode() vs xfs_ifree_cluster() - Fix dquot buffer pin deadlock * tag 'xfs-fixes-6.16-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: add FALLOC_FL_ALLOCATE_RANGE to supported flags mask xfs: fix unmount hang with unflushable inodes stuck in the AIL xfs: factor out stale buffer item completion xfs: rearrange code in xfs_buf_item.c xfs: add tracepoints for stale pinned inode state debug xfs: avoid dquot buffer pin deadlock xfs: catch stale AGF/AGF metadata xfs: xfs_ifree_cluster vs xfs_iflush_shutdown_abort deadlock xfs: actually use the xfs_growfs_check_rtgeom tracepoint xfs: Improve error handling in xfs_mru_cache_create() xfs: move xfs_submit_zoned_bio a bit xfs: use xfs_readonly_buftarg in xfs_remount_rw xfs: remove NULL pointer checks in xfs_mru_cache_insert xfs: check for shutdown before going to sleep in xfs_select_zone	2025-07-03 09:00:04 -07:00
Carolina Jubran	42401c4238	netlink: introduce type-checking attribute iteration for nlmsg Add the nlmsg_for_each_attr_type() macro to simplify iteration over attributes of a specific type in a Netlink message. Convert existing users in vxlan and nfsd to use the new macro. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250629142138.361537-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 15:39:04 -07:00
Christian Brauner	1e7ab6f678	anon_inode: rework assertions Making anonymous inodes regular files comes with a lot of risk and regression potential as evidenced by a recent hickup in io_uring. We're better of continuing to not have them be regular files. Since we have S_ANON_INODE we can port all of our assertions easily. Link: https://lore.kernel.org/20250702-work-fixes-v1-1-ff76ea589e33@kernel.org Fixes: `cfd86ef7e8` ("anon_inode: use a proper mode internally") Acked-by: Jens Axboe <axboe@kernel.dk> Cc: stable@kernel.org Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-02 14:41:39 +02:00
Kent Overstreet	94426e4201	bcachefs: opts.casefold_disabled Add an option for completely disabling casefolding on a filesystem, as a workaround for overlayfs. This should only be needed as a temporary workaround, until the overlayfs fix arrives. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-01 19:33:46 -04:00
Kent Overstreet	c6e8d51b37	bcachefs: Work around deadlock to btree node rewrites in journal replay Don't mark btree nodes for rewrites, if they are or would be degraded, if journal replay hasn't finished, to avoid a deadlock. This is because btree node rewrites generate more updates for the interior updates (alloc, backpointers), and if those updates touch new nodes and generate more rewrites - we can only have so many interior btree updates in flight before we deadlock on open_buckets. The biggest cause is that we don't use the btree write buffer (for the backpointer updates - this needs some real thought on locking in order to fix. The problem with this workaround (not doing the rewrite for degraded nodes in journal replay) is that those degraded nodes persist, and we don't want that (this is a real bug when a btree node write completes with fewer replicas than we wanted and leaves a degraded node due to device _removal_, i.e. the device went away mid write). It's less of a bug here, but still a problem because we don't yet have a way of tracking degraded data - we another index (all extents/btree nodes, by replicas entry) in order to fix properly (re-replicate degraded data at the earliest possible time). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2025-07-01 19:33:46 -04:00
Linus Torvalds	ce95858aee	NFS Client Bugfixes for Linux 6.16 * Fix loop in GSS sequence number cache * Clean up /proc/net/rpc/nfs if nfs_fs_proc_net_init() fails * Fix a race to wake on NFS_LAYOUT_DRAIN * Fix handling of NFS level errors in I/O -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmhkRaUACgkQ18tUv7Cl QOskFxAAiyf6OIZ6M1mBfmbDDb+O5Gl6zofn0+OW2V9puJT/0U7pgNzepK9gFtO8 o3Bq0/GU3I2oxO7wEFrWFQXl3hkPvMqCN7ai1Vb2DjRGWhu97E0Mk3DWltSmuDFQ IaofuURjJdhgvjLb03mI6ReQNxONbMU3qD0JgK4/WIfvm44574Fah6jTnod32G23 EHj8cBw+iIvGh8MmPb4g01XivMGM36bA08NP4qkU/wgeLnkJzFYb5XZf16v821T6 ZxwwruclX2fbpLtsQsHfJpOgW/TFRJTyjBcZw581H8fpkgh1PlJ96OFwrbOU7RCp gVzDw3hvWoKFaMjVlkKk3wSWzwtMWLnB8a7TmgssuNU+DqmN3qMzkaRqrOxWSYMc t7SycQ+PReaR2gQdlJNrN5/Q75OLpqplwPi6O5cqOMQXC2aMK+nhXVW9QiC1SPFI ZcymKk4anzdgIgH+8TR3JpFVmPoEuuIeLV24+DQ0rlh7+4SI3TooTygfsl3/DErb 6Ic6nXgeSBWBPvuemnPbsq9DuAqGFbLrbdutVu4LUx/9XoGd8AfA9dVLMIb/0hgm C3Lwt1xeata8dz1v2jHHS1Tzs8ZphXnUCU7gzcf4TDs3UQUGzKnnNfdfb1r2cvxU LVz2guJ9xH4r3TsVNn2GQijbccxwPVFxszzPm0JobxiQYOna0Ss= =F4+b -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.16-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: - Fix loop in GSS sequence number cache - Clean up /proc/net/rpc/nfs if nfs_fs_proc_net_init() fails - Fix a race to wake on NFS_LAYOUT_DRAIN - Fix handling of NFS level errors in I/O * tag 'nfs-for-6.16-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4/flexfiles: Fix handling of NFS level errors in I/O NFSv4/pNFS: Fix a race to wake on NFS_LAYOUT_DRAIN nfs: Clean up /proc/net/rpc/nfs when nfs_fs_proc_net_init() fails. sunrpc: fix loop in gss seqno cache	2025-07-01 13:42:30 -07:00
David Howells	90b3ccf514	netfs: Update tracepoints in a number of ways Make a number of updates to the netfs tracepoints: (1) Remove a duplicate trace from netfs_unbuffered_write_iter_locked(). (2) Move the trace in netfs_wake_rreq_flag() to after the flag is cleared so that the change appears in the trace. (3) Differentiate the use of netfs_rreq_trace_wait/woke_queue symbols. (4) Don't do so many trace emissions in the wait functions as some of them are redundant. (5) In netfs_collect_read_results(), differentiate a subreq that's being abandoned vs one that has been consumed in a regular way. (6) Add a tracepoint to indicate the call to ->ki_complete(). (7) Don't double-increment the subreq_counter when retrying a write. (8) Move the netfs_sreq_trace_io_progress tracepoint within cifs code to just MID_RESPONSE_RECEIVED and add different tracepoints for other MID states and note check failure. Signed-off-by: David Howells <dhowells@redhat.com> Co-developed-by: Paulo Alcantara <pc@manguebit.org> Signed-off-by: Paulo Alcantara <pc@manguebit.org> Link: https://lore.kernel.org/20250701163852.2171681-14-dhowells@redhat.com cc: Steve French <sfrench@samba.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-01 22:37:14 +02:00
David Howells	4e32541076	netfs: Renumber the NETFS_RREQ_* flags to make traces easier to read Renumber the NETFS_RREQ_* flags to put the most useful status bits in the bottom nibble - and therefore the last hex digit in the trace output - making it easier to grasp the state at a glance. In particular, put the IN_PROGRESS flag in bit 0 and ALL_QUEUED at bit 1. Also make the flags field in /proc/fs/netfs/requests larger to accommodate all the flags. Also make the flags field in the netfs_sreq tracepoint larger to accommodate all the NETFS_SREQ_* flags. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250701163852.2171681-13-dhowells@redhat.com Reviewed-by: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-01 22:37:14 +02:00
David Howells	5e1e6ec2e3	netfs: Merge i_size update functions Netfslib has two functions for updating the i_size after a write: one for buffered writes into the pagecache and one for direct/unbuffered writes. However, what needs to be done is much the same in both cases, so merge them together. This does raise one question, though: should updating the i_size after a direct write do the same estimated update of i_blocks as is done for buffered writes. Also get rid of the cleanup function pointer from netfs_io_request as it's only used for direct write to update i_size; instead do the i_size setting directly from write collection. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250701163852.2171681-12-dhowells@redhat.com cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-01 22:37:14 +02:00
David Howells	2e0658940d	netfs: Fix i_size updating Fix the updating of i_size, particularly in regard to the completion of DIO writes and especially async DIO writes by using a lock. The bug is triggered occasionally by the generic/207 xfstest as it chucks a bunch of AIO DIO writes at the filesystem and then checks that fstat() returns a reasonable st_size as each completes. The problem is that netfs is trying to do "if new_size > inode->i_size, update inode->i_size" sort of thing but without a lock around it. This can be seen with cifs, but shouldn't be seen with kafs because kafs serialises modification ops on the client whereas cifs sends the requests to the server as they're generated and lets the server order them. Fixes: `153a9961b5` ("netfs: Implement unbuffered/DIO write support") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250701163852.2171681-11-dhowells@redhat.com Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-01 22:37:14 +02:00
Paulo Alcantara	74ee76bea4	smb: client: set missing retry flag in cifs_writev_callback() Set NETFS_SREQ_NEED_RETRY flag to tell netfslib that the subreq needs to be retried. Fixes: `ee4cdf7ba8` ("netfs: Speed up buffered reading") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250701163852.2171681-9-dhowells@redhat.com Tested-by: Steve French <sfrench@samba.org> Cc: linux-cifs@vger.kernel.org Cc: netfs@lists.linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-01 22:37:13 +02:00
Paulo Alcantara	0e60bae24a	smb: client: set missing retry flag in cifs_readv_callback() Set NETFS_SREQ_NEED_RETRY flag to tell netfslib that the subreq needs to be retried. Fixes: `ee4cdf7ba8` ("netfs: Speed up buffered reading") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250701163852.2171681-8-dhowells@redhat.com Tested-by: Steve French <sfrench@samba.org> Cc: linux-cifs@vger.kernel.org Cc: netfs@lists.linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-01 22:37:13 +02:00
Paulo Alcantara	e67e75edeb	smb: client: set missing retry flag in smb2_writev_callback() Set NETFS_SREQ_NEED_RETRY flag to tell netfslib that the subreq needs to be retried. Fixes: `ee4cdf7ba8` ("netfs: Speed up buffered reading") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250701163852.2171681-7-dhowells@redhat.com Tested-by: Steve French <sfrench@samba.org> Cc: linux-cifs@vger.kernel.org Cc: netfs@lists.linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-01 22:37:13 +02:00

1 2 3 4 5 ...

99719 Commits