linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Luca Ceresoli	178ac97535	drm/panel: simple: add Tianma P0700WXF1MBAA panel Add the Tianma P0700WXF1MBAA 7" 1280x800 LVDS RGB TFT LCD panel. Reuse the timings of the TM070JDHG34-00 as they are identical, even though they are described differently by the datasheet as noted in the comment. Power up/down timing are slightly different, so add a new struct panel_desc for that. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Link: https://lore.kernel.org/r/20250411-tianma-p0700wxf1mbaa-v3-3-acbefe9ea669@bootlin.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250411-tianma-p0700wxf1mbaa-v3-3-acbefe9ea669@bootlin.com	2025-04-17 17:38:18 +02:00
Luca Ceresoli	716c75afd8	drm/panel: simple: Tianma TM070JDHG34-00: add delays Add power on/off delays for the Tianma TM070JDHG34-00. Fixes: `bf6daaa281` ("drm/panel: simple: Add Tianma TM070JDHG34-00 panel support") Cc: stable@vger.kernel.org Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250411-tianma-p0700wxf1mbaa-v3-2-acbefe9ea669@bootlin.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250411-tianma-p0700wxf1mbaa-v3-2-acbefe9ea669@bootlin.com	2025-04-17 17:38:18 +02:00
Luca Ceresoli	12ad686ffd	dt-bindings: display: simple: Add Tianma P0700WXF1MBAA panel Add the Tianma Micro-electronics P0700WXF1MBAA 7.0" LVDS LCD TFT panel. Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Link: https://lore.kernel.org/r/20250411-tianma-p0700wxf1mbaa-v3-1-acbefe9ea669@bootlin.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250411-tianma-p0700wxf1mbaa-v3-1-acbefe9ea669@bootlin.com	2025-04-17 17:38:17 +02:00
Bo-Cun Chen	1b66124135	net: ethernet: mtk_eth_soc: revise QDMA packet scheduler settings The QDMA packet scheduler suffers from a performance issue. Fix this by picking up changes from MediaTek's SDK which change to use Token Bucket instead of Leaky Bucket and fix the SPEED_1000 configuration. Fixes: `160d3a9b19` ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/18040f60f9e2f5855036b75b28c4332a2d2ebdd8.1744764277.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-17 08:13:41 -07:00
Bo-Cun Chen	6b02eb372c	net: ethernet: mtk_eth_soc: correct the max weight of the queue limit for 100Mbps Without this patch, the maximum weight of the queue limit will be incorrect when linked at 100Mbps due to an apparent typo. Fixes: `f63959c7ee` ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/74111ba0bdb13743313999ed467ce564e8189006.1744764277.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-17 08:13:41 -07:00
Bo-Cun Chen	6bc2b6c6f1	net: ethernet: mtk_eth_soc: reapply mdc divider on reset In the current method, the MDC divider was reset to the default setting of 2.5MHz after the NETSYS SER. Therefore, we need to reapply the MDC divider configuration function in mtk_hw_init() after reset. Fixes: `c0a440031d` ("net: ethernet: mtk_eth_soc: set MDIO bus clock frequency") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/8ab7381447e6cdcb317d5b5a6ddd90a1734efcb0.1744764277.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-17 08:13:40 -07:00
Lu Baolu	4f1492efb4	iommu/vt-d: Revert ATS timing change to fix boot failure Commit <5518f239aff1> ("iommu/vt-d: Move scalable mode ATS enablement to probe path") changed the PCI ATS enablement logic to run earlier, specifically before the default domain attachment. On some client platforms, this change resulted in boot failures, causing the kernel to panic with the following message and call trace: Kernel panic - not syncing: DMAR hardware is malfunctioning CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #175 Call Trace: <TASK> dump_stack_lvl+0x6f/0xb0 dump_stack+0x10/0x16 panic+0x10a/0x2b7 iommu_enable_translation.cold+0xc/0xc intel_iommu_init+0xe39/0xec0 ? trace_hardirqs_on+0x1e/0xd0 ? __pfx_pci_iommu_init+0x10/0x10 pci_iommu_init+0xd/0x40 do_one_initcall+0x5b/0x390 kernel_init_freeable+0x26d/0x2b0 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x15/0x120 ret_from_fork+0x35/0x60 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1a/0x30 RIP: 1f0f:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 841f0f2e66 ORIG_RAX: 1f0f2e6600000000 RAX: 0000000000000000 RBX: 1f0f2e6600000000 RCX: 2e66000000000084 RDX: 0000000000841f0f RSI: 000000841f0f2e66 RDI: 00841f0f2e660000 RBP: 00841f0f2e660000 R08: 00841f0f2e660000 R09: 000000841f0f2e66 R10: 0000000000841f0f R11: 2e66000000000084 R12: 000000841f0f2e66 R13: 0000000000841f0f R14: 2e66000000000084 R15: 1f0f2e6600000000 </TASK> ---[ end Kernel panic - not syncing: DMAR hardware is malfunctioning ]--- Fix this by reverting the timing change for ATS enablement introduced by the offending commit and restoring the previous behavior. Fixes: `5518f239af` ("iommu/vt-d: Move scalable mode ATS enablement to probe path") Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Closes: https://lore.kernel.org/linux-iommu/01b9c72f-460d-4f77-b696-54c6825babc9@linux.intel.com/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250416073608.1799578-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:45:28 +02:00
Nicolin Chen	30a3f2f3e4	iommu: Fix two issues in iommu_copy_struct_from_user() In the review for iommu_copy_struct_to_user() helper, Matt pointed out that a NULL pointer should be rejected prior to dereferencing it: https://lore.kernel.org/all/86881827-8E2D-461C-BDA3-FA8FD14C343C@nvidia.com And Alok pointed out a typo at the same time: https://lore.kernel.org/all/480536af-6830-43ce-a327-adbd13dc3f1d@oracle.com Since both issues were copied from iommu_copy_struct_from_user(), fix them first in the current header. Fixes: `e9d36c07bb` ("iommu: Add iommu_copy_struct_from_user helper") Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com> Link: https://lore.kernel.org/r/20250414191635.450472-1-nicolinc@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:44:27 +02:00
Pavel Paklov	8dee308e4c	iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid There is a string parsing logic error which can lead to an overflow of hid or uid buffers. Comparing ACPIID_LEN against a total string length doesn't take into account the lengths of individual hid and uid buffers so the check is insufficient in some cases. For example if the length of hid string is 4 and the length of the uid string is 260, the length of str will be equal to ACPIID_LEN + 1 but uid string will overflow uid buffer which size is 256. The same applies to the hid string with length 13 and uid string with length 250. Check the length of hid and uid strings separately to prevent buffer overflow. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `ca3bf5d47c` ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Cc: stable@vger.kernel.org Signed-off-by: Pavel Paklov <Pavel.Paklov@cyberprotect.ru> Link: https://lore.kernel.org/r/20250325092259.392844-1-Pavel.Paklov@cyberprotect.ru Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:37:21 +02:00
Nitesh Shetty	80c7378f94	io_uring/rsrc: send exact nr_segs for fixed buffer Sending exact nr_segs, avoids bio split check and processing in block layer, which takes around 5%[1] of overall CPU utilization. In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2] and 5% less CPU utilization. [1] 3.52% io_uring [kernel.kallsyms] [k] bio_split_rw_at 1.42% io_uring [kernel.kallsyms] [k] bio_split_rw 0.62% io_uring [kernel.kallsyms] [k] bio_submit_split [2] sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1 Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> [Pavel: fixed for kbuf, rebased and reworked on top of cleanups] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7a1a49a8d053bd617c244291d63dbfbc07afde36.1744882081.git.asml.silence@gmail.com [axboe: fold in fix factoring in buf reg offset] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-17 07:42:14 -06:00
Danilo Krummrich	7bd1710aac	rust: dma: require a bound device Require the Bound device context to be able to create new dma::CoherentAllocation instances. DMA memory allocations are only valid to be created for bound devices. Link: https://lore.kernel.org/r/20250413173758.12068-10-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-04-17 15:21:54 +02:00
Danilo Krummrich	f720efda2d	rust: devres: require a bound device Require the Bound device context to be able to a new Devres container. This ensures that we can't register devres callbacks for unbound devices. Link: https://lore.kernel.org/r/20250413173758.12068-9-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-04-17 15:21:52 +02:00
Danilo Krummrich	f2a399d7b6	rust: pci: move iomap_region() to impl Device<Bound> Require the Bound device context to be able to call iomap_region() and iomap_region_sized(). Creating I/O mapping requires the device to be bound. Reviewed-by: Benno Lossin <benno.lossin@proton.me> Link: https://lore.kernel.org/r/20250413173758.12068-8-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-04-17 15:21:50 +02:00
Danilo Krummrich	f933b7489f	rust: device: implement Bound device context The Bound device context indicates that a device is bound to a driver. It must be used for APIs that require the device to be bound, such as Devres or dma::CoherentAllocation. Implement Bound and add the corresponding Deref hierarchy, as well as the corresponding ARef conversion for this device context. Reviewed-by: Benno Lossin <benno.lossin@proton.me> Link: https://lore.kernel.org/r/20250413173758.12068-7-dakr@kernel.org [ Add missing `::` prefix in macros. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-04-17 15:21:48 +02:00
Danilo Krummrich	3edaefbf2b	rust: pci: preserve device context in AsRef Since device::Device has a generic over its context, preserve this device context in AsRef. For instance, when calling pci::Device<Core> the new AsRef implementation returns device::Device<Core>. Reviewed-by: Benno Lossin <benno.lossin@proton.me> Link: https://lore.kernel.org/r/20250413173758.12068-6-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-04-17 15:21:47 +02:00
Danilo Krummrich	da6c47c6cb	rust: platform: preserve device context in AsRef Since device::Device has a generic over its context, preserve this device context in AsRef. For instance, when calling platform::Device<Core> the new AsRef implementation returns device::Device<Core>. Reviewed-by: Benno Lossin <benno.lossin@proton.me> Link: https://lore.kernel.org/r/20250413173758.12068-5-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-04-17 15:21:45 +02:00
Danilo Krummrich	d32e4c24a7	rust: device: implement device context for Device Analogous to bus specific device, implement the DeviceContext generic for generic devices. This is used for APIs that work with generic devices (such as Devres) to evaluate the device's context. Reviewed-by: Benno Lossin <benno.lossin@proton.me> Link: https://lore.kernel.org/r/20250413173758.12068-4-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-04-17 15:21:39 +02:00
Danilo Krummrich	fbb92b6a53	rust: device: implement impl_device_context_into_aref! Implement a macro to implement all From conversions of a certain device to ARef<Device>. This avoids unnecessary boiler plate code for every device implementation. Reviewed-by: Benno Lossin <benno.lossin@proton.me> Link: https://lore.kernel.org/r/20250413173758.12068-3-dakr@kernel.org [ Add missing `::` prefix in macros. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-04-17 15:21:27 +02:00
Danilo Krummrich	cb271c2edf	rust: device: implement impl_device_context_deref! The Deref hierarchy for device context generics is the same for every (bus specific) device. Implement those with a generic macro to avoid duplicated boiler plate code and ensure the correct Deref hierarchy for every device implementation. Co-developed-by: Benno Lossin <benno.lossin@proton.me> Signed-off-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Christian Schrefl <chrisi.schrefl@gmail.com> Link: https://lore.kernel.org/r/20250413173758.12068-2-dakr@kernel.org [ Add missing `::` prefix in macros. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-04-17 15:21:11 +02:00
Paolo Abeni	8e57ce3c32	netfilter pull request 25-04-17 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmgA1LAACgkQ1w0aZmrP KyHLThAAw/aRaQmwWB7yAYuCR+VzdB+SzafLVR4PlZpH0aGG1dNucIf0y3eI016M ZixJsOithlpqNEFMKi6WfuKxX2h835ck56VMhYuuCW2thaOeXiyut+P6jAaTWVKV u9tjfYICtI2KTGWGBekhKOSxmATWzDqBRCa1uO3Qsjz+J/ti4sW8YsSSs4BE91Uu zecK00lSkoNTfU5HuUOpTTeStpiQcXxfO1Arx3HLtdf6IqZLtbKk4hkgJw5DtkNB SS4nbvnhWx8CbngdMT6EJas47TK8sJPvgWTBHPeSlYjcPV9p2QanQ3RNMASf2AAE OL2pPKWICUVLKTpZ4UlsNDm+0a1D3Bz7ET4CtJAmpa6sennl0cDZYsDkRCXV/Dz7 putS+BuX3SSk/G1XqRh3VZOUf/G0Cyko2Viyj6gWP8A+TeGzAm0EtHWURP2TNodS 1DTr0BB3/tf+uJ2WCKBng30XtyPfQz0soDPJ0LrvFQwkWotVPyOWL37cVrEs/yyF FjiwJkeEUBZNc8HRM3KADm69Q5LK7YLN/lNrgOCNiJeQI52rhNd7rdEheuvSCiwc IEer5EXgj2dubh0MeEKbUvrwXqZGrdquec7/KftTrfqo+vec7+9ycNlnjhq/k1Qb Ni5k7Svor5+avW/H39rUO6b6DHJoG0zaAppgrRrmniDQ330Sjok= =VG/D -----END PGP SIGNATURE----- Merge tag 'nf-25-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fix for net The following batch contains one Netfilter fix for net: 1) conntrack offload bit is erroneously unset in a race scenario, from Florian Westphal. netfilter pull request 25-04-17 * tag 'nf-25-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: conntrack: fix erronous removal of offload bit ==================== Link: https://patch.msgid.link/20250417102847.16640-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-17 15:20:41 +02:00
Pavel Begunkov	59852ebad9	io_uring/rsrc: refactor io_import_fixed io_import_fixed is a mess. Even though we know the final len of the iterator, we still assign offset + len and do some magic after to correct for that. Do offset calculation first and finalise it with iov_iter_bvec at the end. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2d5107fed24f8b23245ef2ede9a5a7f7c426df61.1744882081.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-17 06:21:13 -06:00
Pavel Begunkov	50169d0754	io_uring/rsrc: separate kbuf offset adjustments Kernel registered buffers are special because segments are not uniform in size, and we have a bunch of optimisations based on that uniformity for normal buffers. Handle kbuf separately, it'll be cleaner this way. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4e9e5990b0ab5aee723c0be5cd9b5bcf810375f9.1744882081.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-17 06:21:13 -06:00
Pavel Begunkov	1ac5712888	io_uring/rsrc: don't skip offset calculation Don't optimise for requests with offset=0. Large registered buffers are the preference and hence the user is likely to pass an offset, and the adjustments are not expensive and will be made even cheaper in following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1c2beb20470ee3c886a363d4d8340d3790db19f3.1744882081.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-17 06:21:13 -06:00
Kan Liang	7950de14ff	perf/x86/intel: Add Panther Lake support From PMU's perspective, Panther Lake is similar to the previous generation Lunar Lake. Both are hybrid platforms, with e-core and p-core. The key differences are the ARCH PEBS feature and several new events. The ARCH PEBS is supported in the following patches. The new events will be supported later in perf tool. Share the code path with the Lunar Lake. Only update the name. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20250415114428.341182-2-dapeng1.mi@linux.intel.com	2025-04-17 14:19:59 +02:00
Dapeng Mi	71dcc11c2c	perf/x86/intel: Allow to update user space GPRs from PEBS records Currently when a user samples user space GPRs (--user-regs option) with PEBS, the user space GPRs actually always come from software PMI instead of from PEBS hardware. This leads to the sampled GPRs to possibly be inaccurate for single PEBS record case because of the skid between counter overflow and GPRs sampling on PMI. For the large PEBS case, it is even worse. If user sets the exclude_kernel attribute, large PEBS would be used to sample user space GPRs, but since PEBS GPRs group is not really enabled, it leads to all samples in the large PEBS record to share the same piece of user space GPRs, like this reproducer shows: $ perf record -e branches:pu --user-regs=ip,ax -c 100000 ./foo $ perf report -D \| grep "AX" .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead .... AX 0x000000003a0d4ead So enable GPRs group for user space GPRs sampling and prioritize reading GPRs from PEBS. If the PEBS sampled GPRs is not user space GPRs (single PEBS record case), perf_sample_regs_user() modifies them to user space GPRs. [ mingo: Clarified the changelog. ] Fixes: `c22497f583` ("perf/x86/intel: Support adaptive PEBS v4") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250415104135.318169-2-dapeng1.mi@linux.intel.com	2025-04-17 14:19:38 +02:00
Dapeng Mi	a5f5e1238f	perf/x86/intel: Don't clear perf metrics overflow bit unconditionally The below code would always unconditionally clear other status bits like perf metrics overflow bit once PEBS buffer overflows: status &= intel_ctrl \| GLOBAL_STATUS_TRACE_TOPAPMI; This is incorrect. Perf metrics overflow bit should be cleared only when fixed counter 3 in PEBS counter group. Otherwise perf metrics overflow could be missed to handle. Closes: https://lore.kernel.org/all/20250225110012.GK31462@noisy.programming.kicks-ass.net/ Fixes: `7b2c05a15d` ("perf/x86/intel: Generic support for hardware TopDown metrics") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250415104135.318169-1-dapeng1.mi@linux.intel.com	2025-04-17 14:19:07 +02:00
Jens Axboe	81dd1feb19	nvme fixes for Linux 6.15 - fix scan failure for non-ANA multipath controllers (Hannes Reinecke) - fix multipath sysfs links creation for some cases (Hannes Reinecke) - PCIe endpoint fixes (Damien Le Moal) - use NULL instead of 0 in the auth code (Damien Le Moal) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmgAo4wLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYNmfw/9EbS9qrtCo8n8kwkhez7mzx+d36fveTGdA+XApm/3 M2ZHwosFqw8TgBQTVxvqqPeUit8ZeD78IGvSOwK7p+bO8R55h1m2BxVKbY+1VJhE r+5Cdq5BYsXb3C0P1MWnW3pu4VTzC/+kH1SH4yNw00rW//+W5xBgpKfWglzzMsw5 FGqgLa2venfMRGuqXtoya3iTPhyPvIQMzPXwhkLd/iBVioWuJlgZNX0vspzyYmjo XES90XcASItOa9aR5tdPz0nQ+cmVz7yEGJkzUNX2is50shn3llR43Jr2mLEyMKS9 3ZGfsX7jHLPv8l7aPOs5chGJOSOGYU+oLV64dKM8X/RtfyY4XCT6ImE6Eqstvblu q8x0lMk5iYX/wdNNJh2LH8/6RC30TxGn8/PGpm2qK2RYBOwF+kBOdRvr2F9w1MNV SAzJv4vQQd6+JjwubvvJknky2sB7ogZSgY1m+g+VoAGDrwPvS/PQPkfYUrJ8LH2O FaKoS/U+KOEJhcqEmJ//LQNZh+YpvXXdEMAHzC8ifhBqEBY/1F4+AcKpubYNGrHU 2jlvH9tgxSkA450McDPmNO+62ZH0gPvCHZbbABJIGuhoiR+lcycuu1l3uhJbms4p X5ZSYmzXbBRCTviOL0Qs9wPcNOUh/FLVN+DUS7ticvTDg8DLm2SA/fBiXtcnWw3b x2Y= =ZSJ5 -----END PGP SIGNATURE----- Merge tag 'nvme-6.15-2025-04-17' of git://git.infradead.org/nvme into block-6.15 Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.15 - fix scan failure for non-ANA multipath controllers (Hannes Reinecke) - fix multipath sysfs links creation for some cases (Hannes Reinecke) - PCIe endpoint fixes (Damien Le Moal) - use NULL instead of 0 in the auth code (Damien Le Moal)" * tag 'nvme-6.15-2025-04-17' of git://git.infradead.org/nvme: nvmet: pci-epf: cleanup link state management nvmet: pci-epf: clear CC and CSTS when disabling the controller nvmet: pci-epf: always fully initialize completion entries nvmet: auth: use NULL to clear a pointer in nvmet_auth_sq_free() nvme-multipath: sysfs links may not be created for devices nvme: fixup scan failure for non-ANA multipath controllers	2025-04-17 06:18:49 -06:00
Robin Murphy	2d00c34d66	iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully We've never supported StreamID aliasing between devices, and as such they will never have had functioning DMA, but this is not fatal to the SMMU itself. Although aliasing between hard-wired platform device StreamIDs would tend to raise questions about the whole system, in practice it's far more likely to occur relatively innocently due to legacy PCI bridges, where the underlying StreamID mappings are still perfectly reasonable. As such, return a more benign -ENODEV when failing probe for such an unsupported device (and log a more obvious error message), so that it doesn't break the entire SMMU probe now that bus_iommu_probe() runs in the right order and can propagate that error back. The end result is still that the device doesn't get an IOMMU group and probably won't work, same as before. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/39d54e49c8476efc4653e352150d44b185d6d50f.1744380554.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-04-17 12:42:02 +01:00
Nicolin Chen	b00d24997a	iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids ASPEED VGA card has two built-in devices: 0008:06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 06) 0008:07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) Its toplogy looks like this: +-[0008:00]---00.0-[01-09]--+-00.0-[02-09]--+-00.0-[03]----00.0 Sandisk Corp Device 5017 \| +-01.0-[04]-- \| +-02.0-[05]----00.0 NVIDIA Corporation Device \| +-03.0-[06-07]----00.0-[07]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family \| +-04.0-[08]----00.0 Renesas Technology Corp. uPD720201 USB 3.0 Host Controller \| \-05.0-[09]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller \-00.1 PMC-Sierra Inc. Device 4028 The IORT logic populaties two identical IDs into the fwspec->ids array via DMA aliasing in iort_pci_iommu_init() called by pci_for_each_dma_alias(). Though the SMMU driver had been able to handle this situation since commit `563b5cbe33` ("iommu/arm-smmu-v3: Cope with duplicated Stream IDs"), that got broken by the later commit `cdf315f907` ("iommu/arm-smmu-v3: Maintain a SID->device structure"), which ended up with allocating separate streams with the same stuffing. On a kernel prior to v6.15-rc1, there has been an overlooked warning: pci 0008:07:00.0: vgaarb: setting as boot VGA device pci 0008:07:00.0: vgaarb: bridge control possible pci 0008:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none pcieport 0008:06:00.0: Adding to iommu group 14 ast 0008:07:00.0: stream 67328 already in tree <===== WARNING ast 0008:07:00.0: enabling device (0002 -> 0003) ast 0008:07:00.0: Using default configuration ast 0008:07:00.0: AST 2600 detected ast 0008:07:00.0: [drm] Using analog VGA ast 0008:07:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16 [drm] Initialized ast 0.1.0 for 0008:07:00.0 on minor 0 ast 0008:07:00.0: [drm] fb0: astdrmfb frame buffer device With v6.15-rc, since the commit `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path"), the error returned with the warning is moved to the SMMU device probe flow: arm_smmu_probe_device+0x15c/0x4c0 __iommu_probe_device+0x150/0x4f8 probe_iommu_group+0x44/0x80 bus_for_each_dev+0x7c/0x100 bus_iommu_probe+0x48/0x1a8 iommu_device_register+0xb8/0x178 arm_smmu_device_probe+0x1350/0x1db0 which then fails the entire SMMU driver probe: pci 0008:06:00.0: Adding to iommu group 21 pci 0008:07:00.0: stream 67328 already in tree arm-smmu-v3 arm-smmu-v3.9.auto: Failed to register iommu arm-smmu-v3 arm-smmu-v3.9.auto: probe with driver arm-smmu-v3 failed with error -22 Since SMMU driver had been already expecting a potential duplicated Stream ID in arm_smmu_install_ste_for_dev(), change the arm_smmu_insert_master() routine to ignore a duplicated ID from the fwspec->sids array as well. Note: this has been failing the iommu_device_probe() since 2021, although a recent iommu commit in v6.15-rc1 that moves iommu_device_probe() started to fail the SMMU driver probe. Since nobody has cared about DMA Alias support, leave that as it was but fix the fundamental iommu_device_probe() breakage. Fixes: `cdf315f907` ("iommu/arm-smmu-v3: Maintain a SID->device structure") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20250415185620.504299-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2025-04-17 12:37:58 +01:00
Balbir Singh	12f7802197	iommu/arm-smmu-v3: Fix pgsize_bit for sva domains UBSan caught a bug with IOMMU SVA domains, where the reported exponent value in __arm_smmu_tlb_inv_range() was >= 64. __arm_smmu_tlb_inv_range() uses the domain's pgsize_bitmap to compute the number of pages to invalidate and the invalidation range. Currently arm_smmu_sva_domain_alloc() does not setup the iommu domain's pgsize_bitmap. This leads to __ffs() on the value returning 64 and that leads to undefined behaviour w.r.t. shift operations Fix this by initializing the iommu_domain's pgsize_bitmap to PAGE_SIZE. Effectively the code needs to use the smallest page size for invalidation Cc: stable@vger.kernel.org Fixes: `eb6c97647b` ("iommu/arm-smmu-v3: Avoid constructing invalid range commands") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250412002354.3071449-1-balbirs@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>	2025-04-17 12:28:42 +01:00
Aneesh Kumar K.V (Arm)	45e00e3671	iommu/arm-smmu-v3: Add missing S2FWB feature detection Commit `67e4fe3985` ("iommu/arm-smmu-v3: Use S2FWB for NESTED domains") introduced S2FWB usage but omitted the corresponding feature detection. As a result, vIOMMU allocation fails on FVP in arm_vsmmu_alloc(), due to the following check: if (!arm_smmu_master_canwbs(master) && !(smmu->features & ARM_SMMU_FEAT_S2FWB)) return ERR_PTR(-EOPNOTSUPP); This patch adds the missing detection logic to prevent allocation failure when S2FWB is supported. Fixes: `67e4fe3985` ("iommu/arm-smmu-v3: Use S2FWB for NESTED domains") Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250408033351.1012411-1-aneesh.kumar@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2025-04-17 12:25:34 +01:00
Tamura Dai	951a04ab3a	spi: spi-imx: Add check for spi_imx_setupxfer() Add check for the return value of spi_imx_setupxfer(). spi_imx->rx and spi_imx->tx function pointer can be NULL when spi_imx_setupxfer() return error, and make NULL pointer dereference. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: 0x0 spi_imx_pio_transfer+0x50/0xd8 spi_imx_transfer_one+0x18c/0x858 spi_transfer_one_message+0x43c/0x790 __spi_pump_transfer_message+0x238/0x5d4 __spi_sync+0x2b0/0x454 spi_write_then_read+0x11c/0x200 Signed-off-by: Tamura Dai <kirinode0@gmail.com> Reviewed-by: Carlos Song <carlos.song@nxp.com> Link: https://patch.msgid.link/20250417011700.14436-1-kirinode0@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>	2025-04-17 12:25:12 +01:00
Kurt Borja	4a8e04e2bd	platform/x86: alienware-wmi-wmax: Fix uninitialized variable due to bad error handling wmax_thermal_information() may also return -ENOMSG, which would leave `id` uninitialized in thermal_profile_probe. Reorder and modify logic to catch all errors. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/Z_-KVqNbD9ygvE2X@stanley.mountain Fixes: `27e9e63398` ("platform/x86: alienware-wmi: Refactor thermal control methods") Signed-off-by: Kurt Borja <kuurtb@gmail.com> Link: https://lore.kernel.org/r/20250416-smatch-fix-v1-1-35491b462d8f@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>	2025-04-17 14:16:16 +03:00
Shouye Liu	8d6955ed76	platform/x86/intel-uncore-freq: Fix missing uncore sysfs during CPU hotplug In certain situations, the sysfs for uncore may not be present when all CPUs in a package are offlined and then brought back online after boot. This issue can occur if there is an error in adding the sysfs entry due to a memory allocation failure. Retrying to bring the CPUs online will not resolve the issue, as the uncore_cpu_mask is already set for the package before the failure condition occurs. This issue does not occur if the failure happens during module initialization, as the module will fail to load in the event of any error. To address this, ensure that the uncore_cpu_mask is not set until the successful return of uncore_freq_add_entry(). Fixes: `dbce412a77` ("platform/x86/intel-uncore-freq: Split common and enumeration part") Signed-off-by: Shouye Liu <shouyeliu@tencent.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250417032321.75580-1-shouyeliu@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>	2025-04-17 14:15:30 +03:00
Mario Limonciello	9f5595d5f0	platform/x86/amd: pmc: Require at least 2.5 seconds between HW sleep cycles When an APU exits HW sleep with no active wake sources the Linux kernel will rapidly assert that the APU can enter back into HW sleep. This happens in a few ms. Contrasting this to Windows, Windows can take 10s of seconds to enter back into the resiliency phase for Modern Standby. For some situations this can be problematic because it can cause leakage from VDDCR_SOC to VDD_MISC and force VDD_MISC outside of the electrical design guide specifications. On some designs this will trip the over voltage protection feature (OVP) of the voltage regulator module, but it could cause APU damage as well. To prevent this risk, add an explicit sleep call so that future attempts to enter into HW sleep will have enough time to settle. This will occur while the screen is dark and only on cases that the APU should enter HW sleep again, so it shouldn't be noticeable to any user. Cc: stable@vger.kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20250414162446.3853194-1-superm1@kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>	2025-04-17 14:14:39 +03:00
Paolo Abeni	a43ae7cf55	bluetooth pull request for net: - l2cap: Process valid commands in too long frame - vhci: Avoid needless snprintf() calls -----BEGIN PGP SIGNATURE----- iQJMBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmgAGkwZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKZOmD/UbedvbIrN+fV9mUcTfWsOE UDo+L3GOaowjHDmRu/YR2XD65HWxJGM93u3rksZXfW+wuLWrm6gUfDGphEHT8Xee Oms+qWgBg/qqNX4gm0vmxGY9HeO9o+Ove8BN1cGudJAkZ5P4kz3vQ2Ytcb732tb5 zKgh+JaiUMa3eZnXmYKVpFiZ0V1c1iJwjsp2O+pXrBvM6uoheIZSnTKtFPFF199c I09l30nYeKsZsRNgyCgYU3mOvNtEcHPrhH1Y8sXh5Oy88ffM/G/29FHWb3fd/wCd Mo4G8Ciy70OY4xR1oeS/Mnt8mNIvja3AXlquRtKHvNgcY3e6DgKQZAWXLcMMRmOO hdjy9pIUJLG09OGv92+HEMlnXMRvK9OYJ5GjbBVFxSHNk2b8FaIW7NdMm4xjVClZ qcG5pl2a3Gji5sdsrtoP3jBO6ErHDwI0S5rrBYGiSqvAqBBRvveo9cl2GgPHviEF IT0E6FzzyaR6xu9qpObw+LN13d4egEA2mi9TSHdUn/Vs1TCdk1zjKGOljKpcQWrc It0B4cwDKJPiw5Be8AXhb+OgdYHmzjUsFz9FtXJ8X92nzJLC1ufWybvR2fPnQm47 NScQMRC3TjuWxcwuqZHYcaixO2efq4Kl58f2SF/SdBKlhlfpgOASML2PKp1AOweO lSZNdBpVOgQIpFw2gkS7 =8ull -----END PGP SIGNATURE----- Merge tag 'for-net-2025-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - l2cap: Process valid commands in too long frame - vhci: Avoid needless snprintf() calls * tag 'for-net-2025-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: vhci: Avoid needless snprintf() calls Bluetooth: l2cap: Process valid commands in too long frame ==================== Link: https://patch.msgid.link/20250416210126.2034212-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-17 13:08:42 +02:00
Kan Liang	506f981ab4	perf/x86/intel/uncore: Fix the scale of IIO free running counters on SPR The scale of IIO bandwidth in free running counters is inherited from the ICX. The counter increments for every 32 bytes rather than 4 bytes. The IIO bandwidth out free running counters don't increment with a consistent size. The increment depends on the requested size. It's impossible to find a fixed increment. Remove it from the event_descs. Fixes: `0378c93a92` ("perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server") Reported-by: Tang Jun <dukang.tj@alibaba-inc.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250416142426.3933977-3-kan.liang@linux.intel.com	2025-04-17 12:57:32 +02:00
Kan Liang	32c7f11502	perf/x86/intel/uncore: Fix the scale of IIO free running counters on ICX There was a mistake in the ICX uncore spec too. The counter increments for every 32 bytes rather than 4 bytes. The same as SNR, there are 1 ioclk and 8 IIO bandwidth in free running counters. Reuse the snr_uncore_iio_freerunning_events(). Fixes: `2b3b76b5ec` ("perf/x86/intel/uncore: Add Ice Lake server uncore support") Reported-by: Tang Jun <dukang.tj@alibaba-inc.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250416142426.3933977-2-kan.liang@linux.intel.com	2025-04-17 12:57:29 +02:00
Kan Liang	96a720db59	perf/x86/intel/uncore: Fix the scale of IIO free running counters on SNR There was a mistake in the SNR uncore spec. The counter increments for every 32 bytes of data sent from the IO agent to the SOC, not 4 bytes which was documented in the spec. The event list has been updated: "EventName": "UNC_IIO_BANDWIDTH_IN.PART0_FREERUN", "BriefDescription": "Free running counter that increments for every 32 bytes of data sent from the IO agent to the SOC", Update the scale of the IIO bandwidth in free running counters as well. Fixes: `210cc5f9db` ("perf/x86/intel/uncore: Add uncore support for Snow Ridge server") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250416142426.3933977-1-kan.liang@linux.intel.com	2025-04-17 12:57:20 +02:00
Paolo Abeni	f49a372361	Merge branch 'bug-fixes-from-xdp-and-perout-series' Meghana Malladi says: ==================== Bug fixes from XDP and perout series This patch series consists of bug fixes from the XDP series: 1. Fixes a kernel warning that occurs when bringing down the network interface. 2. Resolves a potential NULL pointer dereference in the emac_xmit_xdp_frame() function. 3. Resolves a potential NULL pointer dereference in the icss_iep_perout_enable() function v3: https://lore.kernel.org/all/20250328102403.2626974-1-m-malladi@ti.com/ ==================== Link: https://patch.msgid.link/20250415090543.717991-1-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-17 12:11:26 +02:00
Meghana Malladi	7349c9e997	net: ti: icss-iep: Fix possible NULL pointer dereference for perout request The ICSS IEP driver tracks perout and pps enable state with flags. Currently when disabling pps and perout signals during icss_iep_exit(), results in NULL pointer dereference for perout. To fix the null pointer dereference issue, the icss_iep_perout_enable_hw function can be modified to directly clear the IEP CMP registers when disabling PPS or PEROUT, without referencing the ptp_perout_request structure, as its contents are irrelevant in this case. Fixes: `9b11536124` ("net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/7b1c7c36-363a-4085-b26c-4f210bee1df6@stanley.mountain/ Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250415090543.717991-4-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-17 12:11:24 +02:00
Meghana Malladi	8ed2fa6613	net: ti: icssg-prueth: Fix possible NULL pointer dereference inside emac_xmit_xdp_frame() There is an error check inside emac_xmit_xdp_frame() function which is called when the driver wants to transmit XDP frame, to check if the allocated tx descriptor is NULL, if true to exit and return ICSSG_XDP_CONSUMED implying failure in transmission. In this case trying to free a descriptor which is NULL will result in kernel crash due to NULL pointer dereference. Fix this error handling and increase netdev tx_dropped stats in the caller of this function if the function returns ICSSG_XDP_CONSUMED. Fixes: `62aa3246f4` ("net: ti: icssg-prueth: Add XDP support") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/70d8dd76-0c76-42fc-8611-9884937c82f5@stanley.mountain/ Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250415090543.717991-3-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-17 12:11:24 +02:00
Meghana Malladi	75bc744466	net: ti: icssg-prueth: Fix kernel warning while bringing down network interface During network interface initialization, the NIC driver needs to register its Rx queue with the XDP, to ensure the incoming XDP buffer carries a pointer reference to this info and is stored inside xdp_rxq_info. While this struct isn't tied to XDP prog, if there are any changes in Rx queue, the NIC driver needs to stop the Rx queue by unregistering with XDP before purging and reallocating memory. Drop page_pool destroy during Rx channel reset as this is already handled by XDP during xdp_rxq_info_unreg (Rx queue unregister), failing to do will cause the following warning: warning logs: https://gist.github.com/MeghanaMalladiTI/eb627e5dc8de24e42d7d46572c13e576 Fixes: `46eeb90f03` ("net: ti: icssg-prueth: Use page_pool API for RX buffer allocation") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250415090543.717991-2-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-04-17 12:11:24 +02:00
Naohiro Aota	866bafae59	btrfs: zoned: skip reporting zone for new block group There is a potential deadlock if we do report zones in an IO context, detailed in below lockdep report. When one process do a report zones and another process freezes the block device, the report zones side cannot allocate a tag because the freeze is already started. This can thus result in new block group creation to hang forever, blocking the write path. Thankfully, a new block group should be created on empty zones. So, reporting the zones is not necessary and we can set the write pointer = 0 and load the zone capacity from the block layer using bdev_zone_capacity() helper. ====================================================== WARNING: possible circular locking dependency detected 6.14.0-rc1 #252 Not tainted ------------------------------------------------------ modprobe/1110 is trying to acquire lock: ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60 but task is already holding lock: ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&q->q_usage_counter(queue)#16){++++}-{0:0}: blk_queue_enter+0x3d9/0x500 blk_mq_alloc_request+0x47d/0x8e0 scsi_execute_cmd+0x14f/0xb80 sd_zbc_do_report_zones+0x1c1/0x470 sd_zbc_report_zones+0x362/0xd60 blkdev_report_zones+0x1b1/0x2e0 btrfs_get_dev_zones+0x215/0x7e0 [btrfs] btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs] btrfs_make_block_group+0x36b/0x870 [btrfs] btrfs_create_chunk+0x147d/0x2320 [btrfs] btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs] start_transaction+0xce6/0x1620 [btrfs] btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs] kthread+0x39d/0x750 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 -> #2 (&fs_info->dev_replace.rwsem){++++}-{4:4}: down_read+0x9b/0x470 btrfs_map_block+0x2ce/0x2ce0 [btrfs] btrfs_submit_chunk+0x2d4/0x16c0 [btrfs] btrfs_submit_bbio+0x16/0x30 [btrfs] btree_write_cache_pages+0xb5a/0xf90 [btrfs] do_writepages+0x17f/0x7b0 __writeback_single_inode+0x114/0xb00 writeback_sb_inodes+0x52b/0xe00 wb_writeback+0x1a7/0x800 wb_workfn+0x12a/0xbd0 process_one_work+0x85a/0x1460 worker_thread+0x5e2/0xfc0 kthread+0x39d/0x750 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 -> #1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}: __mutex_lock+0x1aa/0x1360 btree_write_cache_pages+0x252/0xf90 [btrfs] do_writepages+0x17f/0x7b0 __writeback_single_inode+0x114/0xb00 writeback_sb_inodes+0x52b/0xe00 wb_writeback+0x1a7/0x800 wb_workfn+0x12a/0xbd0 process_one_work+0x85a/0x1460 worker_thread+0x5e2/0xfc0 kthread+0x39d/0x750 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 -> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}: __lock_acquire+0x2f52/0x5ea0 lock_acquire+0x1b1/0x540 __flush_work+0x3ac/0xb60 wb_shutdown+0x15b/0x1f0 bdi_unregister+0x172/0x5b0 del_gendisk+0x841/0xa20 sd_remove+0x85/0x130 device_release_driver_internal+0x368/0x520 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 __scsi_remove_device+0x272/0x340 scsi_forget_host+0xf7/0x170 scsi_remove_host+0xd2/0x2a0 sdebug_driver_remove+0x52/0x2f0 [scsi_debug] device_release_driver_internal+0x368/0x520 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 device_unregister+0x13/0xa0 sdebug_do_remove_host+0x1fb/0x290 [scsi_debug] scsi_debug_exit+0x17/0x70 [scsi_debug] __do_sys_delete_module.isra.0+0x321/0x520 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e other info that might help us debug this: Chain exists of: (work_completion)(&(&wb->dwork)->work) --> &fs_info->dev_replace.rwsem --> &q->q_usage_counter(queue)#16 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->q_usage_counter(queue)#16); lock(&fs_info->dev_replace.rwsem); lock(&q->q_usage_counter(queue)#16); lock((work_completion)(&(&wb->dwork)->work)); * DEADLOCK * 5 locks held by modprobe/1110: #0: ffff88811f7bc108 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520 #1: ffff8881022ee0e0 (&shost->scan_mutex){+.+.}-{4:4}, at: scsi_remove_host+0x20/0x2a0 #2: ffff88811b4c4378 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520 #3: ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130 #4: ffffffffa3284360 (rcu_read_lock){....}-{1:3}, at: __flush_work+0xda/0xb60 stack backtrace: CPU: 0 UID: 0 PID: 1110 Comm: modprobe Not tainted 6.14.0-rc1 #252 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x6a/0x90 print_circular_bug.cold+0x1e0/0x274 check_noncircular+0x306/0x3f0 ? __pfx_check_noncircular+0x10/0x10 ? mark_lock+0xf5/0x1650 ? __pfx_check_irq_usage+0x10/0x10 ? lockdep_lock+0xca/0x1c0 ? __pfx_lockdep_lock+0x10/0x10 __lock_acquire+0x2f52/0x5ea0 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_mark_lock+0x10/0x10 lock_acquire+0x1b1/0x540 ? __flush_work+0x38f/0xb60 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x94/0xe0 ? __flush_work+0x38f/0xb60 __flush_work+0x3ac/0xb60 ? __flush_work+0x38f/0xb60 ? __pfx_mark_lock+0x10/0x10 ? __pfx___flush_work+0x10/0x10 ? __pfx_wq_barrier_func+0x10/0x10 ? __pfx___might_resched+0x10/0x10 ? mark_held_locks+0x94/0xe0 wb_shutdown+0x15b/0x1f0 bdi_unregister+0x172/0x5b0 ? __pfx_bdi_unregister+0x10/0x10 ? up_write+0x1ba/0x510 del_gendisk+0x841/0xa20 ? __pfx_del_gendisk+0x10/0x10 ? _raw_spin_unlock_irqrestore+0x35/0x60 ? __pm_runtime_resume+0x79/0x110 sd_remove+0x85/0x130 device_release_driver_internal+0x368/0x520 ? kobject_put+0x5d/0x4a0 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 ? __pfx_device_del+0x10/0x10 __scsi_remove_device+0x272/0x340 scsi_forget_host+0xf7/0x170 scsi_remove_host+0xd2/0x2a0 sdebug_driver_remove+0x52/0x2f0 [scsi_debug] ? kernfs_remove_by_name_ns+0xc0/0xf0 device_release_driver_internal+0x368/0x520 ? kobject_put+0x5d/0x4a0 bus_remove_device+0x1f1/0x3f0 device_del+0x3bd/0x9c0 ? __pfx_device_del+0x10/0x10 ? __pfx___mutex_unlock_slowpath+0x10/0x10 device_unregister+0x13/0xa0 sdebug_do_remove_host+0x1fb/0x290 [scsi_debug] scsi_debug_exit+0x17/0x70 [scsi_debug] __do_sys_delete_module.isra.0+0x321/0x520 ? __pfx___do_sys_delete_module.isra.0+0x10/0x10 ? __pfx_slab_free_after_rcu_debug+0x10/0x10 ? kasan_save_stack+0x2c/0x50 ? kasan_record_aux_stack+0xa3/0xb0 ? __call_rcu_common.constprop.0+0xc4/0xfb0 ? kmem_cache_free+0x3a0/0x590 ? __x64_sys_close+0x78/0xd0 do_syscall_64+0x93/0x180 ? lock_is_held_type+0xd5/0x130 ? __call_rcu_common.constprop.0+0x3c0/0xfb0 ? lockdep_hardirqs_on+0x78/0x100 ? __call_rcu_common.constprop.0+0x3c0/0xfb0 ? __pfx___call_rcu_common.constprop.0+0x10/0x10 ? kmem_cache_free+0x3a0/0x590 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? __pfx___x64_sys_openat+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f436712b68b RSP: 002b:00007ffe9f1a8658 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 00005559b367fd80 RCX: 00007f436712b68b RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005559b367fde8 RBP: 00007ffe9f1a8680 R08: 1999999999999999 R09: 0000000000000000 R10: 00007f43671a5fe0 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ffe9f1a86b0 R14: 0000000000000000 R15: 0000000000000000 </TASK> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> CC: <stable@vger.kernel.org> # 6.13+ Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-04-17 11:57:25 +02:00
Naohiro Aota	c1a79b1a58	block: introduce zone capacity helper {bdev,disk}_zone_capacity() takes block_device or gendisk and sector position and returns the zone capacity of the corresponding zone. With that, move disk_nr_zones() and blk_zone_plug_bio() to consolidate them in the same #ifdef block. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-04-17 11:57:20 +02:00
David Sterba	f1ab0171e9	btrfs: tree-checker: adjust error code for header level check The whole tree checker returns EUCLEAN, except the one check in btrfs_verify_level_key(). This was inherited from the function that was moved from disk-io.c in `2cac5af165` ("btrfs: move btrfs_verify_level_key into tree-checker.c") but this should be unified with the rest. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-04-17 11:56:53 +02:00
Filipe Manana	50fecb8cf0	btrfs: fix invalid inode pointer after failure to create reloc inode If we have a failure at create_reloc_inode(), under the 'out' label we assign an error pointer to the 'inode' variable and then return a weird pointer because we return the expression "&inode->vfs_inode": static noinline_for_stack struct inode create_reloc_inode( const struct btrfs_block_group group) { (...) out: (...) if (ret) { if (inode) iput(&inode->vfs_inode); inode = ERR_PTR(ret); } return &inode->vfs_inode; } This can make us return a pointer that is not an error pointer and make the caller proceed as if an error didn't happen and later result in an invalid memory access when dereferencing the inode pointer. Syzbot reported reported such a case with the following stack trace: R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 431bde82d7b634db R15: 00007ffc55de5790 </TASK> BTRFS info (device loop0): relocating block group 6881280 flags data\|metadata Oops: general protection fault, probably for non-canonical address 0xdffffc0000000045: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000228-0x000000000000022f] CPU: 0 UID: 0 PID: 5332 Comm: syz-executor215 Not tainted 6.14.0-syzkaller-13423-ga8662bcd2ff1 #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:relocate_file_extent_cluster+0xe7/0x1750 fs/btrfs/relocation.c:2971 Code: 00 74 08 (...) RSP: 0018:ffffc9000d3375e0 EFLAGS: 00010203 RAX: 0000000000000045 RBX: 000000000000022c RCX: ffff888000562440 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880452db000 RBP: ffffc9000d337870 R08: ffffffff84089251 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffffffff9368a020 R14: 0000000000000394 R15: ffff8880452db000 FS: 000055558bc7b380(0000) GS:ffff88808c596000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a7a192e740 CR3: 0000000036e2e000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> relocate_block_group+0xa1e/0xd50 fs/btrfs/relocation.c:3657 btrfs_relocate_block_group+0x777/0xd80 fs/btrfs/relocation.c:4011 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3511 __btrfs_balance+0x1a93/0x25e0 fs/btrfs/volumes.c:4292 btrfs_balance+0xbde/0x10c0 fs/btrfs/volumes.c:4669 btrfs_ioctl_balance+0x3f5/0x660 fs/btrfs/ioctl.c:3586 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf1/0x160 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb4ef537dd9 Code: 28 00 00 (...) RSP: 002b:00007ffc55de5728 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffc55de5750 RCX: 00007fb4ef537dd9 RDX: 0000200000000440 RSI: 00000000c4009420 RDI: 0000000000000003 RBP: 0000000000000002 R08: 00007ffc55de54c6 R09: 00007ffc55de5770 R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 431bde82d7b634db R15: 00007ffc55de5790 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:relocate_file_extent_cluster+0xe7/0x1750 fs/btrfs/relocation.c:2971 Code: 00 74 08 (...) RSP: 0018:ffffc9000d3375e0 EFLAGS: 00010203 RAX: 0000000000000045 RBX: 000000000000022c RCX: ffff888000562440 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880452db000 RBP: ffffc9000d337870 R08: ffffffff84089251 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffffffff9368a020 R14: 0000000000000394 R15: ffff8880452db000 FS: 000055558bc7b380(0000) GS:ffff88808c596000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a7a192e740 CR3: 0000000036e2e000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ---------------- Code disassembly (best guess): 0: 00 74 08 48 add %dh,0x48(%rax,%rcx,1) 4: 89 df mov %ebx,%edi 6: e8 f8 36 24 fe call 0xfe243703 b: 48 89 9c 24 30 01 00 mov %rbx,0x130(%rsp) 12: 00 13: 4c 89 74 24 28 mov %r14,0x28(%rsp) 18: 4d 8b 76 10 mov 0x10(%r14),%r14 1c: 49 8d 9e 98 fe ff ff lea -0x168(%r14),%rbx 23: 48 89 d8 mov %rbx,%rax 26: 48 c1 e8 03 shr $0x3,%rax * 2a: 42 80 3c 20 00 cmpb $0x0,(%rax,%r12,1) <-- trapping instruction 2f: 74 08 je 0x39 31: 48 89 df mov %rbx,%rdi 34: e8 ca 36 24 fe call 0xfe243703 39: 4c 8b 3b mov (%rbx),%r15 3c: 48 rex.W 3d: 8b .byte 0x8b 3e: 44 rex.R 3f: 24 .byte 0x24 So fix this by returning the error immediately. Reported-by: syzbot+7481815bb47ef3e702e2@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/67f14ee9.050a0220.0a13.023e.GAE@google.com/ Fixes: `b204e5c7d4` ("btrfs: make btrfs_iget() return a btrfs inode instead") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-04-17 11:56:36 +02:00
Johannes Thumshirn	b0c26f4799	btrfs: zoned: return EIO on RAID1 block group write pointer mismatch There was a bug report about a NULL pointer dereference in __btrfs_add_free_space_zoned() that ultimately happens because a conversion from the default metadata profile DUP to a RAID1 profile on two disks. The stack trace has the following signature: BTRFS error (device sdc): zoned: write pointer offset mismatch of zones in raid1 profile BUG: kernel NULL pointer dereference, address: 0000000000000058 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:__btrfs_add_free_space_zoned.isra.0+0x61/0x1a0 RSP: 0018:ffffa236b6f3f6d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff96c8132f3400 RCX: 0000000000000001 RDX: 0000000010000000 RSI: 0000000000000000 RDI: ffff96c8132f3410 RBP: 0000000010000000 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000 R13: ffff96c758f65a40 R14: 0000000000000001 R15: 000011aac0000000 FS: 00007fdab1cb2900(0000) GS:ffff96e60ca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 00000001a05ae000 CR4: 0000000000350ef0 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15c/0x2f0 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? __btrfs_add_free_space_zoned.isra.0+0x61/0x1a0 btrfs_add_free_space_async_trimmed+0x34/0x40 btrfs_add_new_free_space+0x107/0x120 btrfs_make_block_group+0x104/0x2b0 btrfs_create_chunk+0x977/0xf20 btrfs_chunk_alloc+0x174/0x510 ? srso_return_thunk+0x5/0x5f btrfs_inc_block_group_ro+0x1b1/0x230 btrfs_relocate_block_group+0x9e/0x410 btrfs_relocate_chunk+0x3f/0x130 btrfs_balance+0x8ac/0x12b0 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? __kmalloc_cache_noprof+0x14c/0x3e0 btrfs_ioctl+0x2686/0x2a80 ? srso_return_thunk+0x5/0x5f ? ioctl_has_perm.constprop.0.isra.0+0xd2/0x120 __x64_sys_ioctl+0x97/0xc0 do_syscall_64+0x82/0x160 ? srso_return_thunk+0x5/0x5f ? __memcg_slab_free_hook+0x11a/0x170 ? srso_return_thunk+0x5/0x5f ? kmem_cache_free+0x3f0/0x450 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? syscall_exit_to_user_mode+0x10/0x210 ? srso_return_thunk+0x5/0x5f ? do_syscall_64+0x8e/0x160 ? sysfs_emit+0xaf/0xc0 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? seq_read_iter+0x207/0x460 ? srso_return_thunk+0x5/0x5f ? vfs_read+0x29c/0x370 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? syscall_exit_to_user_mode+0x10/0x210 ? srso_return_thunk+0x5/0x5f ? do_syscall_64+0x8e/0x160 ? srso_return_thunk+0x5/0x5f ? exc_page_fault+0x7e/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fdab1e0ca6d RSP: 002b:00007ffeb2b60c80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fdab1e0ca6d RDX: 00007ffeb2b60d80 RSI: 00000000c4009420 RDI: 0000000000000003 RBP: 00007ffeb2b60cd0 R08: 0000000000000000 R09: 0000000000000013 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffeb2b6343b R14: 00007ffeb2b60d80 R15: 0000000000000001 </TASK> CR2: 0000000000000058 ---[ end trace 0000000000000000 ]--- The 1st line is the most interesting here: BTRFS error (device sdc): zoned: write pointer offset mismatch of zones in raid1 profile When a RAID1 block-group is created and a write pointer mismatch between the disks in the RAID set is detected, btrfs sets the alloc_offset to the length of the block group marking it as full. Afterwards the code expects that a balance operation will evacuate the data in this block-group and repair the problems. But before this is possible, the new space of this block-group will be accounted in the free space cache. But in __btrfs_add_free_space_zoned() it is being checked if it is a initial creation of a block group and if not a reclaim decision will be made. But the decision if a block-group's free space accounting is done for an initial creation depends on if the size of the added free space is the whole length of the block-group and the allocation offset is 0. But as btrfs_load_block_group_zone_info() sets the allocation offset to the zone capacity (i.e. marking the block-group as full) this initial decision is not met, and the space_info pointer in the 'struct btrfs_block_group' has not yet been assigned. Fail creation of the block group and rely on manual user intervention to re-balance the filesystem. Afterwards the filesystem can be unmounted, mounted in degraded mode and the missing device can be removed after a full balance of the filesystem. Reported-by: 西木野羰基 <yanqiyu01@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@mail.gmail.com/ Fixes: `b1934cd606` ("btrfs: zoned: handle broken write pointer on zones") Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-04-17 11:56:19 +02:00
Qu Wenruo	7d82240c45	btrfs: fix the ASSERT() inside GET_SUBPAGE_BITMAP() After enabling large data folios for tests, I hit the ASSERT() inside GET_SUBPAGE_BITMAP() where blocks_per_folio matches BITS_PER_LONG. The ASSERT() itself is only based on the original subpage fs block size, where we have at most 16 blocks per page, thus "ASSERT(blocks_per_folio < BITS_PER_LONG)". However the experimental large data folio support will set the max folio order according to the BITS_PER_LONG, so we can have a case where a large folio contains exactly BITS_PER_LONG blocks. So the ASSERT() is too strict, change it to "ASSERT(blocks_per_folio <= BITS_PER_LONG)" to avoid the false alert. Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-04-17 11:55:56 +02:00
Qu Wenruo	bc2dbc4983	btrfs: avoid page_lockend underflow in btrfs_punch_hole_lock_range() [BUG] When running btrfs/004 with 4K fs block size and 64K page size, sometimes fsstress workload can take 100% CPU for a while, but not long enough to trigger a 120s hang warning. [CAUSE] When such 100% CPU usage happens, btrfs_punch_hole_lock_range() is always in the call trace. One example when this problem happens, the function btrfs_punch_hole_lock_range() got the following parameters: lock_start = 4096, lockend = 20469 Then we calculate @page_lockstart by rounding up lock_start to page boundary, which is 64K (page size is 64K). For @page_lockend, we round down the value towards page boundary, which result 0. Then since we need to pass an inclusive end to filemap_range_has_page(), we subtract 1 from the rounded down value, resulting in (u64)-1. In the above case, the range is inside the same page, and we do not even need to call filemap_range_has_page(), not to mention to call it with (u64)-1 at the end. This behavior will cause btrfs_punch_hole_lock_range() to busy loop waiting for irrelevant range to have its pages dropped. [FIX] Calculate @page_lockend by just rounding down @lockend, without decreasing the value by one. So @page_lockend will no longer overflow. Then exit early if @page_lockend is no larger than @page_lockstart. As it means either the range is inside the same page, or the two pages are adjacent already. Finally only decrease @page_lockend when calling filemap_range_has_page(). Fixes: `0528476b6a` ("btrfs: fix the filemap_range_has_page() call in btrfs_punch_hole_lock_range()") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-04-17 11:55:34 +02:00

... 26 27 28 29 30 ...

1354251 Commits