linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Kent Russell	e54c5de901	drm/amdgpu: Include sdma_4_4_4.bin This got missed during SDMA 4.4.4 support. Fixes: `968e3811c3` ("drm/amdgpu: add initial support for sdma444") Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `51526efe02`) Cc: stable@vger.kernel.org	2025-06-30 13:52:01 -04:00
David Yat Sin	62461367f4	amdkfd: MTYPE_UC for ext-coherent system memory Set memory mtype to UC host memory when ext-coherent flag is set and memory is registered as a SVM allocation. Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: David Yat Sin <David.YatSin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5d14fdab47`)	2025-06-30 13:51:46 -04:00
Alex Deucher	905967e359	drm/amdgpu/sdma5.x: suspend KFD queues in ring reset SDMA 5.x only supports engine soft reset which resets all queues on the engine. As such, we need to suspend KFD queues around resets like we do for SDMA 4.x. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `61feed0baa`)	2025-06-30 13:51:07 -04:00
Dmitry Baryshkov	e8537cad82	drm/bridge: aux-hpd-bridge: fix assignment of the of_node Perform fix similar to the one in the commit `85e444a681` ("drm/bridge: Fix assignment of the of_node of the parent to aux bridge"). The assignment of the of_node to the aux HPD bridge needs to mark the of_node as reused, otherwise driver core will attempt to bind resources like pinctrl, which is going to fail as corresponding pins are already marked as used by the parent device. Fix that by using the device_set_of_node_from_dev() helper instead of assigning it directly. Fixes: `e560518a6c` ("drm/bridge: implement generic DP HPD bridge") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250608-fix-aud-hpd-bridge-v1-1-4641a6f8e381@oss.qualcomm.com	2025-06-30 17:43:17 +02:00
Dmitry Baryshkov	eb028cd884	drm/bridge: panel: move prepare_prev_first handling to drm_panel_bridge_add_typed The commit `5ea6b17027` ("drm/panel: Add prepare_prev_first flag to drm_panel") and commit `0974687a19` ("drm/bridge: panel: Set pre_enable_prev_first from drmm_panel_bridge_add") added handling of panel's prepare_prev_first to devm_panel_bridge_add() and drmm_panel_bridge_add(). However if the driver calls drm_panel_bridge_add_typed() directly, then the flag won't be handled and thus the drm_bridge.pre_enable_prev_first will not be set. Move prepare_prev_first handling to the drm_panel_bridge_add_typed() so that there is no way to miss the flag. Fixes: `5ea6b17027` ("drm/panel: Add prepare_prev_first flag to drm_panel") Fixes: `0974687a19` ("drm/bridge: panel: Set pre_enable_prev_first from drmm_panel_bridge_add") Reported-by: Svyatoslav Ryhel <clamor95@gmail.com> Closes: https://lore.kernel.org/dri-devel/CAPVz0n3YZass3Bns1m0XrFxtAC0DKbEPiW6vXimQx97G243sXw@mail.gmail.com/ Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250220-panel_prev_first-v1-1-b9e787825a1a@linaro.org	2025-06-30 17:42:32 +02:00
Christian König	97e000acf2	drm/ttm: fix error handling in ttm_buffer_object_transfer Unlocking the resv object was missing in the error path, additionally to that we should move over the resource only after the fence slot was reserved. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Fixes: `c8d4c18bfb` ("dma-buf/drivers: make reserving a shared slot mandatory v4") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20250616130726.22863-3-christian.koenig@amd.com	2025-06-30 13:26:28 +02:00
Hans de Goede	7da6c155a6	drm/i915/dsi: Fix NULL pointer deref in vlv_dphy_param_init() Commit `77ba0b8562` ("drm/i915/dsi: convert vlv_dsi.[ch] to struct intel_display") added a to_intel_display(connector) call to vlv_dphy_param_init() but when vlv_dphy_param_init() gets called the connector object has not been initialized yet, so this leads to a NULL pointer deref: BUG: kernel NULL pointer dereference, address: 000000000000000c ... Hardware name: ASUSTeK COMPUTER INC. T100TA/T100TA, BIOS T100TA.314 08/13/2015 RIP: 0010:vlv_dsi_init+0x4e6/0x1600 [i915] ... Call Trace: <TASK> ? intel_step_name+0x4be8/0x5c30 [i915] intel_setup_outputs+0x2d6/0xbd0 [i915] intel_display_driver_probe_nogem+0x13f/0x220 [i915] i915_driver_probe+0x3d9/0xaf0 [i915] Use to_intel_display(&intel_dsi->base) instead to fix this. Fixes: `77ba0b8562` ("drm/i915/dsi: convert vlv_dsi.[ch] to struct intel_display") Signed-off-by: Hans de Goede <hansg@kernel.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20250626143317.101706-1-hansg@kernel.org Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit `0dc6bfb50a`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2025-06-30 08:12:36 +03:00
Dan Carpenter	caa7c7a76b	drm/i915/selftests: Change mock_request() to return error pointers There was an error pointer vs NULL bug in __igt_breadcrumbs_smoketest(). The __mock_request_alloc() function implements the smoketest->request_alloc() function pointer. It was supposed to return error pointers, but it propogates the NULL return from mock_request() so in the event of a failure, it would lead to a NULL pointer dereference. To fix this, change the mock_request() function to return error pointers and update all the callers to expect that. Fixes: `52c0fdb25c` ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/685c1417.050a0220.696f5.5c05@mx.google.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `778fa8ad5f`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2025-06-30 08:12:33 +03:00
Thomas Weißschuh	2ab3ba3915	drm/bridge: samsung-dsim: Don't use %pK through printk In the past %pK was preferable to %p as it would not leak raw pointer values into the kernel log. Since commit `ad67b74d24` ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through printk(). They can still unintentionally leak raw pointers or acquire sleeping locks in atomic contexts. Switch to the regular pointer formatting which is safer and easier to reason about. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Inki Dae <inki.dae@samsung.com>	2025-06-29 18:05:51 +09:00
Marek Szyprowski	5d91394f23	drm/exynos: fimd: Guard display clock control with runtime PM calls Commit `c9b1150a68` ("drm/atomic-helper: Re-order bridge chain pre-enable and post-disable") changed the call sequence to the CRTC enable/disable and bridge pre_enable/post_disable methods, so those bridge methods are now called when CRTC is not yet enabled. This causes a lockup observed on Samsung Peach-Pit/Pi Chromebooks. The source of this lockup is a call to fimd_dp_clock_enable() function, when FIMD device is not yet runtime resumed. It worked before the mentioned commit only because the CRTC implemented by the FIMD driver was always enabled what guaranteed the FIMD device to be runtime resumed. This patch adds runtime PM guards to the fimd_dp_clock_enable() function to enable its proper operation also when the CRTC implemented by FIMD is not yet enabled. Fixes: `196e059a8a` ("drm/exynos: convert clock_enable crtc callback to pipeline clock") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>	2025-06-29 16:58:16 +09:00
Kaustabh Chakraborty	b846350aa2	drm/exynos: exynos7_drm_decon: add vblank check in IRQ handling If there's support for another console device (such as a TTY serial), the kernel occasionally panics during boot. The panic message and a relevant snippet of the call stack is as follows: Unable to handle kernel NULL pointer dereference at virtual address 000000000000000 Call trace: drm_crtc_handle_vblank+0x10/0x30 (P) decon_irq_handler+0x88/0xb4 [...] Otherwise, the panics don't happen. This indicates that it's some sort of race condition. Add a check to validate if the drm device can handle vblanks before calling drm_crtc_handle_vblank() to avoid this. Cc: stable@vger.kernel.org Fixes: `96976c3d9a` ("drm/exynos: Add DECON driver") Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org> Signed-off-by: Inki Dae <inki.dae@samsung.com>	2025-06-29 16:58:16 +09:00
Thomas Weißschuh	18665eaa2a	drm/exynos: Don't use %pK through printk In the past %pK was preferable to %p as it would not leak raw pointer values into the kernel log. Since commit `ad67b74d24` ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through printk(). They can still unintentionally leak raw pointers or acquire sleeping locks in atomic contexts. Switch to the regular pointer formatting which is safer and easier to reason about. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Inki Dae <inki.dae@samsung.com>	2025-06-29 16:58:13 +09:00
Dave Airlie	9fbceb37c9	Merge tag 'drm-misc-fixes-2025-06-26' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.16-rc4: - Fix function signature of drm_writeback_connector_cleanup. - Use correct HDMI audio bridge in drm_connector_hdmi_audio_init. - Make HPD work on SN65DSI86. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/3dd1d5e1-73b6-4b0c-a208-f7d6235cf530@linux.intel.com	2025-06-28 06:53:00 +10:00
Thomas Zimmermann	615cc4223f	drm/vesadrm: Avoid NULL-ptr deref in vesadrm_pmi_cmap_write() Only set PMI fields if the screen_info's Vesa PM segment has been set. Vesa PMI is the power-management interface. It also provides means to set the color palette. The interface is optional, so not all VESA graphics cards support it. Print vesafb's warning [1] if the hardware palette cannot be set at all. If unsupported the field PrimaryPalette in struct vesadrm.pmi is NULL, which results in a segmentation fault. Happens with qemu's Cirrus emulation. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: `814d270b31` ("drm/sysfb: vesadrm: Add gamma correction") Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/video/fbdev/vesafb.c#L375 # 1 Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: dri-devel@lists.freedesktop.org Acked-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20250617140944.142392-1-tzimmermann@suse.de	2025-06-27 16:00:49 +02:00
Maxime Ripard	f6faebc11a	drm/panel: panel-simple: get rid of panel_dpi hack The empty panel_dpi struct was only ever used as a discriminant, but it's kind of a hack, and with the reworks done in the previous patches, we shouldn't need it anymore. Let's get rid of it. Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> # Toradex Colibri iMX6 Link: https://lore.kernel.org/r/20250626-drm-panel-simple-fixes-v2-5-5afcaa608bdc@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-06-27 11:22:48 +02:00
Maxime Ripard	47c08262f3	drm/panel: panel-simple: Add function to look panel data up Commit `de04bb0089` ("drm/panel/panel-simple: Use the new allocation in place of devm_kzalloc()") moved the call to drm_panel_init into the devm_drm_panel_alloc(), which needs a connector type to initialize properly. In the panel-dpi compatible case, the passed panel_desc structure is an empty one used as a discriminant, and the connector type it contains isn't actually initialized. It is initialized through a call to panel_dpi_probe() later in the function, which used to be before the call to drm_panel_init() that got merged into devm_drm_panel_alloc(). So, we do need a proper panel_desc pointer before the call to devm_drm_panel_alloc() now. All cases associate their panel_desc with the panel compatible and use of_device_get_match_data, except for the panel-dpi compatible. In that case, we're expected to call panel_dpi_probe, which will allocate and initialize the panel_desc for us. Let's create such a helper function that would be called first in the driver and will lookup the desc by compatible, or allocate one if relevant. Reported-by: Francesco Dolcini <francesco@dolcini.it> Closes: https://lore.kernel.org/all/20250612081834.GA248237@francesco-nb/ Fixes: `de04bb0089` ("drm/panel/panel-simple: Use the new allocation in place of devm_kzalloc()") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> # Toradex Colibri iMX6 Link: https://lore.kernel.org/r/20250626-drm-panel-simple-fixes-v2-4-5afcaa608bdc@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-06-27 11:22:47 +02:00
Maxime Ripard	921c41e509	drm/panel: panel-simple: Make panel_simple_probe return its panel In order to fix the regession introduced by commit `de04bb0089` ("drm/panel/panel-simple: Use the new allocation in place of devm_kzalloc()"), we need to move the panel_desc lookup into the common panel_simple_probe() function. There's two callers for that function, the probe implementations of the platform and MIPI-DSI drivers panel-simple implements. The MIPI-DSI driver's probe will need to access the current panel_desc to initialize properly, which won't be possible anymore if we make that lookup in panel_simple_probe(). However, we can make panel_simple_probe() return the initialized panel_simple structure it allocated, which will contain a pointer to the associated panel_desc in its desc field. This doesn't fix `de04bb0089` ("drm/panel/panel-simple: Use the new allocation in place of devm_kzalloc()") still, but makes progress towards that goal. Fixes: `de04bb0089` ("drm/panel/panel-simple: Use the new allocation in place of devm_kzalloc()") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> # Toradex Colibri iMX6 Link: https://lore.kernel.org/r/20250626-drm-panel-simple-fixes-v2-3-5afcaa608bdc@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-06-27 11:22:47 +02:00
Maxime Ripard	073667fce1	drm/panel: panel-simple: make panel_dpi_probe return a panel_desc If the panel-simple driver is probed from a panel-dpi compatible, the driver will use an empty panel_desc structure as a descriminant. It will then allocate and fill another panel_desc as part of its probe. However, that allocation needs to happen after the panel_simple structure has been allocated, since panel_dpi_probe(), the function doing the panel_desc allocation and initialization, takes a panel_simple pointer as an argument. This pointer is used to fill the panel_simple->desc pointer that is still initialized with the empty panel_desc when panel_dpi_probe() is called. Since commit `de04bb0089` ("drm/panel/panel-simple: Use the new allocation in place of devm_kzalloc()"), we will need the panel connector type found in panel_desc to allocate panel_simple. This creates a circular dependency where we need panel_desc to create panel_simple, and need panel_simple to create panel_desc. Let's break that dependency by making panel_dpi_probe simply return the panel_desc it initialized and move the panel_simple->desc assignment to the caller. This will not fix the breaking commit entirely, but will move us towards the right direction. Fixes: `de04bb0089` ("drm/panel/panel-simple: Use the new allocation in place of devm_kzalloc()") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> # Toradex Colibri iMX6 Link: https://lore.kernel.org/r/20250626-drm-panel-simple-fixes-v2-2-5afcaa608bdc@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-06-27 11:22:46 +02:00
Maxime Ripard	2d22b63f3a	drm/mipi-dsi: Add dev_is_mipi_dsi function This will be especially useful for generic panels (like panel-simple) which can take different code path depending on if they are MIPI-DSI devices or platform devices. Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> # Toradex Colibri iMX6 Link: https://lore.kernel.org/r/20250626-drm-panel-simple-fixes-v2-1-5afcaa608bdc@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-06-27 11:22:39 +02:00
Dave Airlie	6daaa479ac	UAPI Changes: Driver Changes: - Missing error check (Haoxiang Li) - Fix xe_hwmon_power_max_write (Karthik) - Move flushes (Maarten and Matthew Auld) - Explicitly exit CT safe mode on unwind (Michal) - Process deferred GGTT node removals on device unwind (Michal) -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRskUM7w1oG5rx2IZO4FpNVCsYGvwUCaF1TdQAKCRC4FpNVCsYG v11OAP440S8EaxlNVmSoJOv1xB8szucsuakEEooS/WtqxsboZwEAimhfqQKKWSjC SpsGncCyt4R16qAZsdJykFPBG4Sirgg= =8QG7 -----END PGP SIGNATURE----- Merge tag 'drm-xe-fixes-2025-06-26' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: Driver Changes: - Missing error check (Haoxiang Li) - Fix xe_hwmon_power_max_write (Karthik) - Move flushes (Maarten and Matthew Auld) - Explicitly exit CT safe mode on unwind (Michal) - Process deferred GGTT node removals on device unwind (Michal) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/aF1T6EzzC3xj4K4H@fedora	2025-06-27 09:15:00 +10:00
Dave Airlie	b6211ab2eb	Merge tag 'drm-intel-fixes-2025-06-26' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix for SNPS PHY HDMI for 1080p@120Hz - Correct DP AUX DPCD probe address - Followup build fix for GCOV and AutoFDO enabled config Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://lore.kernel.org/r/aFzsHR9WLYsxg8jy@jlahtine-mobl	2025-06-27 09:09:02 +10:00
Jakub Kicinski	28aa52b618	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc4). Conflicts: Documentation/netlink/specs/mptcp_pm.yaml `9e6dd4c256` ("netlink: specs: mptcp: replace underscores with dashes in names") `ec362192aa` ("netlink: specs: fix up indentation errors") https://lore.kernel.org/20250626122205.389c2cd4@canb.auug.org.au Adjacent changes: Documentation/netlink/specs/fou.yaml `791a9ed0a4` ("netlink: specs: fou: replace underscores with dashes in names") `880d43ca9a` ("netlink: specs: clean up spaces in brackets") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-26 10:40:50 -07:00
Michal Wajdeczko	af2b588abe	drm/xe: Process deferred GGTT node removals on device unwind While we are indirectly draining our dedicated workqueue ggtt->wq that we use to complete asynchronous removal of some GGTT nodes, this happends as part of the managed-drm unwinding (ggtt_fini_early), which could be later then manage-device unwinding, where we could already unmap our MMIO/GMS mapping (mmio_fini). This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes) and this was leading to: [ ] BUG: unable to handle page fault for address: ffffc900058162a0 [ ] #PF: supervisor write access in kernel mode [ ] #PF: error_code(0x0002) - not-present page [ ] Oops: Oops: 0002 [#1] SMP NOPTI [ ] Tainted: [W]=WARN [ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe] [ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe] [ ] Call Trace: [ ] <TASK> [ ] xe_ggtt_clear+0xb0/0x270 [xe] [ ] ggtt_node_remove+0xbb/0x120 [xe] [ ] ggtt_node_remove_work_func+0x30/0x50 [xe] [ ] process_one_work+0x22b/0x6f0 [ ] worker_thread+0x1e8/0x3d Add managed-device action that will explicitly drain the workqueue with all pending node removals prior to releasing MMIO/GSM mapping. Fixes: `919bb54e98` ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250612220937.857-2-michal.wajdeczko@intel.com (cherry picked from commit `89d2835c36`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-26 15:21:45 +02:00
Michal Wajdeczko	ad40098da5	drm/xe/guc: Explicitly exit CT safe mode on unwind During driver probe we might be briefly using CT safe mode, which is based on a delayed work, but usually we are able to stop this once we have IRQ fully operational. However, if we abort the probe quite early then during unwind we might try to destroy the workqueue while there is still a pending delayed work that attempts to restart itself which triggers a WARN. This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] ------------[ cut here ]------------ [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710 [ ] RIP: 0010:__queue_work+0x287/0x710 [ ] Call Trace: [ ] delayed_work_timer_fn+0x19/0x30 [ ] call_timer_fn+0xa1/0x2a0 Exit the CT safe mode on unwind to avoid that warning. Fixes: `09b286950f` ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250612220937.857-3-michal.wajdeczko@intel.com (cherry picked from commit `2ddbb73ec2`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-26 14:52:43 +02:00
Matthew Auld	f16873f42a	drm/xe: move DPT l2 flush to a more sensible place Only need the flush for DPT host updates here. Normal GGTT updates don't need special flush. Fixes: `01570b4469` ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250606104546.1996818-4-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `35db1da40c`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-26 14:52:25 +02:00
Maarten Lankhorst	a4b1b51ae1	drm/xe: Move DSB l2 flush to a more sensible place Flushing l2 is only needed after all data has been written. Fixes: `01570b4469` ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://lore.kernel.org/r/20250606104546.1996818-3-matthew.auld@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `0dd2dd0182`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-26 14:52:19 +02:00
Jayesh Choudhary	55e8ff8420	drm/bridge: ti-sn65dsi86: Add HPD for DisplayPort connector type By default, HPD was disabled on SN65DSI86 bridge. When the driver was added (commit "a095f15c00e27"), the HPD_DISABLE bit was set in pre-enable call which was moved to other function calls subsequently. Later on, commit "c312b0df3b13" added detect utility for DP mode. But with HPD_DISABLE bit set, all the HPD events are disabled[0] and the debounced state always return 1 (always connected state). Set HPD_DISABLE bit conditionally based on display sink's connector type. Since the HPD_STATE is reflected correctly only after waiting for debounce time (~100-400ms) and adding this delay in detect() is not feasible owing to the performace impact (glitches and frame drop), remove runtime calls in detect() and add hpd_enable()/disable() bridge hooks with runtime calls, to detect hpd properly without any delay. [0]: <https://www.ti.com/lit/gpn/SN65DSI86> (Pg. 32) Fixes: `c312b0df3b` ("drm/bridge: ti-sn65dsi86: Implement bridge connector operations for DP") Cc: Max Krummenacher <max.krummenacher@toradex.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com> Signed-off-by: Jayesh Choudhary <j-choudhary@ti.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20250624044835.165708-1-j-choudhary@ti.com	2025-06-25 07:49:01 -07:00
Arnd Bergmann	d02b2103a0	drm/i915: fix build error some more An earlier patch fixed a build failure with clang, but I still see the same problem with some configurations using gcc: drivers/gpu/drm/i915/i915_pmu.c: In function 'config_mask': include/linux/compiler_types.h:568:38: error: call to '__compiletime_assert_462' declared with attribute error: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1 drivers/gpu/drm/i915/i915_pmu.c:116:3: note: in expansion of macro 'BUILD_BUG_ON' 116 \| BUILD_BUG_ON(bit > As I understand it, the problem is that the function is not always fully inlined, but the __builtin_constant_p() can still evaluate the argument as being constant. Marking it as __always_inline so far works for me in all configurations. Fixes: `a7137b1825` ("drm/i915/pmu: Fix build error with GCOV and AutoFDO enabled") Fixes: `a644fde77f` ("drm/i915/pmu: Change bitmask of enabled events to u32") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250620111824.3395007-1-arnd@kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `ef69f9dd1c`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2025-06-25 10:23:16 +03:00
Karthik Poosa	9127a69c71	drm/xe/hwmon: Fix xe_hwmon_power_max_write Prevent other bits of mailbox power limit from being overwritten with 0. This issue was due to a missing read and modify of current power limit, before setting a requested mailbox power limit, which is added in this patch. v2: - Improve commit message. (Anshuman) v3: - Rebase. - Rephrase commit message. (Riana) - Add read-modify-write variant of xe_hwmon_pcode_write_power_limit() i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal) - Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits. - Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits writes use xe_hwmon_pcode_rmw_power_limit() only. v4: - Use PWR_LIM in place of (PWR_LIM_EN \| PWR_LIM_VAL) wherever applicable. (Riana) Fixes: `25a2aa779f` ("drm/xe/hwmon: Add support to manage power limits though mailbox") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `8aa7306631`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-24 20:18:31 +02:00
Haoxiang Li	6220729347	drm/xe/display: Add check for alloc_ordered_workqueue() Add check for the return value of alloc_ordered_workqueue() in xe_display_create() to catch potential exception. Fixes: `44e694958b` ("drm/xe/display: Implement display support") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/4ee1b0e5d1626ce1dde2e82af05c2edaed50c3aa.1747397638.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit `5b62d63395`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-24 20:18:31 +02:00
Takashi Iwai	6847b3b6e8	drm/amd/display: Add sanity checks for drm_edid_raw() When EDID is retrieved via drm_edid_raw(), it doesn't guarantee to return proper EDID bytes the caller wants: it may be either NULL (that leads to an Oops) or with too long bytes over the fixed size raw_edid array (that may lead to memory corruption). The latter was reported actually when connected with a bad adapter. Add sanity checks for drm_edid_raw() to address the above corner cases, and return EDID_BAD_INPUT accordingly. Fixes: `48edb2a425` ("drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid") Link: https://bugzilla.suse.com/show_bug.cgi?id=1236415 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `648d3f4d20`) Cc: stable@vger.kernel.org	2025-06-24 10:39:24 -04:00
Mario Limonciello	66abb99699	drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value [Why] commit `16dc8bc27c` ("drm/amd/display: Export full brightness range to userspace") adjusted the brightness range to scale to larger values, but missed updating AMDGPU_MAX_BL_LEVEL which is needed to make sure that scaling works properly with custom brightness curves. [How] As the change for max brightness of 0xFFFF only applies to devices supporting DC, use existing DC define MAX_BACKLIGHT_LEVEL. Fixes: `16dc8bc27c` ("drm/amd/display: Export full brightness range to userspace") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250623171114.1156451-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5b852044eb`) Cc: stable@vger.kernel.org	2025-06-24 10:39:13 -04:00
Alex Deucher	31135cc99c	drm/amdgpu/sdma7: add ucode version checks for userq support SDMA 7.0.0/1: 7836028 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `8c011408ed`)	2025-06-24 10:38:05 -04:00
Alex Deucher	899dec4e88	drm/amdgpu/sdma6: add ucode version checks for userq support SDMA 6.0.0 version 24 SDMA 6.0.2 version 21 SDMA 6.0.3 version 25 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e8cca30d8b`)	2025-06-24 10:37:58 -04:00
Mario Limonciello	73eab78721	drm/amd: Adjust output for discovery error handling commit `017fbb6690` ("drm/amdgpu/discovery: check ip_discovery fw file available") added support for reading an amdgpu IP discovery bin file for some specific products. If it's not found then it will fallback to hardcoded values. However if it's not found there is also a lot of noise about missing files and errors. Adjust the error handling to decrease most messages to DEBUG and to show users less about missing files. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4312 Tested-by: Marcus Seyfarth <m.seyfarth@gmail.com> Fixes: `017fbb6690` ("drm/amdgpu/discovery: check ip_discovery fw file available") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250617183052.1692059-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `49f1f9f6c3`)	2025-06-24 10:37:32 -04:00
Alex Deucher	99579c55c3	drm/amdgpu/mes: add compatibility checks for set_hw_resource_1 Seems some older MES firmware versions do not properly support this packet. Add back some the compatibility checks. v2: switch to fw version check (Shaoyun) Fixes: `f81cd79311` ("drm/amd/amdgpu: Fix MES init sequence") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4295 Cc: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: shaoyun.liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `0180e0a5dd`) Cc: stable@vger.kernel.org	2025-06-24 10:34:56 -04:00
Srinivasan Shanmugam	0043ec26d8	drm/amdgpu/gfx9: Add Cleaner Shader Support for GFX9.x GPUs Enable the cleaner shader for other GFX9.x series of GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX9.x GPUs, previously available for GFX9.4.2. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Manu Rastogi <manu.rastogi@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `99808926d0`)	2025-06-24 10:34:44 -04:00
Chaoyi Chen	1035782415	drm/bridge-connector: Fix bridge in drm_connector_hdmi_audio_init() The bridge used in drm_connector_hdmi_audio_init() does not correctly point to the required audio bridge, which lead to incorrect audio configuration input. Fixes: `231adeda9f` ("drm/bridge-connector: hook DisplayPort audio support") Signed-off-by: Chaoyi Chen <chaoyi.chen@rock-chips.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Tested-by: Stephan Gerhold <stephan.gerhold@linaro.org> Link: https://lore.kernel.org/r/20250620011616.118-1-kernel@airkyi.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>	2025-06-24 01:17:21 +03:00
Imre Deak	a3ef3c2da6	drm/dp: Change AUX DPCD probe address from DPCD_REV to LANE0_1_STATUS Reading DPCD registers has side-effects in general. In particular accessing registers outside of the link training register range (0x102-0x106, 0x202-0x207, 0x200c-0x200f, 0x2216) is explicitly forbidden by the DP v2.1 Standard, see 3.6.5.1 DPTX AUX Transaction Handling Mandates 3.6.7.4 128b/132b DP Link Layer LTTPR Link Training Mandates Based on my tests, accessing the DPCD_REV register during the link training of an UHBR TBT DP tunnel sink leads to link training failures. Solve the above by using the DP_LANE0_1_STATUS (0x202) register for the DPCD register access quirk. Cc: <stable@vger.kernel.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://lore.kernel.org/r/20250605082850.65136-2-imre.deak@intel.com (cherry picked from commit `a40c5d727b`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2025-06-23 15:53:42 +03:00
Ankit Nautiyal	9205999e9f	drm/i915/snps_hdmi_pll: Fix 64-bit divisor truncation by using div64_u64 DIV_ROUND_CLOSEST_ULL uses do_div(), which expects a 32-bit divisor. When passing a 64-bit constant like CURVE2_MULTIPLIER, the value is silently truncated to u32, potentially leading to incorrect results on large divisors. Replace DIV_ROUND_CLOSEST_ULL with DIV64_U64_ROUND_CLOSEST which correctly handles full 64-bit division. v2: Use DIV64_U64_ROUND_CLOSEST instead of div64_u64 macro. (Jani) Fixes: `5947642004` ("drm/i915/display: Add support for SNPS PHY HDMI PLL algorithm for DG2") Reported-by: Vas Novikov <vasya.novikov@gmail.com> Closes: https://lore.kernel.org/all/8d7c7958-9558-4c8a-a81a-e9310f2d8852@gmail.com/ Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Cc: Suraj Kandpal <suraj.kandpal@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Vas Novikov <vasya.novikov@gmail.com> Cc: stable@vger.kernel.org # v6.15+ Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://lore.kernel.org/r/20250618130951.1596587-2-ankit.k.nautiyal@intel.com (cherry picked from commit `b300a175a1`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2025-06-23 14:59:55 +03:00
Louis Chauvet	fb721b2c35	drm: writeback: Fix drm_writeback_connector_cleanup signature The drm_writeback_connector_cleanup have the signature: static void drm_writeback_connector_cleanup( struct drm_device dev, struct drm_writeback_connector wb_connector) But it is stored and used as a drmres_release_t typedef void (drmres_release_t)(struct drm_device dev, void *res); While the current code is valid and does not produce any warning, the CFI runtime check (CONFIG_CFI_CLANG) can fail because the function signature is not the same as drmres_release_t. In order to fix this, change the function signature to match what is expected by drmres_release_t. Fixes: `1914ba2b91` ("drm: writeback: Create drmm variants for drm_writeback_connector initialization") Suggested-by: Mark Yacoub <markyacoub@google.com> Reviewed-by: Maíra Canal <mcanal@igalia.com> Link: https://lore.kernel.org/r/20250429-drm-fix-writeback-cleanup-v2-1-548ff3a4e284@bootlin.com Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>	2025-06-23 10:12:44 +02:00
Jeff Layton	707bd05be7	ref_tracker: eliminate the ref_tracker_dir name field Now that we have dentries and the ability to create meaningful symlinks to them, don't keep a name string in each tracker. Switch the output format to print "class@address", and drop the name field. Also, add a kerneldoc header for ref_tracker_dir_init(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20250618-reftrack-dbgfs-v15-9-24fc37ead144@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 17:02:04 -07:00
Jeff Layton	aa7d26c3c3	ref_tracker: add a static classname string to each ref_tracker_dir A later patch in the series will be adding debugfs files for each ref_tracker that get created in ref_tracker_dir_init(). The format will be "class@%px". The current "name" string can vary between ref_tracker_dir objects of the same type, so it's not suitable for this purpose. Add a new "class" string to the ref_tracker dir that describes the the type of object (sans any individual info for that object). Also, in the i915 driver, gate the creation of debugfs files on whether the dentry pointer is still set to NULL. CI has shown that the ref_tracker_dir can be initialized more than once. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20250618-reftrack-dbgfs-v15-4-24fc37ead144@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 17:02:04 -07:00
Dave Airlie	b8de9b21e8	Driver Changes: - A workaround update (Vinay) - Fix memset on iomem (Lucas) - Fix early wedge on GuC Load failure (Daniele) -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRskUM7w1oG5rx2IZO4FpNVCsYGvwUCaFQ0RgAKCRC4FpNVCsYG v816AQDrIMgJ53fOLr8nEW6Ns8ZUct2zq+beuCbmi7na8IqBSQD/Rs6TSU5BTatq jQ9+J7M9xuVVV7mHmsvUKQNdKMxUYw4= =gD/V -----END PGP SIGNATURE----- Merge tag 'drm-xe-fixes-2025-06-19' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - A workaround update (Vinay) - Fix memset on iomem (Lucas) - Fix early wedge on GuC Load failure (Daniele) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/aFQ03kNzhbiNK7gW@fedora	2025-06-20 09:01:51 +10:00
Dave Airlie	97aadc161d	Merge tag 'drm-misc-fixes-2025-06-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.16-rc3: - vivante scheduler fix. - v3d null pointer crash fix. - fix backlight, booting GSP-RM, and potential integer shift overflow in nouveau. - fix compiler warnings about unused linux/export.h - fix malidp unknown modifier spam. - fix for ssd130x. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/d44bab7b-01f8-45a8-a7f4-5d3d563d2f9d@linux.intel.com	2025-06-20 08:57:11 +10:00
Daniele Ceraolo Spurio	a39d082c35	drm/xe: Fix early wedge on GuC load failure When the GuC fails to load we declare the device wedged. However, the very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done before the GT1 GuC objects are initialized, so things go bad when the wedge code attempts to cleanup GT1. To fix this, check the initialization status in the functions called during wedge. Fixes: `7dbe8af13c` ("drm/xe: Wedge the entire device") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: stable@vger.kernel.org # v6.12+: `1e1981b16b`: drm/xe: Fix taking invalid lock on wedge Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250611214453.1159846-2-daniele.ceraolospurio@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `0b93b7dcd9`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-19 17:24:30 +02:00
Lucas De Marchi	87a15c89d8	drm/xe: Fix memset on iomem It should rather use xe_map_memset() as the BO is created with XE_BO_FLAG_VRAM_IF_DGFX in xe_guc_pc_init(). Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250612-vmap-vaddr-v1-1-26238ed443eb@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `21cf47d89f`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-19 16:14:56 +02:00
Vinay Belgaumkar	16c1241b08	drm/xe/bmg: Update Wa_16023588340 This allows for additional L2 caching modes. Fixes: `01570b4469` ("drm/xe/bmg: implement Wa_16023588340") Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-2-94ba5dcc1e30@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `6ab42fa03d`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-19 16:14:47 +02:00
Dave Airlie	453e6fd262	amd-drm-fixes-6.16-2025-06-18: amdgpu: - DP tunneling fix - LTTPR fix - DSC fix - DML2.x ABGR16161616 fix - RMCM fix - Backlight fixes - GFX11 kicker support - SDMA reset fixes - VCN 5.0.1 fix - Reset fix - Misc small fixes amdkfd: - SDMA reset fix - Fix race in GWS scheduling -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaFMhQgAKCRC93/aFa7yZ 2Bx4AQCTVGyAbfrApEaOGsLR/NANvug9v3oR7xMPZA6/H9XHVQEAoXJ+gCqryvak 7DyDr4vNr3ZlLT4oY9AFeTljeYQs0Qc= =lEki -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.16-2025-06-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.16-2025-06-18: amdgpu: - DP tunneling fix - LTTPR fix - DSC fix - DML2.x ABGR16161616 fix - RMCM fix - Backlight fixes - GFX11 kicker support - SDMA reset fixes - VCN 5.0.1 fix - Reset fix - Misc small fixes amdkfd: - SDMA reset fix - Fix race in GWS scheduling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250618203115.1533451-1-alexander.deucher@amd.com	2025-06-19 17:29:14 +10:00
Dave Airlie	bec8ff17ff	Merge tag 'drm-intel-fixes-2025-06-18' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix MIPI vtotal programming off by one on Broxton - Fix PMU code for GCOV and AutoFDO enabled build Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://lore.kernel.org/r/aFJfykDpUwtmpilE@jlahtine-mobl	2025-06-19 14:18:54 +10:00
Alex Deucher	fe79ef3530	drm/amdgpu/sdma5.2: init engine reset mutex Missing the mutex init. Fixes: `47454f2dc0` ("drm/amdgpu: Register the new sdma function pointers for sdma_v5_2") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ea685ff30a`)	2025-06-18 13:17:49 -04:00
Jay Cornwall	cfb05257ae	drm/amdkfd: Fix race in GWS queue scheduling q->gws is not updated atomically with qpd->mapped_gws_queue. If a runlist is created between pqm_set_gws and update_queue it will contain a queue which uses GWS in a process with no GWS allocated. This will result in a scheduler hang. Use q->properties.is_gws which is changed while holding the DQM lock. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b98370220e`) Cc: stable@vger.kernel.org	2025-06-18 13:17:32 -04:00
Alex Deucher	49cc5beeab	drm/amdgpu/sdma5: init engine reset mutex Missing the mutex init. Fixes: `e56d4bf57f` ("drm/amdgpu/: drm/amdgpu: Register the new sdma function pointers for sdma_v5_0") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3f4caf092f`)	2025-06-18 13:17:00 -04:00
Alex Deucher	ebe4354270	drm/amdgpu: switch job hw_fence to amdgpu_fence Use the amdgpu fence container so we can store additional data in the fence. This also fixes the start_time handling for MCBP since we were casting the fence to an amdgpu_fence and it wasn't. Fixes: `3f4c175d62` ("drm/amdgpu: MCBP based on DRM scheduler (v9)") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `bf1cd14f9e`) Cc: stable@vger.kernel.org	2025-06-18 13:15:26 -04:00
Jesse Zhang	7f3b16f3f2	drm/amdgpu: Fix SDMA UTC_L1 handling during start/stop sequences This commit makes two key fixes to SDMA v4.4.2 handling: 1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines by reading the current value before modifying UTC_L1_ENABLE bit. 2. Ensure UTC_L1_ENABLE is consistently managed by: - Adding the missing register write when enabling UTC_L1 during start - Keeping UTC_L1 enabled by default as per hardware requirements v2: Correct SDMA_CNTL setting (Philip) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `375bf56465`) Cc: stable@vger.kernel.org	2025-06-18 13:14:40 -04:00
Lijo Lazar	785c536c31	drm/amdgpu: Release reset locks during failures Make sure to release reset domain lock in case of failures. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Fixes: `11bb33766f` ("drm/amdgpu: refactor amdgpu_device_gpu_recover") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1ab11a8268`)	2025-06-18 13:14:10 -04:00
Alex Hung	b669507b63	drm/amd/display: Check dce_hwseq before dereferencing it [WHAT] hws was checked for null earlier in dce110_blank_stream, indicating hws can be null, and should be checked whenever it is used. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `79db43611f`) Cc: stable@vger.kernel.org	2025-06-18 13:13:43 -04:00
Sonny Jiang	46e15197b5	drm/amdgpu: VCN v5_0_1 to prevent FW checking RB during DPG pause Add a protection to ensure programming are all complete prior VCPU starting. This is a WA for an unintended VCPU running. Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c29521b529`) Cc: stable@vger.kernel.org	2025-06-18 13:13:13 -04:00
Jesse Zhang	caade9d69f	drm/amdgpu: Use logical instance ID for SDMA v4_4_2 queue operations Simplify SDMA v4_4_2 queue reset and stop operations by: 1. Removing GET_INST(SDMA0) conversion for ring->me 2. Using the logical instance ID (ring->me) directly 3. Maintaining consistent behavior with other SDMA queue operations This change aligns with the existing queue handling logic where ring->me already represents the correct instance identifier. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3bab282dfe`) Cc: stable@vger.kernel.org	2025-06-18 13:11:15 -04:00
Jesse Zhang	09b585592f	drm/amdgpu: Fix SDMA engine reset with logical instance ID This commit makes the following improvements to SDMA engine reset handling: 1. Clarifies in the function documentation that instance_id refers to a logical ID 2. Adds conversion from logical to physical instance ID before performing reset using GET_INST(SDMA0, instance_id) macro 3. Improves error messaging to indicate when a logical instance reset fails 4. Adds better code organization with blank lines for readability The change ensures proper SDMA engine reset by using the correct physical instance ID while maintaining the logical ID interface for callers. V2: Remove harvest_config check and convert directly to physical instance (Lijo) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5efa6217c2`) Cc: stable@vger.kernel.org	2025-06-18 13:10:44 -04:00
Frank Min	854171405e	drm/amdgpu: add kicker fws loading for gfx11/smu13/psp13 1. Add kicker firmwares loading for gfx11/smu13/psp13 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_11_0_0_rlc_kicker.bin - gc_11_0_0_imu_kicker.bin - psp_13_0_0_sos_kicker.bin - psp_13_0_0_ta_kicker.bin - smu_13_0_0_kicker.bin Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `fb5ec2174d`) Cc: stable@vger.kernel.org	2025-06-18 13:09:41 -04:00
Frank Min	0bbf5fd86c	drm/amdgpu: Add kicker device detection 1. add kicker device list 2. add kicker device checking helper function Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `09aa2b408f`) Cc: stable@vger.kernel.org	2025-06-18 13:08:52 -04:00
Mario Limonciello	16dc8bc27c	drm/amd/display: Export full brightness range to userspace [WHY] Userspace currently is offered a range from 0-0xFF but the PWM is programmed from 0-0xFFFF. This can be limiting to some software that wants to apply greater granularity. [HOW] Convert internally to firmware values only when mapping custom brightness curves because these are in 0-0xFF range. Advertise full PWM range to userspace. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `8dbd72cb79`) Cc: stable@vger.kernel.org	2025-06-18 13:08:18 -04:00
Mario Limonciello	ffcaed1d7e	drm/amd/display: Only read ACPI backlight caps once [WHY] Backlight caps are read already in amdgpu_dm_update_backlight_caps(). They may be updated by update_connector_ext_caps(). Reading again when registering backlight device may cause wrong values to be used. [HOW] Use backlight caps already registered to the dm. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `148144f6d2`) Cc: stable@vger.kernel.org	2025-06-18 13:07:59 -04:00
Yihan Zhu	158f9944ac	drm/amd/display: Fix RMCM programming seq errors [WHY & HOW] Fix RMCM programming sequence errors and mapping issues to pass the RMCM test. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `11baa49750`) Cc: stable@vger.kernel.org	2025-06-18 13:06:31 -04:00
Alex Hung	8724a5380c	drm/amd/display: Fix mpv playback corruption on weston [WHAT] Severe video playback corruption is observed in the following setup: weston 14.0.90 (built from source) + mpv v0.40.0 with command: mpv bbb_sunflower_1080p_60fps_normal.mp4 --vo=gpu [HOW] ABGR16161616 needs to be included in dml2/2.1 translation. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `d023de809f`) Cc: stable@vger.kernel.org	2025-06-18 13:05:59 -04:00
Nicholas Kazlauskas	0d57dd1765	drm/amd/display: Add more checks for DSC / HUBP ONO guarantees [WHY] For non-zero DSC instances it's possible that the HUBP domain required to drive it for sequential ONO ASICs isn't met, potentially causing the logic to the tile to enter an undefined state leading to a system hang. [HOW] Add more checks to ensure that the HUBP domain matching the DSC instance is appropriately powered. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Duncan Ma <duncan.ma@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `da63df0711`) Cc: stable@vger.kernel.org	2025-06-18 13:05:12 -04:00
Michael Strauss	d358a51444	drm/amd/display: Get LTTPR IEEE OUI/Device ID From Closest LTTPR To Host [WHY] These fields are read for the explicit purpose of detecting embedded LTTPRs (i.e. between host ASIC and the user-facing port), and thus need to calculate the correct DPCD address offset based on LTTPR count to target the appropriate LTTPR's DPCD register space with these queries. [HOW] Cascaded LTTPRs in a link each snoop and increment LTTPR count when queried via DPCD read, so an LTTPR embedded in a source device (e.g. USB4 port on a laptop) will always be addressible using the max LTTPR count seen by the host. Therefore we simply need to use a recently added helper function to calculate the correct DPCD address to target potentially embedded LTTPRs based on the received LTTPR count. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `791897f5c7`) Cc: stable@vger.kernel.org	2025-06-18 13:03:29 -04:00
Peichen Huang	3251b69b7e	drm/amd/display: Add dc cap for dp tunneling [WHAT] 1. add dc cap for dp tunneling 2. add function to get index of host router Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Cruise Hung <cruise.hung@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `29e178d139`) Cc: stable@vger.kernel.org	2025-06-18 12:59:45 -04:00
Jesse.Zhang	a55737dab6	drm/amdkfd: move SDMA queue reset capability check to node_show Relocate the per-SDMA queue reset capability check from kfd_topology_set_capabilities() to node_show() to ensure we read the latest value of sdma.supported_reset after all IP blocks are initialized. Fixes: `ceb7114c96` ("drm/amdkfd: flag per-sdma queue reset supported to user space") Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e17df7b086`)	2025-06-18 12:56:52 -04:00
Dave Airlie	49a5fdc06c	Merge tag 'drm-msm-fixes-2025-06-16' of https://gitlab.freedesktop.org/drm/msm into drm-fixes Fixes for v6.16-rc3 Display: - Fixed DP output on SDM845 - Fixed 10nm DSI PLL init GPU: - SUBMIT ioctl error path leak fixes - drm half of stall-on-fault fixes. Note there is a soft dependency, to get correct mmu fault devcoredumps, on arm-smmu changes which are not in this branch, but have already been merged by Linus. So by the time Linus merges this, everything should be peachy. - a7xx: Missing CP_RESET_CONTEXT_STATE - Skip GPU component bind if GPU is not in the device table. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <rob.clark@oss.qualcomm.com> Link: https://lore.kernel.org/r/CACSVV03=OH74ip8O1xqb8RJWGyM4HFuUnWuR=p3zJR+-ko_AJA@mail.gmail.com	2025-06-18 11:37:14 +10:00
Maíra Canal	61ee19dedb	drm/etnaviv: Protect the scheduler's pending list with its lock Commit `704d3d60fe` ("drm/etnaviv: don't block scheduler when GPU is still active") ensured that active jobs are returned to the pending list when extending the timeout. However, it didn't use the pending list's lock to manipulate the list, which causes a race condition as the scheduler's workqueues are running. Hold the lock while manipulating the scheduler's pending list to prevent a race. Cc: stable@vger.kernel.org Fixes: `704d3d60fe` ("drm/etnaviv: don't block scheduler when GPU is still active") Reported-by: Philipp Stanner <phasta@kernel.org> Closes: https://lore.kernel.org/dri-devel/964e59ba1539083ef29b06d3c78f5e2e9b138ab8.camel@mailbox.org/ Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250602132240.93314-1-mcanal@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>	2025-06-16 20:18:13 -03:00
Maíra Canal	e1bc3a13bd	drm/v3d: Avoid NULL pointer dereference in `v3d_job_update_stats()` The following kernel Oops was recently reported by Mesa CI: [ 800.139824] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000588 [ 800.148619] Mem abort info: [ 800.151402] ESR = 0x0000000096000005 [ 800.155141] EC = 0x25: DABT (current EL), IL = 32 bits [ 800.160444] SET = 0, FnV = 0 [ 800.163488] EA = 0, S1PTW = 0 [ 800.166619] FSC = 0x05: level 1 translation fault [ 800.171487] Data abort info: [ 800.174357] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 800.179832] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 800.184873] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 800.190176] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001014c2000 [ 800.196607] [0000000000000588] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 800.205305] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 800.211564] Modules linked in: vc4 snd_soc_hdmi_codec drm_display_helper v3d cec gpu_sched drm_dma_helper drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm i2c_brcmstb snd_timer snd backlight [ 800.234448] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1 Debian 1:6.12.25-1+rpt1 [ 800.244182] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 800.250005] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 800.256959] pc : v3d_job_update_stats+0x60/0x130 [v3d] [ 800.262112] lr : v3d_job_update_stats+0x48/0x130 [v3d] [ 800.267251] sp : ffffffc080003e60 [ 800.270555] x29: ffffffc080003e60 x28: ffffffd842784980 x27: 0224012000000000 [ 800.277687] x26: ffffffd84277f630 x25: ffffff81012fd800 x24: 0000000000000020 [ 800.284818] x23: ffffff8040238b08 x22: 0000000000000570 x21: 0000000000000158 [ 800.291948] x20: 0000000000000000 x19: ffffff8040238000 x18: 0000000000000000 [ 800.299078] x17: ffffffa8c1bd2000 x16: ffffffc080000000 x15: 0000000000000000 [ 800.306208] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 800.313338] x11: 0000000000000040 x10: 0000000000001a40 x9 : ffffffd83b39757c [ 800.320468] x8 : ffffffd842786420 x7 : 7fffffffffffffff x6 : 0000000000ef32b0 [ 800.327598] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : ffffffd842784980 [ 800.334728] x2 : 0000000000000004 x1 : 0000000000010002 x0 : 000000ba4c0ca382 [ 800.341859] Call trace: [ 800.344294] v3d_job_update_stats+0x60/0x130 [v3d] [ 800.349086] v3d_irq+0x124/0x2e0 [v3d] [ 800.352835] __handle_irq_event_percpu+0x58/0x218 [ 800.357539] handle_irq_event+0x54/0xb8 [ 800.361369] handle_fasteoi_irq+0xac/0x240 [ 800.365458] handle_irq_desc+0x48/0x68 [ 800.369200] generic_handle_domain_irq+0x24/0x38 [ 800.373810] gic_handle_irq+0x48/0xd8 [ 800.377464] call_on_irq_stack+0x24/0x58 [ 800.381379] do_interrupt_handler+0x88/0x98 [ 800.385554] el1_interrupt+0x34/0x68 [ 800.389123] el1h_64_irq_handler+0x18/0x28 [ 800.393211] el1h_64_irq+0x64/0x68 [ 800.396603] default_idle_call+0x3c/0x168 [ 800.400606] do_idle+0x1fc/0x230 [ 800.403827] cpu_startup_entry+0x40/0x50 [ 800.407742] rest_init+0xe4/0xf0 [ 800.410962] start_kernel+0x5e8/0x790 [ 800.414616] __primary_switched+0x80/0x90 [ 800.418622] Code: 8b170277 8b160296 11000421 b9000861 (b9401ac1) [ 800.424707] ---[ end trace 0000000000000000 ]--- [ 800.457313] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- This issue happens when the file descriptor is closed before the jobs submitted by it are completed. When the job completes, we update the global GPU stats and the per-fd GPU stats, which are exposed through fdinfo. If the file descriptor was closed, then the struct `v3d_file_priv` and its stats were already freed and we can't update the per-fd stats. Therefore, if the file descriptor was already closed, don't update the per-fd GPU stats, only update the global ones. Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Link: https://lore.kernel.org/r/20250602151451.10161-1-mcanal@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>	2025-06-16 19:35:05 -03:00
Ville Syrjälä	c464ce6af3	drm/i915/dsi: Fix off by one in BXT_MIPI_TRANS_VTOTAL BXT_MIPI_TRANS_VTOTAL must be programmed with vtotal-1 instead of vtotal. Make it so. Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250314150136.22564-1-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit `7b3685c9b3`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2025-06-16 07:57:00 +03:00
Tzung-Bi Shih	a7137b1825	drm/i915/pmu: Fix build error with GCOV and AutoFDO enabled i915_pmu.c may fail to build with GCOV and AutoFDO enabled. ../drivers/gpu/drm/i915/i915_pmu.c:116:3: error: call to '__compiletime_assert_487' declared with 'error' attribute: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1 116 \| BUILD_BUG_ON(bit > \| ^ Here is a way to reproduce the issue: $ git checkout v6.15 $ mkdir build $ ./scripts/kconfig/merge_config.sh -O build -n -m <(cat <<EOF CONFIG_DRM=y CONFIG_PCI=y CONFIG_DRM_I915=y CONFIG_PERF_EVENTS=y CONFIG_DEBUG_FS=y CONFIG_GCOV_KERNEL=y CONFIG_GCOV_PROFILE_ALL=y CONFIG_AUTOFDO_CLANG=y EOF ) $ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \ olddefconfig $ PATH=${PATH}:${HOME}/llvm-20.1.5-x86_64/bin make LLVM=1 O=build \ CLANG_AUTOFDO_PROFILE=...PATH_TO_SOME_AFDO_PROFILE... \ drivers/gpu/drm/i915/i915_pmu.o Although not super sure what happened, by reviewing the code, it should depend on `__builtin_constant_p(bit)` directly instead of assuming `__builtin_constant_p(config)` makes `bit` a builtin constant. Also fix a nit, to reuse the `bit` local variable. Fixes: `a644fde77f` ("drm/i915/pmu: Change bitmask of enabled events to u32") Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250612083023.562585-1-tzungbi@kernel.org (cherry picked from commit `686d773186`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2025-06-16 07:56:57 +03:00
Rob Clark	d3deabe4c6	drm/msm: Fix inverted WARN_ON() logic We want to WARN_ON() if info is NULL. Suggested-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Fixes: `0838fc3e67` ("drm/msm/adreno: Check for recognized GPU before bind") Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/658631/	2025-06-14 08:10:44 -07:00
Jacob Keller	61b2b37374	drm/nouveau/bl: increase buffer size to avoid truncate warning The nouveau_get_backlight_name() function generates a unique name for the backlight interface, appending an id from 1 to 99 for all backlight devices after the first. GCC 15 (and likely other compilers) produce the following -Wformat-truncation warning: nouveau_backlight.c: In function ‘nouveau_backlight_init’: nouveau_backlight.c:56:69: error: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 3 [-Werror=format-truncation=] 56 \| snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb); \| ^~ In function ‘nouveau_get_backlight_name’, inlined from ‘nouveau_backlight_init’ at nouveau_backlight.c:351:7: nouveau_backlight.c:56:56: note: directive argument in the range [1, 2147483647] 56 \| snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb); \| ^~~~~~~~~~~~~~~~ nouveau_backlight.c:56:17: note: ‘snprintf’ output between 14 and 23 bytes into a destination of size 15 56 \| snprintf(backlight_name, BL_NAME_SIZE, "nv_backlight%d", nb); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The warning started appearing after commit `ab244be47a` ("drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()") This fix for the ida usage removed the explicit value check for ids larger than 99. The compiler is unable to intuit that the ida_alloc_max() limits the returned value range between 0 and 99. Because the compiler can no longer infer that the number ranges from 0 to 99, it thinks that it could use as many as 11 digits (10 + the potential - sign for negative numbers). The warning has gone unfixed for some time, with at least one kernel test robot report. The code breaks W=1 builds, which is especially frustrating with the introduction of CONFIG_WERROR. The string is stored temporarily on the stack and then copied into the device name. Its not a big deal to use 11 more bytes of stack rounding out to an even 24 bytes. Increase BL_NAME_SIZE to 24 to avoid the truncation warning. This fixes the W=1 builds that include this driver. Compile tested only. Fixes: `ab244be47a` ("drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312050324.0kv4PnfZ-lkp@intel.com/ Suggested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20250610-jk-nouveua-drm-bl-snprintf-fix-v2-1-7fdd4b84b48e@intel.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-06-13 16:41:43 +02:00
Zhi Wang	9802f0a63b	drm/nouveau: fix a use-after-free in r535_gsp_rpc_push() The RPC container is released after being passed to r535_gsp_rpc_send(). When sending the initial fragment of a large RPC and passing the caller's RPC container, the container will be freed prematurely. Subsequent attempts to send remaining fragments will therefore result in a use-after-free. Allocate a temporary RPC container for holding the initial fragment of a large RPC when sending. Free the caller's container when all fragments are successfully sent. Fixes: `176fdcbddf` ("drm/nouveau/gsp/r535: add support for booting GSP-RM") Signed-off-by: Zhi Wang <zhiw@nvidia.com> Link: https://lore.kernel.org/r/20250527163712.3444-1-zhiw@nvidia.com [ Rebase onto Blackwell changes. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-06-13 16:38:06 +02:00
Colin Ian King	80626ae6ff	drm/nouveau/gsp: Fix potential integer overflow on integer shifts The left shift int 32 bit integer constants 1 is evaluated using 32 bit arithmetic and then assigned to a 64 bit unsigned integer. In the case where the shift is 32 or more this can lead to an overflow. Avoid this by shifting using the BIT_ULL macro instead. Fixes: `6c3ac7bcfc` ("drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250522131512.2768310-1-colin.i.king@gmail.com	2025-06-13 16:25:37 +02:00
Thomas Zimmermann	2e3395ab2a	drm/mgag200: Do not include <linux/export.h> Fix the compile-time warning drivers/gpu/drm/mgag200/mgag200_ddc.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250612085308.203861-1-tzimmermann@suse.de	2025-06-13 08:54:31 +02:00
Thomas Zimmermann	d742f3ec1c	drm/ast: Do not include <linux/export.h> Fix the compile-time warning drivers/gpu/drm/ast/ast_mode.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250612084257.200907-1-tzimmermann@suse.de	2025-06-13 08:54:18 +02:00
Dave Airlie	1364af9cb2	Merge tag 'drm-misc-fixes-2025-06-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.16-rc2: - Fix infinite EPROBE_DEFER loop in vc4 probing. - Fix amdxdna firmware size. - mode fixes for meson. - Kconfig fix for st7171-i2c. - Fix -EBUSY WARN_ON_ONCE in dma-buf - Use dma_sync_sgtable_for_cpu in udmabuf. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/62c06195-8bc1-4dae-8777-e86d94e4d9d9@linux.intel.com	2025-06-13 14:57:09 +10:00
Lucas De Marchi	9c7632faad	drm/xe/lrc: Use a temporary buffer for WA BB In case the BO is in iomem, we can't simply take the vaddr and write to it. Instead, prepare a separate buffer that is later copied into io memory. Right now it's just a few words that could be using xe_map_write32(), but the intention is to grow the WA BB for other uses. Fixes: `617d824c53` ("drm/xe: Add WA BB to capture active context utilization") Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `ef48715b2d`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-06-12 18:09:50 +02:00
Alexander Stein	8ecbad4853	drm/arm/malidp: Silence informational message When checking for unsupported expect an error is printed every time. This spams the log for platforms where this is expected, e.g. ls1028a having a Vivante (etnaviv) GPU and Mali display processor. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://lore.kernel.org/r/20250523064042.3275926-1-alexander.stein@ew.tq-group.com	2025-06-12 13:11:25 +01:00
John Keeping	e479da4054	drm/ssd130x: fix ssd132x_clear_screen() columns The number of columns relates to the width, not the height. Use the correct variable. Signed-off-by: John Keeping <jkeeping@inmusicbrands.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Fixes: `fdd591e00a` ("drm/ssd130x: Add support for the SSD132x OLED controller family") Link: https://lore.kernel.org/r/20250611111307.1814876-1-jkeeping@inmusicbrands.com Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>	2025-06-12 14:04:56 +02:00
Nathan Chancellor	b8d3291d31	drm/sitronix: st7571-i2c: Select VIDEOMODE_HELPERS This driver requires of_get_display_timing() from CONFIG_VIDEOMODE_HELPERS but does not select it. If no other driver selects it, there will be a failure from the linker if the driver is built in or modpost if it is a module. ERROR: modpost: "of_get_display_timing" [drivers/gpu/drm/sitronix/st7571-i2c.ko] undefined! Select CONFIG_VIDEOMODE_HELPERS to resolve the build failure. Fixes: `4b35f0f41e` ("drm/st7571-i2c: add support for Sitronix ST7571 LCD controller") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250610-drm-st7571-i2c-select-videomode-helpers-v1-1-d30b50ff6e64@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-06-11 10:44:56 +02:00
Martin Blumenstingl	0cee6c4d35	drm/meson: fix more rounding issues with 59.94Hz modes Commit `1017560164` ("drm/meson: use unsigned long long / Hz for frequency types") attempts to resolve video playback using 59.94Hz. using YUV420 by changing the clock calculation to use Hz instead of kHz (thus yielding more precision). The basic calculation itself is correct, however the comparisions in meson_vclk_vic_supported_freq() and meson_vclk_setup() don't work anymore for 59.94Hz modes (using the freq * 1000 / 1001 logic). For example, drm/edid specifies a 593407kHz clock for 3840x2160@59.94Hz. With the mentioend commit we convert this to Hz. Then meson_vclk tries to find a matchig "params" entry (as the clock setup code currently only supports specific frequencies) by taking the venc_freq from the params and calculating the "alt frequency" (used for the 59.94Hz modes) from it, which is: (594000000Hz * 1000) / 1001 = 593406593Hz Similar calculation is applied to the phy_freq (TMDS clock), which is 10 times the pixel clock. Implement a new meson_vclk_freqs_are_matching_param() function whose purpose is to compare if the requested and calculated frequencies. They may not match exactly (for the reasons mentioned above). Allow the clocks to deviate slightly to make the 59.94Hz modes again. Fixes: `1017560164` ("drm/meson: use unsigned long long / Hz for frequency types") Reported-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250609202751.962208-1-martin.blumenstingl@googlemail.com	2025-06-10 14:15:42 +02:00
Martin Blumenstingl	faf2f83820	drm/meson: use vclk_freq instead of pixel_freq in debug print meson_vclk_vic_supported_freq() has a debug print which includes the pixel freq. However, within the whole function the pixel freq is irrelevant, other than checking the end of the params array. Switch to printing the vclk_freq which is being compared / matched against the inputs to the function to avoid confusion when analyzing error reports from users. Fixes: `e5fab2ec9c` ("drm/meson: vclk: add support for YUV420 setup") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250606221031.3419353-1-martin.blumenstingl@googlemail.com	2025-06-10 14:14:43 +02:00
Martin Blumenstingl	d17e61ab63	drm/meson: fix debug log statement when setting the HDMI clocks The "phy" and "vclk" frequency labels were swapped, making it more difficult to debug driver errors. Swap the label order to make them match with the actual frequencies printed to correct this. Fixes: `e5fab2ec9c` ("drm/meson: vclk: add support for YUV420 setup") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250606203729.3311592-1-martin.blumenstingl@googlemail.com	2025-06-10 14:14:36 +02:00
Gabriel Dalimonte	c0317ad44f	drm/vc4: fix infinite EPROBE_DEFER loop `vc4_hdmi_audio_init` calls `devm_snd_dmaengine_pcm_register` which may return EPROBE_DEFER. Calling `drm_connector_hdmi_audio_init` adds a child device. The driver model docs[1] state that adding a child device prior to returning EPROBE_DEFER may result in an infinite loop. [1] https://www.kernel.org/doc/html/v6.14/driver-api/driver-model/driver.html Fixes: `9640f1437a` ("drm/vc4: hdmi: switch to using generic HDMI Codec infrastructure") Signed-off-by: Gabriel Dalimonte <gabriel.dalimonte@gmail.com> Link: https://lore.kernel.org/r/20250601-vc4-audio-inf-probe-v2-1-9ad43c7b6147@gmail.com Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-06-10 10:47:04 +02:00
Rob Clark	0838fc3e67	drm/msm/adreno: Check for recognized GPU before bind If we have a newer dtb than kernel, we could end up in a situation where the GPU device is present in the dtb, but not in the drivers device table. We don't want this to prevent the display from probing. So check that we recognize the GPU before adding the GPU component. v2: use %pOF Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657701/	2025-06-09 16:42:09 -07:00
Rob Clark	1efb73791c	drm/msm/adreno: Pass device_node to find_chipid() We are going to want to re-use this before the component is bound, when we don't yet have the device pointer (but we do have the of node). v2: use %pOF Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657705/	2025-06-09 12:52:02 -07:00
Rob Clark	1453b532d1	drm/msm: Rename add_components_mdp() To better match add_gpu_components(). Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657700/	2025-06-09 12:52:02 -07:00
Ryan Eatmon	ba64c6737f	drivers: gpu: drm: msm: registers: improve reproducibility The files generated by gen_header.py capture the source path to the input files and the date. While that can be informative, it varies based on where and when the kernel was built as the full path is captured. Since all of the files that this tool is run on is under the drivers directory, this modifies the application to strip all of the path before drivers. Additionally it prints <stripped> instead of the date. Signed-off-by: Ryan Eatmon <reatmon@ti.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com> Signed-off-by: Viswanath Kraleti <viswanath.kraleti@oss.qualcomm.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/655599/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>	2025-06-09 12:50:20 -07:00
Connor Abbott	2b520c6104	drm/msm/a7xx: Call CP_RESET_CONTEXT_STATE Calling this packet is necessary when we switch contexts because there are various pieces of state used by userspace to synchronize between BR and BV that are persistent across submits and we need to make sure that they are in a "safe" state when switching contexts. Otherwise a userspace submission in one context could cause another context to function incorrectly and hang, effectively a denial of service (although without leaking data). This was missed during initial a7xx bringup. Fixes: `af66706acc` ("drm/msm/a6xx: Add skeleton A7xx support") Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654924/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>	2025-06-09 12:48:55 -07:00
Connor Abbott	b1c9e797ad	drm/msm: Fix CP_RESET_CONTEXT_STATE bitfield names Based on kgsl. Fixes: `af66706acc` ("drm/msm/a6xx: Add skeleton A7xx support") Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654922/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>	2025-06-09 12:48:55 -07:00
Connor Abbott	b13044092c	drm/msm: Temporarily disable stall-on-fault after a page fault When things go wrong, the GPU is capable of quickly generating millions of faulting translation requests per second. When that happens, in the stall-on-fault model each access will stall until it wins the race to signal the fault and then the RESUME register is written. This slows processing page faults to a crawl as the GPU can generate faults much faster than the CPU can acknowledge them. It also means that all available resources in the SMMU are saturated waiting for the stalled transactions, so that other transactions such as transactions generated by the GMU, which shares translation resources with the GPU, cannot proceed. This causes a GMU watchdog timeout, which leads to a failed reset because GX cannot collapse when there is a transaction pending and a permanently hung GPU. On older platforms with qcom,smmu-v2, it seems that when one transaction is stalled subsequent faulting transactions are terminated, which avoids this problem, but the MMU-500 follows the spec here. To work around these problems, disable stall-on-fault as soon as we get a page fault until a cooldown period after pagefaults stop. This allows the GMU some guaranteed time to continue working. We only use stall-on-fault to halt the GPU while we collect a devcoredump and we always terminate the transaction afterward, so it's fine to miss some subsequent page faults. We also keep it disabled so long as the current devcoredump hasn't been deleted, because in that case we likely won't capture another one if there's a fault. After this commit HFI messages still occasionally time out, because the crashdump handler doesn't run fast enough to let the GMU resume, but the driver seems to recover from it. This will probably go away after the HFI timeout is increased. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654891/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>	2025-06-09 11:37:34 -07:00
Connor Abbott	dedf404be8	drm/msm: Delete resume_translation() Unused since the previous commit. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654890/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>	2025-06-09 11:36:42 -07:00
Connor Abbott	0c5fea1eb0	drm/msm: Don't use a worker to capture fault devcoredump Now that we use a threaded IRQ, it should be safe to do this in the fault handler. We can also remove fault_info from struct msm_gpu and just pass it directly. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654889/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>	2025-06-09 11:36:21 -07:00
Rob Clark	f681c2aa86	drm/msm: Fix another leak in the submit error path put_unused_fd() doesn't free the installed file, if we've already done fd_install(). So we need to also free the sync_file. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/653583/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>	2025-06-09 11:27:28 -07:00

1 2 3 4 5 ...

115282 Commits