linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Alexandre Demers	62e0b8f766	drm/amdgpu: use gmc_v7_0_is_idle() since it is available under GMC7 gmc_v7_0_is_idle() does exactly what we need, so use it. v2: fix parameter (Alex) Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-04-07 15:18:32 -04:00
Ce Sun	969fd18c8d	drm/amdgpu/vcn: during dpc recovery will corrupt VCPU buffer err_event_athub and dpc recovery will corrupt VCPU buffer, so we need to restore fw data and clear buffer in amdgpu_vcn_resume() Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-04-07 15:18:31 -04:00
Ce Sun	8ba904f541	drm/amdgpu: Multi-GPU DPC recovery support Add support for DPC recover based on refactored code Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-04-07 15:18:31 -04:00
Ce Sun	11bb33766f	drm/amdgpu: refactor amdgpu_device_gpu_recover Split amdgpu_device_gpu_recover into the following stages: halt activities,asic reset,schedule resume and amdgpu resume. The reason is that the subsequent addition of dpc recover code will have a high similarity with gpu reset Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-04-07 15:18:31 -04:00
Christian König	f5e7fabd1f	drm/amdgpu: allow pinning DMA-bufs into VRAM if all importers can do P2P Try pinning into VRAM to allow P2P with RDMA NICs without ODP support if all attachments can do P2P. If any attachment can't do P2P just pin into GTT instead. Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Tested-by: Pak Nin Lui <pak.lui@amd.com> Cc: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-04-07 15:18:30 -04:00
Maarten Lankhorst	1b5447d773	drm/amdgpu: Add cgroups implementation Similar to xe, enable some simple management of VRAM only. Reviewed-by: Christian König <christian.koenig@amd.com> Co-developed-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-04-07 15:18:30 -04:00
Jay Cornwall	3666ed8218	drm/amdgpu: Increase KIQ invalidate_tlbs timeout KIQ invalidate_tlbs request has been seen to marginally exceed the configured 100 ms timeout on systems under load. All other KIQ requests in the driver use a 10 second timeout. Use a similar timeout implementation on the invalidate_tlbs path. v2: Poll once before msleep v3: Fix return value Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Cc: Kent Russell <kent.russell@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-04-07 15:18:30 -04:00
Flora Cui	2f6dd741cd	drm/amdgpu/ip_discovery: add missing ip_discovery fw Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-04-07 14:32:20 -04:00
Matthew Auld	c0dd8a9253	drm/amdgpu/dma_buf: fix page_link check The page_link lower bits of the first sg could contain something like SG_END, if we are mapping a single VRAM page or contiguous blob which fits into one sg entry. Rather pull out the struct page, and use that in our check to know if we mapped struct pages vs VRAM. Fixes: `f44ffd677f` ("drm/amdgpu: add support for exporting VRAM using DMA-buf v3") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.8+ Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-04-07 14:31:45 -04:00
Christian König	a755906fb2	drm/amdgpu: immediately use GTT for new allocations Only use GTT as a fallback if we already have a backing store. This prevents evictions when an application constantly allocates and frees new memory. Partially fixes https://gitlab.freedesktop.org/drm/amd/-/issues/3844#note_2833985. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `216c1282dd` ("drm/amdgpu: use GTT only as fallback for VRAM\|GTT") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-04-07 14:30:57 -04:00
Alex Deucher	b71a2bb0ce	drm/amdgpu/mes11: optimize MES pipe FW version fetching Don't fetch it again if we already have it. It seems the registers don't reliably have the value at resume in some cases. Fixes: `028c3fb37e` ("drm/amdgpu/mes11: initiate mes v11 support") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4083 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-04-07 14:30:10 -04:00
Thomas Zimmermann	1afba39f93	Merge drm/drm-next into drm-misc-next Backmerging to get v6.15-rc1 into drm-misc-next. Also fixes a build issue when enabling CONFIG_DRM_SCHED_KUNIT_TEST. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2025-04-07 14:35:48 +02:00
Linus Torvalds	16cd1c2657	A set of final cleanups for the timer subsystem: 1) Convert all del_timer[_sync]() instances over to the new timer_delete[_sync]() API and remove the legacy wrappers. Conversion was done with coccinelle plus some manual fixups as coccinelle chokes on scoped_guard(). 2) The final cleanup of the hrtimer_init() to hrtimer_setup() conversion. This has been delayed to the end of the merge window, so that all patches which have been merged through other trees are in mainline and all new users are catched. Doing this right before rc1 ensures that new code which is merged post rc1 is not introducing new instances of the original functionality. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfyXi0THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoYzlD/4ykDZbUzgTreYOxEQpBJ9elPwBhxfL 1v8OwDjRWlNrmLup8RiUfKrlbmztGl1J/u9ld0qhjcqkywCCBC1N5S+DhCjYetyP MPWLbi2Dc35cFA+M7i8fMgxI2K9MLz2Zj1UKxz1MdsSuNHm07N3mul/3T11Ye4Rz nPlzeQBTBDFCKTEGKjr8zjuoD15Wl48sObM0AjV35BPuQR1jfY4CE6VXo2h78+0c jYwpJpDmcd+o1bDrfFhWUME2DzABEkHhn4wNSETnM4E5RXZRMUbi4UiigzInibQr JOUTKwPJXTMX/Erd0XyXErrYf2qy1X9BQy6NlyDDOv+8kLEVRsC9Efplx9uoEtfi QvVT/UmgmhZFJBfIT3/B8OvasrfwOropaYoG4L0zbDpp1b09VY47N5lCLlNr/mZf jb2TwIln8Szy2EfIT2RSd0ZNupyU8V4aH/mYNpSlbUJ6mfvfIAttBSS/YH+Zeqku 7zOJkoCusaySOCZCOQkeikL3ZBN+FHtNteXxmGnp34ed/tsfgGZj1lsbmkM2rrWo f2mQsYAclUA4KQeY9z/Xf7/c5wJUkME69PxOaaN23dOpBR7GA58Cvb0PQTnPlAiT KnH/JRweBHtcv4KEHMi2f5no4cxcmXyKTj7/TLyYNjc8LATL9Eo/nxG36PLxy4lN QPOWz11zEBLjQQ== =8Ftq -----END PGP SIGNATURE----- Merge tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A set of final cleanups for the timer subsystem: - Convert all del_timer[_sync]() instances over to the new timer_delete[_sync]() API and remove the legacy wrappers. Conversion was done with coccinelle plus some manual fixups as coccinelle chokes on scoped_guard(). - The final cleanup of the hrtimer_init() to hrtimer_setup() conversion. This has been delayed to the end of the merge window, so that all patches which have been merged through other trees are in mainline and all new users are catched. Doing this right before rc1 ensures that new code which is merged post rc1 is not introducing new instances of the original functionality" * tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/timers: Rename the hrtimer_init event to hrtimer_setup hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack() hrtimers: Rename debug_init() to debug_setup() hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns() hrtimers: Make callback function pointer private hrtimers: Merge __hrtimer_init() into __hrtimer_setup() hrtimers: Switch to use __htimer_setup() hrtimers: Delete hrtimer_init() treewide: Convert new and leftover hrtimer_init() users treewide: Switch/rename to timer_delete[_sync]()	2025-04-06 08:35:37 -07:00
Linus Torvalds	758e4c86a1	drm fixes for 6.15-rc1 bridge: - tda998x: Select CONFIG_DRM_KMS_HELPER amdgpu: - Guard against potential division by 0 in fan code - Zero RPM support for SMU 14.0.2 - Properly handle SI and CIK support being disabled - PSR fixes - DML2 fixes - DP Link training fix - Vblank fixes - RAS fixes - Partitioning fix - SDMA fix - SMU 13.0.x fixes - Rom fetching fix - MES fixes - Queue reset fix xe: - Fix NULL pointer dereference on error path - Add missing HW workaround for BMG - Fix survivability mode not triggering - Fix build warning when DRM_FBDEV_EMULATION is not set i915: - Bounds check for scalers in DSC prefill latency computation - Fix build by adding a missing include adp: - Fix error handling in plane setup -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmfwQ5wACgkQDHTzWXnE hr7OIxAAn8ATKOYsyxaq0t8+gj2jhmGC1L3X6tQUzSh3uAwZpWzYlP8RVMMS9WrL Cwj0kBLlq71GasuG/wjEnAxuoahEWebrwTm73it9yWXWW+GsQYhOcWttvmhNSVET aTUdB2VM32S0QHf8t6rP0yG4XqJD95sSNnXCTh6/ksk5rhK6NpqyJRkdVBIRxI5O IUXxfxmREIrAq5ZfpoPuVUEA2US2W9p+L616RnILewa3iZYzK9XahfcErFU8xlS+ 796ZaYFGEWUmdJKTyCejRHz5z1afXAgKO3vRyGKbLC4MAsxk6MsWzTXsH5EMukxF lCtJQuODFDed2gXxLUOeXAUjWoHrgYpJBWRk23Pq83Kx2sf85MD2SF2d8pGJqMa+ B0JYr/mFtZCi5xraHIH6rwx7k2+KqPVAl+P8f/Fye/AIxBUE3KoI43M69TQ3B2VD dG8Jz53DUVQ1TUGJjfZajninFfBdDlr9PgPLmp1gESjbGkI1azGXngwlaVK4kti+ iZmM/4pbB0A/+PG7Wq1DFEhr0fvQW2RjtgXoh+CujRLgK7ZYwVbCcFpmGQgjOiXb IyphOc7tPiOH1EcXGSsrV5JlkyhVL0Ghu6JCKQCzF5MXbrg1fwjjUwVuGtJyqtso kKBbx3bx7rv577VXRhM13s0MPwOyhgi56a4kBtX7YRJqY5cVkdM= =P/sW -----END PGP SIGNATURE----- Merge tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Weekly fixes, mostly from the end of last week, this week was very quiet, maybe you scared everyone away. It's mostly amdgpu, and xe, with some i915, adp and bridge bits, since I think this is overly quiet I'd expect rc2 to be a bit more lively. bridge: - tda998x: Select CONFIG_DRM_KMS_HELPER amdgpu: - Guard against potential division by 0 in fan code - Zero RPM support for SMU 14.0.2 - Properly handle SI and CIK support being disabled - PSR fixes - DML2 fixes - DP Link training fix - Vblank fixes - RAS fixes - Partitioning fix - SDMA fix - SMU 13.0.x fixes - Rom fetching fix - MES fixes - Queue reset fix xe: - Fix NULL pointer dereference on error path - Add missing HW workaround for BMG - Fix survivability mode not triggering - Fix build warning when DRM_FBDEV_EMULATION is not set i915: - Bounds check for scalers in DSC prefill latency computation - Fix build by adding a missing include adp: - Fix error handling in plane setup" # -----BEGIN PGP SIGNATURE----- * tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel: (34 commits) drm/i2c: tda998x: select CONFIG_DRM_KMS_HELPER drm/amdgpu/gfx12: fix num_mec drm/amdgpu/gfx11: fix num_mec drm/amd/pm: Add gpu_metrics_v1_8 drm/amdgpu: Prefer shadow rom when available drm/amd/pm: Update smu metrics table for smu_v13_0_6 drm/amd/pm: Remove host limit metrics support Remove unnecessary firmware version check for gc v9_4_2 drm/amdgpu: stop unmapping MQD for kernel queues v3 Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA" drm/amdgpu: Parse all deferred errors with UMC aca handle drm/amdgpu: Update ta ras block drm/amdgpu: Add NPS2 to DPX compatible mode drm/amdgpu: Use correct gfx deferred error count drm/amd/display: Actually do immediate vblank disable drm/amd/display: prevent hang on link training fail Revert "drm/amd/display: dml2 soc dscclk use DPM table clk setting" drm/amd/display: Increase vblank offdelay for PSR panels drm/amd: Handle being compiled without SI or CIK support better drm/amd/pm: Add zero RPM enabled OD setting support for SMU14.0.2 ...	2025-04-05 15:35:11 -07:00
Thomas Gleixner	8fa7292fee	treewide: Switch/rename to timer_delete[_sync]() timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-05 10:30:12 +02:00
Linus Torvalds	2cd5769fb0	Driver core updates for 6.15-rc1 Here is the big set of driver core updates for 6.15-rc1. Lots of stuff happened this development cycle, including: - kernfs scaling changes to make it even faster thanks to rcu - bin_attribute constify work in many subsystems - faux bus minor tweaks for the rust bindings - rust binding updates for driver core, pci, and platform busses, making more functionaliy available to rust drivers. These are all due to people actually trying to use the bindings that were in 6.14. - make Rafael and Danilo full co-maintainers of the driver core codebase - other minor fixes and updates. This has been in linux-next for a while now, with the only reported issue being some merge conflicts with the rust tree. Depending on which tree you pull first, you will have conflicts in one of them. The merge resolution has been in linux-next as an example of what to do, or can be found here: https://lore.kernel.org/r/CANiq72n3Xe8JcnEjirDhCwQgvWoE65dddWecXnfdnbrmuah-RQ@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+mMrg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylRgwCdH58OE3BgL0uoFY5vFImStpmPtqUAoL5HpVWI jtbJ+UuXGsnmO+JVNBEv =gy6W -----END PGP SIGNATURE----- Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updatesk from Greg KH: "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff happened this development cycle, including: - kernfs scaling changes to make it even faster thanks to rcu - bin_attribute constify work in many subsystems - faux bus minor tweaks for the rust bindings - rust binding updates for driver core, pci, and platform busses, making more functionaliy available to rust drivers. These are all due to people actually trying to use the bindings that were in 6.14. - make Rafael and Danilo full co-maintainers of the driver core codebase - other minor fixes and updates" * tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits) rust: platform: require Send for Driver trait implementers rust: pci: require Send for Driver trait implementers rust: platform: impl Send + Sync for platform::Device rust: pci: impl Send + Sync for pci::Device rust: platform: fix unrestricted &mut platform::Device rust: pci: fix unrestricted &mut pci::Device rust: device: implement device context marker rust: pci: use to_result() in enable_device_mem() MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers rust/kernel/faux: mark Registration methods inline driver core: faux: only create the device if probe() succeeds rust/faux: Add missing parent argument to Registration::new() rust/faux: Drop #[repr(transparent)] from faux::Registration rust: io: fix devres test with new io accessor functions rust: io: rename `io::Io` accessors kernfs: Move dput() outside of the RCU section. efi: rci2: mark bin_attribute as __ro_after_init rapidio: constify 'struct bin_attribute' firmware: qemu_fw_cfg: constify 'struct bin_attribute' powerpc/perf/hv-24x7: Constify 'struct bin_attribute' ...	2025-04-01 11:02:03 -07:00
Linus Torvalds	0c86b42439	drm for 6.15-rc1 uapi: - add mediatek tiled fourcc - add support for notifying userspace on device wedged new driver: - appletbdrm: support for Apple Touchbar displays on m1/m2 - nova-core: skeleton rust driver to develop nova inside off firmware: - add some rust firmware pieces rust: - add 'LocalModule' type alias component: - add helper to query bound status fbdev: - fbtft: remove access to page->index media: - cec: tda998x: import driver from drm dma-buf: - add fast path for single fence merging tests: - fix lockdep warnings atomic: - allow full modeset on connector changes - clarify semantics of allow_modeset and drm_atomic_helper_check - async-flip: support on arbitary planes - writeback: fix UAF - Document atomic-state history format-helper: - support ARGB8888 to ARGB4444 conversions buddy: - fix multi-root cleanup ci: - update IGT dp: - support extended wake timeout - mst: fix RAD to string conversion - increase DPCD eDP control CAP size to 5 bytes - add DPCD eDP v1.5 definition - add helpers for LTTPR transparent mode panic: - encode QR code according to Fido 2.2 scheduler: - add parameter struct for init - improve job peek/pop operations - optimise drm_sched_job struct layout ttm: - refactor pool allocation - add helpers for TTM shrinker panel-orientation: - add a bunch of new quirks panel: - convert panels to multi-style functions - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3, LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006, Lenovo T14s Gen6 Snapdragon - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay kd110n11-51ie, Starry 2082109qfh040022-50e - visionox-r66451: use multi-style MIPI-DSI functions - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 bridge: - pass full atomic state to various callbacks - adv7511: Report correct capabilities - it6505: Fix HDCP V compare - snd65dsi86: fix device IDs - nwl-dsi: set bridge type - ti-sn65si83: add error recovery and set bridge type - synopsys: add HDMI audio support xe: - support device-wedged event - add mmap support for PCI memory barrier - perf pmu integration and expose per-engien activity - add EU stall sampling support - GPU SVM and Xe SVM implementation - use TTM shrinker - add survivability mode to allow the driver to do firmware updates in critical failure states - PXP HWDRM support for MTL and LNL - expose package/vram temps over hwmon - enable DP tunneling - drop mmio_ext abstraction - Reject BO evcition if BO is bound to current VM - Xe suballocator improvements - re-use display vmas when possible - add GuC Buffer Cache abstraction - PCI ID update for Panther Lake and Battlemage - Enable SRIOV for Panther Lake - Refactor VRAM manager location i915: - enable extends wake timeout - support device-wedged event - Enable DP 128b/132b SST DSC - FBC dirty rectangle support for display version 30+ - convert i915/xe to drm client setup - Compute HDMI PLLS for rates not in fixed tables - Allow DSB usage when PSR is enabled on LNL+ - Enable panel replay without full modeset - Enable async flips with compressed buffers on ICL+ - support luminance based brightness via DPCD for eDP - enable VRR enable/disable without full modeset - allow GuC SLPC default strategies on MTL+ for performance - lots of display refactoring in move to struct intel_display amdgpu: - add device wedged event - support async page flips on overlay planes - enable broadcast RGB drm property - add info ioctl for virt mode - OEM i2c support for RGB lights - GC 11.5.2 + 11.5.3 support - SDMA 6.1.3 support - NBIO 7.9.1 + 7.11.2 support - MMHUB 1.8.1 + 3.3.2 support - DCN 3.6.0 support - Add dynamic workload profile switching for GC 10-12 - support larger VBIOS sizes - Mark gttsize parameters as deprecated - Initial JPEG queue resset support amdkfd: - add KFD per process flags for setting precision - sync pasid values between KGD and KFD - improve GTT/VRAM handling for APUs - fix user queue validation on GC7/8 - SDMA queue reset support raedeon: - rs400 hyperz fix i2c: - td998x: drop platform_data, split driver into media and bridge ast: - transmitter chip detection refactoring - vbios display mode refactoring - astdp: fix connection status and filter unsupported modes - cursor handling refactoring imagination: - check job dependencies with sched helper ivpu: - improve command queue handling - use workqueue for IRQ handling - add support HW fault injection - locking fixes mgag200: - add support for G200eH5 msm: - dpu: add concurrent writeback support for DPU 10.x+ - use LTTPR helpers - GPU: - Fix obscure GMU suspend failure - Expose syncobj timeline support - Extend GPU devcoredump with pagetable info - a623 support - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot / devcoredump - Display: - Add cpu-cfg interconnect paths on SM8560 and SM8650 - Introduce KMS OMMU fault handler, causing devcoredump snapshot - Fixed error pointer dereference in msm_kms_init_aspace() - DPU: - Fix mode_changing handling - Add writeback support on SM6150 (QCS615) - Fix DSC programming in 1:1:1 topology - Reworked hardware resource allocation, moving it to the CRTC code - Enabled support for Concurrent WriteBack (CWB) on SM8650 - Enabled CDM blocks on all relevant platforms - Reworked debugfs interface for BW/clocks debugging - Clear perf params before calculating bw - Support YUV formats on writeback - Fixed double inclusion - Fixed writeback in YUV formats when using cloned output, Dropped wb2_formats_rgb - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt kerneldocs - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode() - DSI: - DSC-related fixes - Rework clock programming - DSI PHY: - Fix 7nm (and lower) PHY programming - Add proper DT schema definitions for DSI PHY clocks - HDMI: - Rework the driver, enabling the use of the HDMI Connector framework - Bindings: - Added eDP PHY on SA8775P nouveau: - move drm_slave_encoder interface into driver - nvkm: refactor GSP RPC - use LTTPR helpers mediatek: - HDMI fixup and refinement - add MT8188 dsc compatible - MT8365 SoC support panthor: - Expose sizes of intenral BOs via fdinfo - Fix race between reset and suspend - Improve locking qaic: - Add support for AIC200 renesas: - Fix limits in DT bindings rockchip: - support rk3562-mali - rk3576: Add HDMI support - vop2: Add new display modes on RK3588 HDMI0 up to 4K - Don't change HDMI reference clock rate - Fix DT bindings - analogix_dp: add eDP support - fix shutodnw solomon: - Set SPI device table to silence warnings - Fix pixel and scanline encoding v3d: - handle clock vc4: - Use drm_exec - Use dma-resv for wait-BO ioctl - Remove seqno infrastructure virtgpu: - Support partial mappings of GEM objects - Reserve VGA resources during initialization - Fix UAF in virtgpu_dma_buf_free_obj() - Add panic support vkms: - Switch to a managed modesetting pipeline - Add support for ARGB8888 - fix UAf xlnx: - Set correct DMA segment size - use mutex guards - Fix error handling - Fix docs -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmfmA2cACgkQDHTzWXnE hr7lKQ/+I7gmln3ka8FyKnpwG5KusDpxz3OzgUKHpzOTkXL1+vPMt0vjCKRJKE3D zfpUTWNbwlVN0krqmUGyFIeCt8wmevBF6HvQ+GsWbiWltUj3xIlnkV0TVH84XTUo 0evrXNG9K8sLjMKjrf7yrGL53Ayoaq9IO9wtOws+FCgtykAsMR/IWLrYLpj21ZQ6 Oclhq5Cz21WRoQpzySR23s3DLi4LHri26RGKbGNh2PzxYwyP/euGW6O+ncEduNmg vQLgUfptaM/EubJFG6jxDWZJ2ChIAUUxQuhZwt7DKqRsYIcJKcfDSUzqL95t6SYU zewlYmeslvusoesCeTJzHaUj34yqJGsvjFPsHFUEvAy8BVncsqS40D6mhrRMo5nD chnSJu+IpDOEEqcdbb1J73zKLw6X3GROM8qUQEThJBD2yTsOdw9d0HXekMDMBXSi NayKvXfsBV19rI74FYnRCzHt8BVVANh5qJNnR5RcnPZ2KzHQbV0JFOA9YhxE8vFU GWkFWKlpAQUQ+yoTy1kuO9dcUxLIC7QseMeS5BYhcJBMEV78xQuRYRxgsL8YS4Yg rIhcb3mZwMFj7jBAqfpLKWiI+GTup+P9vcz7Bvm5iIf8gEhZLUTwqBeAYXQkcWd4 i1AqDuFR0//sAgODoeU2sg1Q3yTL/i/DhcwvCIPgcDZ/x4Eg868= =oK/N -----END PGP SIGNATURE----- Merge tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "Outside of drm there are some rust patches from Danilo who maintains that area in here, and some pieces for drm header check tests. The major things in here are a new driver supporting the touchbar displays on M1/M2, the nova-core stub driver which is just the vehicle for adding rust abstractions and start developing a real driver inside of. xe adds support for SVM with a non-driver specific SVM core abstraction that will hopefully be useful for other drivers, along with support for shrinking for TTM devices. I'm sure xe and AMD support new devices, but the pipeline depth on these things is hard to know what they end up being in the marketplace! uapi: - add mediatek tiled fourcc - add support for notifying userspace on device wedged new driver: - appletbdrm: support for Apple Touchbar displays on m1/m2 - nova-core: skeleton rust driver to develop nova inside off firmware: - add some rust firmware pieces rust: - add 'LocalModule' type alias component: - add helper to query bound status fbdev: - fbtft: remove access to page->index media: - cec: tda998x: import driver from drm dma-buf: - add fast path for single fence merging tests: - fix lockdep warnings atomic: - allow full modeset on connector changes - clarify semantics of allow_modeset and drm_atomic_helper_check - async-flip: support on arbitary planes - writeback: fix UAF - Document atomic-state history format-helper: - support ARGB8888 to ARGB4444 conversions buddy: - fix multi-root cleanup ci: - update IGT dp: - support extended wake timeout - mst: fix RAD to string conversion - increase DPCD eDP control CAP size to 5 bytes - add DPCD eDP v1.5 definition - add helpers for LTTPR transparent mode panic: - encode QR code according to Fido 2.2 scheduler: - add parameter struct for init - improve job peek/pop operations - optimise drm_sched_job struct layout ttm: - refactor pool allocation - add helpers for TTM shrinker panel-orientation: - add a bunch of new quirks panel: - convert panels to multi-style functions - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3, LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006, Lenovo T14s Gen6 Snapdragon - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay kd110n11-51ie, Starry 2082109qfh040022-50e - visionox-r66451: use multi-style MIPI-DSI functions - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 bridge: - pass full atomic state to various callbacks - adv7511: Report correct capabilities - it6505: Fix HDCP V compare - snd65dsi86: fix device IDs - nwl-dsi: set bridge type - ti-sn65si83: add error recovery and set bridge type - synopsys: add HDMI audio support xe: - support device-wedged event - add mmap support for PCI memory barrier - perf pmu integration and expose per-engien activity - add EU stall sampling support - GPU SVM and Xe SVM implementation - use TTM shrinker - add survivability mode to allow the driver to do firmware updates in critical failure states - PXP HWDRM support for MTL and LNL - expose package/vram temps over hwmon - enable DP tunneling - drop mmio_ext abstraction - Reject BO evcition if BO is bound to current VM - Xe suballocator improvements - re-use display vmas when possible - add GuC Buffer Cache abstraction - PCI ID update for Panther Lake and Battlemage - Enable SRIOV for Panther Lake - Refactor VRAM manager location i915: - enable extends wake timeout - support device-wedged event - Enable DP 128b/132b SST DSC - FBC dirty rectangle support for display version 30+ - convert i915/xe to drm client setup - Compute HDMI PLLS for rates not in fixed tables - Allow DSB usage when PSR is enabled on LNL+ - Enable panel replay without full modeset - Enable async flips with compressed buffers on ICL+ - support luminance based brightness via DPCD for eDP - enable VRR enable/disable without full modeset - allow GuC SLPC default strategies on MTL+ for performance - lots of display refactoring in move to struct intel_display amdgpu: - add device wedged event - support async page flips on overlay planes - enable broadcast RGB drm property - add info ioctl for virt mode - OEM i2c support for RGB lights - GC 11.5.2 + 11.5.3 support - SDMA 6.1.3 support - NBIO 7.9.1 + 7.11.2 support - MMHUB 1.8.1 + 3.3.2 support - DCN 3.6.0 support - Add dynamic workload profile switching for GC 10-12 - support larger VBIOS sizes - Mark gttsize parameters as deprecated - Initial JPEG queue resset support amdkfd: - add KFD per process flags for setting precision - sync pasid values between KGD and KFD - improve GTT/VRAM handling for APUs - fix user queue validation on GC7/8 - SDMA queue reset support raedeon: - rs400 hyperz fix i2c: - td998x: drop platform_data, split driver into media and bridge ast: - transmitter chip detection refactoring - vbios display mode refactoring - astdp: fix connection status and filter unsupported modes - cursor handling refactoring imagination: - check job dependencies with sched helper ivpu: - improve command queue handling - use workqueue for IRQ handling - add support HW fault injection - locking fixes mgag200: - add support for G200eH5 msm: - dpu: add concurrent writeback support for DPU 10.x+ - use LTTPR helpers - GPU: - Fix obscure GMU suspend failure - Expose syncobj timeline support - Extend GPU devcoredump with pagetable info - a623 support - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot / devcoredump - Display: - Add cpu-cfg interconnect paths on SM8560 and SM8650 - Introduce KMS OMMU fault handler, causing devcoredump snapshot - Fixed error pointer dereference in msm_kms_init_aspace() - DPU: - Fix mode_changing handling - Add writeback support on SM6150 (QCS615) - Fix DSC programming in 1:1:1 topology - Reworked hardware resource allocation, moving it to the CRTC code - Enabled support for Concurrent WriteBack (CWB) on SM8650 - Enabled CDM blocks on all relevant platforms - Reworked debugfs interface for BW/clocks debugging - Clear perf params before calculating bw - Support YUV formats on writeback - Fixed double inclusion - Fixed writeback in YUV formats when using cloned output, Dropped wb2_formats_rgb - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt kerneldocs - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode() - DSI: - DSC-related fixes - Rework clock programming - DSI PHY: - Fix 7nm (and lower) PHY programming - Add proper DT schema definitions for DSI PHY clocks - HDMI: - Rework the driver, enabling the use of the HDMI Connector framework - Bindings: - Added eDP PHY on SA8775P nouveau: - move drm_slave_encoder interface into driver - nvkm: refactor GSP RPC - use LTTPR helpers mediatek: - HDMI fixup and refinement - add MT8188 dsc compatible - MT8365 SoC support panthor: - Expose sizes of intenral BOs via fdinfo - Fix race between reset and suspend - Improve locking qaic: - Add support for AIC200 renesas: - Fix limits in DT bindings rockchip: - support rk3562-mali - rk3576: Add HDMI support - vop2: Add new display modes on RK3588 HDMI0 up to 4K - Don't change HDMI reference clock rate - Fix DT bindings - analogix_dp: add eDP support - fix shutodnw solomon: - Set SPI device table to silence warnings - Fix pixel and scanline encoding v3d: - handle clock vc4: - Use drm_exec - Use dma-resv for wait-BO ioctl - Remove seqno infrastructure virtgpu: - Support partial mappings of GEM objects - Reserve VGA resources during initialization - Fix UAF in virtgpu_dma_buf_free_obj() - Add panic support vkms: - Switch to a managed modesetting pipeline - Add support for ARGB8888 - fix UAf xlnx: - Set correct DMA segment size - use mutex guards - Fix error handling - Fix docs" * tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel: (1762 commits) drm/amd/pm: Update feature list for smu_v13_0_6 drm/amdgpu: Add parameter documentation for amdgpu_sync_fence drm/amdgpu/discovery: optionally use fw based ip discovery drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics drm/amdgpu/discovery: check ip_discovery fw file available drm/amd/pm: Remove unnecessay UQ10 to UINT conversion drm/amd/pm: Remove unnecessay UQ10 to UINT conversion drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush drm/amd/amdgpu: Increase max rings to enable SDMA page ring drm/amdgpu: Decode deferred error type in gfx aca bank parser drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs drm/amdgpu/mes: clean up SDMA HQD loop drm/amdgpu/mes: enable compute pipes across all MEC drm/amdgpu/mes: drop MES 10.x leftovers drm/amdgpu/mes: optimize compute loop handling drm/amdgpu/sdma: guilty tracking is per instance drm/amdgpu/sdma: fix engine reset handling drm/amdgpu: remove invalid usage of sched.ready drm/amdgpu: add cleaner shader trace point ...	2025-03-28 17:44:52 -07:00
Alex Deucher	dce8bd9137	drm/amdgpu/gfx12: fix num_mec GC12 only has 1 mec. Fixes: `52cb80c12e` ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-26 17:47:18 -04:00
Alex Deucher	4161050d47	drm/amdgpu/gfx11: fix num_mec GC11 only has 1 mec. Fixes: `3d879e81f0` ("drm/amdgpu: add init support for GFX11 (v2)") Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-26 17:47:14 -04:00
Lijo Lazar	27145f78f5	drm/amdgpu: Prefer shadow rom when available Fetch VBIOS from shadow ROM when available before trying other methods like EFI method. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Fixes: `9c081c11c6` ("drm/amdgpu: Reorder to read EFI exported ROM first") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4066 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-03-26 17:46:33 -04:00
Candice Li	5b3c08ae9e	Remove unnecessary firmware version check for gc v9_4_2 GC v9_4_2 uses a new versioning scheme for CP firmware, making the warning ("CP firmware version too old, please update!") irrelevant. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-03-26 17:45:53 -04:00
Christian König	1f86f4125e	drm/amdgpu: stop unmapping MQD for kernel queues v3 This looks unnecessary and actually extremely harmful since using kmap() is not possible while inside the ring reset. Remove all the extra mapping and unmapping of the MQDs. v2: also fix debugfs v3: fix coding style typo Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-26 17:45:42 -04:00
Jesse.zhang@amd.com	510a16d995	Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA" this temporarily reverts commit `6ec04e38b2` ("drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA") it cause a regression. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-26 17:44:47 -04:00
Xiang Liu	aedc92be96	drm/amdgpu: Parse all deferred errors with UMC aca handle We should only increase the deferred errors in UMC block. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-26 17:44:41 -04:00
Stanley.Yang	cc11dffc14	drm/amdgpu: Update ta ras block Update ta ra block to keep sync with RAS TA. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-26 17:44:34 -04:00
Lijo Lazar	ee97326fb9	drm/amdgpu: Add NPS2 to DPX compatible mode Compute partition DPX is possible in NPS2 mode. Update the compatible modes for DPX. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-26 17:44:32 -04:00
Xiang Liu	f3f05a0ec5	drm/amdgpu: Use correct gfx deferred error count In the case of parsing GFX deferred error from SMU corrected error channel, the error count should be set to 1 instead of parsing from MISC0 register, which is 0. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-26 17:44:27 -04:00
Mario Limonciello	5f054ddead	drm/amd: Handle being compiled without SI or CIK support better If compiled without SI or CIK support but amdgpu tries to load it will run into failures with uninitialized callbacks. Show a nicer message in this case and fail probe instead. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4050 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2025-03-26 17:41:55 -04:00
Linus Torvalds	a50b4fe095	A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff5jQTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoVvRD/wKtuwmiA66NJFgXC0qVq82A6fO3bY8 GBdbfysDJIbqGu5PTcULTbJ8qkqv3jeLUv6CcXvS4sZ7y/uJQl2lzf8yrD/0bbwc rLI6sHiPSZmK93kNVN4X5H7kvt7cE/DYC9nnEOgK3BY5FgKc4n9887d4aVBhL8Lv ODwVXvZ+xi351YCj7qRyPU24zt/p4tkkT1o2k4a0HBluqLI0D+V20fke9IERUL8r d1uWKlcn0TqYDesE8HXKIhbst3gx52rMJrXBJDHwFmG6v8Pj1fkTXCVpPo8QcBz8 OTVkpomN9f/Tx4+GZwhZOF86LhLL3OhxD6pT7JhFCXdmSGv+Ez8uyk1YZysM/XpV Juy/1yAcBpDIDkmhMFGdAAn48Nn9Fotty0r4je60zSEp1d/4QMXcFme29qr2JTUE iWnQ/HD6DxUjVHqy7CYvvo26Xegg1C7qgyOVt4PYZwAM1VKF5P3kzYTb4SAdxtop Tpji1sfW9QV08jqMNo6XntD32DSP9S2HqjO9LwBw700jnx2jjJ35fcJs6iodMOUn gckIZLMn3L0OoglPdyA5O7SNTbKE7aFiRKdnT/cJtR3Fa39Qu27CwC5gfiyuie9I Q+LG8GLuYSBHXAR+PBK4GWlzJ7Dn8k3eqmbnLeKpRMsU6ZzcttgA64xhaviN2wN0 iJbvLJeisXr3GA== =bYAX -----END PGP SIGNATURE----- Merge tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member" * tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) wifi: rt2x00: Switch to use hrtimer_update_function() io_uring: Use helper function hrtimer_update_function() serial: xilinx_uartps: Use helper function hrtimer_update_function() ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup() RDMA: Switch to use hrtimer_setup() virtio: mem: Switch to use hrtimer_setup() drm/vmwgfx: Switch to use hrtimer_setup() drm/xe/oa: Switch to use hrtimer_setup() drm/vkms: Switch to use hrtimer_setup() drm/msm: Switch to use hrtimer_setup() drm/i915/request: Switch to use hrtimer_setup() drm/i915/uncore: Switch to use hrtimer_setup() drm/i915/pmu: Switch to use hrtimer_setup() drm/i915/perf: Switch to use hrtimer_setup() drm/i915/gvt: Switch to use hrtimer_setup() drm/i915/huc: Switch to use hrtimer_setup() drm/amdgpu: Switch to use hrtimer_setup() stm class: heartbeat: Switch to use hrtimer_setup() i2c: Switch to use hrtimer_setup() iio: Switch to use hrtimer_setup() ...	2025-03-25 10:54:15 -07:00
Dmitry Baryshkov	fcbb93f1e4	drm/display: dp: change drm_dp_dpcd_read_link_status() return value drm_dp_dpcd_read_link_status() follows the "return error code or number of bytes read" protocol, with the code returning less bytes than requested in case of some errors. However most of the drivers interpreted that as "return error code in case of any error". Switch drm_dp_dpcd_read_link_status() to drm_dp_dpcd_read_data() and make it follow that protocol too. Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20250324-drm-rework-dpcd-access-v4-2-e80ff89593df@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>	2025-03-25 16:20:38 +02:00
Srinivasan Shanmugam	af23d3c9ca	drm/amdgpu: Add parameter documentation for amdgpu_sync_fence The 'flags' parameter, which specifies memory allocation behavior while creating a sync entry, Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c:162: warning: Function parameter or struct member 'flags' not described in 'amdgpu_sync_fence' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:36 -04:00
Alex Deucher	80a0e82829	drm/amdgpu/discovery: optionally use fw based ip discovery On chips without native IP discovery support, use the fw binary if available, otherwise we can continue without it. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:36 -04:00
Flora Cui	25f602fbbc	drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics vega10/vega12/vega20/raven/raven2/picasso/arcturus/aldebaran Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Flora Cui	017fbb6690	drm/amdgpu/discovery: check ip_discovery fw file available Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Jesse.zhang@amd.com	6ec04e38b2	drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that all relevant page table cache levels (L1 PTEs, L2 PTEs, and L2 PDEs) are invalidated. - Modified the `sdma_v4_4_2_ring_emit_vm_flush` function to use the new `sdma_v4_4_2_get_invalidate_req` function. The updated function emits the necessary register writes and waits to perform a VM flush for the specified VMID. It updates the PTB address registers and issues a VM invalidation request using the specified VM invalidation engine. - Included the necessary header file `gc/gc_9_0_sh_mask.h` to provide access to the required register definitions. v2: vm flush by the vm inalidation packet (Lijo) v3: code stle and define thh macro for the vm invalidation packet (Christian) v4: Format definition sdma vm invalidate packet (Lijo) Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Jesse.zhang@amd.com	b09cdeb4d3	drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and avoids the issue of insufficient VM invalidation engines. - Add synchronization for GPU TLB flush operations in gmc_v9_0.c. Use spin_lock and spin_unlock to ensure thread safety and prevent race conditions during TLB flush operations. This improves the stability and reliability of the driver, especially in multi-threaded environments. v2: replace the sdma ring check with a function `amdgpu_sdma_is_page_queue` to check if a ring is an SDMA page queue.(Lijo) v3: Add GC version check, only enabled on GC9.4.3/9.4.4/9.5.0 v4: Fix code style and add more detailed description (Christian) v5: Remove dependency on vm_inv_eng loop order, explicitly lookup shared inv_eng(Christian/Lijo) v6: Added search shared ring function amdgpu_sdma_get_shared_ring (Lijo) Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Jesse.zhang@amd.com	ea6dd40caf	drm/amd/amdgpu: Increase max rings to enable SDMA page ring Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Xiang Liu	338f7412c7	drm/amdgpu: Decode deferred error type in gfx aca bank parser In the case of injecting uncorrected error with background workload, the deferred error among uncorrected errors need to be specified by checking the deferred and poison bits of status register. v2: refine checking for deferred error v2: log possiable DEs among CEs v2: generate CPER records for DEs among UEs Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Srinivasan Shanmugam	2ec0a7c337	drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs Enable the cleaner shader for GFX11.5.0/11.5.1 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX11.5.0/11.5.1 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Mario Sopena-Novales <mario.novales@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Alex Deucher	5e93d0e335	drm/amdgpu/mes: clean up SDMA HQD loop Follow the same logic as the other IP types. Reviewed-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Alex Deucher	a52077b6b6	drm/amdgpu/mes: enable compute pipes across all MEC Enable pipes on both MECs for MES. Fixes: `745f46b6a9` ("drm/amdgpu: enable mes v12 self test") Acked-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Alex Deucher	652a06f74a	drm/amdgpu/mes: drop MES 10.x leftovers Leftover from MES bring up. There is no production MES support for MES 10.x. The rest of the MES 10.x code has already been removed so drop this. Acked-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:35 -04:00
Alex Deucher	5608ddf6e9	drm/amdgpu/mes: optimize compute loop handling Break when we get to the end of the supported pipes rather than continuing the loop. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Alex Deucher	3bae7916e7	drm/amdgpu/sdma: guilty tracking is per instance The gfx and page queues are per instance, so track them per instance. v2: drop extra parameter (Lijo) Fixes: `fdbfaaaae0` ("drm/amdgpu: Improve SDMA reset logic with guilty queue tracking") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Alex Deucher	e02fcf7308	drm/amdgpu/sdma: fix engine reset handling Move the kfd suspend/resume code into the caller. That is where the KFD is likely to detect a reset so on the KFD side there is no need to call them. Also add a mutex to lock the actual reset sequence. v2: make the locking per instance Fixes: `bac38ca8c4` ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+") Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Christian König	fc70d1ea1b	drm/amdgpu: remove invalid usage of sched.ready I can't count how often I had to remove this nonsense. Probably doesn't need an explanation any more. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Christian König	02ba7543f2	drm/amdgpu: add cleaner shader trace point Note when the cleaner shader is executed. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Christian König	1bb1314d0b	drm/amdgpu: add isolation trace point Note when we switch from one isolation owner to another. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Christian König	db1e58ec86	drm/amdgpu: stop reserving VMIDs to enforce isolation That was quite troublesome for gang submit. Completely drop this approach and enforce the isolation separately. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Christian König	b7fbcd77bb	drm/amdgpu: rework how the cleaner shader is emitted v3 Instead of emitting the cleaner shader for every job which has the enforce_isolation flag set only emit it for the first submission from every client. v2: add missing NULL check v3: fix another NULL pointer deref Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Christian König	bd22e44ad4	drm/amdgpu: rework how isolation is enforced v2 Limiting the number of available VMIDs to enforce isolation causes some issues with gang submit and applying certain HW workarounds which require multiple VMIDs to work correctly. So instead start to track all submissions to the relevant engines in a per partition data structure and use the dma_fences of the submissions to enforce isolation similar to what a VMID limit does. v2: use ~0l for jobs without isolation to distinct it from kernel submissions which uses NULL for the owner. Add some warning when we are OOM. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Christian König	7f11c59e07	drm/amdgpu: overwrite signaled fence in amdgpu_sync This allows using amdgpu_sync even without peeking into the fences for a long time. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Christian König	16590745b5	drm/amdgpu: use GFP_NOWAIT for memory allocations In the critical submission path memory allocations can't wait for reclaim since that can potentially wait for submissions to finish. Finally clean that up and mark most memory allocations in the critical path with GFP_NOWAIT. The only exception left is the dma_fence_array() used when no VMID is available, but that will be cleaned up later on. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Kenneth Feng	a67f0094c9	drm/amd/amdgpu: Revert "drm/amd/amdgpu: shorten the gfx idle worker timeout" This reverts commit `55ff973fe1`. Reason for revert: this causes some tests fail with call trace. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Ahmad Rehman	cfdf8b34b9	drm/amdgpu/sdam: Skip SDMA queue reset for SRIOV For SRIOV, skip the SDMA queue reset and return error. The engine/queue reset failure will trigger FLR in the sequence. v2: do not add queue reset support mask for sriov Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Ahmad Rehman	0a59fbd5d9	drm/amdgpu: Add support to load PSP TA v13.0.12 for SRIOV Add case for 13.0.12. Signed-off-by: Ahmad Rehman <ahrehman@amd.com> Reviewed-by: Vignesh.Chander@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Ellen Pan	d7f5c13e45	drm/amdgpu: Enable amdgpu_ras_resume for gfx 9.5.0 This enables ras to be resumed after gpu recovery on mi350 sriov. Signed-off-by: Ellen Pan <yunru.pan@amd.com> Reviewed-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Victor Skvortsov	9c05636ca7	drm/amdgpu: Skip pcie_replay_count sysfs creation for VF VFs cannot read the NAK_COUNTER register. This information is only available through PMFW metrics. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:27 -04:00
Candice Li	a5f7e90fe0	drm/amdgpu: Add active_umc_mask to ras init_flags Add active_umc_mask to ras init_flags. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:23 -04:00
Lijo Lazar	a7818b15cf	Documentation/amdgpu: Add debug_mask documentation Add description for debug_mask bit options. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:19 -04:00
Lijo Lazar	ab6893402a	drm/amd/pm: Add debug bit for smu pool allocation In certain cases, it's desirable to avoid PMFW log transactions to system memory. Add a mask bit to decide whether to allocate smu pool in device memory or system memory. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:13 -04:00
Alex Deucher	3b669df92c	drm/amdgpu/vcn: adjust workload profile handling No need to make the workload profile setup dependent on the results of cancelling the delayed work thread. We have all of the necessary checking in place for the workload profile reference counting, so separate the two. As it is now, we can theoretically end up with the call from begin_use happening while the worker thread is executing which would result in the profile not getting set for that submission. It should not affect the reference counting. v2: bail early if the the profile is already active (Lijo) Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:09 -04:00
Alex Deucher	9e34d8d1a1	drm/amdgpu/gfx: adjust workload profile handling No need to make the workload profile setup dependent on the results of cancelling the delayed work thread. We have all of the necessary checking in place for the workload profile reference counting, so separate the two. As it is now, we can theoretically end up with the call from begin_use happening while the worker thread is executing which would result in the profile not getting set for that submission. It should not affect the reference counting. v2: bail early if the the profile is already active (Lijo) Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:04 -04:00
Candice Li	5762f9dcf7	drm/amdgpu: Add EEPROM I2C address support for smu v13_0_12 Add EEPROM I2C address support for smu v13_0_12. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:00 -04:00
Alex Deucher	ca6575a32a	drm/amdgpu/vcn: fix ref counting for ring based profile handling We need to make sure the workload profile ref counts are balanced. This isn't currently the case because we can increment the count on submissions, but the decrement may be delayed as work comes in. Track when we enable the workload profile so the references are balanced. v2: switch to a mutex and active flag v3: fix mutex init Fixes: `1443dd3c67` ("drm/amd/pm: fix and simplify workload handling") Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:54:36 -04:00
Alex Deucher	553673a3e1	drm/amdgpu/gfx: fix ref counting for ring based profile handling We need to make sure the workload profile ref counts are balanced. This isn't currently the case because we can increment the count on submissions, but the decrement may be delayed as work comes in. Track when we enable the workload profile so the references are balanced. v2: switch to a mutex and active flag v3: fix mutex init Fixes: `8fdb3958e3` ("drm/amdgpu/gfx: add ring helpers for setting workload profile") Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Tested-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:52:56 -04:00
Christian König	0d9a95099d	drm/amdgpu: grab an additional reference on the gang fence v2 We keep the gang submission fence around in adev, make sure that it stays alive. v2: fix memory leak on retry Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:51:31 -04:00
David Belanger	35b6162bb7	drm/amdgpu: Restore uncached behaviour on GFX12 Always use MTYPE_UC if UNCACHED flag is specified. This makes kernarg region uncached and it restores usermode cache disable debug flag functionality. Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by shader code. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `eb6cdfb807`) Cc: stable@vger.kernel.org # 6.12.x	2025-03-18 16:29:52 -04:00
Wentao Liang	86730b5261	drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini() In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini() can only release 'pfp' field of 'gfx'. The release function of 'me' field should be gfx_v12_0_me_fini(). Fixes: `52cb80c12e` ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ebdc52607a`) Cc: stable@vger.kernel.org # 6.12.x	2025-03-18 16:29:16 -04:00
David Rosca	7fc0765208	drm/amdgpu: Remove JPEG from vega and carrizo video caps JPEG is only supported for VCN1+. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `0a6e7b06bd`) Cc: stable@vger.kernel.org	2025-03-18 16:26:12 -04:00
David Rosca	ec33964d9d	drm/amdgpu: Fix JPEG video caps max size for navi1x and raven 8192x8192 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6e0d2fde3a`) Cc: stable@vger.kernel.org	2025-03-18 16:25:46 -04:00
David Rosca	f0105e1731	drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size 1920x1088 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1a0807feb9`) Cc: stable@vger.kernel.org	2025-03-18 16:25:16 -04:00
Lijo Lazar	6c11d4a87d	drm/amdgpu: Use wafl version for xgmi XGMI and WAFL share the same versions. Use WAFL version if XGMI version is not present in discovery. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:47 -04:00
Jesse.zhang@amd.com	cc63bcfd14	drm/amdgpu: Fix SDMA engine reset logic The scheduler should restart only if the reset operation succeeds This ensures that new tasks are only submitted to the queues after a successful reset. Fixes: `4c02f73016` ("drm/amdgpu: Introduce conditional user queue suspension for SDMA resets") Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse.Zhang <Jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:47 -04:00
Flora Cui	b5aaa82e2b	drm/amdgpu: release xcp_mgr on exit Free on driver cleanup. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:47 -04:00
Xiang Liu	d6f9bbce18	drm/amdgpu: Fix computation for remain size of CPER ring The mistake of computation for remain size of CPER ring will cause unbreakable while cycle when CPER ring overflow. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:46 -04:00
Kenneth Feng	55ff973fe1	drm/amd/amdgpu: shorten the gfx idle worker timeout Shorten the gfx idle worker timeout. This is to sync with DAL when there is no activity on the screen. Original 1 second can not sync with DAL, so DAL can not apply MALL when the workload type is not bootup default. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:38 -04:00
Tao Zhou	05d50ea3ea	drm/amdgpu: format old RAS eeprom data into V3 version Clear old data and save it in V3 format. v2: only format eeprom data for new ASICs. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:38 -04:00
Alex Deucher	9deacd6c55	drm/amdgpu: don't free conflicting apertures for non-display devices PCI_CLASS_ACCELERATOR_PROCESSING devices won't ever be the sysfb, so there is no need to free conflicting apertures. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:38 -04:00
Alex Deucher	e00e5c2238	drm/amdgpu: adjust drm_firmware_drivers_only() handling Move to probe so we can check the PCI device type and only apply the drm_firmware_drivers_only() check for PCI DISPLAY classes. Also add a module parameter to override the nomodeset kernel parameter as a workaround for platforms that have this hardcoded on their kernel command lines. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:38 -04:00
Alex Deucher	4db4c82d4d	drm/amdgpu: drop drm_firmware_drivers_only() There are a number of systems and cloud providers out there that have nomodeset hardcoded in their kernel parameters to block nouveau for the nvidia driver. This prevents the amdgpu driver from loading. Unfortunately the end user cannot easily change this. The preferred way to block modules from loading is to use modprobe.blacklist=<driver>. That is what providers should be using to block specific drivers. Drop the check to allow the driver to load even when nomodeset is specified on the kernel command line. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:37 -04:00
David Belanger	eb6cdfb807	drm/amdgpu: Restore uncached behaviour on GFX12 Always use MTYPE_UC if UNCACHED flag is specified. This makes kernarg region uncached and it restores usermode cache disable debug flag functionality. Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by shader code. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:18:02 -04:00
Wentao Liang	ebdc52607a	drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini() In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini() can only release 'pfp' field of 'gfx'. The release function of 'me' field should be gfx_v12_0_me_fini(). Fixes: `52cb80c12e` ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:17:24 -04:00
Shaoyun Liu	f81cd79311	drm/amd/amdgpu: Fix MES init sequence When MES is been used , the set_hw_resource_1 API is required to initialize MES internal context correctly Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:16:08 -04:00
Xiang Liu	13c13bdd1b	drm/amdgpu: Enable ACA by default for psp v13_0_6/v13_0_14 Enable ACA by default for psp v13_0_6/v13_0_14. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:15:58 -04:00
ganglxie	a4b6e990d7	drm/amdgpu: Save PA of bad pages for old asics for old asics that do not support mca translating, we just save PA for them Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:13:08 -04:00
Emily Deng	2da3af5f0b	drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf. In sriov multiple vf, Set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:13:02 -04:00
Emily Deng	8d5e70ba5d	drm/amdgpu: Add amdgpu_sriov_multi_vf_mode function Use amdgpu_sriov_multi_vf_mode to replace amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev). Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:12:52 -04:00
Lijo Lazar	357506799b	drm/amdgpu: Calculate IP specific xgmi bandwidth Use IP version specific xgmi speed/width for bandwidth calculation. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:11:46 -04:00
Harish Kasiviswanathan	8a7820c072	drm/amdgpu: Reduce dequeue retry timeout for gfx9 family Dequeue retry timeout controls the interval between checks for unmet conditions. On MI series, reduce this from 0x40 to 0x1 (~ 1 uS). The cost of additional bandwidth consumed by CP when polling memory shouldn't be substantial. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:10:38 -04:00
Alex Deucher	fc3c139cf0	drm/amdgpu/gfx12: don't read registers in mqd init Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:10:30 -04:00
Alex Deucher	e27b36ea6b	drm/amdgpu/gfx11: don't read registers in mqd init Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:10:25 -04:00
Lijo Lazar	b9e75bcb2b	drm/amdgpu: Remove unsupported xgmi versions XGMI v4.8.0 is not used in any SOCs. Remove the associated functions. Also, ensure get_xgmi_info callback pointer is not NULL before calling the function. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:10:08 -04:00
David Rosca	19478f2011	drm/amdgpu: Update SRIOV video codec caps There have been multiple fixes to the video caps that are missing for SRIOV. Update the SRIOV caps with correct values. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:08:51 -04:00
David Rosca	0a6e7b06bd	drm/amdgpu: Remove JPEG from vega and carrizo video caps JPEG is only supported for VCN1+. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:08:25 -04:00
David Rosca	6e0d2fde3a	drm/amdgpu: Fix JPEG video caps max size for navi1x and raven 8192x8192 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:08:14 -04:00
David Rosca	1a0807feb9	drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size 1920x1088 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:07:52 -04:00
Natalie Vock	6cc30748e1	drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags PRT BOs may not have any backing store, so bo->tbo.resource will be NULL. Check for that before dereferencing. Fixes: `0cce5f285d` ("drm/amdkfd: Check correct memory types for is_system variable") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3e3fcd29b5`) Cc: stable@vger.kernel.org # 6.12.x	2025-03-12 14:59:21 -04:00
Natalie Vock	3e3fcd29b5	drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags PRT BOs may not have any backing store, so bo->tbo.resource will be NULL. Check for that before dereferencing. Fixes: `0cce5f285d` ("drm/amdkfd: Check correct memory types for is_system variable") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:37:10 -04:00
Alexandre Demers	37c890d831	drm/amdgpu: finish wiring up sid.h in DCE6 For coherence with DCE8 et DCE10, add or move some values under sid.h and remove duplicated from si_enums.h. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:37:07 -04:00
Alexandre Demers	760632fa2e	drm/amdgpu: fix SI's GB_ADDR_CONFIG_GOLDEN values and wire up sid.h in GFX6 By wiring up sid.h in GFX6, we end up with a few duplicated defines such as the golden registers. Let's clean this up. [TAHITI,VERDE, HAINAN]_GB_ADDR_CONFIG_GOLDEN were defined both in sid.h and under si_enums.h, with different values. Keep the values used under radeon and move them under gfx_v6_0.c where they are used (as it is done under cik) Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:36:48 -04:00
Alexandre Demers	20fb56dfd8	drm/amdgpu: prepare DCE6 uniformisation with DCE8 and DCE10 Let's begin the cleanup in sid.h to prevent warnings and errors when wiring sid.h into dce_v6_0.c. This is a bigger cleanup. Many defines found under sid.h have already been properly moved into the different "_d.h" and "_sh_mask.h", so they should have been already removed from sid.h and properly linked in where needed. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:36:42 -04:00
Dan Carpenter	0ee560d71f	drm/amdgpu/gfx: delete stray tabs These lines are indented one tab too far. Delete the extra tabs. Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:36:29 -04:00
André Almeida	099f273eff	drm/amdgpu: Trigger a wedged event for ring reset Instead of only triggering a wedged event for complete GPU resets, trigger for ring resets. Regardless of the reset, it's useful for userspace to know that it happened because the kernel will reject further submissions from that app. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-10 14:56:25 -04:00
Alex Deucher	ded6ad4c6e	drm/amdgpu/vce2: fix ip block reference Need to use the correct IP block type. VCE vs VCN. Fixes mclk issues on Hawaii. Suggested by selendym. Fixes: `82ae6619a4` ("drm/amdgpu: update the handle ptr in wait_for_idle") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3997 Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Cc: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `02438acd25`) Cc: stable@vger.kernel.org	2025-03-10 14:18:04 -04:00
Mario Limonciello	4afacc9948	drm/amd: Keep display off while going into S4 When userspace invokes S4 the flow is: 1) amdgpu_pmops_prepare() 2) amdgpu_pmops_freeze() 3) Create hibernation image 4) amdgpu_pmops_thaw() 5) Write out image to disk 6) Turn off system Then on resume amdgpu_pmops_restore() is called. This flow has a problem that because amdgpu_pmops_thaw() is called it will call amdgpu_device_resume() which will resume all of the GPU. This includes turning the display hardware back on and discovering connectors again. This is an unexpected experience for the display to turn back on. Adjust the flow so that during the S4 sequence display hardware is not turned back on. Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038 Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `68bfdc8dc0`)	2025-03-10 14:09:09 -04:00
Alex Deucher	02438acd25	drm/amdgpu/vce2: fix ip block reference Need to use the correct IP block type. VCE vs VCN. Fixes mclk issues on Hawaii. Suggested by selendym. Fixes: `82ae6619a4` ("drm/amdgpu: update the handle ptr in wait_for_idle") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3997 Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Cc: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-10 13:36:21 -04:00
Alex Deucher	2c01befe4a	drm/amdgpu/vcn: fix idle work handler for VCN 2.5 VCN 2.5 uses the PG callback to enable VCN DPM which is a global state. As such, we need to make sure all instances are in the same state. v2: switch to a ref count (Lijo) v3: switch to its own idle work handler v4: fix logic in DPG handling Fixes: `4ce4fe2720` ("drm/amdgpu/vcn: use per instance callbacks for idle work handler") Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-10 13:22:05 -04:00
Marek Olšák	3855f1d925	drm/amd/display: allow 256B DCC max compressed block sizes on gfx12 The hw supports it. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-10 13:22:01 -04:00
Greg Kroah-Hartman	993a47bd7b	Linux 6.14-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfOKBUeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1aQH/iC+Oyij4VxAjBek BOXIT/p6CwlIXb8ObiWWcRjDPizlcxb3RaV8J2RO+IqaQ2wltxpFANq2G7Re2FPm SNcEpIURAOVcxHGedcfFA91srO5F4FzNTO8LVp7MIbcgMYy3pdk+dbZmi6A691R+ t9pb74m+MAnF1o/MUx7pUlhAT/4ymuuR0F7WCSg4h0Xwe5m0nlJY89kJBC7PCjyd n3mdhsz3rDSLmt/z/T7HGD89r8sYSvm9cOKtL3ELgGTrm7boQV8ii9Y9w04DI8PQ JmIernugcCxmhH36mVUAHgJf2+/T388xFUh/D5+skeUOUZpaJZG866rnb32WpsHc eWLFUeg= =Wypt -----END PGP SIGNATURE----- Merge 6.14-rc6 into driver-core-next We need the driver core fix in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-10 17:37:25 +01:00
Dave Airlie	236f475d29	amdgpu: - Fix spelling typos - RAS updates - VCN 5.0.1 updates - SubVP fixes - DCN 4.0.1 fixes - MSO DPCD fixes - DIO encoder refactor - PCON fixes - Misc cleanups - DMCUB fixes - USB4 DP fixes - DM cleanups - Backlight cleanups and fixes - Support platform backlight curves - Misc code cleanups - SMU 14 fixes - JPEG 4.0.3 reset updates - SR-IOV fixes - SVM fixes - GC 12 DCC fixes - DC DCE 6.x fix - Hiberation fix amdkfd: - Fix possible NULL pointer in queue validation - Remove unnecessary CP domain validation - SDMA queue reset support - Add per process flags radeon: - Fix spelling typos - RS400 hyperZ fix UAPI: - Add KFD per process flags for setting precision Proposed user space: `2a64fa5e06` -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ8tfmgAKCRC93/aFa7yZ 2CmkAP4xoy09aHecz6WOdNN9VBJiSMSZ3UOyBL9Ll2i5nI3YegEA7Co+AmQpXOPn HKeD6Vy1tt1XE/Liuur5dbLUKsVYSAI= =IQ45 -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.15-2025-03-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amdgpu: - Fix spelling typos - RAS updates - VCN 5.0.1 updates - SubVP fixes - DCN 4.0.1 fixes - MSO DPCD fixes - DIO encoder refactor - PCON fixes - Misc cleanups - DMCUB fixes - USB4 DP fixes - DM cleanups - Backlight cleanups and fixes - Support platform backlight curves - Misc code cleanups - SMU 14 fixes - JPEG 4.0.3 reset updates - SR-IOV fixes - SVM fixes - GC 12 DCC fixes - DC DCE 6.x fix - Hiberation fix amdkfd: - Fix possible NULL pointer in queue validation - Remove unnecessary CP domain validation - SDMA queue reset support - Add per process flags radeon: - Fix spelling typos - RS400 hyperZ fix UAPI: - Add KFD per process flags for setting precision Proposed user space: `2a64fa5e06` Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307211051.1880472-1-alexander.deucher@amd.com	2025-03-10 09:04:52 +10:00
Mario Limonciello	68bfdc8dc0	drm/amd: Keep display off while going into S4 When userspace invokes S4 the flow is: 1) amdgpu_pmops_prepare() 2) amdgpu_pmops_freeze() 3) Create hibernation image 4) amdgpu_pmops_thaw() 5) Write out image to disk 6) Turn off system Then on resume amdgpu_pmops_restore() is called. This flow has a problem that because amdgpu_pmops_thaw() is called it will call amdgpu_device_resume() which will resume all of the GPU. This includes turning the display hardware back on and discovering connectors again. This is an unexpected experience for the display to turn back on. Adjust the flow so that during the S4 sequence display hardware is not turned back on. Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038 Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 15:33:49 -05:00
Alexandre Demers	092da9fb25	drm/amdgpu: add defines for pin_offsets in DCE8 Define pin_offsets values in the same way it is done in DCE8 Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:56:08 -05:00
Srinivasan Shanmugam	9c551ca3db	drm/amdgpu: Fix annotation for dce_v6_0_line_buffer_adjust function Updated description for the 'other_mode' parameter. This parameter is used to determine the display mode of another display controller that may be sharing the line buffer. Cc: Ken Wang <Qingqing.Wang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:56:04 -05:00
Xiang Liu	148084bbb1	drm/amdgpu: Use unique CPER record id across devices Encode socket id to CPER record id to be unique across devices. v2: add pointer check for adev->smuio.funcs->get_socket_id v2: set 0 if adev->smuio.funcs->get_socket_id is NULL Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:54:08 -05:00
Shiwu Zhang	216be476f1	drm/amdgpu: fix the gb_addr_config_fields init value mismatch For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten in gfx hw_init it is read out to popluate the gb_addr_config_fields in the sw_init stage, which causes mismatch. Fix it by using the golden value in sw_init as well. v2: This is a driver-set golden reg and keep as it is (Lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:58 -05:00
Shiwu Zhang	3bc7bc73af	drm/amdgpu: retire ip init code specific for A0 rev For aqua_vanjaram, A0 HW is retired so remove the code specific for it in gfx ip init. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:43 -05:00
Tao Zhou	334dc5fcc3	drm/amdgpu: increase RAS bad page threshold For default policy, driver will issue an RMA event when the number of bad pages is greater than 8 physical rows, rather than reaches 8 physical rows, don't rely on threshold configurable parameters in default mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:39 -05:00
Emily Deng	fe2fa3be3d	drm/amdgpu: Fix missing drain retry fault the last entry While the entry get in svm_range_unmap_from_cpu is the last entry, and the entry is page fault, it also need to be dropped. So for equal case, it also need to be dropped. v2: Only modify the svm_range_restore_pages. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Xiaogang Chen<xiaogang.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:30 -05:00
Victor Lu	94b0908b85	drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOV Aldebaran SRIOV VF cannot access the power brake feature regs. The accesses can be skipped to avoid a dmesg warning. v2: Remove redundant asic type check Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:24 -05:00
Charles Han	571d36837c	drm/amdgpu: fix inconsistent indenting warning Fix below inconsistent indenting smatch warning. smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:582 amdgpu_sdma_reset_engine() warn: inconsistent indenting Signed-off-by: Charles Han <hanchunchao@inspur.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:06 -05:00
Victor Lu	3646cc65e2	drm/amdgpu: Do not write to GRBM_CNTL if Aldebaran SRIOV Aldebaran SRIOV VF does not have write permissions to GRBM_CTNL. This access can be skipped to avoid a dmesg warning. v2: Use GC IP version check instead of asic check Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:52:29 -05:00
Sathishkumar S	a29936bcd2	drm/amdgpu: Fix core reset sequence for JPEG5_0_1 For cores 1 through 9 repair the core reset sequence by adjusting offsets to access the expected registers. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:36 -05:00
Jonathan Kim	bac38ca8c4	drm/amdkfd: implement per queue sdma reset for gfx 9.4+ To reset hung SDMA queues on GFX 9.4+ for the GFX9 family, a soft reset must be issued through SMU. Since soft resets will reset an entire SDMA engine, use a common KGD call to do the reset as the KGD will handle avoiding a reset of in flight GFX and paging queues on that engine. In addition, create a common call for all reset types to simplify the handling of module parameter settings that block gpu resets. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:26 -05:00
Victor Lu	057fef20b8	drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c SRIOV VF does not have write access to AGP BAR regs. Skip the writes to avoid a dmesg warning. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:21 -05:00
Sathishkumar S	20c34e5c4a	drm/amdgpu: Fix core reset sequence for JPEG4_0_3 For cores 1 through 7 repair the core reset sequence by adjusting offsets to access the expected registers. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:07 -05:00
Tony Yi	a91d91b600	drm/amdgpu: Add support for CPERs on virtualization Add support for CPERs on VFs. VFs do not receive PMFW messages directly; as such, they need to query them from the host. To avoid hitting host event guard, CPER queries need to be rate limited. CPER queries share the same RAS telemetry buffer as error count query, so a mutex protecting the shared buffer was added as well. For readability, the amdgpu_detect_virtualization was refactored into multiple individual functions. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:03 -05:00
James Zhu	ca17c8e149	drm/amdkfd: remove unnecessary cpu domain validation before move to GTT domain. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:46:55 -05:00
Tony Yi	d4c60219ac	drm/amdgpu: Update headers for CPER support on SRIOV Update amdgv_sriovmsg.h and mxgpu_nv.h to add new definitions for CPER support on VFs. PMFW ACA messages are not available on VFs, and VFs must query CPERs from host. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:46:40 -05:00
Lijo Lazar	6ef5ccaad7	drm/amdgpu: Reinit FW shared flags on VCN v5.0.1 After a full device reset, shared memory region will clear out and it's not possible to reliably save the region in case of RAS errors. Reinitialize the flags if required. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:46:33 -05:00
Lijo Lazar	6e09402098	drm/amdgpu: Use the right struct for VCN v5.0.1 VCN IP versions >= 5.0 uses VCN5 fw shared struct. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:46:26 -05:00
Alexandre Demers	ab23db6d08	drm/amdgpu: add dce_v6_0_soft_reset() to DCE6 DCE6 was missing soft reset, but it was easily identifiable under radeon. This should be it, pretty much as it is done under DCE8 and DCE10. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:45:30 -05:00
Alexandre Demers	5f6021d52b	drm/amdgpu: fix style in DCE6 Whitespace cleanups. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:45:27 -05:00
Alexandre Demers	029ab8cabd	drm/amdgpu: add some comments in DCE6 Add some comments. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:45:23 -05:00
Asad Kamal	8df5f03be5	drm/amdgpu: Set PG state to gating for vcn_v_5_0_1 For vcn_v_5_0_1, set power state to gating during hw fini. Also there may be scenario where VCN engine hangs during a job execution, then it's not safe to assume that set_pg_state works fine during hw_fini to put the state to gated. After a reset, we can assume that it's in the default state, therefore reset the driver maintained state. Put the default state as gated during reset as per this assumption. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:44:56 -05:00
Mario Limonciello	f729e63743	drm/amd: Pass luminance data to amdgpu_dm_backlight_caps The ATIF method on some systems will provide a backlight curve. Pass this curve into amdgpu_dm add it to the structures. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250228185145.186319-3-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:42:42 -05:00
Mario Limonciello	1c79b5fcdf	drm/amd: Copy entire structure in amdgpu_acpi_get_backlight_caps() As new members are introduced to the structure copying the entire structure will help avoid missing them. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250228185145.186319-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:42:37 -05:00
Lijo Lazar	a734a717dc	drm/amdgpu: Avoid HDP flush on JPEG v5.0.1 Similar to JPEG v4.0.3, HDP flush shouldn't be performed by JPEG engine. Keep it empty. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:38:01 -05:00
Lijo Lazar	6fcfaac604	drm/amdgpu: Initialize RRMT status on JPEG v5.0.1 Initialize RRMT enablement status from register. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:56 -05:00
Jesse.zhang@amd.com	77bd621d14	drm/amdgpu: Update SDMA scheduler mask handling to include page queue This patch updates the SDMA scheduler mask handling to include the page queue if it exists. The scheduler mask is calculated based on the number of SDMA instances and the presence of the page queue. The mask is updated to reflect the state of both the SDMA gfx ring and the page queue. Changes: - Add handling for the SDMA page queue in `amdgpu_debugfs_sdma_sched_mask_set`. - Update scheduler mask calculations to include the page queue. - Modify `amdgpu_debugfs_sdma_sched_mask_get` to return the correct mask value. This change is necessary to verify multiple queues (SDMA gfx queue + page queue) and ensure proper scheduling and state management for SDMA instances. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:42 -05:00
Lijo Lazar	0b9647d40e	drm/amdgpu: Add offset normalization in VCN v5.0.1 VCN v5.0.1 also will need register offset normalization. Reuse the logic from VCN v4.0.3. Also, avoid HDP flush similar to VCN v4.0.3 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:35 -05:00
Lijo Lazar	b5a3fc54e8	drm/amdgpu: Initialize RRMT status on VCN v5.0.1 Initialize RRMT status from register. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:31 -05:00
Xiang Liu	677ae51f49	drm/amdgpu: Free CPER entry after committing to ring Free CPER entry when it's committed to CPER ring to avoid memory leak. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:27 -05:00
Alexandre Demers	899634a57a	drm/amdgpu: fix spelling typos in SI Fix typos Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:23 -05:00
Alexandre Demers	ce43abd7ec	drm/amdgpu: fix spelling typos Found some typos while exploring amdgpu code. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:13 -05:00
Dave Airlie	e21cba7047	drm-misc-next for v6.15: Cross-subsystem Changes: bus: - mhi: Avoid access to uninitialized field Core Changes: - Fix docmentation dp: - Add helpers for LTTPR transparent mode sched: - Improve job peek/pop operations - Optimize layout of struct drm_sched_job Driver Changes: arc: - Convert to devm_platform_ioremap_resource() aspeed: - Convert to devm_platform_ioremap_resource() bridge: - ti-sn65dsi86: Support CONFIG_PWM tristate i915: - dp: Use helpers for LTTPR transparent mode mediatek: - Convert to devm_platform_ioremap_resource() msm: - dp: Use helpers for LTTPR transparent mode nouveau: - dp: Use helpers for LTTPR transparent mode panel: - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 repaper: - Fix integer overflows stm: - Convert to devm_platform_ioremap_resource() vc4: - Convert to devm_platform_ioremap_resource() -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmfAMnwACgkQaA3BHVML eiOszgf+NCaYdCCKXurnKy2DVHZ61ar+Fg/FHYs1oThFVqknRT9CBGY93g3I7VGs gYUATrH2fYt4daIF94ap4YLYuMrZSKXSd0duzr8lQzCVAb/UgvUap9yjGwJsCOZA bE2jxlKvo6cUJbq19+ATgrAQ7y31/w30PCPyBjMGR0t4C3bBbbCHpxFyKzDWCtFT JS6pNbCPblZCCvtFPmQRqQqLSE0K5x4SE49ZpHM6hhA4MavlwoN4ZkWincliSzF2 XVCo7QHpVy3SItshhOB7ch7pIdkbm6nUvW2poR3UdXe+B+OSKRIz0C+2E98bYAra vwkyWoUP2A9nKZBsNy2d5026DeUDwg== =KURF -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-02-27' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.15: Cross-subsystem Changes: bus: - mhi: Avoid access to uninitialized field Core Changes: - Fix docmentation dp: - Add helpers for LTTPR transparent mode sched: - Improve job peek/pop operations - Optimize layout of struct drm_sched_job Driver Changes: arc: - Convert to devm_platform_ioremap_resource() aspeed: - Convert to devm_platform_ioremap_resource() bridge: - ti-sn65dsi86: Support CONFIG_PWM tristate i915: - dp: Use helpers for LTTPR transparent mode mediatek: - Convert to devm_platform_ioremap_resource() msm: - dp: Use helpers for LTTPR transparent mode nouveau: - dp: Use helpers for LTTPR transparent mode panel: - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 repaper: - Fix integer overflows stm: - Convert to devm_platform_ioremap_resource() vc4: - Convert to devm_platform_ioremap_resource() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250227094041.GA114623@linux.fritz.box	2025-02-28 12:36:01 +10:00
Srinivasan Shanmugam	7d83c129a8	drm/amdgpu: Fix parameter annotation in vcn_v5_0_0_is_idle Update parameter description in the vcn_v5_0_0_is_idle function Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:05 -05:00
Srinivasan Shanmugam	0f3fda3117	drm/amdgpu: Fix parameter annotations for VCN clock gating functions The previous references to a non-existent `adev` parameter have been removed & corrected to reflect the use of the `vinst` pointer, which points to the VCN instance structure, in the below files: - vcn_v1_0.c - vcn_v2_0.c - vcn_v3_0.c Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Function parameter or struct member 'vinst' not described in 'vcn_v1_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Excess function parameter 'adev' description in 'vcn_v1_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Function parameter or struct member 'vinst' not described in 'vcn_v2_0_mc_resume' drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Excess function parameter 'adev' description in 'vcn_v2_0_mc_resume' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'adev' description in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'inst' description in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'adev' description in 'vcn_v3_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'inst' description in 'vcn_v3_0_enable_clock_gating' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:05 -05:00
Asad Kamal	f9234217d0	drm/amd/amdgpu: Add support for xgmi_v6_4_1 Add support for xgmi_v6_4_1 and use it appropriate places Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:04 -05:00
Lijo Lazar	485993e2f1	drm/amdgpu: Add xgmi speed/width related info Add APIs to initialize XGMI speed, width details and get to max bandwidth supported. It is assumed that a device only supports same generation of XGMI links with uniform width. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:04 -05:00
Lijo Lazar	6f16d101da	drm/amdgpu: Move xgmi definitions to xgmi header Move definitions related to xgmi to amdgpu_xgmi header Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:04 -05:00
André Almeida	9c696cc57c	drm/amdgpu: Create a debug option to disable ring reset Prior to the addition of ring reset, the debug option `debug_disable_soft_recovery` could be used to force a full device reset. Now that we have ring reset, create a debug option to disable them in amdgpu, forcing the driver to go with the full device reset path again when both options are combined. This option is useful for testing and debugging purposes when one wants to test the full reset from userspace. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:04 -05:00
Alex Deucher	1d72fc2e9e	drm/amdgpu/mes11: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls They are noops on GFX11 for most firmware versions. KFD already handles its own queues and they should already be unmapped at this point so even if this runs, it's not doing anything. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Colin Ian King	eaa3feb16d	drm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammar There is a spelling mistake and a grammatical error in a dev_err message. Fix it. Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Xiang Liu	00f85667fa	drm/amdgpu: Decode deferred error type in aca bank parser In the case of poison inband log, the error type need to be specified by checking the deferred or poison bit of status register. v2: check both deferred and poison bit Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Le Ma	5b5f01eff7	drm/amdgpu: add sdma page queue irq processing for sdma442 Add the trap irq processing for page queue of sdma442 Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by and Tested-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Xiang Liu	d4bd7a50ca	drm/amdgpu: Report generic instead of unknown boot time errors Change the DMESG reporting of unknown errors to "Boot Controller Generic Error" to align with the RAS SPEC and provide more clarity to customers. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Lijo Lazar	b965e42530	drm/amdgpu: Fix logic to fetch supported NPS modes Correct the logic to find supported NPS modes from firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Ava Zhang <niandong.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Fixes: `30eb41f5d1` ("drm/amdgpu: Use firmware supported NPS modes") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Xiang Liu	906d2859e1	drm/amdgpu: Disable fru_id field in CPER section The fru_id field is disabled cause of mis-matching defination between CPER spec and driver. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Benjamin Chan	dce1b82398	drm/amdgpu: Add amdisp pinctrl MFD resource AMDISP GPIO control uses a dedicated pinctrl driver, and requires MFD hotadd GPIO resources. Co-developed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Benjamin Chan <benjamin.chan@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:49 -05:00
Alex Deucher	4343f814e5	drm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls They are noops on GFX12. There is no suspend/resume all support in firmware so the function doesn't do anything. KFD already handles its own queues and they should already be unmapped at this point so even if this runs, it's not doing anything. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:43 -05:00
Pratap Nirujogi	a67e75beff	drm/amdgpu: Replace DRM_ERROR() with drm_err() DRM_ERROR() is no longer preferred. Replace DRM_ERROR() usage with drm_err() in isp driver. Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:36 -05:00
Alex Deucher	4d1b653571	drm/amdgpu/vcn: use dev_info() for firmware information To properly handle multiple GPUs. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:32 -05:00
Alex Deucher	c51aa7923e	drm/amdgpu/vcn: optimize firmware storage If each instance uses the same fw image, only store one copy in the driver. Acked-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:32 -05:00
Alex Deucher	31a37dfc8f	drm/amdgpu/vcn5.0.1: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:32 -05:00
Alex Deucher	9b648fa54c	drm/amdgpu/vcn5.0.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:32 -05:00
Alex Deucher	4bb5879322	drm/amdgpu/vcn4.0.5: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	1ee6b2bff2	drm/amdgpu/vcn4.0.3: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	8bdfa5756b	drm/amdgpu/vcn4.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	38c0d9882a	drm/amdgpu/vcn3.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	bd32af6faa	drm/amdgpu/vcn2.5: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	3389dd059f	drm/amdgpu/vcn2.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	cac3dc89f2	drm/amdgpu/vcn1.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	a2cf2a883c	drm/amdgpu/vcn: add a generic helper for set_power_gating_state It's common for all VCN variants. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	4ce4fe2720	drm/amdgpu/vcn: use per instance callbacks for idle work handler Use the vcn instance power gating callbacks rather than the IP powergating callback. This limits power gating to only the instance in use rather than all of the instances. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	592846e3fe	drm/amdgpu/vcn5.0.1: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	f2eb0a66ca	drm/amdgpu/vcn5.0.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	f9993efed7	drm/amdgpu/vcn4.0.5: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	39fb77a8d3	drm/amdgpu/vcn4.0.3: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	8b18f03142	drm/amdgpu/vcn4.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	bda37b68f6	drm/amdgpu/vcn3.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	307ce8bdc6	drm/amdgpu/vcn2.5: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	40c6d55806	drm/amdgpu/vcn2.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	c5ed3655cd	drm/amdgpu/vcn1.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	55945f08d9	drm/amdgpu/vcn: add new per instance callback for powergating This is per instance so add a new function pointer for it. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	64303b72de	drm/amdgpu/vcn: adjust pause_dpg_mode function signature Change it to take a vcn instance rather than adev to align with the vcn instance changes. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	0a3fb7338f	drm/amdgpu/vcn5.0.1: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	e3eb71cd69	drm/amdgpu/vcn5.0.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	c07c0c0df9	drm/amdgpu/vcn4.0.5: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	4a23b9c670	drm/amdgpu/vcn4.0.3: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	259873561f	drm/amdgpu/vcn4.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	f1ab687040	drm/amdgpu/vcn2.5: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	38a404f8af	drm/amdgpu/vcn2.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. v2: index instances directly on vcn1.0 and 2.0 to make it clear that they only support a single instance (Lijo) Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	201fee333d	drm/amdgpu/vcn1.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	710151263c	drm/amdgpu/vcn3.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	f98675638f	drm/amdgpu/vcn: switch vcn helpers to be instance based Pass the instance to the helpers. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	cb10727168	drm/amdgpu/vcn: move more instanced data to vcn_instance Move more per instance data into the per instance structure. v2: index instances directly on vcn1.0 and 2.0 to make it clear that they only support a single instance (Lijo) v3: fix typo on vcn 2.5 Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> (v2) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	9bf9442051	drm/amdgpu/vcn: make powergating status per instance Store it per instance so we can track it per instance. v2: index instances directly on vcn1.0 and 2.0 to make it clear that they only support a single instance (Lijo) Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	bee48570cf	drm/amdgpu/vcn: switch work handler to be per instance Have a separate work handler for each VCN instance. This paves the way for per instance VCN power gating at runtime. v2: index instances directly on vcn1.0 and 2.0 to make it clear that they only support a single instance (Lijo) Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	94629182f3	drm/amdgpu/vcn5.0.1: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	0797c54502	drm/amdgpu/vcn5.0.0: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	ecc9ab4e92	drm/amdgpu/vcn4.0.5: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	5826d5a5d5	drm/amdgpu/vcn4.0.3: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	f4cd7a85db	drm/amdgpu/vcn4.0: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	d39f1bb577	drm/amdgpu/vcn3.0: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	dae8700198	drm/amdgpu/vcn2.5: fix VCN stop logic Need to make sure we call amdgpu_dpm_enable_vcn() in vcn_v2_5_stop() at the end if there are errors or DPG is enabled. Fixes: `ebc25499de` ("drm/amdgpu/vcn2.5: split code along instances") Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Suggested-by: Boyuan Zhang <boyuan.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:11 -05:00
Dave Airlie	425b848175	amd-drm-next-6.15-2025-02-21: amdgpu: - Add OEM i2c support for RGB lights, etc. - Add support for GC 11.5.3 - Add support for GC 11.5.2 - Add support for SDMA 6.1.3 - Add support for NBIO 7.11.2 - Add support for NBIO 7.9.1 - Add support for MMHUB 3.3.2 - Add support for MMHUB 1.8.1 - Add support for SMU 14.0.5 - Add support for SMUIO 13.0.11 - Add support for PSP 14.0.5 - Add support for UMC 12.5.0 - Add support for DCN 3.6.0 - JPEG 4.0.3 updates - Add dynamic workload profile switching for GC 10-12 - support larger vbios sizes - GC 9.5.0 updates - SMU 13.0.12 updates - SMU 13.0.6 updates - IP discovery updates - GC 10 queue reset updates - DCN 4.0.1 updates - UHBR link rate fixes - Aborted suspend fix - Mark gttsize parameter as deprecated - GC 10 cleaner shader updates - PSR-SU fixes - Clean up PM4 headers - Cursor fixes - Enable devcoredump for JPEG - Misc cleanups - Runpm cleanups - MES updates - GC 9 gfxoff fixes - Vbios fetching cleanups - Documentation updates - Update secondary plane handling - DML2 updates - SDMA fixes for MI - Cleaner shader fixes for GC 11/12 - ACA updates - Initial JPEG queue reset support - RAS updates - Initial RAS CPER support - DCN/DCE panic screen handling cleanup - BT2020 fixes - SR-IOV fixes amdkfd: - synchronize pasid values between KGD and KFD - Misc cleanups - Improve GTT/VRAM handling for APUs - Topology updates - Fix user queue validation on GC 7/8 UAPI: - Enable "Broadcast RGB" drm property - Add INFO IOCTL query for virtualization mode Proposed userspace: `e663bed7d6` -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ7jwVwAKCRC93/aFa7yZ 2EBmAP42gCX0ESuLJlwLTrobhs9mALp5g6AClLTSBuxNcgfOJwEAvqq12R2db8F9 hFF3rq07C8YaL3bMIi3IhqIXVsnUBgo= =uvFs -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.15-2025-02-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.15-2025-02-21: amdgpu: - Add OEM i2c support for RGB lights, etc. - Add support for GC 11.5.3 - Add support for GC 11.5.2 - Add support for SDMA 6.1.3 - Add support for NBIO 7.11.2 - Add support for NBIO 7.9.1 - Add support for MMHUB 3.3.2 - Add support for MMHUB 1.8.1 - Add support for SMU 14.0.5 - Add support for SMUIO 13.0.11 - Add support for PSP 14.0.5 - Add support for UMC 12.5.0 - Add support for DCN 3.6.0 - JPEG 4.0.3 updates - Add dynamic workload profile switching for GC 10-12 - support larger vbios sizes - GC 9.5.0 updates - SMU 13.0.12 updates - SMU 13.0.6 updates - IP discovery updates - GC 10 queue reset updates - DCN 4.0.1 updates - UHBR link rate fixes - Aborted suspend fix - Mark gttsize parameter as deprecated - GC 10 cleaner shader updates - PSR-SU fixes - Clean up PM4 headers - Cursor fixes - Enable devcoredump for JPEG - Misc cleanups - Runpm cleanups - MES updates - GC 9 gfxoff fixes - Vbios fetching cleanups - Documentation updates - Update secondary plane handling - DML2 updates - SDMA fixes for MI - Cleaner shader fixes for GC 11/12 - ACA updates - Initial JPEG queue reset support - RAS updates - Initial RAS CPER support - DCN/DCE panic screen handling cleanup - BT2020 fixes - SR-IOV fixes amdkfd: - synchronize pasid values between KGD and KFD - Misc cleanups - Improve GTT/VRAM handling for APUs - Topology updates - Fix user queue validation on GC 7/8 UAPI: - Enable "Broadcast RGB" drm property - Add INFO IOCTL query for virtualization mode Proposed userspace: `e663bed7d6` From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221213651.4176031-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-02-26 16:41:44 +10:00
Pierre-Eric Pelloux-Prayer	d3c7059b6a	drm/amdgpu: init return value in amdgpu_ttm_clear_buffer Otherwise an uninitialized value can be returned if amdgpu_res_cleared returns true for all regions. Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `7c62aacc3b`) Cc: stable@vger.kernel.org	2025-02-25 12:29:12 -05:00
Alex Deucher	748a1f51bb	drm/amdgpu/mes: keep enforce isolation up to date Re-send the mes message on resume to make sure the mes state is up to date. Fixes: `8521e3c5f0` ("drm/amd/amdgpu: limit single process inside MES") Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `27b7915147`)	2025-02-25 12:23:48 -05:00
Alex Deucher	e7ea88207c	drm/amdgpu/gfx: only call mes for enforce isolation if supported This should not be called on chips without MES so check if MES is enabled and if the cleaner shader is supported. Fixes: `8521e3c5f0` ("drm/amd/amdgpu: limit single process inside MES") Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> (cherry picked from commit `80513e3897`)	2025-02-25 12:22:56 -05:00
Alex Deucher	099bffc7ca	drm/amdgpu: disable BAR resize on Dell G5 SE There was a quirk added to add a workaround for a Sapphire RX 5600 XT Pulse that didn't allow BAR resizing. However, the quirk caused a regression with runtime pm on Dell laptops using those chips, rather than narrowing the scope of the resizing quirk, add a quirk to prevent amdgpu from resizing the BAR on those Dell platforms unless runtime pm is disabled. v2: update commit message, add runpm check Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: `907830b0fc` ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5235053f44`) Cc: stable@vger.kernel.org	2025-02-25 12:20:27 -05:00
Tao Zhou	dab993bf15	drm/amdgpu: increase AMDGPU_MAX_RINGS Increase it since a cper ring is introduced. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:13 -05:00
Srinivasan Shanmugam	59f9c2c9f6	drm/amdgpu: Fix correct parameter desc for VCN idle check functions Fixes the kdoc for the following VCN idle check functions by updating the parameter description from 'handle' to 'ip_block': - vcn_v4_0_is_idle - vcn_v4_0_3_is_idle - vcn_v4_0_5_is_idle - vcn_v5_0_1_is_idle Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_1_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Excess function parameter 'handle' description in 'vcn_v5_0_1_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Excess function parameter 'handle' description in 'vcn_v4_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_3_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Excess function parameter 'handle' description in 'vcn_v4_0_3_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_is_idle' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:13 -05:00
Pierre-Eric Pelloux-Prayer	7c62aacc3b	drm/amdgpu: init return value in amdgpu_ttm_clear_buffer Otherwise an uninitialized value can be returned if amdgpu_res_cleared returns true for all regions. Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
ganglxie	a8f921a10a	drm/amdgpu: Change page/record number calculation based on nps save only one record to save eeprom space,and bad_page_num = pa_rec_num + mca_rec_num*16 Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
ganglxie	0153d27673	drm/amdgpu: Refine bad page adding bad page adding can be simpler with nps info Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Jesse.zhang@amd.com	e4e6ae41cc	drm/amdgpu: update SDMA sysfs reset mask in late_init - Added `sdma_v4_4_2_update_reset_mask` function to update the reset mask. - update the sysfs reset mask to the `late_init` stage to ensure that the SMU initialization and capability setup are completed before checking the SDMA reset capability. - For IP versions 9.4.3 and 9.4.4, enable per-queue reset if the MEC firmware version is at least 0xb0 and PMFW supports queue reset. - Add a TODO comment for future support of per-queue reset for IP version 9.5.0. This change ensures that per-queue reset is only enabled when the MEC and PMFW support it. v2: fix ip version (9.5.4 -> 9.5.0)(Lijo) Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Xiang Liu	ff930483af	drm/amdgpu: Set CPER enabled flag after ring initiailized Setting cper.enabled to be true only after cper ring is successfully created. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
ganglxie	f2510355fb	drm/amdgpu: Save nps to eeprom nps info saved together with bad page makes bad page parsing more efficient Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Xiang Liu	ce615fe328	drm/amdgpu: Check if CPER enabled when generating CPER In the case of CPER disabled, generating CPER will cause kernel NULL pointer dereference without checking. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Jonathan Kim	9424a5bf08	drm/amdgpu: simplify xgmi peer info calls Deprecate KFD XGMI peer info calls in favour of calling directly from simplified XGMI peer info functions. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Dr. David Alan Gilbert	9d8af72fe7	drm/amdgpu: Remove unused nbif_v6_3_1_sriov_funcs The nbif_v6_3_1_sriov_funcs instance of amdgpu_nbio_funcs was added in commit `894c6d3522` ("drm/amdgpu: Add nbif v6_3_1 ip block support") but has remained unused. Alex has confirmed it wasn't needed. Remove it, together with the four unused stub functions: nbif_v6_3_1_sriov_ih_doorbell_range nbif_v6_3_1_sriov_gc_doorbell_init nbif_v6_3_1_sriov_vcn_doorbell_range nbif_v6_3_1_sriov_sdma_doorbell_range Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:01 -05:00
Sathishkumar S	62431979dd	drm/amdgpu: Add ring reset callback for JPEG5_0_1 Add ring reset function callback for JPEG5_0_1 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
André Almeida	b7fd6528b5	drm/amdgpu: Log after a successful ring reset When a ring reset happens, the kernel log shows only "amdgpu: Starting <ring name> ring reset", but when it finishes nothing appears in the log. Explicitly write in the log that the reset has finished correctly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
André Almeida	28d05f0836	drm/amdgpu: Log the creation of a coredump file After a GPU reset happens, the driver creates a coredump file. However, the user might not be aware of it. Log the file creation the user can find more information about the device and add the file to bug reports. This is similar to what the xe driver does. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Alex Deucher	27b7915147	drm/amdgpu/mes: keep enforce isolation up to date Re-send the mes message on resume to make sure the mes state is up to date. Fixes: `8521e3c5f0` ("drm/amd/amdgpu: limit single process inside MES") Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Sathishkumar S	9b71be8785	drm/amdgpu: Add core reset registers for JPEG5_0_1 Add core reset control register definitions and align all prior register definitions to end at 100 column length for uniformity. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Sathishkumar S	da120ed561	drm/amdgpu: Per-instance init func for JPEG5_0_1 Add helper functions to handle per-instance and per-core initialization and deinitialization in JPEG5_0_1. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Alex Deucher	5235053f44	drm/amdgpu: disable BAR resize on Dell G5 SE There was a quirk added to add a workaround for a Sapphire RX 5600 XT Pulse that didn't allow BAR resizing. However, the quirk caused a regression with runtime pm on Dell laptops using those chips, rather than narrowing the scope of the resizing quirk, add a quirk to prevent amdgpu from resizing the BAR on those Dell platforms unless runtime pm is disabled. v2: update commit message, add runpm check Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: `907830b0fc` ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Asad Kamal	25907304cf	drm/amd/pm: Fetch fru product info for smu_v13_0_12 Fetch fru product info for smu_v13_0_12 from static metrics table v2: Field by field copy for fru info(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Jesse.zhang@amd.com	c94943b086	drm/amdgpu: Update amdgpu_job_timedout to check if the ring is guilty This patch updates the `amdgpu_job_timedout` function to check if the ring is actually guilty of causing the timeout. If not, it skips error handling and fence completion. v2: move the is_guilty check down into the queue reset area (Alex) v3: need to call is_guilty before reset (Alex) v4: squash in is_guilty logic fixes (Alex) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	8225254492	drm/amdgpu: Add reset function pointer for SDMA v4.4.2 page ring This patch adds a reset function pointer to the SDMA v4.4.2 page ring functionality. The new function pointer `reset` is set to `sdma_v4_4_2_reset_queue`, which is responsible for resetting the SDMA queue. Changes: - Add `reset` function pointer to `sdma_v4_4_2_page_ring_funcs`. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	fdbfaaaae0	drm/amdgpu: Improve SDMA reset logic with guilty queue tracking This patch includes the remaining improvements to the SDMA reset logic: - Added `gfx_guilty` and `page_guilty` flags to track guilty queues. - Updated the reset and resume functions to handle the guilty state. - Cached the `rptr` before reset. v2: 1.replace the caller with a guilty bool. If the queue is the guilty one, set the rptr and wptr to the saved wptr value, else, set the rptr and wptr to the saved rptr value. (Alex) 2. cache the rptr before the reset. (Alex) v3: Keeping intermediate variables like u64 rwptr simplifies resotre rptr/wptr.(Lijo) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	0ad649321a	drm/amdgpu/sdma: Introduce is_guilty callbacks for sdma GFX and PAGE rings This patch introduces the `is_guilty` callbacks for the GFX and PAGE rings. These callbacks check if a ring is guilty of causing a timeout or error. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	4d3c4f4f7f	drm/amdgpu: Introduce cached_rptr and is_guilty callback in amdgpu_ring This patch introduces the following changes: - Add `cached_rptr` to the `amdgpu_ring` structure to store the read pointer before a reset. - Add `is_guilty` callback to the `amdgpu_ring_funcs` structure to check if a ring is guilty of causing a timeout. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	4c02f73016	drm/amdgpu: Introduce conditional user queue suspension for SDMA resets - Modify the `amdgpu_sdma_reset_engine` function to accept a `suspend_user_queues` parameter. - This parameter allows the function to conditionally suspend and resume user queues during SDMA resets. - Ensure that user queues are suspended only when necessary to avoid unnecessary overhead and potential deadlocks. - Restart the scheduler's work queue for the GFX and page rings after the reset to allow new tasks to be submitted. This change improves synchronization between the KGD and the KFD during SDMA resets, ensuring proper handling of user queues and avoiding race conditions. V2: replace the ring_lock with the existed the scheduler locks for the queues (ring->sched) on the sdma engine.(Alex) v3: call drm_sched_wqueue_stop() rather than job_list_lock. If a GPU ring reset was already initiated for one ring at amdgpu_job_timedout, skip resetting that ring and call drm_sched_wqueue_stop() for the other rings (Alex) replace the common lock (sdma_reset_lock) with DQM lock to to resolve reset races between the two driver sections during KFD eviction.(Jon) Rename the caller to Reset_src and Change AMDGPU_RESET_SRC_SDMA_KGD/KFD to AMDGPU_RESET_SRC_SDMA_HWS/RING (Jon) v4: restart the wqueue if the reset was successful, or fall back to a full adapter reset. (Alex) move definition of reset source to enumeration AMDGPU_RESET_SRCS, and check reset src in amdgpu_sdma_reset_instance (Jon) v5: Call amdgpu_amdkfd_suspend/resume at the start/end of reset function respectively under !SRC_HWS conditions only (Jon) v6: replace the paramter src with a bool suspend_user_queues, remove the paramter src in pre/post func. (Jon) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Lijo Lazar	0ca5751560	drm/amdgpu: Remove redundant logic in GC v9.4.3 GFXOFF check is not needed for GC v9.4.3. Also, save/restore list is available by default. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Sathishkumar S	793ee232ee	drm/amdgpu: Do not poweroff UVDJ in JPEG4_0_3 Update power gate setting to not poweroff UVDJ in JPEG4_0_3. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Jesse.zhang@amd.com	d6e6ea5efb	drm/amdgpu/sdma: Refactor SDMA reset functionality and add callback support This patch refactors the SDMA reset functionality in the `sdma_v4_4_2` driver to improve modularity and support shared usage between AMDGPU and KFD. The changes include: 1. Refactored SDMA Reset Logic: - Split the `sdma_v4_4_2_reset_queue` function into two separate functions: - `sdma_v4_4_2_stop_queue`: Stops the SDMA queue before reset. - `sdma_v4_4_2_restore_queue`: Restores the SDMA queue after reset. - These functions are now used as callbacks for the shared reset mechanism. 2. Added Callback Support: - Introduced a new structure `sdma_v4_4_2_reset_funcs` to hold the stop and restore callbacks. - Added `sdma_v4_4_2_set_reset_funcs` to register these callbacks with the shared reset mechanism using `amdgpu_set_on_reset_callbacks`. 3. Fixed Reset Queue Function: - Modified `sdma_v4_4_2_reset_queue` to use the shared `amdgpu_sdma_reset_queue` function, ensuring consistency across the driver. This patch ensures that SDMA reset functionality is more modular, reusable, and aligned with the shared reset mechanism between AMDGPU and KFD. v2: Renamed sdma_v4_4_2_set_reset_funcs to sdma_v4_4_2_set_engine_reset_funcs. Renamed sdma_v4_4_2_reset_funcs to sdma_v4_4_2_engine_reset_funcs.(Alex) Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Jesse.zhang@amd.com	f33044952c	drm/amdgpu/kfd: Add shared SDMA reset functionality with callback support This patch introduces shared SDMA reset functionality between AMDGPU and KFD. The implementation includes the following key changes: 1. Added `amdgpu_sdma_reset_queue`: - Resets a specific SDMA queue by instance ID. - Invokes registered pre-reset and post-reset callbacks to allow KFD and AMDGPU to save/restore their state during the reset process. 2. Added `amdgpu_set_on_reset_callbacks`: - Allows KFD and AMDGPU to register callback functions for pre-reset and post-reset operations. - Callbacks are stored in a global linked list and invoked in the correct order during SDMA reset. This patch ensures that both AMDGPU and KFD can handle SDMA reset events gracefully, with proper state saving and restoration. It also provides a flexible callback mechanism for future extensions. v2: fix CamelCase and put the SDMA helper into amdgpu_sdma.c (Alex) v3: rename the `amdgpu_register_on_reset_callbacks` function to `amdgpu_sdma_register_on_reset_callbacks` move global reset_callback_list to struct amdgpu_sdma (Alex) v4: Update the reset callback function description and rename the reset function to amdgpu_sdma_reset_engine (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Likun Gao	71209c9663	drm/amdgpu: correct the name of mes_pipe structure Correct the structure name admgpu_mes_pipe to amdgpu_mes_pipe. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Sunil Khatri	7dc3405403	drm/amdgpu: update the handle ptr in is_idle Update the *handle to amdgpu_ip_block ptr for all functions pointers of is_idle. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Thomas Zimmermann	130377304e	Merge drm/drm-next into drm-misc-next Backmerging to get fixes from v6.14-rc4. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2025-02-25 11:43:10 +01:00
Dave Airlie	fb51bf0255	Linux 6.14-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAme7hfkeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGU+IH/1bk6zIvAwXXS5yu KNsQ8dEkC3Xme6HqLtPsAhRLF+5YJf6MaGm1ip5dDMyIvasa2gwvCQQQoOpeMbKj 79VKT+m9t3szMHZaQYjlOuYHBmNSJ4cMCD2Qh6ktXHGPfTTWDFGf7fBwBOkVNeJU 1Ask+bxeop21aJMhfYXrUta3OYyerLBUR6jCiCM82A/GLtdv6oNGXBu3ygDt9Tjx ZHSl+CYjKpmGUP8JnMKwCBHVguEfqgzZ//dY1H16AvOLed9k2jkMFn8O5Vi3vjnx TWMMXoiJimuamGzbjxtCCqzxNlFFDT4gRpDqeJxb16W/gDTFmbRr9LDjNehCZe33 AigLZ6M= =Y/7F -----END PGP SIGNATURE----- Merge tag 'v6.14-rc4' into drm-next Backmerge Linux 6.14-rc4 at the request of tzimmermann so misc-next can base on rc4. Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-02-25 17:36:09 +10:00
Tvrtko Ursulin	80b6ef8ae2	drm/amdgpu: Pop jobs from the queue more robustly Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a newly added drm_sched_entity_queue_pop. This allows breaking the hidden dependency that queue_node has to be the first element in struct drm_sched_job. A comment is also added with a reference to the mailing list discussion explaining the copied helper will be removed when the whole broken amdgpu_job_stop_all_jobs_on_sched is removed. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Cc: Zhang, Hawking <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-3-tvrtko.ursulin@igalia.com	2025-02-24 10:17:40 +01:00
Christian König	cb0de06d1b	drm/amdgpu: remove all KFD fences from the BO on release Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-21 10:41:49 -05:00
Thomas Weißschuh	2d0f5001b6	drm/amdgpu: Constify 'struct bin_attribute' The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241216-sysfs-const-bin_attr-drm-v1-4-210f2b36b9bf@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-21 09:20:31 +01:00
Sunil Khatri	3521276ad1	drm/amdgpu: update the handle ptr in get_clockgating_state Update the *handle to amdgpu_ip_block ptr for all functions pointers of get_clockgating_state. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:19:05 -05:00
Xiang Liu	2f94469cc0	drm/amdgpu: Remove redundant check of adev There is no need to check adev for sure. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:16:37 -05:00
Xiang Liu	663a87763b	drm/amdgpu: Check aca enabled inside cper init/fini func Move code about checking aca enabled to the cper init/fini function to make code clean. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:16:33 -05:00
Lijo Lazar	30eb41f5d1	drm/amdgpu: Use firmware supported NPS modes If firmware supported NPS modes are available through CAP register, use those values for supported NPS modes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:16:29 -05:00
Sathishkumar S	c4c3808feb	drm/amdgpu: Add ring reset callback for JPEG4_0_3 Add ring reset function callback for JPEG4_0_3 to recover from job timeouts without a full gpu reset. V2: - sched->ready flag shouldn't be modified by HW backend (Christian) V3: - Dont modifying sched/job-submission state from HW backend (Christian) - Implement per-core reset sequence V4: - Dont create reset_mask sysfs and return -EOPNOTSUPP on VFs (Lijo) Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:16:11 -05:00
Srinivasan Shanmugam	dc0297f319	drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu] [ 253.017493] local_pci_probe+0x4b/0xb0 [ 253.017746] work_for_cpu_fn+0x1a/0x30 [ 253.017995] process_one_work+0x21e/0x680 [ 253.018248] worker_thread+0x190/0x330 [ 253.018500] ? __pfx_worker_thread+0x10/0x10 [ 253.018746] kthread+0xe7/0x120 [ 253.018988] ? __pfx_kthread+0x10/0x10 [ 253.019231] ret_from_fork+0x3c/0x60 [ 253.019468] ? __pfx_kthread+0x10/0x10 [ 253.019701] ret_from_fork_asm+0x1a/0x30 [ 253.019939] </TASK> v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian). Fixes: `e864180ee4` ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <lin.cao@amd.com> Cc: Jingwen Chen <Jingwen.Chen2@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Zhigang Luo <zhigang.luo@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:14:38 -05:00
Nam Cao	690d59fee8	drm/amdgpu: Switch to use hrtimer_setup() hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Patch was created by using Coccinelle. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Zack Rusin <zack.rusin@broadcom.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/all/2e3664eebb00d3a8c786ee7cc1fba8096bababc9.1738746904.git.namcao@linutronix.de	2025-02-18 11:19:05 +01:00
Thomas Zimmermann	8bd1a8e757	Merge drm/drm-next into drm-misc-next Backmerging to get bugfixes from v6.14-rc2. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2025-02-18 07:43:43 +01:00
Xiang Liu	f9d35b945c	drm/amdgpu: Generate bad page threshold cper records Generate CPER record when bad page threshold exceed and commit to CPER ring. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:30 -05:00
Xiang Liu	4058e7cbfd	drm/amdgpu: Commit CPER entry Commit the CPER entry to the ring buffer. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:30 -05:00
Tao Zhou	8652920d2c	drm/amdgpu: add mutex lock for cper ring Avoid the confliction between read and write of ring buffer. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:30 -05:00
Tao Zhou	a6d9d19290	drm/amdgpu: add data write function for CPER ring Old CPER data will be overwritten if ring buffer is full, and read pointer always points to CPER header. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:30 -05:00
Tao Zhou	5a14282429	drm/amdgpu: read CPER ring via debugfs We read CPER data from read pointer to write pointer without changing the pointers. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Tao Zhou	4d614ce8ff	drm/amdgpu: add RAS CPER ring buffer And initialize it, this is a pure software ring to store RAS CPER data. v2: change ring size to 0x100000 v2: update the initialization of count_dw of cper ring, it's dword variable v3: skip VM inv eng for cper v3: init/fini when aca enabled Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Xiang Liu	b3060f5bea	drm/amdgpu: Get timestamp from system time Get system local time and encode it to timestamp for CPER. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Alex Deucher	f3e10e1a0c	drm/amdgpu/mes12: allocate hw_resource_1 buffer once Allocate the buffer at sw init time so we don't alloc and free it for every suspend/resume or reset cycle. Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Alex Deucher	13d68ae651	drm/amdgpu/mes11: allocate hw_resource_1 buffer once Allocate the buffer at sw init time so we don't alloc and free it for every suspend/resume or reset cycle. Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Hawking Zhang	652e090230	drm/amdgpu: Generate cper records Encode the error information in CPER format and commit to the cper ring Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Hawking Zhang	ad97840f95	drm/amdgpu: Introduce funcs for generating cper record Introduce new functions that are used to generate cper ue or ce records. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Hawking Zhang	56316ee91b	drm/amdgpu: Include ACA error type in aca bank ACA error types managed by driver a direct 1:1 correspondence with those managed by firmware. To address this, for each ACA bank, include both the ACA error type and the ACA SMU type. This addition is useful for creating CPER records. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Candice Li	76b1f8b32d	drm/amdgpu: Optimize the enablement of GECC Enable GECC only when the default memory ECC mode or the module parameter amdgpu_ras_enable is activated. v2: Add kernel message to remind users explicitly set amdgpu_ras_enable=1 before driver loading to enable GECC and set amdgpu_ras_enable=0 to disable GECC when GECC is currently enabled if needed. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Hawking Zhang	92d5d2a09d	drm/amdgpu: Introduce funcs for populating CPER Introduce utility functions designed to assist in populating CPER records. v2: call cper_init/fini in device_ip_init/fini. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Srinivasan Shanmugam	2012aff981	drm/amdgpu: Rename VCN clock gating function for consistency Change the function name from vcn_v2_5_enable_clock_gating_inst to vcn_v2_5_enable_clock_gating to ensure consistency in naming. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:781: warning: expecting prototype for vcn_v2_5_enable_clock_gating_inst(). Prototype was for vcn_v2_5_enable_clock_gating() instead Cc: Leo Liu <leo.liu@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:28 -05:00
Alex Deucher	eda80f1c2a	drm/amdgpu/vcn4.0.3: drop dpm power helpers VCN 4.0.3 doesn't support powergating so there is no need to call these. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:25 -05:00
Alex Deucher	7780239809	drm/amdgpu/vcn5.0.1: drop dpm power helpers VCN 5.0.1 doesn't support powergating so there is no need to call these. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:18 -05:00
Alex Deucher	0487f50310	drm/amdgpu/vcn5.0.1: use correct dpm helper The VCN and UVD helpers were split in commit `ff69bba05f` ("drm/amd/pm: add inst to dpm_set_powergating_by_smu") However, this happened in parallel to the vcn 5.0.1 development so it was missed there. Fixes: `346492f30c` ("drm/amdgpu: Add VCN_5_0_1 support") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sonny Jiang <sonjiang@amd.com> Cc: Boyuan Zhang <boyuan.zhang@amd.com>	2025-02-17 14:09:11 -05:00
Alex Deucher	56763be400	drm/amdgpu/umsch: tidy up the ucode name string handling Make the constant parts of the name part of the string we pass to amdgpu_ucode_request(). Only the version number varies from IP to IP. Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lang Yu <Lang.Yu@amd.com>	2025-02-17 14:09:02 -05:00
Alex Deucher	c917e39cbd	drm/amdgpu/umsch: fix ucode check Return an error if the IP version doesn't match otherwise we end up passing a NULL string to amdgpu_ucode_request. We should never hit this in practice today since we only enable the umsch code on the supported IP versions, but add a check to be safe. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502130406.iWQ0eBug-lkp@intel.com/ Fixes: `020620424b` ("drm/amd: Use a constant format string for amdgpu_ucode_request") Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Lang Yu <Lang.Yu@amd.com>	2025-02-17 14:08:53 -05:00
Amber Lin	5183e69090	drm/amdgpu: Remove extra checks for CPX As far as the number of XCCs, the number of compute partitions, and the number of memory partitions qualify, CPX is valid. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:49 -05:00
Alex Deucher	fe652becdb	drm/amdgpu/umsch: declare umsch firmware Needed to be properly picked up for the initrd, etc. Fixes: `3488c79bea` ("drm/amdgpu: add initial support for UMSCH") Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lang Yu <Lang.Yu@amd.com>	2025-02-17 14:08:37 -05:00
Alex Deucher	80513e3897	drm/amdgpu/gfx: only call mes for enforce isolation if supported This should not be called on chips without MES so check if MES is enabled and if the cleaner shader is supported. Fixes: `8521e3c5f0` ("drm/amd/amdgpu: limit single process inside MES") Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2025-02-17 14:08:28 -05:00
Sathishkumar S	500c04d2a7	drm/amdgpu: Add ring reset callback for JPEG2_0_0 Add ring reset function callback for JPEG2_0_0 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:28 -05:00
Sathishkumar S	09e24a0b52	drm/amdgpu: Add ring reset callback for JPEG2_5_0 Add ring reset function callback for JPEG2_5_0 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:28 -05:00
Sathishkumar S	cb493aee4d	drm/amdgpu: Per-instance init func for JPEG2_5_0 Add helper functions to handle per-instance initialization and deinitialization in JPEG2_5_0. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:28 -05:00
Sathishkumar S	03399d0bff	drm/amdgpu: Add ring reset callback for JPEG3_0_0 Add ring reset function callback for JPEG3_0_0 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Sathishkumar S	74894ffc7d	drm/amdgpu: Add ring reset callback for JPEG4_0_0 Add ring reset function callback for JPEG4_0_0 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Sathishkumar S	abce7b4fc7	drm/amdgpu: Per-instance init func for JPEG4_0_3 Add helper functions to handle per-instance and per-core initialization and deinitialization in JPEG4_0_3. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Saleemkhan Jamadar	b0bebbe4ea	drm/amdgpu/umsch: remove vpe test from umsch current test is more intrusive for user queue test Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Suggested-by: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Candice Li	59af05d6a3	drm/amdgpu: Enable ACA by default for psp v13_0_12 Enable ACA by default for psp v13_0_12. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Dave Airlie	0ed1356af8	drm-misc-next for v6.15: UAPI Changes: fourcc: - Add modifiers for MediaTek tiled formats Cross-subsystem Changes: bus: - mhi: Enable image transfer via BHIe in PBL dma-buf: - Add fast-path for single-fence merging Core Changes: atomic helper: - Allow full modeset on connector changes - Clarify semantics of allow_modeset - Clarify semantics of drm_atomic_helper_check() buddy allocator: - Fix multi-root cleanup ci: - Update IGT display: - dp: Support Extendeds Wake Timeout - dp_mst: Fix RAD-to-string conversion panic: - Encode QR code according to Fido 2.2 probe helper: - Cleanups scheduler: - Cleanups ttm: - Refactor pool-allocation code - Cleanups Driver Changes: amdxdma: - Fix error handling - Cleanups ast: - Refactor detection of transmitter chips - Refactor support of VBIOS display-mode handling - astdp: Fix connection status; Filter unsupported display modes bridge: - adv7511: Report correct capabilities - it6505: Fix HDCP V compare - sn65dsi86: Fix device IDs - Cleanups i915: - Enable Extendeds Wake Timeout imagination: - Check job dependencies with DRM-sched helper ivpu: - Improve command-queue handling - Use workqueue for IRQ handling - Add suport for HW fault injection - Locking fixes - Cleanups mgag200: - Add support for G200eH5 chips msm: - dpu: Add concurrent writeback support for DPU 10.x+ nouveau: - Move drm_slave_encoder interface into driver - nvkm: Refactor GSP RPC omapdrm: - Cleanups panel: - Convert several panels to multi-style functions to improve error handling - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3, LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006, Lenovo T14s Gen6 Snapdragon - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay kd110n11-51ie, Starry 2082109qfh040022-50e panthor: - Expose sizes of intenral BOs via fdinfo - Fix race between reset and suspend - Cleanups qaic: - Add support for AIC200 - Cleanups renesas: - Fix limits in DT bindings rockchip: - rk3576: Add HDMI support - vop2: Add new display modes on RK3588 HDMI0 up to 4K - Don't change HDMI reference clock rate - Fix DT bindings solomon: - Set SPI device table to silence warnings - Fix pixel and scanline encoding v3d: - Cleanups vc4: - Use drm_exec - Use dma-resv for wait-BO ioctl - Remove seqno infrastructure virtgpu: - Support partial mappings of GEM objects - Reserve VGA resources during initialization - Fix UAF in virtgpu_dma_buf_free_obj() - Add panic support vkms: - Switch to a managed modesetting pipeline - Add support for ARGB8888 xlnx: - Set correct DMA segment size - Fix error handling - Fix docs -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmesYp4ACgkQaA3BHVML eiP0agf/ZsUXdVdJ3HMI3vAMbiUnW8ThQZx0M3Zkpf3EOCKgdibbjNkhgTzsp3b/ GqcP/pmqwmjCUgVADVDVJvRTD1rSadcxCYYTwI9fsjnXWmib9l4sjI3jPRJkIekj a5eeCFnMEKzfaxhFPJ/nu+esMZ19ZRE1iSO7WC7YC0SSR6zYP/QaNkGti8QA5TpL oBHd4jctoxXw9sw/ACEwzsSRdxL/opXQ5KbFJFmr2Q9/osaTUX7QtslDhnh+2lVK ZGxGdFIvFYKB+QrXpfjJU51Q01gqp/PWS+YIDrpr7bLC5YeFQVe3FDjqYIXSoE2t dFYtU2mSS/rwijcRc64sghXCdbKS1w== =iJGz -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-02-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.15: UAPI Changes: fourcc: - Add modifiers for MediaTek tiled formats Cross-subsystem Changes: bus: - mhi: Enable image transfer via BHIe in PBL dma-buf: - Add fast-path for single-fence merging Core Changes: atomic helper: - Allow full modeset on connector changes - Clarify semantics of allow_modeset - Clarify semantics of drm_atomic_helper_check() buddy allocator: - Fix multi-root cleanup ci: - Update IGT display: - dp: Support Extendeds Wake Timeout - dp_mst: Fix RAD-to-string conversion panic: - Encode QR code according to Fido 2.2 probe helper: - Cleanups scheduler: - Cleanups ttm: - Refactor pool-allocation code - Cleanups Driver Changes: amdxdma: - Fix error handling - Cleanups ast: - Refactor detection of transmitter chips - Refactor support of VBIOS display-mode handling - astdp: Fix connection status; Filter unsupported display modes bridge: - adv7511: Report correct capabilities - it6505: Fix HDCP V compare - sn65dsi86: Fix device IDs - Cleanups i915: - Enable Extendeds Wake Timeout imagination: - Check job dependencies with DRM-sched helper ivpu: - Improve command-queue handling - Use workqueue for IRQ handling - Add suport for HW fault injection - Locking fixes - Cleanups mgag200: - Add support for G200eH5 chips msm: - dpu: Add concurrent writeback support for DPU 10.x+ nouveau: - Move drm_slave_encoder interface into driver - nvkm: Refactor GSP RPC omapdrm: - Cleanups panel: - Convert several panels to multi-style functions to improve error handling - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3, LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006, Lenovo T14s Gen6 Snapdragon - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay kd110n11-51ie, Starry 2082109qfh040022-50e panthor: - Expose sizes of intenral BOs via fdinfo - Fix race between reset and suspend - Cleanups qaic: - Add support for AIC200 - Cleanups renesas: - Fix limits in DT bindings rockchip: - rk3576: Add HDMI support - vop2: Add new display modes on RK3588 HDMI0 up to 4K - Don't change HDMI reference clock rate - Fix DT bindings solomon: - Set SPI device table to silence warnings - Fix pixel and scanline encoding v3d: - Cleanups vc4: - Use drm_exec - Use dma-resv for wait-BO ioctl - Remove seqno infrastructure virtgpu: - Support partial mappings of GEM objects - Reserve VGA resources during initialization - Fix UAF in virtgpu_dma_buf_free_obj() - Add panic support vkms: - Switch to a managed modesetting pipeline - Add support for ARGB8888 xlnx: - Set correct DMA segment size - Fix error handling - Fix docs Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250212090625.GA24865@linux.fritz.box	2025-02-14 10:24:02 +10:00
André Almeida	6fe52b63f5	drm/amdgpu: Use device wedged event Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the reset didn't succeed. Acked-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250204070528.1919158-6-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-02-13 12:15:44 -05:00
Alex Deucher	7845438718	drm/amdgpu/mes: Add cleaner shader fence address handling in MES for GFX12 This commit introduces enhancements to the handling of the cleaner shader fence in the AMDGPU MES driver: - The MES (Microcode Execution Scheduler) now sends a PM4 packet to the KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring that requests are handled in a controlled manner and avoiding the race conditions. - The CP (Compute Processor) firmware has been updated to use a private bus for accessing specific registers, avoiding unnecessary operations that could lead to issues in VF (Virtual Function) mode. - The cleaner shader fence memory address is now set correctly in the `mes_set_hw_res_pkt` structure, allowing for proper synchronization of the cleaner shader execution. Cc: Christian König <christian.koenig@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:06:37 -05:00
Xiaogang Chen	10e08943ca	drm/amdkfd: Fix pasid value leak Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values since the values are allocated by graphic driver, not kfd driver anymore. This patch does not stop graphic driver release pasid values. Fixes: `8544374c0f` ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver") Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:50 -05:00
Saleemkhan Jamadar	4b9a3117bb	drm/amdgpu/vcn: enable TMZ support for vcn 4_0_5 TMZ support is enabled for vcn on GC IP 11_5_0 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:50 -05:00
Srinivasan Shanmugam	be2560e4b8	drm/amdgpu/mes: Add cleaner shader fence address handling in MES for GFX11 This commit introduces enhancements to the handling of the cleaner shader fence in the AMDGPU MES driver: - The MES (Microcode Execution Scheduler) now sends a PM4 packet to the KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring that requests are handled in a controlled manner and avoiding the race conditions. - The CP (Compute Processor) firmware has been updated to use a private bus for accessing specific registers, avoiding unnecessary operations that could lead to issues in VF (Virtual Function) mode. - The cleaner shader fence memory address is now set correctly in the `mes_set_hw_res_pkt` structure, allowing for proper synchronization of the cleaner shader execution. Cc: lin cao <lin.cao@amd.com> Cc: Jingwen Chen <Jingwen.Chen2@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Ying Li	16a5a8fe6f	drm/amd/amdgpu: add support for IP version 11.5.2 This initializes drm/amd/amdgpu version 11.5.2 Signed-off-by: YING LI <yingli12@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Philip Yang	23b645231e	drm/amdgpu: Unlocked unmap only clear page table leaves SVM migration unmap pages from GPU and then update mapping to GPU to recover page fault. Currently unmap clears the PDE entry for range length >= huge page and free PTB bo, update mapping to alloc new PT bo. There is race bug that the freed entry bo maybe still on the pt_free list, reused when updating mapping and then freed, leave invalid PDE entry and cause GPU page fault. By setting the update to clear only one PDE entry or clear PTB, to avoid unmap to free PTE bo. This fixes the race bug and improve the unmap and map to GPU performance. Update mapping to huge page will still free the PTB bo. With this change, the vm->pt_freed list and work is not needed. Add WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the PTB. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Alex Deucher	1350dd3691	drm/amdgpu/mes11: fix set_hw_resources_1 calculation It's GPU page size not CPU page size. In most cases they are the same, but not always. This can lead to overallocation on systems with larger pages. Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Alex Deucher	ebc25499de	drm/amdgpu/vcn2.5: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Harish Kasiviswanathan	3394b1f76d	drm/amdgpu: Set snoop bit for SDMA for MI series SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for this to happen. v2: Missed a few mmhub_v9_4. Added now. v3: Calculate hub offset once since it doesn't change inside the loop Modified function names based on review comments. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Tim Huang	76e0410fe0	drm/amdgpu: add discovery support for DCN IP version 3.6.0 Add discovery entry for DCN IP version 3.6.0. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:09 -05:00
Jiang Liu	e92f3f94ca	drm/amdgpu: reset psp->cmd to NULL after releasing the buffer Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini(). Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Asad Kamal	aafe181f7d	drm/amdgpu: Add flags to distinguish vf/pf/pt mode Add extra flag definition for ids_flag field to distinguish between vf/pf/pt modes v2: Updated kms driver minor version & removed pf check as default is 0 v3: Fix up version (Alex) v4: rebase (Alex) Proposed userspace: `e663bed7d6` Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Alex Deucher	759e764f7d	drm/amdkfd: use GTT for VRAM on APUs only if GTT is larger If the user has configured a large carveout on a small APU, only use GTT for VRAM allocations if GTT is larger than VRAM. v2: fix reversed check (Philip) Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Alex Deucher	8b0d068e7d	drm/amdkfd: add a new flag to manage where VRAM allocations go On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if someone sets the carve out size to be relatively large, we may end up using GTT rather than VRAM. No change of logic with this patch, but it allows the driver to determine which logic to use based on the carve out size in the future. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	cc0e91a755	drm/amdgpu: Make VBIOS image read optional Keep VBIOS image read optional for select SOCs in passthrough mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	6e8ca38ebc	drm/amdgpu: Add flag to make VBIOS read optional Certain SOCs may not need much data from VBIOS. Some data like VBIOS version used will be missed but it doesn't affect functionality. Add a flag to make VBIOS image optional. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	7e0aa70681	drm/amdgpu: Add VBIOS flags Instead of read_bios, use get_bios_flags to get various options around reading VBIOS. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	e986e89659	drm/amdgpu: Add wrapper for freeing vbios memory Use bios_release wrapper to release memory allocated for vbios image and reset the variables. v2: Use the same wrapper for clean up in sw_fini (Alex Deucher) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Alex Deucher	15f00b073c	drm/amdgpu/gfx9: use amdgpu_gfx_off_ctrl_immediate() for PG Use amdgpu_gfx_off_ctrl_immediate() when powergating. There's no need for the delay in gfx off allow. The powergating is dynamically disabled/enabled as for RV/PCO on compute queues and allowing gfx off again as soon the job is submitted improves power savings. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:07 -05:00
Alex Deucher	250d9769ee	drm/amdgpu/gfx: add amdgpu_gfx_off_ctrl_immediate() Same as amdgpu_gfx_off_ctrl(), but without the delay for gfxoff disallow. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:07 -05:00
Lijo Lazar	a5219b41dd	drm/amdgpu: Clean up atom header file inclusion atom bios header files are not required in these files. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:06 -05:00
Alex Deucher	abab978127	drm/amdgpu/sdma4: drop gfxoff calls in dump ip state SDMA 4.x is not part of the GFX power domain so this is not necessary. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:06 -05:00
Sathishkumar S	64dc2f0029	drm/amdgpu: Enable devcoredump for JPEG5_0_0 Add register list and enable devcoredump for JPEG5_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	8ecd4ec6a5	drm/amdgpu: Enable devcoredump for JPEG2_5_0 Add register list and enable devcoredump for JPEG2_5_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	63d5f8db53	drm/amdgpu: Enable devcoredump for JPEG2_0_0 Add register list and enable devcoredump for JPEG2_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	d949e91b42	drm/amdgpu: Enable devcoredump for JPEG3_0_0 Add register list and enable devcoredump for JPEG3_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	2b0ccf3923	drm/amdgpu: Enable devcoredump for JPEG4_0_5 Add register list and enable devcoredump for JPEG4_0_5 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	c3dddd6029	drm/amdgpu: Enable devcoredump for JPEG4_0_0 Add register list and enable devcoredump for JPEG4_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	358b3774a0	drm/amdgpu: Enable devcoredump for JPEG5_0_1 Add register list and enable devcoredump for JPEG5_0_1 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	08527cb534	drm/amdgpu: Enable devcoredump for JPEG4_0_3 Add register list and enable devcoredump for JPEG4_0_3 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Sathishkumar S	df996b5eff	drm/amdgpu: Add helper funcs for jpeg devcoredump Add devcoredump helper functions that can be reused for all jpeg versions. V2: (Lijo) - add amdgpu_jpeg_reg_dump_init() and amdgpu_jpeg_reg_dump_fini() - use reg_list and reg_count from init() to dump and print registers - memory allocation and freeing is moved to the init() and fini() V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Shiwu Zhang	b3dd2903b0	drm/amdgpu: Enable IFWI update support with PSPv13.0.12 Make psp_vbflash_status and psp_vbflash available in sysfs Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	5caea7a589	drm/amdgpu: Add support for smuio 13.0.11 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	37971df806	drm/amdgpu: Add support for nbio 7.9.1 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	a03f5f8d56	drm/amdgpu: Add support for smu 13.0.12 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	05fd502e04	drm/amdgpu: Add support for umc 12.5.0/mmhub 1.8.1 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Sathishkumar S	580dac7437	drm/amdgpu: Add a func for core specific reg offset Add an inline function to calculate core specific register offsets for JPEG v4.0.3 and reuse it, makes code more readable and easier to align. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	2f9a32b589	drm/amdgpu: Clean up IP version checks in gmcv9.0 Clean up some IP version checks in gmcv9.0 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	a52e6cb06b	drm/amdgpu: Clean up GFX v9.4.3 IP version checks Remove unnecessary IP version checks for GFX 9.4.3 and similar variants. Wrap checks inside meaningful function. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	a01e934242	drm/amdgpu: Use version to figure out harvest info IP tables with version <=2 may use harvest bit. For version 3 and above, harvest bit is not applicable, instead uses harvest table. Fix the logic accordingly. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	31f9ed5882	drm/amdgpu: Pass IP instance/hwid as parameters Use IP instance number and hwid as function args for validation checks. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Srinivasan Shanmugam	17585c07c2	drm/amdgpu/gfx10: Enable cleaner shader for GFX10.1.1/10.1.2 GPUs Enable the cleaner shader for GFX10.1.1/10.1.2 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX10.1.1/10.1.2 GPUs, previously available for GFX10.1.10. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Alex Deucher	e818635a31	drm/amdgpu: update and cleanup PM4 headers Consolidate PM4 definitions. Most of these were previously only defined in UMDs. Add them here as well and sync with latest packets. Also no need to include soc15d.h on gfx10+. Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Suggested-by: Saurabh Verma <saurabh.verma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Srinivasan Shanmugam	25961bad92	drm/amdgpu/gfx10: Add cleaner shader for GFX10.1.10 This commit adds the cleaner shader microcode for GFX10.1.0 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_10_1_0_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:59 -05:00
Victor Skvortsov	0489339776	drm/amdgpu: Skip err_count sysfs creation on VF unsupported RAS blocks VFs are not able to query error counts for all RAS blocks. Rather than returning error for queries on these blocks, skip sysfs the creation all together. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:59 -05:00
Hawking Zhang	16b85a0942	drm/amdgpu: Update usage for bad page threshold The driver's behavior varies based on the configuration of amdgpu_bad_page_threshold setting Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:59 -05:00
Mario Limonciello	50e30e3a0e	drm/amd: Mark amdgpu.gttsize parameter as deprecated and show warnings on use When not set `gttsize` module parameter by default will get the value to use for the GTT pool from the TTM page limit, which is set by a separate module parameter. This inevitably leads to people not sure which one to set when they want more addressable memory for the GPU, and you'll end up seeing instructions online saying to set both. Add some messages to try to guide people both who are using or misusing the parameters and mark the parameter as deprecated with the plan to drop it after the next LTS kernel release. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:58 -05:00
Jiang Liu	38e8ca3e4b	amdgpu/soc15: enable asic reset for dGPU in case of suspend abort When GPU suspend is aborted, do the same for dGPU as APU to reset soc15 asic. Otherwise it may cause following errors: [ 547.229463] amdgpu 0001:81:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_0.2.1.0 test failed (-110) [ 555.126827] amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_0.2.1.0 test failed (-110) [ 555.126901] [drm:amdgpu_gfx_enable_kcq [amdgpu]] ERROR KCQ enable failed [ 555.126957] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] ERROR resume of IP block <gfx_v9_4_3> failed -110 [ 555.126959] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). [ 555.126965] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -110 [ 555.126966] PM: Device 0000:0a:00.0 failed to resume async: error -110 This fix has been tested on Mi308X. Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Tested-by: Shuo Liu <shuox.liu@linux.alibaba.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/2462b4b12eb9d025e82525178d568cbaa4c223ff.1736739303.git.gerry@linux.alibaba.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:58 -05:00
Jesse.zhang@amd.com	30f7f53a5b	drm/amdgpu/gfx10: implement gfx queue reset via MMIO Using mmio to do queue reset v2: Alignment the function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:57 -05:00
Jesse.zhang@amd.com	ffdd7a7b28	drm/amdgpu/gfx10: implement queue reset via MMIO Using mmio to do queue reset. v2: Alignment this function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:57 -05:00
Lijo Lazar	f7a594e405	drm/amdgpu: Use active umc info from discovery There could be configs where some UMC instances are harvested. This information is obtained through discovery data and populated in umc.active_mask. Avoid reassigning this as AID mask, instead use the mask directly while iterating through umc instances. This is to avoid accesses to harvested UMC instances. v2: fix warning (Alex) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Amber Lin	46d0436a3e	drm/amdgpu: Set noretry default for GC 9.5.0 Set GC 9.5.0 noretry default as 1 for better performance. It can be changed by the administrator using amdgpu.noretry=0 or by the user using HSA_XNACK=1 environment variable. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviwanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Le Ma	23cb207751	drm/amdgpu: read harvest info from harvest table for gfx950 Harvest table is applied for gfx950. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Shiwu Zhang	667b96134c	drm/amdgpu: enlarge the VBIOS binary size limit Some chips have a larger VBIOS file so raise the size limit to support the flashing tool. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	5f95a15495	drm/amdgpu: add dynamic workload profile switching for gfx12 Enable dynamic workload profile switching for gfx12. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	963537ca23	drm/amdgpu: add dynamic workload profile switching for gfx11 Enable dynamic workload profile switching for gfx11. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	b9467983b7	drm/amdgpu: add dynamic workload profile switching for gfx10 Enable dynamic workload profile switching for gfx10. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	8fdb3958e3	drm/amdgpu/gfx: add ring helpers for setting workload profile Add helpers to switch the workload profile dynamically when commands are submitted. This allows us to switch to the FULLSCREEN3D or COMPUTE profile when work is submitted. Add a delayed work handler to delay switching out of the selected profile if additional work comes in. This works the same as the VIDEO profile for VCN. This lets dynamically enable workload profiles on the fly and then move back to the default when there is no work. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Xiaogang Chen	8544374c0f	drm/amdkfd: Have kfd driver use same PASID values from graphic driver Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid values that graphic driver generated which is per vm per pasid. These pasid values are passed to fw/hardware. We do not need change interrupt handler though more pasid values are used. Also, pasid values at log are replaced by user process pid; pasid values are not exposed to user. Users see their process pids that have meaning in user space. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Lijo Lazar	ca44922107	drm/amdgpu: Check RRMT status for JPEG v4.0.3 RRMT could get dynamically enabled/disabled by PSP firmware. Read the status from register for reading RRMT status. For VFs, this is not accessible, hence assume that it's always disabled for now. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Lijo Lazar	485380f7fe	drm/amdgpu: Check RRMT status for VCN v4.0.3 RRMT could get dynamically enabled/disabled by PSP firmware. Read the status from register for reading RRMT status. For VFs, this is not accessible, hence assume that it's always disabled for now. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	e55565f880	drm/amdgpu: add support for PSP IP version 14.0.5 This initializes PSP IP version 14.0.5. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	e7704d7c72	drm/amdgpu: add support for SMU IP version 14.0.5 This initializes SMU IP version 14.0.5. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	6d437d5203	drm/amdgpu: enable VCN/JPEG CGPG for GC IP version 11.5.3 Enable VCN/JPEG CGPG for ASIC with GFX version 11.5.3. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	6bde08d317	drm/amdgpu: add support for MMHUB IP version 3.3.2 This initializes MMHUB IP version 3.3.2. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	e659c9eb87	drm/amdgpu: add support for NBIO IP version 7.11.2 This initializes NBIO IP version 7.11.2. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	b2e5a04147	drm/amdgpu: add support for SDMA IP version 6.1.3 This initializes SDMA IP version 6.1.3. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	b784faeba2	drm/amdgpu: add support for GC IP version 11.5.3 This initializes GC IP version 11.5.3. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Alex Deucher	20f48be63d	drm/amdgpu: add OEM i2c bus for polaris chips It uses the VGADCC bus. DC doesn't use this bus, so it is safe to add it here. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Alex Deucher	1c0b144bf7	drm/amdgpu: rework i2c init and fini No functional change. Rework the code to allow for adding some additional i2c buses in conjunction with DC in the future. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Alex Deucher	ba7f8eb7e4	drm/amdgpu/atombios: drop empty function This was leftover from when amdgpu was forked from radeon. The function is empty so drop it. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Alex Deucher	b217105acb	drm/amd/display/dm: handle OEM i2c buses in i2c functions Allow the creation of an OEM i2c bus and use the proper DC helpers for that case. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Sathishkumar S	8064ca6e93	drm/amdgpu: increase amdgpu max rings limit increase max rings to 132 to support all JPEG5_0_1 cores, else ring_init fails due to ring count exceeding maximum limit. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Jiang Liu	a0a455b4bc	drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode() In function psp_init_cap_microcode(), it should bail out when failed to load firmware, otherwise it may cause invalid memory access. Fixes: `07dbfc6b10` ("drm/amd: Use `amdgpu_ucode_*` helpers for PSP") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 19:47:15 -05:00
Alex Deucher	55ed2b1b50	drm/amdgpu: bump version for RV/PCO compute fix Bump the driver version for RV/PCO compute stability fix so mesa can use this check to enable compute queues on RV/PCO. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-02-12 19:47:15 -05:00
Alex Deucher	b35eb9128e	drm/amdgpu/gfx9: manually control gfxoff for CS on RV When mesa started using compute queues more often we started seeing additional hangs with compute queues. Disabling gfxoff seems to mitigate that. Manually control gfxoff and gfx pg with command submissions to avoid any issues related to gfxoff. KFD already does the same thing for these chips. v2: limit to compute v3: limit to APUs v4: limit to Raven/PCO v5: only update the compute ring_funcs v6: Disable GFX PG v7: adjust order Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Suggested-by: Sergey Kovalenko <seryoga.engineering@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Link: https://lists.freedesktop.org/archives/amd-gfx/2025-January/119116.html Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-02-12 19:47:01 -05:00
Philipp Stanner	796a9f55a8	drm/sched: Use struct for drm_sched_init() params drm_sched_init() has a great many parameters and upcoming new functionality for the scheduler might add even more. Generally, the great number of parameters reduces readability and has already caused one missnaming, addressed in: commit `6f1cacf4eb` ("drm/nouveau: Improve variable name in nouveau_sched_init()"). Introduce a new struct for the scheduler init parameters and port all users. Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Acked-by: Matthew Brost <matthew.brost@intel.com> # for Xe Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> # for Panfrost and Panthor Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> # for Etnaviv Reviewed-by: Frank Binns <frank.binns@imgtec.com> # for Imagination Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> # for Sched Reviewed-by: Maíra Canal <mcanal@igalia.com> # for v3d Reviewed-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> # for amdxdna Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250211111422.21235-2-phasta@kernel.org	2025-02-12 11:59:52 +01:00
Maxime Ripard	93c7dd1b39	Merge drm/drm-next into drm-misc-next Bring rc1 to start the new release dev. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-02-06 13:47:32 +01:00
Marek Olšák	2255b40cac	drm/amdgpu: add a BO metadata flag to disable write compression for Vulkan Vulkan can't support DCC and Z/S compression on GFX12 without WRITE_COMPRESS_DISABLE in this commit or a completely different DCC interface. AMDGPU_TILING_GFX12_SCANOUT is added because it's already used by userspace. Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-03 12:11:36 -05:00
Kenneth Feng	5cda56bd86	drm/amd/amdgpu: change the config of cgcg on gfx12 change the config of cgcg on gfx12 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-01-28 16:22:39 -05:00
Shaoyun Liu	335acfb64e	drm/amd/amdgpu: Enable scratch data dump for mes 12 MES internal will check CP_MES_MSCRATCH_LO/HI register to set scratch data location during ucode start, driver side need to start the MES one by one with different setting for each pipe Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:13 -05:00
Mario Limonciello	7e4cb7dea2	drm/amd: Clarify kdoc for amdgpu.gttsize Effectively amdgpu.gttsize gets set to ~1/2 of RAM, but that's controlled by what the TTM page limit is set to. Clarify the kdoc. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:08 -05:00
Srinivasan Shanmugam	dc915275ea	drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation If the parent is NULL, adev->pdev is used to retrieve the PCIe speed and width, ensuring that the function can still determine these capabilities from the device itself. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6193 amdgpu_device_gpu_bandwidth() error: we previously assumed 'parent' could be null (see line 6180) drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 6170 static void amdgpu_device_gpu_bandwidth(struct amdgpu_device adev, 6171 enum pci_bus_speed speed, 6172 enum pcie_link_width width) 6173 { 6174 struct pci_dev parent = adev->pdev; 6175 6176 if (!speed \|\| !width) 6177 return; 6178 6179 parent = pci_upstream_bridge(parent); 6180 if (parent && parent->vendor == PCI_VENDOR_ID_ATI) { ^^^^^^ If parent is NULL 6181 /* use the upstream/downstream switches internal to dGPU / 6182 speed = pcie_get_speed_cap(parent); 6183 width = pcie_get_width_cap(parent); 6184 while ((parent = pci_upstream_bridge(parent))) { 6185 if (parent->vendor == PCI_VENDOR_ID_ATI) { 6186 / use the upstream/downstream switches internal to dGPU / 6187 speed = pcie_get_speed_cap(parent); 6188 width = pcie_get_width_cap(parent); 6189 } 6190 } 6191 } else { 6192 / use the device itself / --> 6193 speed = pcie_get_speed_cap(parent); ^^^^^^ Then we are toasted here. 6194 *width = pcie_get_width_cap(parent); 6195 } 6196 } Fixes: `757e8b951c` ("drm/amdgpu: cache gpu pcie link width") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:55:26 -05:00
Lin.Cao	b529093999	drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment commit `26c95e838e` ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") set job->vm as NULL if there is no fence. It will cause emit switch buffer be skippen if job->vm set as NULL. Check job rather than vm could solve this problem. Fixes: `26c95e838e` ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:55:04 -05:00
Alex Deucher	64314e3f9c	drm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTL Combine the platform and GPU caps like we do for PCIe Gen. This aligns properly with expectations and documentation for the interface. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:53:30 -05:00
Alex Deucher	757e8b951c	drm/amdgpu: cache gpu pcie link width Get the PCIe link with of the device itself (or it's integrated upstream bridge) and cache that. v2: fix typo Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:53:24 -05:00
Lijo Lazar	a0db1ea0dd	drm/amdgpu: Refine ip detection log message 'add ip block' causes a confusion if the blocks are disabled later with ip_block_mask. Instead change to 'detected' and also add device context. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:52:58 -05:00
Lijo Lazar	b1df8050e7	drm/amdgpu: Add handler for SDMA context empty Context empty interrupt is enabled for SDMA 4.4.2. Add a handler for context empty interrupt so that it is disposed of fast, and not propagated to KFD layer. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:52:43 -05:00
Srinivasan Shanmugam	19b7f7c721	drm/amdgpu/gfx12: Add Cleaner Shader Support for GFX12.0 GPUs This commit enables the cleaner shader feature for GFX12.0 and GFX12.0.1 GPUs. The cleaner shader is important for clearing GPU resources such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs) between workloads. - This feature ensures that GPU resources are reset between workloads, preventing data leaks and ensuring accurate computation. By enabling the cleaner shader, this update enhances the security and reliability of GPU operations on GFX12.0 hardware. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Kent Russell	f2935a3019	drm/amdgpu: Mark debug KFD module params as unsafe Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Gui Chengming	62952a38d9	drm/amdgpu: fix fw attestation for MP0_14_0_{2/3} FW attestation was disabled on MP0_14_0_{2/3}. V2: Move check into is_fw_attestation_support func. (Frank) Remove DRM_WARN log info. (Alex) Fix format. (Christian) Signed-off-by: Gui Chengming <Jack.Gui@amd.com> Reviewed-by: Frank.Min <Frank.Min@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Christian König	def59436fb	drm/amdgpu: always sync the GFX pipe on ctx switch That is needed to enforce isolation between contexts. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Christian König	177b76a8d8	drm/amdgpu: mark a bunch of module parameters unsafe We sometimes have people trying to use debugging options in production environments. Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Tvrtko Ursulin	e996127ec1	drm/amdgpu: Use DRM scheduler API in amdgpu_xcp_release_sched Lets use the existing helper instead of peeking into the structure directly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Kenneth Feng	2affe2bbc9	drm/amdgpu: disable gfxoff with the compute workload on gfx12 Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Srinivasan Shanmugam	0b6b2dd383	drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] * DEADLOCK * [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: `afefd6f245` ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 10:38:33 -05:00
Dave Airlie	c3d590f8ba	amd-drm-next-6.14-2025-01-10: amdgpu: - Fix max surface handling in DC - clang fixes - DCN 3.5 fixes - DCN 4.0.1 fixes - DC CRC fixes - DML updates - DSC fixes - PSR fixes - DC add some divide by 0 checks - SMU13 updates - SR-IOV fixes - RAS fixes - Cleaner shader support for gfx10.3 dGPUs - fix drm buddy trim handling - SDMA engine reset updates _ Fix RB bitmap setup - Fix doorbell ttm cleanup - Add CEC notifier support - DPIA updates - MST fixes amdkfd: - Shader debugger fixes - Trap handler cleanup - Cleanup includes - Eviction fence wq fix -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ4FXWgAKCRC93/aFa7yZ 2MudAQCzmzUNF9W29JOcset09IcS24Xe5liXrJWzHIPaHhQ25QD/ZU4JHb1947/8 EnS3P7vraGPuCCet2aKmiWgtay7zggE= =5nDZ -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.14-2025-01-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.14-2025-01-10: amdgpu: - Fix max surface handling in DC - clang fixes - DCN 3.5 fixes - DCN 4.0.1 fixes - DC CRC fixes - DML updates - DSC fixes - PSR fixes - DC add some divide by 0 checks - SMU13 updates - SR-IOV fixes - RAS fixes - Cleaner shader support for gfx10.3 dGPUs - fix drm buddy trim handling - SDMA engine reset updates _ Fix RB bitmap setup - Fix doorbell ttm cleanup - Add CEC notifier support - DPIA updates - MST fixes amdkfd: - Shader debugger fixes - Trap handler cleanup - Cleanup includes - Eviction fence wq fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110172731.2960668-1-alexander.deucher@amd.com	2025-01-13 11:13:13 +10:00
Victor Zhao	85b73415fd	drm/amdgpu: fill the ucode bo during psp resume for SRIOV refill the ucode bo during psp resume for SRIOV, otherwise ucode load will fail after VM hibernation and fb clean. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:57 -05:00
Srinivasan Shanmugam	9814626751	drm/amdgpu/gfx10: Enable cleaner shader for GFX10.3.2/10.3.4/10.3.5 GPUs Enable the cleaner shader for GFX10.3.2/10.3.4/10.3.5 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX10.3.2/10.3.4/10.3.5 GPUs, previously available for GFX10.3.0. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:57 -05:00
Jonathan Kim	86bde64cb7	drm/amdgpu: fix gpu recovery disable with per queue reset Per queue reset should be bypassed when gpu recovery is disabled with module parameter. Fixes: `ee0a469cf9` ("drm/amdkfd: support per-queue reset on gfx9") Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:57 -05:00
Jiang Liu	edec9b0690	drm/amdgpu: wrong array index to get ip block for PSP The adev->ip_blocks array is not indexed by AMD_IP_BLOCK_TYPE_xxx, instead we should call amdgpu_device_ip_get_ip_block() to get the corresponding IP block oject. Fix some checkpatch issues (Alex) Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Jiang Liu	60a2c0c12b	drm/amdgpu: tear down ttm range manager for doorbell in amdgpu_ttm_fini() Tear down ttm range manager for doorbell in function amdgpu_ttm_fini(), to avoid memory leakage. Fixes: `792b84fb90` ("drm/amdgpu: initialize ttm for doorbells") Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Tim Huang	6b34d0328b	drm/amdgpu: fix incorrect number of active RBs for gfx12 The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Tim Huang	4a60c55b3b	drm/amdgpu: fix incorrect active RB bitmap in setup RBs The RB bitmap width per SA may be 0x1 for some ASICs. Use the actual bitmap of SA instead of 0x3 to determine the active RB bitmap. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Dan Carpenter	6ec6cd9acb	drm/amdgpu: Fix shift type in amdgpu_debugfs_sdma_sched_mask_set() The "mask" and "val" variables are type u64. The problem is that the BIT() macros are type unsigned long which is just 32 bits on 32bit systems. It's unlikely that people will be using this driver on 32bit kernels and even if they did we only use the lower AMDGPU_MAX_SDMA_INSTANCES (16) bits. So this bug does not affect anything in real life. Still, for correctness sake, u64 bit masks should use BIT_ULL(). Fixes: `d2e3961ae3` ("drm/amdgpu: add amdgpu_sdma_sched_mask debugfs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/d39a9325-87a4-4543-b6ec-1c61fca3a6fc@stanley.mountain Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Jesse Zhang	f7e672e6f8	drm/amdgpu: enable gfx12 queue reset flag Enable the kgq and kcq queue reset flag Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:47 -05:00
Jesse Zhang	39b0fa29f6	drm/amdgpu/sdma4.4.2: add apu support in sdma queue reset Remove apu check in sdma queue reset. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:01:29 -05:00
Dave Airlie	0739b8ba82	drm-misc-next for 6.14: UAPI Changes: - Clarify drm memory stats documentation Cross-subsystem Changes: Core Changes: - sched: Documentation fixes, Driver Changes: - amdgpu: Track BO memory stats at runtime - amdxdna: Various fixes - hisilicon: New HIBMC driver - bridges: - Provide default implementation of atomic_check for HDMI bridges - it605: HDCP improvements, MCCS Support -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZ3uZUQAKCRAnX84Zoj2+ dgM8AX4ur9y2eXLVQPS2IhCouWpFsYgSRnCysdVG43vszZ2kcObvlj4UV8nrrv7j W6x9FZYBfRLm6ctAUnu05ppm3zSbSdmsocadu1mfoDbShy31Pc5xklnB1u6M3Asw 3mOtO5dxkA== =QZ6P -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-01-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.14: UAPI Changes: - Clarify drm memory stats documentation Cross-subsystem Changes: Core Changes: - sched: Documentation fixes, Driver Changes: - amdgpu: Track BO memory stats at runtime - amdxdna: Various fixes - hisilicon: New HIBMC driver - bridges: - Provide default implementation of atomic_check for HDMI bridges - it605: HDCP improvements, MCCS Support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106-augmented-kakapo-of-action-0cf000@houat	2025-01-09 15:48:50 +10:00
Dmitry Baryshkov	26d6fd8191	drm/connector: make mode_valid take a const struct drm_display_mode The mode_valid() callbacks of drm_encoder, drm_crtc and drm_bridge take a const struct drm_display_mode argument. Change the mode_valid callback of drm_connector to also take a const argument. Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Raphael Gallais-Pou <rgallaispou@gmail.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241214-drm-connector-mode-valid-const-v2-5-4f9498a4c822@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	2025-01-07 12:45:19 +02:00
Kent Russell	6c9c97387b	drm/amdgpu: Remove unnecessary NULL check container_of cannot return NULL, so it is unnecessary to check for NULL after gem_to_amdgpu_bo, which is just a container_of call Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:29 -05:00
Arunpravin Paneer Selvam	3318ba94e5	drm/amdgpu: Add a lock when accessing the buddy trim function When running YouTube videos and Steam games simultaneously, the tester found a system hang / race condition issue with the multi-display configuration setting. Adding a lock to the buddy allocator's trim function would be the solution. <log snip> [ 7197.250436] general protection fault, probably for non-canonical address 0xdead000000000108 [ 7197.250447] RIP: 0010:__alloc_range+0x8b/0x340 [amddrm_buddy] [ 7197.250470] Call Trace: [ 7197.250472] <TASK> [ 7197.250475] ? show_regs+0x6d/0x80 [ 7197.250481] ? die_addr+0x37/0xa0 [ 7197.250483] ? exc_general_protection+0x1db/0x480 [ 7197.250488] ? drm_suballoc_new+0x13c/0x93d [drm_suballoc_helper] [ 7197.250493] ? asm_exc_general_protection+0x27/0x30 [ 7197.250498] ? __alloc_range+0x8b/0x340 [amddrm_buddy] [ 7197.250501] ? __alloc_range+0x109/0x340 [amddrm_buddy] [ 7197.250506] amddrm_buddy_block_trim+0x1b5/0x260 [amddrm_buddy] [ 7197.250511] amdgpu_vram_mgr_new+0x4f5/0x590 [amdgpu] [ 7197.250682] amdttm_resource_alloc+0x46/0xb0 [amdttm] [ 7197.250689] ttm_bo_alloc_resource+0xe4/0x370 [amdttm] [ 7197.250696] amdttm_bo_validate+0x9d/0x180 [amdttm] [ 7197.250701] amdgpu_bo_pin+0x15a/0x2f0 [amdgpu] [ 7197.250831] amdgpu_dm_plane_helper_prepare_fb+0xb2/0x360 [amdgpu] [ 7197.251025] ? try_wait_for_completion+0x59/0x70 [ 7197.251030] drm_atomic_helper_prepare_planes.part.0+0x2f/0x1e0 [ 7197.251035] drm_atomic_helper_prepare_planes+0x5d/0x70 [ 7197.251037] drm_atomic_helper_commit+0x84/0x160 [ 7197.251040] drm_atomic_nonblocking_commit+0x59/0x70 [ 7197.251043] drm_mode_atomic_ioctl+0x720/0x850 [ 7197.251047] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [ 7197.251049] drm_ioctl_kernel+0xb9/0x120 [ 7197.251053] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7197.251056] drm_ioctl+0x2d4/0x550 [ 7197.251058] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [ 7197.251063] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [ 7197.251186] __x64_sys_ioctl+0xa0/0xf0 [ 7197.251190] x64_sys_call+0x143b/0x25c0 [ 7197.251193] do_syscall_64+0x7f/0x180 [ 7197.251197] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7197.251199] ? amdgpu_display_user_framebuffer_create+0x215/0x320 [amdgpu] [ 7197.251329] ? drm_internal_framebuffer_create+0xb7/0x1a0 [ 7197.251332] ? srso_alias_return_thunk+0x5/0xfbef5 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Fixes: `4a5ad08f53` ("drm/amdgpu: Add address alignment support to DCC buffers") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:29 -05:00
Prike Liang	2b11179e18	drm/amdgpu: reduce RLC safe mode request for gfx clock gating The driver can only request one time for the power safe mode instead of polling and disabling the power feature each time prior to program the GFX clock gating control registers. This update will reduce the latency on the GFX clock gating entry. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:29 -05:00

... 6 7 8 9 10 ...

15900 Commits