linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
fanhuang	2f0268ca1c	drm/amdgpu/jpeg: sriov support for jpeg_v5_0_1 initialization table handshake with mmsch Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:39:14 -04:00
fanhuang	56fc141a5c	drm/amdgpu/vcn: sriov support for vcn_v5_0_1 initialization table handshake with mmsch Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:39:10 -04:00
Taimur Hassan	7ce316620d	drm/amd/display: Promote DAL to 3.2.334 This version brings along following update: -Support external tunneling feature -Modify DCN401 DMUB reset & halt sequence -Fix the typo in dcn401 Hubp block -Skip backend validation for virtual monitors Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:39:07 -04:00
Taimur Hassan	81fc9ca25f	drm/amd/display: [FW Promotion] Release 0.1.11.0 Refactoring some DMUB related structs and enum. Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:39:04 -04:00
Ovidiu Bunea	29e82a2716	drm/amd/display: Add GPINT retries to ips_query_residency_info [why & how] GPINTs can timeout without returning any data. Since this path is only for testing purposes, it should retry several times to ensure data is collected. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:59 -04:00
Dillon Varone	6d8f73885e	drm/amd/display: Modify DCN401 DMUB reset & halt sequence [WHY&HOW] If DMCUB is already disabled or reset, no need to send the halt command again. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:56 -04:00
Samson Tam	4a16285aa1	drm/amd/display: add support for 2nd sharpening range [Why & How] Add support for 2nd sharpening range for cases where we want override existing DCN sharpening range. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:51 -04:00
Nevenko Stupar	5dd63a0bfc	drm/amd/display: Fix the typo in dcn401 Hubp block [Why & How] Fix the typo for hubp_clear_tiling, currently calls hubp2_clear_tiling for dcn401 instead of intended hubp401_clear_tiling. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Nevenko Stupar <Nevenko.Stupar@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:47 -04:00
Chiawen Huang	eee5e5b358	drm/amd/display: Skip backend validation for virtual monitors [Why&How] Virtual monitors are now being validated during set_mode. Virtual monitors should not undergo backend validation, as the backend is intended only for physical monitors. Virtual sinks have no real backend part information and should be excluded from this validation. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Chiawen Huang <chiawen.huang@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:44 -04:00
Karthi Kandasamy	40bae1aea0	drm/amd/display: Move mcache allocation programming from DML to resource [Why] mcache allocation programming is not part of DML's core responsibilities. Keeping this logic in DML leads to poor separation of concerns and complicates maintenance. [How] Refactored code to move mcache parameter preparation and mcache ID assignment into the resource file. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:36 -04:00
Cruise Hung	17accf4f22	drm/amd/display: Support external tunneling feature [Why & How] The original code only supports the tunneling for embedded one. To support external tunneling feature, it needs to check Tunneling_Support bit register. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:32 -04:00
Yihan Zhu	fe1903bc95	drm/amd/display: init local variable to fix format errors [WHY & HOW] Uninitialized local variables will cause format checker complain about them. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:29 -04:00
Tomasz Siemek	c9b4fa034c	drm/amd/display: Extend dc_plane_get_status with flags [WHY] dc_plane_get_status may be used for reading other plane properties in the future. [HOW] Provide API for choosing plane properties to read. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Signed-off-by: Tomasz Siemek <Tomasz.Siemek@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:25 -04:00
Arvind Yadav	13d0724f0f	drm/amdgpu: fix use-after-unlock in eviction fence destroy The eviction fence destroy path incorrectly calls dma_fence_put() on evf_mgr->ev_fence after releasing the ev_fence_lock. This introduces a potential use-after-unlock or race because another thread concurrently modifies evf_mgr->ev_fence. Fix this by grabbing a local reference to evf_mgr->ev_fence under the lock and using that for dma_fence_put() after waiting. Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:10 -04:00
Lijo Lazar	cc473057bb	drm/amdgpu: Allow NPS2-CPX combination for VFs CPX partition mode is compatible with NPS2 on aquavanjaram VFs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:38:05 -04:00
fanhuang	67cc7f9096	drm/amdgpu/mmsch: Add MMSCH v5_0 support for sriov These structures are basically ported from MMSCH v4_0 The structures are the same as v4_0 except for the init header Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:37:57 -04:00
Lijo Lazar	3aa37922c6	drm/amdgpu: Use compatible NPS mode info Compatible NPS modes for a partition mode are exposed through xcp_config interface. To determine if a compute partition mode is valid, check if the current NPS mode is part of compatible NPS modes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:37:50 -04:00
Lijo Lazar	9a343a64fa	drm/amd/pm: Move SMUv13.0.12 function declarations Move them to SMUv13.0.6 header file as they are used only in SMU v13.0.6. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:37:44 -04:00
Asad Kamal	58c397890f	drm/amdgpu: Add pldm version reporting Add pldm version reporting through sysfs node Signed-off-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:37:38 -04:00
Amber Lin	e3d0870a90	drm/amdkfd: Support chain runlists of XNACK+/XNACK- If the MEC firmware supports chaining runlists of XNACK+/XNACK- processes, set SQ_CONFIG1 chicken bit and SET_RESOURCES bit 28. When the MEC/HWS supports it, KFD checks the XNACK+/XNACK- processes mix happens or not. If it does, enter over-subscription. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-16 13:37:29 -04:00
Shiwu Zhang	72ea78335e	drm/amdgpu: add debugfs for spirom IFWI dump Expose the debugfs file node for user space to dump the IFWI image on spirom. For one transaction between PSP and host, it will read out the images on both active and inactive partitions so a buffer with two times the size of maximum IFWI image (currently 16MByte) is needed. v2: move the vbios gfl macros to the common header and rename the bo triplet struct to spirom_bo for this specific usage (Hawking) v3: return directly the result of last command execution (Lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-14 11:30:15 -04:00
Prike Liang	64db767013	drm/amdgpu: fix userq resource double freed As the userq resource was already freed at the drm_release early phase, it should avoid freeing userq resource again at the later kms postclose callback. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-14 11:30:03 -04:00
Jesse.Zhang	96a86dcb5b	drm/amdgpu: Fix circular locking in userq creation A circular locking dependency was detected between the global `adev->userq_mutex` and per-file `userq_mgr->userq_mutex` when creating user queues. The issue occurs because: 1. `amdgpu_userq_suspend()` and `amdgpu_userq_resume` take `adev->userq_mutex` first, then `userq_mgr->userq_mutex` 2. While `amdgpu_userq_create()` takes them in reverse order This patch resolves the issue by: 1. Moving the `adev->userq_mutex` lock earlier in `amdgpu_userq_create()` to cover the `amdgpu_userq_ensure_ev_fence()` call 2. Releasing it after we're done with both queue creation and the scheduling halt check v2: remove unused adev->userq_mutex lock (Prike) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-14 11:29:38 -04:00
David (Ming Qiang) Wu	07c9db090b	drm/amdgpu: read back register after written for VCN v4.0.5 On VCN v4.0.5 there is a race condition where the WPTR is not updated after starting from idle when doorbell is used. Adding register read-back after written at function end is to ensure all register writes are done before they can be used. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-14 11:28:39 -04:00
Melissa Wen	f6a305d474	Revert "drm/amd/display: Hardware cursor changes color when switched to software cursor" This reverts commit `272e6aab14`. Applying degamma curve to the cursor by default breaks Linux userspace expectation. On Linux, AMD display manager enables cursor degamma ROM just for implict sRGB on HW versions where degamma is split into two blocks: degamma ROM for pre-defined TFs and `gamma correction` for user/custom curves, and degamma ROM settings doesn't apply to cursor plane. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1513 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803 Reported-by: Michel Dänzer <michel.daenzer@mailbox.org> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4144 Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-14 11:27:09 -04:00
Arunpravin Paneer Selvam	553ad6fc2b	drm/amdgpu/userq: Fix DEBUG_LOCKS_WARN_ON(lock->magic != lock) Fix DEBUG_LOCKS_WARN_ON(lock->magic != lock) warning logs. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 13:39:52 -04:00
Arunpravin Paneer Selvam	bc5bab82d3	drm/amdgpu: Fix userq ttm_bo_pin and ttm_bo_unpin lockdep warnings The ttm_bo_pin and ttm_bo_unpin warnings are resolved by moving the doorbell bo reserve up before pin/unpin. WARNING: CPU: 11 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:592 ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000277] CPU: 11 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15 [ +0.000006] Tainted: [W]=WARN [ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024 [ +0.000004] RIP: 0010:ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000005] RSP: 0018:ffff88846ca879d0 EFLAGS: 00010246 [ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000 [ +0.000004] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000005] RBP: ffff88846ca879e8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810b7ca848 [ +0.000004] R13: ffff88846c666250 R14: 1ffff1108d950f44 R15: ffff88846ca87aa0 [ +0.000005] FS: 00007c45ff436d00(0000) GS:ffff888409580000(0000) knlGS:0000000000000000 [ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000005] CR2: 00005b0c142a60e0 CR3: 000000012ce5a000 CR4: 0000000000f50ef0 [ +0.000004] PKRU: 55555554 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000005] ? show_regs+0x6c/0x80 [ +0.000007] ? __warn+0xd2/0x2d0 [ +0.000007] ? ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000031] ? report_bug+0x282/0x2f0 [ +0.000012] ? handle_bug+0x6e/0xc0 [ +0.000007] ? exc_invalid_op+0x18/0x50 [ +0.000007] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000014] amdgpu_bo_pin+0x365/0x9d0 [amdgpu] [ +0.000191] ? __pfx_amdgpu_bo_pin+0x10/0x10 [amdgpu] [ +0.000185] ? drm_gem_object_lookup+0x81/0xc0 [ +0.000008] ? kasan_save_alloc_info+0x37/0x60 [ +0.000007] ? __kasan_kmalloc+0xc3/0xd0 [ +0.000013] amdgpu_userqueue_get_doorbell_index+0xee/0x5f0 [amdgpu] [ +0.000209] amdgpu_userq_ioctl+0x6b4/0xd40 [amdgpu] [ +0.000193] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000211] ? lock_acquire+0x7c/0xc0 [ +0.000006] ? drm_dev_enter+0x51/0x190 [ +0.000015] drm_ioctl_kernel+0x18b/0x330 [ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000190] ? __pfx_drm_ioctl_kernel+0x10/0x10 [ +0.000005] ? lock_acquire+0x7c/0xc0 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000011] drm_ioctl+0x589/0xd00 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000194] ? __pfx_drm_ioctl+0x10/0x10 [ +0.000006] ? __pm_runtime_resume+0x80/0x110 [ +0.000021] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? trace_hardirqs_on+0x53/0x60 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu] [ +0.000185] __x64_sys_ioctl+0x13a/0x1c0 [ +0.000010] x64_sys_call+0x11ad/0x25f0 [ +0.000007] do_syscall_64+0x91/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? irqentry_exit+0x77/0xb0 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? exc_page_fault+0x93/0x150 [ +0.000009] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7c45ff924ded [ +0.000005] RSP: 002b:00007ffff7167810 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded [ +0.000004] RDX: 00007ffff7167870 RSI: 00000000c0486456 RDI: 000000000000000b [ +0.000004] RBP: 00007ffff7167860 R08: ffff800100000000 R09: 0000000000010000 [ +0.000005] R10: 00007ffff7167950 R11: 0000000000000246 R12: 00005b0c2a51bc48 [ +0.000004] R13: 000000000000000b R14: 0000000000000000 R15: 00007ffff7167950 [ +0.000022] </TASK> [ +0.000004] irq event stamp: 80693 [ +0.000004] hardirqs last enabled at (80699): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0 [ +0.000005] hardirqs last disabled at (80704): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0 [ +0.000005] softirqs last enabled at (80390): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] softirqs last disabled at (80385): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000006] ---[ end trace 0000000000000000 ]--- ------------------------------------------------------------------------------------------------------ [ +0.000006] WARNING: CPU: 10 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:611 ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000280] CPU: 10 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15 [ +0.000006] Tainted: [W]=WARN [ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024 [ +0.000004] RIP: 0010:ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000005] RSP: 0018:ffff88846ca87888 EFLAGS: 00010246 [ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000 [ +0.000005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000004] RBP: ffff88846ca878a0 R08: 0000000000000000 R09: 0000000000000000 [ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164e90050 [ +0.000005] R13: ffff88846c666200 R14: 0000000000000001 R15: ffff888168402d28 [ +0.000004] FS: 00007c45ff436d00(0000) GS:ffff888409500000(0000) knlGS:0000000000000000 [ +0.000005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000004] CR2: 00007c45f7373b20 CR3: 000000012ce5a000 CR4: 0000000000f50ef0 [ +0.000005] PKRU: 55555554 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000005] ? show_regs+0x6c/0x80 [ +0.000008] ? __warn+0xd2/0x2d0 [ +0.000007] ? ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000012] ? report_bug+0x282/0x2f0 [ +0.000013] ? handle_bug+0x6e/0xc0 [ +0.000006] ? exc_invalid_op+0x18/0x50 [ +0.000008] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000011] ? ttm_bo_unpin+0x217/0x2c0 [ttm] [ +0.000011] amdgpu_bo_unpin+0x45/0x250 [amdgpu] [ +0.000216] amdgpu_userq_ioctl+0x2c3/0xd40 [amdgpu] [ +0.000226] ? drm_dev_exit+0x2d/0x60 [ +0.000010] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000201] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? lock_acquire+0x7c/0xc0 [ +0.000006] ? drm_dev_enter+0x51/0x190 [ +0.000015] drm_ioctl_kernel+0x18b/0x330 [ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000188] ? __pfx_drm_ioctl_kernel+0x10/0x10 [ +0.000006] ? lock_acquire+0x7c/0xc0 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000006] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] drm_ioctl+0x589/0xd00 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000211] ? __pfx_drm_ioctl+0x10/0x10 [ +0.000006] ? __pm_runtime_resume+0x80/0x110 [ +0.000020] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? trace_hardirqs_on+0x53/0x60 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu] [ +0.000186] __x64_sys_ioctl+0x13a/0x1c0 [ +0.000010] x64_sys_call+0x11ad/0x25f0 [ +0.000007] do_syscall_64+0x91/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] ? __pfx___rseq_handle_notify_resume+0x10/0x10 [ +0.000005] ? __pfx_blkcg_maybe_throttle_current+0x10/0x10 [ +0.000013] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? syscall_exit_to_user_mode+0x95/0x260 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000011] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? irqentry_exit_to_user_mode+0x8b/0x260 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? irqentry_exit+0x77/0xb0 [ +0.000004] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? exc_page_fault+0x93/0x150 [ +0.000010] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7c45ff924ded [ +0.000005] RSP: 002b:00007ffff7168790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded [ +0.000005] RDX: 00007ffff71687f0 RSI: 00000000c0486456 RDI: 000000000000000b [ +0.000004] RBP: 00007ffff71687e0 R08: 00005b0c2a49b010 R09: 0000000000000007 [ +0.000004] R10: 00005b0c2a4d7140 R11: 0000000000000246 R12: 000000000000000b [ +0.000004] R13: 00007c45ff19e5cc R14: 00005b0c2a51c538 R15: 00005b0c2a51bbd8 [ +0.000022] </TASK> [ +0.000005] irq event stamp: 87419 [ +0.000004] hardirqs last enabled at (87425): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0 [ +0.000005] hardirqs last disabled at (87430): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0 [ +0.000005] softirqs last enabled at (87058): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000006] softirqs last disabled at (87053): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] ---[ end trace 0000000000000000 ]--- Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 13:39:46 -04:00
Arunpravin Paneer Selvam	218caca4ba	drm/amdgpu/userq: Fix lock contention in userq fence Fix lockdep warnings. [ +0.000637] ================================ [ +0.000004] WARNING: inconsistent lock state [ +0.000004] 6.12.0+ #18 Tainted: G W OE [ +0.000004] -------------------------------- [ +0.000004] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ +0.000004] Xwayland/1952 [HC0[0]:SC0[0]:HE1:SE1] takes: [ +0.000005] ffff8884636f4740 (&fence_drv->fence_list_lock){?...}-{2:2}, at: amdgpu_userq_fence_driver_destroy+0xb8/0x540 [amdgpu] [ +0.000208] {IN-HARDIRQ-W} state was registered at: [ +0.000004] lock_acquire.part.0+0x116/0x360 [ +0.000005] lock_acquire+0x7c/0xc0 [ +0.000005] _raw_spin_lock+0x2f/0x60 [ +0.000005] amdgpu_userq_fence_driver_process+0x75/0x400 [amdgpu] [ +0.000185] gfx_v12_0_eop_irq+0x29f/0x420 [amdgpu] [ +0.000210] amdgpu_irq_dispatch+0x2a4/0x7b0 [amdgpu] [ +0.000191] amdgpu_ih_process+0x1e1/0x3d0 [amdgpu] [ +0.000185] amdgpu_irq_handler+0x28/0xc0 [amdgpu] [ +0.000183] __handle_irq_event_percpu+0x1bb/0x590 [ +0.000005] handle_irq_event+0xab/0x1d0 [ +0.000005] handle_edge_irq+0x1fd/0xc10 [ +0.000005] __common_interrupt+0x83/0x190 [ +0.000004] common_interrupt+0xb1/0xe0 [ +0.000005] asm_common_interrupt+0x27/0x40 [ +0.000004] cpuidle_enter_state+0x2ba/0x530 [ +0.000005] cpuidle_enter+0x4f/0xb0 [ +0.000006] call_cpuidle+0x46/0xd0 [ +0.000005] do_idle+0x367/0x430 [ +0.000004] cpu_startup_entry+0x58/0x70 [ +0.000005] start_secondary+0x224/0x2b0 [ +0.000005] common_startup_64+0x13e/0x141 [ +0.000005] irq event stamp: 88271 [ +0.000004] hardirqs last enabled at (88271): [<ffffffffad9ca7a1>] _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000005] hardirqs last disabled at (88270): [<ffffffffad9ca424>] _raw_spin_lock_irqsave+0x74/0x80 [ +0.000005] softirqs last enabled at (87858): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] softirqs last disabled at (87849): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] other info that might help us debug this: [ +0.000004] Possible unsafe locking scenario: [ +0.000003] CPU0 [ +0.000004] ---- [ +0.000003] lock(&fence_drv->fence_list_lock); v2: Drop fence_list_flags and use xa_lock_irqsave() flags parameter (Christian) Fix merge conflicts. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 13:39:20 -04:00
Wayne Lin	9a9c3e1fe5	drm/amd/display: Avoid flooding unnecessary info messages It's expected that we'll encounter temporary exceptions during aux transactions. Adjust logging from drm_info to drm_dbg_dp to prevent flooding with unnecessary log messages. Fixes: `3637e457eb` ("drm/amd/display: Fix wrong handling for AUX_DEFER case") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250513032026.838036-1-Wayne.Lin@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 13:38:20 -04:00
Jesse.Zhang	3b63602614	drm/amdgpu: Add GFX 9.5.0 support for per-queue/pipe reset This patch enables per-queue and per-pipe reset functionality for GFX IP v9.5.0 when using MEC firmware version 21 (0x15) or later. This change: 1. Refactors the pipe reset support check in gfx_v9_4_3_pipe_reset_support() to use the compute_supported_reset flags instead of hardcoding version checks. 2. Adds support for GFX9.5.0 (IP 9.5.0) with MEC firmware version >= 21 to enable per-queue and per-pipe reset capabilities. v2: Replaced mec version check with !!(adev->gfx.compute_supported_reset & AMDGPU_RESET_TYPE_PER_PIPE) (Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:37:32 -04:00
Tao Zhou	5d6fddac55	drm/amdgpu: set vram type for GC 9.5.0 Set vram type so we can take different actions according to the type. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:37:28 -04:00
Tao Zhou	dc111f8fb1	drm/amdgpu: set flip bits for RAS bad pages Make the code more general, user doesn't need to pay attention to the detail of flip bits setting. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:37:19 -04:00
Sebastian Aguilera Novoa	7e340d3cea	drm/amd/display/dc/irq: Remove duplications of hpd_ack function from IRQ The major of dcn and dce irqs share a copy-pasted collection of copy-pasted function, which is: hpd_ack. This patch removes the multiple copy-pasted by moving them to the irq_service.c and make the irq_service's calls the functions implemented by the irq_service.c instead. The hpd_ack function is replaced by hpd0_ack and hpd1_ack, the required constants are also added. The changes were not tested on actual hardware. I am only able to verify that the changes keep the code compileable and do my best to look repeatedly if I am not actually changing any code. Signed-off-by: Sebastian Aguilera Novoa <saguileran@ime.usp.br> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:37:09 -04:00
Melissa Wen	d8d47f7397	drm/amd/display: Fix null check of pipe_ctx->plane_state for update_dchubp_dpp Similar to commit `6a057072dd` ("drm/amd/display: Fix null check for pipe_ctx->plane_state in dcn20_program_pipe") that addresses a null pointer dereference on dcn20_update_dchubp_dpp. This is the same function hooked for update_dchubp_dpp in dcn401, with the same issue. Fix possible null pointer deference on dcn401_program_pipe too. Fixes: `63ab80d9ac` ("drm/amd/display: DML2.1 Post-Si Cleanup") Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:35:17 -04:00
Ce Sun	533aa8bdbe	drm/amdgpu: Modify the count method of defer error The number of newly added de counts and the number of newly added error addresses remain consistent Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:35:12 -04:00
Alex Deucher	085c997d44	drm/amdkfd: drop warning in event_interrupt_isr_v1*() Commit ded8b3c36f17 ("drm/amdgpu: properly handle GC vs MM in amdgpu_vmid_mgr_init()") enables all 16 vmids for MMHUB on GC 10 and newer for KGD since there are no KFD resources using MMHUB. With this change, KFD starts seeing MMHUB vmids in it's range with no pasid set. As such there is no need to warn, we can just ignore those interrupts. Fixes: `aded8b3c36` ("drm/amdgpu: properly handle GC vs MM in amdgpu_vmid_mgr_init()") Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:34:09 -04:00
Lijo Lazar	937467b7d5	drm/amdgpu: Log RAS errors during load During driver load, RAS event manager may not be initialized. This will cause any ATHUB event during driver load to be skipped in dmesg log. Log the error in dmesg log for easier diagnosis. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:34:02 -04:00
Jesse.Zhang	648a0dc0d7	drm/amdgpu: Fix user queue deadlock by reordering mutex locking This resolves a deadlock between user queue management and GPU reset paths by enforcing consistent lock ordering. The deadlock occurred when: 1. Process exit path (amdgpu_userq_mgr_fini) would: - Take uqm->userq_mutex - Then try to take adev->userq_mutex for list operations 2. GPU reset path (amdgpu_userq_pre_reset) would: - Take adev->userq_mutex first (for list traversal) - Then take uqm->userq_mutex The solution establishes a strict top-down locking order: 1. Always take adev->userq_mutex before any uqm->userq_mutex 2. Maintain this order consistently across all code paths Changes made: - Reordered locking in amdgpu_userq_mgr_fini() to take device lock first - Kept existing proper order in amdgpu_userq_pre_reset() - Simplified the fini flow by removing redundant operations This prevents circular dependencies while maintaining thread safety during both normal operation and GPU reset scenarios. Fixes: `4ce60dbada` ("drm/amdgpu: store userq_managers in a list in adev") Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:32:25 -04:00
Ce Sun	f71509fdd0	drm/amdgpu: Fix the kernel panic caused by RAS records exceed threshold kernel panic caused by RAS records exceeding the threshold when load driver specifying RMA(bad_page_threshold=128) 1.Fix the warnings caused by disabling the interrupt source before it was enabled 2.Fix kernel panic when xcp sysfs is not initialized,null pointer appears during fini 3.Fix the memory leak caused by the device's early exit due to rma The first reason: [ 2744.246650] ------------[ cut here ]------------ [ 2744.246651] WARNING: CPU: 0 PID: 289 at /tmp/amd.BkfTLqYV/amd/amdgpu/amdgpu_irq.c:635 amdgpu_irq_put.cold+0x42/0x6e [amdgpu] [ 2744.247108] Modules linked in: amdgpu(OE+) amddrm_ttm_helper(OE) amdttm(OE) amdxcp(OE) amddrm_buddy(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay binfmt_misc intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif kvm_intel nls_iso8859_1 kvm rapl isst_if_mbox_pci isst_if_mmio pmt_telemetry pmt_crashlog isst_if_common pmt_class mei_me mei acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath [ 2744.247167] linear mlx5_ib ib_uverbs ib_core ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul ghash_clmulni_intel mlx5_core sysfillrect sysimgblt aesni_intel mlxfw fb_sys_fops psample cec crypto_simd cryptd rc_core i2c_i801 nvme xhci_pci tls intel_pmt drm pci_hyperv_intf nvme_core i2c_smbus i2c_ismt xhci_pci_renesas wmi pinctrl_emmitsburg [ 2744.247194] CPU: 0 PID: 289 Comm: kworker/0:1 Tainted: G OE 5.15.0-70-generic #77-Ubuntu [ 2744.247197] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C23.AG.2 11/21/2024 [ 2744.247198] Workqueue: events work_for_cpu_fn [ 2744.247206] RIP: 0010:amdgpu_irq_put.cold+0x42/0x6e [amdgpu] [ 2744.247634] Code: 79 7f ff 44 89 ee 48 c7 c7 4d 5a 42 c2 89 55 d4 e8 90 09 bc bf 8b 55 d4 4c 89 e6 4c 89 ff e8 3c 76 7f ff 8b 55 d4 84 c0 75 07 <0f> 0b e9 95 79 7f ff 49 03 5c 24 08 f0 ff 0b 75 13 4c 89 e6 4c 89 [ 2744.247636] RSP: 0018:ffa0000019e27cb0 EFLAGS: 00010246 [ 2744.247639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ff11000150fa87c0 [ 2744.247641] RDX: 0000000000000000 RSI: ffffffffc2222430 RDI: ff1100019f200000 [ 2744.247642] RBP: ffa0000019e27ce0 R08: 0000000000000003 R09: ffffffffffe41a08 [ 2744.247643] R10: 0000000000ffff0a R11: 0000000000000001 R12: ff1100019f22ce60 [ 2744.247644] R13: 0000000000000000 R14: 00000000ffffffea R15: ff1100019f200000 [ 2744.247645] FS: 0000000000000000(0000) GS:ff11007e7e400000(0000) knlGS:0000000000000000 [ 2744.247647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2744.247649] CR2: 00007f3d2002819c CR3: 0000000006810003 CR4: 0000000000771ef0 [ 2744.247650] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2744.247651] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 2744.247652] PKRU: 55555554 [ 2744.247653] Call Trace: [ 2744.247654] <TASK> [ 2744.247656] sdma_v4_4_2_hw_fini+0x7a/0xc0 [amdgpu] [ 2744.247997] ? vcn_v4_0_3_hw_fini+0x5f/0xa0 [amdgpu] [ 2744.248336] amdgpu_ip_block_hw_fini+0x31/0x61 [amdgpu] [ 2744.248776] amdgpu_device_fini_hw+0x3bb/0x47b [amdgpu] [ 2744.249197] ? blocking_notifier_chain_unregister+0x56/0xb0 [ 2744.249202] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 2744.249482] amdgpu_driver_load_kms.cold+0x18/0x2e [amdgpu] [ 2744.249913] amdgpu_pci_probe+0x23e/0x590 [amdgpu] [ 2744.250187] local_pci_probe+0x48/0x90 [ 2744.250191] work_for_cpu_fn+0x17/0x30 [ 2744.250196] process_one_work+0x228/0x3d0 [ 2744.250198] worker_thread+0x223/0x420 [ 2744.250200] ? process_one_work+0x3d0/0x3d0 [ 2744.250201] kthread+0x127/0x150 [ 2744.250204] ? set_kthread_struct+0x50/0x50 [ 2744.250207] ret_from_fork+0x1f/0x30 [ 2744.250212] </TASK> [ 2744.250213] ---[ end trace 488c997a88508bc3 ]--- The second reason: [ 5139.303446] Memory manager not clean during takedown. [ 5139.303509] WARNING: CPU: 145 PID: 117699 at drivers/gpu/drm/drm_mm.c:998 drm_mm_takedown+0x27/0x30 [drm] [ 5139.303542] Modules linked in: amdgpu(OE+) amddrm_ttm_helper(OE) amdttm(OE) amdxcp(OE) amddrm_buddy(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif kvm_intel binfmt_misc kvm nls_iso8859_1 rapl isst_if_mbox_pci pmt_telemetry pmt_crashlog isst_if_mmio pmt_class isst_if_common mei_me mei acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath [ 5139.303572] linear mlx5_ib ib_uverbs ib_core crct10dif_pclmul ast crc32_pclmul i2c_algo_bit ghash_clmulni_intel aesni_intel crypto_simd drm_vram_helper cryptd drm_ttm_helper mlx5_core ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core mlxfw psample intel_pmt nvme xhci_pci drm tls i2c_i801 pci_hyperv_intf nvme_core i2c_smbus i2c_ismt xhci_pci_renesas wmi pinctrl_emmitsburg [last unloaded: amdkcl] [ 5139.303588] CPU: 145 PID: 117699 Comm: modprobe Tainted: G U OE 5.15.0-70-generic #77-Ubuntu [ 5139.303590] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C23.AG.2 11/21/2024 [ 5139.303591] RIP: 0010:drm_mm_takedown+0x27/0x30 [drm] [ 5139.303605] Code: cc 66 90 0f 1f 44 00 00 48 8b 47 38 48 83 c7 38 48 39 f8 75 05 c3 cc cc cc cc 55 48 c7 c7 18 d0 10 c0 48 89 e5 e8 5a bc c3 c1 <0f> 0b 5d c3 cc cc cc cc 90 0f 1f 44 00 00 55 b9 15 00 00 00 48 89 [ 5139.303607] RSP: 0018:ffa00000325c3940 EFLAGS: 00010286 [ 5139.303608] RAX: 0000000000000000 RBX: ff1100012f5cecb0 RCX: 0000000000000027 [ 5139.303609] RDX: ff11007e7fa60588 RSI: 0000000000000001 RDI: ff11007e7fa60580 [ 5139.303610] RBP: ffa00000325c3940 R08: 0000000000000003 R09: fffffffff00c2b78 [ 5139.303610] R10: 000000000000002b R11: 0000000000000001 R12: ff1100012f5cec00 [ 5139.303611] R13: ff1100012138f068 R14: 0000000000000000 R15: ff1100012f5cec90 [ 5139.303611] FS: 00007f42ffca0000(0000) GS:ff11007e7fa40000(0000) knlGS:0000000000000000 [ 5139.303612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5139.303613] CR2: 00007f23d945ab68 CR3: 00000001212ce005 CR4: 0000000000771ee0 [ 5139.303614] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5139.303615] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 5139.303615] PKRU: 55555554 [ 5139.303616] Call Trace: [ 5139.303617] <TASK> [ 5139.303619] amdttm_range_man_fini_nocheck+0xfe/0x1c0 [amdttm] [ 5139.303625] amdgpu_ttm_fini+0x2ed/0x390 [amdgpu] [ 5139.303800] amdgpu_bo_fini+0x27/0xc0 [amdgpu] [ 5139.303959] gmc_v9_0_sw_fini+0x63/0x90 [amdgpu] [ 5139.304144] amdgpu_device_fini_sw+0x125/0x6a0 [amdgpu] [ 5139.304302] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] [ 5139.304455] devm_drm_dev_init_release+0x4a/0x80 [drm] [ 5139.304472] devm_action_release+0x12/0x20 [ 5139.304476] release_nodes+0x3d/0xb0 [ 5139.304478] devres_release_all+0x9b/0xd0 [ 5139.304480] really_probe+0x11d/0x420 [ 5139.304483] __driver_probe_device+0x119/0x190 [ 5139.304485] driver_probe_device+0x23/0xc0 [ 5139.304487] __driver_attach+0xf7/0x1f0 [ 5139.304489] ? __device_attach_driver+0x140/0x140 [ 5139.304491] bus_for_each_dev+0x7c/0xd0 [ 5139.304493] driver_attach+0x1e/0x30 [ 5139.304494] bus_add_driver+0x148/0x220 [ 5139.304496] driver_register+0x95/0x100 [ 5139.304498] __pci_register_driver+0x68/0x70 [ 5139.304500] amdgpu_init+0xbc/0x1000 [amdgpu] [ 5139.304655] ? 0xffffffffc0b8f000 [ 5139.304657] do_one_initcall+0x46/0x1e0 [ 5139.304659] ? kmem_cache_alloc_trace+0x19e/0x2e0 [ 5139.304663] do_init_module+0x52/0x260 [ 5139.304665] load_module+0xb2b/0xbc0 [ 5139.304667] __do_sys_finit_module+0xbf/0x120 [ 5139.304669] __x64_sys_finit_module+0x18/0x20 [ 5139.304670] do_syscall_64+0x59/0xc0 [ 5139.304673] ? exit_to_user_mode_prepare+0x37/0xb0 [ 5139.304676] ? syscall_exit_to_user_mode+0x27/0x50 [ 5139.304678] ? __x64_sys_mmap+0x33/0x50 [ 5139.304680] ? do_syscall_64+0x69/0xc0 [ 5139.304681] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 5139.304684] RIP: 0033:0x7f42ffdbf88d [ 5139.304686] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48 [ 5139.304687] RSP: 002b:00007ffcb7427158 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 5139.304688] RAX: ffffffffffffffda RBX: 000055ce8b8f3150 RCX: 00007f42ffdbf88d [ 5139.304689] RDX: 0000000000000000 RSI: 000055ce8b8f9a70 RDI: 000000000000000a [ 5139.304690] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000011 [ 5139.304690] R10: 000000000000000a R11: 0000000000000246 R12: 000055ce8b8f9a70 [ 5139.304691] R13: 000055ce8b8f2ec0 R14: 000055ce8b8f2ab0 R15: 000055ce8b8f9aa0 [ 5139.304692] </TASK> [ 5139.304693] ---[ end trace 8536b052f7883003 ]--- Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:32:11 -04:00
Tao Zhou	b7674ae75b	drm/amdgu: get RAS retire flip bits for new type of HBM Get RAS retire flip bits for HBM with different types in various NPS modes. Also set flip row bit and MCA R13 bit in PA in different NPS modes. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:32:08 -04:00
Tao Zhou	9b5b71895b	drm/amdgpu: implement get_retire_flip_bits for UMC v12 The RAS bad page retire flip bits can be set per vram type, vram vendor and NPS mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:32:05 -04:00
Tao Zhou	699bff37a5	drm/amdgpu: add get_retire_flip_bits for UMC Add the general interface to get flip bits for RAS bad page retirement. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:32:01 -04:00
fanhuang	80f66ca7a4	drm/amdgpu: add vcn v5_0_0 ip headers Add vcn v5_0_0 register offset and shift masks header files Only include the registers required for MMSCH initialization Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:51 -04:00
Tao Zhou	4ce5b99128	drm/amdgpu: adjust high bits for RAS retired page Per UMC address conversion algorithm, the high row bits of UMC MCA address are changed when they're converted into normalized address on specific ASICs. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:46 -04:00
Tao Zhou	1df57411a6	drm/amd: add definition for new memory type Support new version of HBM. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:40 -04:00
ganglxie	f5db59067c	Refine RAS bad page records counting and parsing in eeprom V3 there is only MCA records in V3, no need to care about PA records. recalculate the value of ras_num_bad_pages when parsing failed and go on with the left records instead of quit. Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:33 -04:00
Taimur Hassan	3ab3d680ff	drm/amd/display: Promote DC to 3.2.333 Summary * Refactor DMI quirks * Fix link-off issue triggered by quick unplug/replug * Fix race condition when submitting DMUB commands * Correct reply value when AUX Write incomplete * Backup / restore plane config only on update Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:28 -04:00
Michael Strauss	8989cb919b	drm/amd/display: Add early 8b/10b channel equalization test pattern sequence [WHY] Early EQ pattern sequence is required for some LTTPR + old dongle combinations. [HOW] If DP_EARLY_8B10B_TPS2 chip cap is set, this new sequence programs phy to output TPS2 before initiating link training and writes TPS1 to LTTPR training pattern register as instructed by vendor. Add function to get embedded LTTPR target address offset. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: TungYu Lu <tungyu.lu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:21 -04:00
Sung Lee	90af999835	drm/amd/display: Program triplebuffer on all pipes [WHY] Triplebuffer should be programmed on all pipes. Some code assumed it only needed to be called on top pipe, but as the HWSS function does not account for that, it must be called on every pipe. [HOW] Remove condition to not program triplebuffer on non-top/next pipe. Call the function unconditionally on all pipes. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Sung Lee <Sung.Lee@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:15 -04:00
Taimur Hassan	e91c91e506	drm/amd/display: [FW Promotion] Release 0.1.10.0 Refactoring some IPS and panel replay structs Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:12 -04:00

1 2 3 4 5 ...

33509 Commits