2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

33509 Commits

Author SHA1 Message Date
Lijo Lazar
3580440308 drm/amd/pm: Fix comment style
Fix code comment style

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504271422.D6cqMlZ0-lkp@intel.com/
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:15:03 -04:00
Lijo Lazar
cf1fcdeec4 drm/amdgpu: Print bootloader status for long waits
If it needs a long wait for completion of bootloader execution, report
the status in between. That helps to know if there is some issue during
bootloader execution.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:14:58 -04:00
Yifan Zha
161949dd71 drm/amdgpu: refine MES register print for devices of hive
[Why]
Register access print missed device info.

[How]
Using dev_xxx instead of DRM_xxx to indicate which device
of a hive is the message for.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:14:47 -04:00
Lijo Lazar
3805e6959c drm/amdgpu: Fix query order of XGMI v6.4.1 status
Keep the register offsets as per link order for querying XGMI v6.4.1
link status.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Fixes: 6dee64e765 ("drm/amdgpu: Fix xgmi v6.4.1 link status reporting")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:13:44 -04:00
Asad Kamal
96ac487c12 drm/amd/pm: Add board voltage node to hwmon
Add and expose board voltage node as vddboard to hwmon for smu_v13_0_6

v2: Replace ip check with supported sensor attribute(Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:13:40 -04:00
Jesse.Zhang
ad7c088e31 drm/amdgpu: Fix API status offset for MES queue reset
The mes_v11_0_reset_hw_queue and mes_v12_0_reset_hw_queue functions were
using the wrong union type (MESAPI__REMOVE_QUEUE) when getting the offset
for api_status. Since these functions handle queue reset operations, they
should use MESAPI__RESET union instead.

This fixes the polling of API status during hardware queue reset operations
in the MES for both v11 and v12 versions.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-By: Shaoyun.liu <Shaoyun.liu@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:10:06 -04:00
Asad Kamal
3a2191efe4 drm/amd/pm: Add voltage caps for smu_v13_0_6
Add & enable board voltage caps for smu_v13_0_6

v3: Update version check for board voltage support

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:10:02 -04:00
Asad Kamal
ad3d93230d drm/amd/pm: Fill static metrics data
Fill static metrics data for smu_v13_0_6

v2: Proceed with driver load just with warning even if board
voltage reads invalid value

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:09:54 -04:00
Asad Kamal
0ee5847849 drm/amd/pm: Use common function to fetch static metrics table
Use common function to fetch static metrics table for smu_v13_0_12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:09:50 -04:00
Asad Kamal
9eddfcbef4 drm/amd/pn: Fetch static metrics table
Fetch static metrics table for smu_v13_0_6

v2: Add static metrics caps check to fetch static metrics table

v3: Update version having all fixes for static metrics support

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:09:45 -04:00
Asad Kamal
a2344a9827 drm/amd/pm: Update pmfw headers for smu_v_13_0_6
Update pmfw headers for smu_v_13_0_6 to include static metrics table

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:09:42 -04:00
Alex Deucher
482d485332 drm/amdgpu/userq: take the userq_mgr lock in enforce isolation
Add the missing locking.

Fixes: 94976e7e5e ("drm/amdgpu/userq: add helpers to start/stop scheduling")
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:09:26 -04:00
Alex Deucher
c5e02d6588 drm/amdgpu/userq: take the userq_mgr lock in suspend/resume
Add the missing locking.

Fixes: 73e12e98ec ("drm/amdgpu/userq: add suspend and resume helpers")
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:08:24 -04:00
Sonny Jiang
3e5f86c14c drm/amdgpu: Add DPG pause for VCN v5.0.1
For vcn5.0.1 only, enable DPG PAUSE to avoid DPG resets.

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:08:21 -04:00
Asad Kamal
b02a284cc8 drm/amd/pm: Add ip version check for smu_v13_0_12 functions
Add ip version check to use smu_v13_0_12 specific functions

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:08:16 -04:00
Aurabindo Pillai
d85212e1ce drm/amd/display: downgrade HDMI infoframe error to one time warning
In certain config, a modeprobe test triggers too many instances of the
error related to infoframe. Make it print only once, and also make it a
warning.

Fixes: 6027cbee19 ("drm/amd/display: Add error check for avi and vendor infoframe setup function")
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Chengjun Yao <Chengjun.Yao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:06:19 -04:00
Eric Huang
7295e00df0 drm/amdkfd: add pasid debugfs entries
the entries will be appearing at
/sys/kernel/debug/kfd/proc/<pid>/pasid_<gpuid>.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:06:14 -04:00
Arvind Yadav
56801cb83c drm/amdgpu: remove DRM_AMDGPU_NAVI3X_USERQ config for UQ
DRM_AMDGPU_NAVI3X_USERQ config support is not required for
usermode queue.

v2: rebase.

Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:06:00 -04:00
Srinivasan Shanmugam
716ad3c28f drm/amd/display: Fix NULL pointer dereference for program_lut_mode in dcn401_populate_mcm_luts
This commit introduces a NULL pointer check for
mpc->funcs->program_lut_mode in the dcn401_populate_mcm_luts function.
The previous implementation directly called program_lut_mode without
validating its existence, which could lead to a NULL pointer
dereference.

With this change, the function is now only invoked if
mpc->funcs->program_lut_mode is not NULL

Fixes the below:

drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn401/dcn401_hwseq.c:720 dcn401_populate_mcm_luts()
error: we previously assumed 'mpc->funcs->program_lut_mode' could be null (see line 701)

drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn401/dcn401_hwseq.c
    642 void dcn401_populate_mcm_luts(struct dc *dc,
    643                 struct pipe_ctx *pipe_ctx,
    644                 struct dc_cm2_func_luts mcm_luts,
    645                 bool lut_bank_a)
    646 {
	...
    716                 }
    717                 if (m_lut_params.pwl) {
    718                         if (mpc->funcs->mcm.populate_lut)
    719                                 mpc->funcs->mcm.populate_lut(mpc, m_lut_params, lut_bank_a, mpcc_id);
--> 720                         mpc->funcs->program_lut_mode(mpc, MCM_LUT_SHAPER, MCM_LUT_ENABLE, lut_bank_a, mpcc_id);

Cc: Yihan Zhu <yihanzhu@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Yihan Zhu <yihanzhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:52 -04:00
Amber Lin
ab9fcc6362 drm/amdkfd: Set SDMA_RLCx_IB_CNTL/SWITCH_INSIDE_IB
When submitting MQD to CP, set SDMA_RLCx_IB_CNTL/SWITCH_INSIDE_IB bit so
it'll allow SDMA preemption if there is a massive command buffer of
long-running SDMA commands.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:46 -04:00
Rodrigo Siqueira
ffc7e11c10 drm/amdgpu: Add documentation associated with CSB
Add a description for the get_csb_buffer callback, update the glossary,
and add some extra information about RB, which is associated with CSB
configuration.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:41 -04:00
Rodrigo Siqueira
e7164c7ade drm/amdgpu/gfx: Use CSB helpers in gfx_v6_0_get_csb_buffer
Remove duplications from gfx_v6_0_get_csb_buffer by using CSB helpers.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:38 -04:00
Rodrigo Siqueira
aff78a6172 drm/amdgpu/gfx: Fix gfx_v7_0_get_csb_buffer to use rb_config
Instead of having the hardcoded values for the CSB buffer in
gfx_v7_0_get_csb_buffer, use the values calculated in previous steps by
accessing raster_config and raster_config_1.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:34 -04:00
Prike Liang
e125a6e8ce drm/amdgpu: set the evf name to identify the userq case
The evf fence name can clearly identify the userq usage.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:27 -04:00
Lijo Lazar
d8116a32cd drm/amdgpu: Fix offset for HDP remap in nbio v7.11
APUs in passthrough mode use HDP flush. 0x7F000 offset used for
remapping HDP flush is mapped to VPE space which could get power gated.
Use another unused offset in BIF space.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:14 -04:00
Lijo Lazar
923406e74e drm/amd/pm: Reset SMU v13.0.x custom settings
On SMU v13.0.2 and SMU v13.0.6 variants user may choose custom min/max
clocks in manual perf mode. Those custom min/max values need to be
reset once user switches to auto or restores default settings.
Otherwise, they may get used inadvertently during the next operation.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:11 -04:00
Prike Liang
9d40b05d6d drm/amdgpu: add the evf attached gem obj resv dump
This debug dump will help on debugging the evf attached gem obj fence
related issue.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:07 -04:00
Felix Kuehling
dbe4c63689 drm/amdgpu: Fail DMABUF map of XGMI-accessible memory
If peer memory is XGMI-accessible, we should never access it through PCIe
P2P DMA mappings. PCIe P2P is slower, has different coherence behaviour,
limited or no support for atomics, or may not work at all. Fail with a
warning if DMABUF mappings of such memory are attempted.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:02 -04:00
Rodrigo Siqueira
9bd5a47ee2 drm/amdgpu/gfx: Use CSB helpers in gfx_v7_0_get_csb_buffer
Use CSB helpers to remove code duplication from gfx_v7_0_get_csb_buffer.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:48 -04:00
Rodrigo Siqueira
b990cb5234 drm/amdgpu/gfx: Use CSB helpers in gfx_v8_0_get_csb_buffer
Remove code duplication from gfx_v8_0_get_csb_buffer by using CSB
helpers.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:45 -04:00
Rodrigo Siqueira
9fec2e92fa drm/amdgpu/gfx: Use CSB helpers in gfx_v9_0_get_csb_buffer
Eliminate code duplication in gfx_v9_0_get_csb_buffer by using CSB
helpers.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:42 -04:00
Rodrigo Siqueira
0eef0e36ba drm/amdgpu/gfx: Use CSB helpers in gfx_v10_0_get_csb_buffer
Remove duplicate code by using CSB helpers.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:38 -04:00
Rodrigo Siqueira
106172df6e drm/amdgpu/gfx: Use CSB helpers in gfx_v11_0_get_csb_buffer
Part of the code in gfx_v11_0_get_csb_buffer can be removed in favor of
some GFX CSB helpers. This commit removes the duplicated part for the
GFX 11 CSB function.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:34 -04:00
Rodrigo Siqueira
9718f7457d drm/amdgpu/gfx: Introduce helpers handling CSB manipulation
From GFX6 to GFX11, there is a function for getting the CSB buffer to be
put into the hardware. Three common parts are duplicated in all of these
GFX functions:

1. Prepare the CSB preamble.
2. Parser the CS data.
3. End the CSB preamble.

This commit creates helpers to be used from GFX6 to GFX11.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:29 -04:00
Colin Ian King
5a2658fda4 drm/amdgpu: Fix spelling mistake "rounter" -> "router"
There is a spelling mistake with the array utcl2_rounter_str, it
appears it should be utcl2_router_str. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:23 -04:00
Kees Cook
e5f7e4e0a4 drm/amdgpu/atom: Work around vbios NULL offset false positive
GCC really does not want to consider NULL (or near-NULL) addresses as
valid, so calculations based off of NULL end up getting range-tracked into
being an offset wthin a 0 byte array. It gets especially mad about this:

                if (vbios_str == NULL)
                        vbios_str += sizeof(BIOS_ATOM_PREFIX) - 1;
	...
        if (vbios_str != NULL && *vbios_str == 0)
                vbios_str++;

It sees this as being "sizeof(BIOS_ATOM_PREFIX) - 1" byte offset from
NULL, when building with -Warray-bounds (and the coming
-fdiagnostic-details flag):

In function 'atom_get_vbios_pn',
    inlined from 'amdgpu_atom_parse' at drivers/gpu/drm/amd/amdgpu/atom.c:1553:2:
drivers/gpu/drm/amd/amdgpu/atom.c:1447:34: error: array subscript 0 is outside array bounds of 'unsigned char[0]' [-Werror=array-bounds=]
 1447 |         if (vbios_str != NULL && *vbios_str == 0)
      |                                  ^~~~~~~~~~
  'amdgpu_atom_parse': events 1-2
 1444 |                 if (vbios_str == NULL)
      |                    ^
      |                    |
      |                    (1) when the condition is evaluated to true
......
 1447 |         if (vbios_str != NULL && *vbios_str == 0)
      |                                  ~~~~~~~~~~
      |                                  |
      |                                  (2) out of array bounds here
In function 'amdgpu_atom_parse':
cc1: note: source object is likely at address zero

As there isn't a sane way to convince it otherwise, hide vbios_str from
GCC's optimizer to avoid the warning so we can get closer to enabling
-Warray-bounds globally.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:12 -04:00
Chris Bainbridge
d4673f3c3b drm/amd/display: Fix slab-use-after-free in hdcp
The HDCP code in amdgpu_dm_hdcp.c copies pointers to amdgpu_dm_connector
objects without incrementing the kref reference counts. When using a
USB-C dock, and the dock is unplugged, the corresponding
amdgpu_dm_connector objects are freed, creating dangling pointers in the
HDCP code. When the dock is plugged back, the dangling pointers are
dereferenced, resulting in a slab-use-after-free:

[   66.775837] BUG: KASAN: slab-use-after-free in event_property_validate+0x42f/0x6c0 [amdgpu]
[   66.776171] Read of size 4 at addr ffff888127804120 by task kworker/0:1/10

[   66.776179] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.14.0-rc7-00180-g54505f727a38-dirty #233
[   66.776183] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
[   66.776186] Workqueue: events event_property_validate [amdgpu]
[   66.776494] Call Trace:
[   66.776496]  <TASK>
[   66.776497]  dump_stack_lvl+0x70/0xa0
[   66.776504]  print_report+0x175/0x555
[   66.776507]  ? __virt_addr_valid+0x243/0x450
[   66.776510]  ? kasan_complete_mode_report_info+0x66/0x1c0
[   66.776515]  kasan_report+0xeb/0x1c0
[   66.776518]  ? event_property_validate+0x42f/0x6c0 [amdgpu]
[   66.776819]  ? event_property_validate+0x42f/0x6c0 [amdgpu]
[   66.777121]  __asan_report_load4_noabort+0x14/0x20
[   66.777124]  event_property_validate+0x42f/0x6c0 [amdgpu]
[   66.777342]  ? __lock_acquire+0x6b40/0x6b40
[   66.777347]  ? enable_assr+0x250/0x250 [amdgpu]
[   66.777571]  process_one_work+0x86b/0x1510
[   66.777575]  ? pwq_dec_nr_in_flight+0xcf0/0xcf0
[   66.777578]  ? assign_work+0x16b/0x280
[   66.777580]  ? lock_is_held_type+0xa3/0x130
[   66.777583]  worker_thread+0x5c0/0xfa0
[   66.777587]  ? process_one_work+0x1510/0x1510
[   66.777588]  kthread+0x3a2/0x840
[   66.777591]  ? kthread_is_per_cpu+0xd0/0xd0
[   66.777594]  ? trace_hardirqs_on+0x4f/0x60
[   66.777597]  ? _raw_spin_unlock_irq+0x27/0x60
[   66.777599]  ? calculate_sigpending+0x77/0xa0
[   66.777602]  ? kthread_is_per_cpu+0xd0/0xd0
[   66.777605]  ret_from_fork+0x40/0x90
[   66.777607]  ? kthread_is_per_cpu+0xd0/0xd0
[   66.777609]  ret_from_fork_asm+0x11/0x20
[   66.777614]  </TASK>

[   66.777643] Allocated by task 10:
[   66.777646]  kasan_save_stack+0x39/0x60
[   66.777649]  kasan_save_track+0x14/0x40
[   66.777652]  kasan_save_alloc_info+0x37/0x50
[   66.777655]  __kasan_kmalloc+0xbb/0xc0
[   66.777658]  __kmalloc_cache_noprof+0x1c8/0x4b0
[   66.777661]  dm_dp_add_mst_connector+0xdd/0x5c0 [amdgpu]
[   66.777880]  drm_dp_mst_port_add_connector+0x47e/0x770 [drm_display_helper]
[   66.777892]  drm_dp_send_link_address+0x1554/0x2bf0 [drm_display_helper]
[   66.777901]  drm_dp_check_and_send_link_address+0x187/0x1f0 [drm_display_helper]
[   66.777909]  drm_dp_mst_link_probe_work+0x2b8/0x410 [drm_display_helper]
[   66.777917]  process_one_work+0x86b/0x1510
[   66.777919]  worker_thread+0x5c0/0xfa0
[   66.777922]  kthread+0x3a2/0x840
[   66.777925]  ret_from_fork+0x40/0x90
[   66.777927]  ret_from_fork_asm+0x11/0x20

[   66.777932] Freed by task 1713:
[   66.777935]  kasan_save_stack+0x39/0x60
[   66.777938]  kasan_save_track+0x14/0x40
[   66.777940]  kasan_save_free_info+0x3b/0x60
[   66.777944]  __kasan_slab_free+0x52/0x70
[   66.777946]  kfree+0x13f/0x4b0
[   66.777949]  dm_dp_mst_connector_destroy+0xfa/0x150 [amdgpu]
[   66.778179]  drm_connector_free+0x7d/0xb0
[   66.778184]  drm_mode_object_put.part.0+0xee/0x160
[   66.778188]  drm_mode_object_put+0x37/0x50
[   66.778191]  drm_atomic_state_default_clear+0x220/0xd60
[   66.778194]  __drm_atomic_state_free+0x16e/0x2a0
[   66.778197]  drm_mode_atomic_ioctl+0x15ed/0x2ba0
[   66.778200]  drm_ioctl_kernel+0x17a/0x310
[   66.778203]  drm_ioctl+0x584/0xd10
[   66.778206]  amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu]
[   66.778375]  __x64_sys_ioctl+0x139/0x1a0
[   66.778378]  x64_sys_call+0xee7/0xfb0
[   66.778381]  do_syscall_64+0x87/0x140
[   66.778385]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fix this by properly incrementing and decrementing the reference counts
when making and deleting copies of the amdgpu_dm_connector pointers.

(Mario: rebase on current code and update fixes tag)

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4006
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Fixes: da3fd7ac0b ("drm/amd/display: Update CP property based on HW query")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Link: https://lore.kernel.org/r/20250417215005.37964-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:03:08 -04:00
Lijo Lazar
75f138db48 drm/amdgpu: Disallow partition query during reset
Reject queries to get current partition modes during reset. Also, don't
accept sysfs interface requests to switch compute partition mode while
in reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:03:02 -04:00
Srinivasan Shanmugam
af0819755c drm/amd/display: Fix NULL pointer dereferences in dm_update_crtc_state() v2
Added checks for NULL values after retrieving drm_new_conn_state
to prevent dereferencing NULL pointers.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:10751 dm_update_crtc_state()
	warn: 'drm_new_conn_state' can also be NULL

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c
    10672 static int dm_update_crtc_state(struct amdgpu_display_manager *dm,
    10673                          struct drm_atomic_state *state,
    10674                          struct drm_crtc *crtc,
    10675                          struct drm_crtc_state *old_crtc_state,
    10676                          struct drm_crtc_state *new_crtc_state,
    10677                          bool enable,
    10678                          bool *lock_and_validation_needed)
    10679 {
    10680         struct dm_atomic_state *dm_state = NULL;
    10681         struct dm_crtc_state *dm_old_crtc_state, *dm_new_crtc_state;
    10682         struct dc_stream_state *new_stream;
    10683         int ret = 0;
    10684
    ...
    10703
    10704         /* TODO This hack should go away */
    10705         if (connector && enable) {
    10706                 /* Make sure fake sink is created in plug-in scenario */
    10707                 drm_new_conn_state = drm_atomic_get_new_connector_state(state,
    10708                                                                         connector);

drm_atomic_get_new_connector_state() can't return error pointers, only NULL.

    10709                 drm_old_conn_state = drm_atomic_get_old_connector_state(state,
    10710                                                                         connector);
    10711
    10712                 if (IS_ERR(drm_new_conn_state)) {
                                     ^^^^^^^^^^^^^^^^^^

    10713                         ret = PTR_ERR_OR_ZERO(drm_new_conn_state);

Calling PTR_ERR_OR_ZERO() doesn't make sense.  It can't be success.

    10714                         goto fail;
    10715                 }
    10716
    ...
    10748
    10749                 dm_new_crtc_state->abm_level = dm_new_conn_state->abm_level;
    10750
--> 10751                 ret = fill_hdr_info_packet(drm_new_conn_state,
                                                     ^^^^^^^^^^^^^^^^^^ Unchecked dereference

    10752                                            &new_stream->hdr_static_metadata);
    10753                 if (ret)
    10754                         goto fail;
    10755

v2: Modified the NULL pointer check for drm_new_conn_state in the
    dm_update_crtc_state function to  include a warning via WARN_ON and
    return -EINVAL to indicate an invalid state when the pointer is NULL.

Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:02:52 -04:00
Gergo Koteles
b316727a27 drm/amd/display: do not copy invalid CRTC timing info
Since b255ce4388, it is possible that the CRTC timing
information for the preferred mode has not yet been
calculated while amdgpu_dm_connector_mode_valid() is running.

In this case use the CRTC timing information of the actual mode.

Fixes: b255ce4388 ("drm/amdgpu: don't change mode in amdgpu_dm_connector_mode_valid()")
Closes: https://lore.kernel.org/all/ed09edb167e74167a694f4854102a3de6d2f1433.camel@irl.hu/
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4085
Signed-off-by: Gergo Koteles <soyer@irl.hu>
Reviewed-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 20232192a5)
Cc: stable@vger.kernel.org
2025-04-22 16:51:17 -04:00
Leo Li
6ed0dc3fd3 drm/amd/display: Default IPS to RCG_IN_ACTIVE_IPS2_IN_OFF
[Why]

Recent findings show negligible power savings between IPS2 and RCG
during static desktop. In fact, DCN related clocks are higher
when IPS2 is enabled vs RCG.

RCG_IN_ACTIVE is also the default policy for another OS supported by
DC, and it has faster entry/exit.

[How]

Remove previous logic that checked for IPS2 support, and just default
to `DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF`.

Fixes: 199888aa25 ("drm/amd/display: Update IPS default mode for DCN35/DCN351")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8f772d79ef)
Cc: stable@vger.kernel.org
2025-04-22 16:50:45 -04:00
George Shen
d59bddce49 drm/amd/display: Use 16ms AUX read interval for LTTPR with old sinks
[Why/How]
LTTPR are required to program DPCD 0000Eh to 0x4 (16ms) upon AUX read
reply to this register. Since old Sinks witih DPCD rev 1.1 and earlier
may not support this register, assume the mandatory value is programmed
by the LTTPR to avoid AUX timeout issues.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1594b60d74)
2025-04-22 16:49:15 -04:00
Mario Limonciello
870bea21fd drm/amd/display: Fix ACPI edid parsing on some Lenovo systems
[Why]
The ACPI EDID in the BIOS of a Lenovo laptop includes 3 blocks, but
dm_helpers_probe_acpi_edid() has a start that is 'char'.  The 3rd
block index starts after 255, so it can't be indexed properly.
This leads to problems with the display when the EDID is parsed.

[How]
Change the variable type to 'short' so that larger values can be indexed.

Cc: Renjith Pananchikkal <renjith.pananchikkal@amd.com>
Reported-by: Mark Pearson <mpearson@lenovo.com>
Suggested-by: David Ober <dober@lenovo.com>
Fixes: c6a837088b ("drm/amd/display: Fetch the EDID from _DDC if available for eDP")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a918bb4a90)
Cc: stable@vger.kernel.org
2025-04-22 16:48:44 -04:00
Felix Kuehling
a92741e72f drm/amdgpu: Allow P2P access through XGMI
If peer memory is accessible through XGMI, allow leaving it in VRAM
rather than forcing its migration to GTT on DMABuf attachment.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Hao (Claire) Zhou <hao.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 372c8d72c3)
2025-04-22 16:47:12 -04:00
Nicholas Susanto
756c85e4d0 drm/amd/display: Enable urgent latency adjustment on DCN35
[Why]

Urgent latency adjustment was disabled on DCN35 due to issues with P0
enablement on some platforms. Without urgent latency, underflows occur
when doing certain high timing configurations. After testing, we found
that reenabling urgent latency didn't reintroduce p0 support on multiple
platforms.

[How]

renable urgent latency on DCN35 and setting it to 3000 Mhz.

This reverts commit 3412860cc4.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Nicholas Susanto <nsusanto@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cd74ce1f0c)
2025-04-22 16:46:34 -04:00
Roman Li
67fe574651 drm/amd/display: Force full update in gpu reset
[Why]
While system undergoing gpu reset always do full update
to sync the dc state before and after reset.

[How]
Return true in should_reset_plane() if gpu reset detected

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2ba8619b9a)
Cc: stable@vger.kernel.org
2025-04-22 16:45:50 -04:00
Roman Li
7eb287beeb drm/amd/display: Fix gpu reset in multidisplay config
[Why]
The indexing of stream_status in dm_gpureset_commit_state() is incorrect.
That leads to asserts in multi-display configuration after gpu reset.

[How]
Adjust the indexing logic to align stream_status with surface_updates.

Fixes: cdaae8371a ("drm/amd/display: Handle GPU reset for DC block")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3808
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d91bc90139)
Cc: stable@vger.kernel.org
2025-04-22 16:45:10 -04:00
Felix Kuehling
5e56935b51 drm/amdgpu: Don't pin VRAM without DMABUF_MOVE_NOTIFY
Pinning of VRAM is for peer devices that don't support dynamic attachment
and move notifiers. But it requires that all such peer devices are able to
access VRAM via PCIe P2P. Any device without P2P access requires migration
to GTT, which fails if the memory is already pinned for another peer
device.

Sharing between GPUs should not require pinning in VRAM. However, if
DMABUF_MOVE_NOTIFY is disabled in the kernel build, even DMABufs shared
between GPUs must be pinned, which can lead to failures and functional
regressions on systems where some peer GPUs are not P2P accessible.

Disable VRAM pinning if move notifiers are disabled in the kernel build
to fix regressions when sharing BOs between GPUs.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Hao (Claire) Zhou <hao.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 05185812ae)
2025-04-22 16:44:28 -04:00
Felix Kuehling
5cf3c602df drm/amdgpu: Use allowed_domains for pinning dmabufs
When determining the domains for pinning DMABufs, filter allowed_domains
and fail with a warning if VRAM is forbidden and GTT is not an allowed
domain.

Fixes: f5e7fabd1f ("drm/amdgpu: allow pinning DMA-bufs into VRAM if all importers can do P2P")
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3940796a6e)
2025-04-22 16:44:02 -04:00
Sunil Khatri
127e612bf1 drm/amdgpu: update fence ptr with context:seqno
log context:seqno of the fence during timeout rather
than logging fence pointer.

Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Arvind Yadav
cade59abaa drm/amdgpu/gfx12: Add fw minimum version check for usermode queue
This patch is load usermode queue based on FW support for gfx12.
CP Ucode FW Vesion: [PFP = 2840, ME = 2780, MEC = 3050, MES = 123]

v2: Addressed review comments from Alex
   - Just check the firmware versions directly.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Arvind Yadav
61ca97e959 drm/amdgpu/gfx11: Add fw minimum version check for usermode queue
This patch is load usermode queue based on FW support for gfx11.
CP Ucode FW version: [PFP = 2530, ME = 2390, MEC = 2600, MES = 120]

v2: Addressed review comments from Alex.
    - Just check the firmware versions directly.
v3: Firmware version checks only for Navi3x(by Alex).

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Srinivasan Shanmugam
3f397cd203 drm/amd/display: Add NULL pointer checks in dm_force_atomic_commit()
This commit updates the dm_force_atomic_commit function to replace the
usage of PTR_ERR_OR_ZERO with IS_ERR for checking error states after
retrieving the Connector (drm_atomic_get_connector_state), CRTC
(drm_atomic_get_crtc_state), and Plane (drm_atomic_get_plane_state)
states.

The function utilized PTR_ERR_OR_ZERO for error checking. However, this
approach is inappropriate in this context because the respective
functions do not return NULL; they return pointers that encode errors.

This change ensures that error pointers are properly checked using
IS_ERR before attempting to dereference.

Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Alex Deucher
42a6667780 drm/amdgpu/userq: use consistent function naming
s/userqueue/userq/

1. remove the mix of amdgpu_userqueue and amdgpu_userq
2. to be consistent with other amdgpu_userq_fence.c
3. it's shorter

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Alex Deucher
4fdbe3a623 drm/amdgpu/userq: rename eviction helpers
suspend/resume -> evict/restore

Rename to avoid confusion with the system suspend
and resume helpers.

v2: update error messages

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Alex Deucher
d13e95967e drm/amdgpu/userq: move waiting for last fence before umap
Need to wait for the last fence before unmapping.  This
also fixes a memory leak in amdgpu_userqueue_cleanup()
when the fence isn't signalled.

Fixes: b0db33c8c5 ("drm/amdgpu/userq: rework front end call sequence")
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Alex Deucher
36b0bc1731 drm/amdgpu/userq: unmap queues amdgpu_userq_mgr_fini()
This was missed when the map and unmap were split out
of the mqd create and destroy functions.

Fixes: b0db33c8c5 ("drm/amdgpu/userq: rework front end call sequence")
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Alex Deucher
e67b95f0cd drm/amdgpu: switch from queue_active to queue state
Track the state of the queue rather than simple active vs
not.  This is needed for other states (hung, preempted, etc.).
While we are at it, move the state tracking into the user
queue front end code.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Alex Deucher
ba324ffb25 drm/amdgpu/userq: optimize enforce isolation and s/r
If user queues are disabled for all IPs in the case
of suspend and resume and for gfx/compute in the case
of enforce isolation, we can return early.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Dr. David Alan Gilbert
d61724056a drm/amd/display: Remove unused *vbios_smu_set_dprefclk
rn_vbios_smu_set_dprefclk() was added in 2019 by
commit 4edb6fc918 ("drm/amd/display: Add Renoir clock manager")
rv1_vbios_smu_set_dprefclk() was also added in 2019 by
commit dc88b4a684 ("drm/amd/display: make clk mgr soc specific")

neither have been used.

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Xiang Liu
696e8fa354 drm/amdgpu: Print kernel message when error logged by scrub
Print a kernel message when the scrub bit of status register is set to
indicate that errors are being logged by the scrub.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Gergo Koteles
20232192a5 drm/amd/display: do not copy invalid CRTC timing info
Since b255ce4388, it is possible that the CRTC timing
information for the preferred mode has not yet been
calculated while amdgpu_dm_connector_mode_valid() is running.

In this case use the CRTC timing information of the actual mode.

Fixes: b255ce4388 ("drm/amdgpu: don't change mode in amdgpu_dm_connector_mode_valid()")
Closes: https://lore.kernel.org/all/ed09edb167e74167a694f4854102a3de6d2f1433.camel@irl.hu/
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4085
Signed-off-by: Gergo Koteles <soyer@irl.hu>
Reviewed-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
TungYu Lu
33bc89949b drm/amd/display: Correct prefetch calculation
[Why]
The minimum value of the dst_y_prefetch_equ was not correct
in prefetch calculation whice causes OPTC underflow.

[How]
Add the min operation of dst_y_prefetch_equ in prefetch calculation
for legacy DML.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: TungYu Lu <tungyu.lu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Dillon Varone
19e743f0fb drm/amd/display: Refactor SubVP cursor limiting logic
[WHY]
There are several gaps that can result in SubVP being enabled with
incompatible HW cursor sizes, and unjust restrictions to cursor size due
to wrong predictions on future usage of SubVP

[HOW]
- remove "prediction" logic in favor of tagging based on previous SubVP
  usage
- block SubVP if current HW cursor settings are incompatible
- provide interface for DM to determine if HW cursor should be disabled
  due to an attempt to enable SubVP

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Alex Deucher
11772eb73b drm/amdgpu/userq: add a helper to check which IPs are enabled
Add a helper to get a mask of IPs which support user queues.
Use this in the INFO IOCTL to get the IP mask to replace
the current code.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Arunpravin Paneer Selvam
4b27406380 drm/amdgpu: Add queue id support to the user queue wait IOCTL
Add queue id support to the user queue wait IOCTL
drm_amdgpu_userq_wait structure.

This is required to retrieve the wait user queue and maintain
the fence driver references in it so that the user queue in
the same context releases their reference to the fence drivers
at some point before queue destruction.

Otherwise, we would gather those references until we
don't have any more space left and crash.

v2: Modify the UAPI comment as per the mesa and libdrm UAPI comment.

Libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/408
Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34493

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Alex Deucher
4ec2141d23 drm/amdgpu/userq: enable support for secure queues
Enable users to create secure GFX/compute queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Tested-by: Jesse.Zhang <Jesse.zhang@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Alex Deucher
87ceff6136 drm/amdgpu/userq/mes: pass the secure flag to mqd init
So that we initialize the MQD as a secure queue.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Meenakshikumar Somasundaram
8aaeb25327 drm/amd/display: Fix pixel rate divider policy for 1 pixel per cycle config
[Why]
Pixel rate dividor was not programmed correctly for 1 pixel per cycle
configuration for empty tu case.

[How]
Included check for empty tu when pixel rate dividor values were selected.

Reviewed-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Leo Li
8f772d79ef drm/amd/display: Default IPS to RCG_IN_ACTIVE_IPS2_IN_OFF
[Why]

Recent findings show negligible power savings between IPS2 and RCG
during static desktop. In fact, DCN related clocks are higher
when IPS2 is enabled vs RCG.

RCG_IN_ACTIVE is also the default policy for another OS supported by
DC, and it has faster entry/exit.

[How]

Remove previous logic that checked for IPS2 support, and just default
to `DMUB_IPS_RCG_IN_ACTIVE_IPS2_IN_OFF`.

Fixes: 199888aa25 ("drm/amd/display: Update IPS default mode for DCN35/DCN351")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Charlene Liu
8e40dd9320 drm/amd/display: Revert "not disable dtb as dto src at dpms off"
[why]
not all the asic using the same code path.
need to revisit and limit the impact.

This reverts commit 32be4e39f4.

Reviewed-by: Gabe Teeger <gabe.teeger@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
George Shen
1594b60d74 drm/amd/display: Use 16ms AUX read interval for LTTPR with old sinks
[Why/How]
LTTPR are required to program DPCD 0000Eh to 0x4 (16ms) upon AUX read
reply to this register. Since old Sinks witih DPCD rev 1.1 and earlier
may not support this register, assume the mandatory value is programmed
by the LTTPR to avoid AUX timeout issues.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Mario Limonciello
a918bb4a90 drm/amd/display: Fix ACPI edid parsing on some Lenovo systems
[Why]
The ACPI EDID in the BIOS of a Lenovo laptop includes 3 blocks, but
dm_helpers_probe_acpi_edid() has a start that is 'char'.  The 3rd
block index starts after 255, so it can't be indexed properly.
This leads to problems with the display when the EDID is parsed.

[How]
Change the variable type to 'short' so that larger values can be indexed.

Cc: Renjith Pananchikkal <renjith.pananchikkal@amd.com>
Reported-by: Mark Pearson <mpearson@lenovo.com>
Suggested-by: David Ober <dober@lenovo.com>
Fixes: c6a837088b ("drm/amd/display: Fetch the EDID from _DDC if available for eDP")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Taimur Hassan
ce2e117bfb drm/amd/display: Promote DC to 3.2.329
Summary:

* Implement HDMI Read request
* RMCM and MCM 3DLUT support
* Enable urgent latency adjustment on DCN35
* Enable phy-ssc reduction by default

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:35 -04:00
Felix Kuehling
372c8d72c3 drm/amdgpu: Allow P2P access through XGMI
If peer memory is accessible through XGMI, allow leaving it in VRAM
rather than forcing its migration to GTT on DMABuf attachment.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Hao (Claire) Zhou <hao.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:28:36 -04:00
Roman Li
e15d09f510 drm/amd/display: enable phy-ssc reduction by default
[Why]
Reduction of phy-ssc is needed to support DP2 high pixel clock on dcn35x/36.
There's a special flag to enable it in dmub hw params.

[How]
Set hbr3_phy_ssc to true for dcn35, dcn351 and dcn36.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:28:30 -04:00
Nicholas Susanto
cd74ce1f0c drm/amd/display: Enable urgent latency adjustment on DCN35
[Why]

Urgent latency adjustment was disabled on DCN35 due to issues with P0
enablement on some platforms. Without urgent latency, underflows occur
when doing certain high timing configurations. After testing, we found
that reenabling urgent latency didn't reintroduce p0 support on multiple
platforms.

[How]

renable urgent latency on DCN35 and setting it to 3000 Mhz.

This reverts commit 3412860cc4.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Nicholas Susanto <nsusanto@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:28:09 -04:00
Yihan Zhu
652968d996 drm/amd/display: DCN42 RMCM and MCM 3DLUT support
[WHY & HOW]
Providing hardware programming for the RMCM and MCM IPs for 3DLUT in DCN42.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:28:05 -04:00
Yihan Zhu
c9646e5a7e drm/amd/display: DCN32 null data check
[WHY & HOW]
Avoid null curve data structure used in the cm block for the potential issue.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:28:01 -04:00
Roman Li
2ba8619b9a drm/amd/display: Force full update in gpu reset
[Why]
While system undergoing gpu reset always do full update
to sync the dc state before and after reset.

[How]
Return true in should_reset_plane() if gpu reset detected

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:27:56 -04:00
Roman Li
d91bc90139 drm/amd/display: Fix gpu reset in multidisplay config
[Why]
The indexing of stream_status in dm_gpureset_commit_state() is incorrect.
That leads to asserts in multi-display configuration after gpu reset.

[How]
Adjust the indexing logic to align stream_status with surface_updates.

Fixes: cdaae8371a ("drm/amd/display: Handle GPU reset for DC block")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3808
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:27:41 -04:00
Austin Zheng
8fc3959cd4 drm/amd/display: Move Mode Support Prefetch Checks To Its Own Function
[Why]
Large stack size observed in DCN4 mode support when compiling with clang.
Additional instrumentation added by compiler adds to stack size.
dml_core_mode_support ends up going over the stack size limit
due to the size of the function.

[How]
Move checks and calculations for prefetch to its own function.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:27:39 -04:00
Felix Kuehling
05185812ae drm/amdgpu: Don't pin VRAM without DMABUF_MOVE_NOTIFY
Pinning of VRAM is for peer devices that don't support dynamic attachment
and move notifiers. But it requires that all such peer devices are able to
access VRAM via PCIe P2P. Any device without P2P access requires migration
to GTT, which fails if the memory is already pinned for another peer
device.

Sharing between GPUs should not require pinning in VRAM. However, if
DMABUF_MOVE_NOTIFY is disabled in the kernel build, even DMABufs shared
between GPUs must be pinned, which can lead to failures and functional
regressions on systems where some peer GPUs are not P2P accessible.

Disable VRAM pinning if move notifiers are disabled in the kernel build
to fix regressions when sharing BOs between GPUs.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Hao (Claire) Zhou <hao.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:27:34 -04:00
Jack Chang
6df7175263 drm/amd/display: Move desync error counter operation up.
[Why & How]
Move desync error counter operation up to prevent
it from being skipped by force disable desync
error.

Reviewed-by: Robin Chen <robin.chen@amd.com>
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Jack Chang <jack.chang@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:27:32 -04:00
Mario Limonciello
7e40f64896 drm/amd/display: Avoid divide by zero by initializing dummy pitch to 1
[Why]
If the dummy values in `populate_dummy_dml_surface_cfg()` aren't updated
then they can lead to a divide by zero in downstream callers like
CalculateVMAndRowBytes()

[How]
Initialize dummy value to a value to avoid divide by zero.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:27:29 -04:00
Chris Park
724a4b400b drm/amd/display: Implement HDMI Read Request
[Why]
Read Request provides alterative method to polling to
the HDMI sinks that support it.

[How]
Implement Read Request where interrupt can be generated
by the sink.

Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Chris Park <chris.park@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:27:25 -04:00
Yiling Chen
f53d0f48a8 drm/amd/display: To apply the adjusted DP ref clock for DP devices
[Why]
For some pixel clock margin sensitive external monitor,
we could not keep original DP ref clock for the ASICs
supported SSC DP ref clock.

[How]
From slicon design team's comment,
we have to apply the adjusted DP ref clock for
DP devices.
DP 128b (DP2) signals uses the DTBCLK not DP ref.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Yiling Chen <yi-ling.chen2@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:26:59 -04:00
Alex Deucher
eec6444923 drm/amdgpu/gfx12: add support for TMZ queues to mqd_init
Set up TMZ for queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:00:43 -04:00
Alex Deucher
9486875408 drm/amdgpu/gfx11: add support for TMZ queues to mqd_init
Set up TMZ for queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 11:00:38 -04:00
Felix Kuehling
3940796a6e drm/amdgpu: Use allowed_domains for pinning dmabufs
When determining the domains for pinning DMABufs, filter allowed_domains
and fail with a warning if VRAM is forbidden and GTT is not an allowed
domain.

Fixes: f5e7fabd1f ("drm/amdgpu: allow pinning DMA-bufs into VRAM if all importers can do P2P")
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:59:05 -04:00
Alex Deucher
cb808ab833 drm/amdgpu: add tmz queue parameter to mqd props
Use this to track the whether we want TMZ for queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:59:01 -04:00
Srinivasan Shanmugam
d30f610762 drm/amdgpu: Refine Cleaner Shader MEC firmware version for GFX10.1.x GPUs
Update the minimum firmware version for the Cleaner Shader in the
gfx_v10_0_sw_init function.

This change adjusts the minimum required firmware version for the MEC
firmware from 152 to 151, allowing for broader compatibility with
GFX10.1 GPUs.

Fixes: 25961bad92 ("drm/amdgpu/gfx10: Add cleaner shader for GFX10.1.10")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:57:26 -04:00
Jesse.zhang@amd.com
2200b41428 drm/amdgpu:remove old sdma reset callback mechanism
This patch removes the deprecated SDMA reset callback mechanism, which was previously used to register pre-reset and post-reset callbacks for SDMA engine resets.
 The callback mechanism has been replaced with a more direct and efficient approach using `stop_queue` and `start_queue` functions in the ring's function table.

The SDMA reset callback mechanism allowed KFD and AMDGPU to register pre-reset and post-reset functions for handling SDMA engine resets.
However, this approach added unnecessary complexity and was no longer needed after the introduction of the `stop_queue` and `start_queue` functions in the ring's function table.

1. **Remove Callback Mechanism**:
   - Removed the `amdgpu_sdma_register_on_reset_callbacks` function and its associated data structures (`sdma_on_reset_funcs`).
   - Removed the callback registration logic from the SDMA v4.4.2 initialization code.

2. **Clean Up Related Code**:
   - Removed the `sdma_v4_4_2_set_engine_reset_funcs` function, which was used to register the callbacks.
   - Removed the `sdma_v4_4_2_engine_reset_funcs` structure, which contained the pre-reset and post-reset callback functions.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:57:22 -04:00
Sunil Khatri
9018c7fe68 drm/amdgpu/userq: add context and seqno of the fence
Add context and seqno of the fence in error logging
rather than printing fence ptr.

Reviewed-by: Christian König <christian.koenig@amd.com>
Suggested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:57:09 -04:00
Jesse.zhang@amd.com
6a07ac702f drm/amdgpu: optimize queue reset and stop logic for sdma_v5_2
This patch refactors the SDMA v5.2 queue reset and stop logic to improve
code readability, maintainability, and performance. The key changes include:

1. **Generalized `sdma_v5_2_gfx_stop` Function**:
   - Added an `inst_mask` parameter to allow stopping specific SDMA instances
     instead of all instances. This is useful for resetting individual queues.

2. **Simplified `sdma_v5_2_reset_queue` Function**:
   - Removed redundant loops and checks by directly using the `ring->me` field
     to identify the SDMA instance.
   - Reused the `sdma_v5_2_gfx_stop` function to stop the queue, reducing code
     duplication.

v1: The general coding style is to declare variables like "i" or "r" last. E.g. longest lines first and short lasts. (Chritian)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:57:07 -04:00
Jesse.zhang@amd.com
574f4b5562 drm/amdgpu: optimize queue reset and stop logic for sdma_v5_0
This patch refactors the SDMA v5.0 queue reset and stop logic to improve
code readability, maintainability, and performance. The key changes include:

1. **Generalized `sdma_v5_0_gfx_stop` Function**:
   - Added an `inst_mask` parameter to allow stopping specific SDMA instances
     instead of all instances. This is useful for resetting individual queues.

2. **Simplified `sdma_v5_0_reset_queue` Function**:
   - Removed redundant loops and checks by directly using the `ring->me` field
     to identify the SDMA instance.
   - Reused the `sdma_v5_0_gfx_stop` function to stop the queue, reducing code
     duplication.

v1: The general coding style is to declare variables like "i" or "r" last. E.g. longest lines first and short lasts. (Chritian)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:57:04 -04:00
Jesse.zhang@amd.com
47454f2dc0 drm/amdgpu: Register the new sdma function pointers for sdma_v5_2
Register stop/start/soft_reset queue functions for SDMA IP versions v5.2.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:57:01 -04:00
Jesse.zhang@amd.com
e56d4bf57f drm/amdgpu/: drm/amdgpu: Register the new sdma function pointers for sdma_v5_0
Register stop/start/soft_reset queue functions for SDMA IP versions v5.0.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:58 -04:00
Jesse.zhang@amd.com
5c3e7c4953 drm/amdgpu: Implement SDMA soft reset directly for v5.x
This patch introduces a new function `amdgpu_sdma_soft_reset` to handle SDMA soft resets directly,
rather than relying on the DPM interface.

1. **New `amdgpu_sdma_soft_reset` Function**:
   - Implements a soft reset for SDMA engines by directly writing to the hardware registers.
   - Handles SDMA versions 4.x and 5.x separately:
     - For SDMA 4.x, the existing `amdgpu_dpm_reset_sdma` function is used for backward compatibility.
     - For SDMA 5.x, the driver directly manipulates the `GRBM_SOFT_RESET` register to reset the specified SDMA instance.

2. **Integration into `amdgpu_sdma_reset_engine`**:
   - The `amdgpu_sdma_soft_reset` function is called during the SDMA reset process, replacing the previous call to `amdgpu_dpm_reset_sdma`.

v2: r should default to an error (Alex)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:54 -04:00
Jesse.zhang@amd.com
b22659d5d3 drm/amdgpu: switch amdgpu_sdma_reset_engine to use the new sdma function pointers
Replace old callback mechanism with direct calls to stop/start functions.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:49 -04:00
Alex Deucher
a5c34299d8 drm/amdgpu/userq: enable support for queue priorities
Enable users to create queues at different priority levels.
The highest level is restricted to drm master.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:46 -04:00
Alex Deucher
23a650bb9f drm/amdgpu/userq/mes: handle user queue priority
Handle the queue priority set by the user.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:42 -04:00
Alex Deucher
9546c05628 drm/amdgpu/userq: add priorty to user queue structure
So we can track this when we create user queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:39 -04:00
Alex Deucher
a83be6e479 drm/amdgpu/mes12: add conversion for priority levels
Convert driver priority levels to MES11 priority levels.
At the moment they are the same, but they may not always
be.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:36 -04:00
Alex Deucher
3d0a402e7c drm/amdgpu/mes11: add conversion for priority levels
Convert driver priority levels to MES11 priority levels.
At the moment they are the same, but they may not always
be.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:33 -04:00
Alex Deucher
fced8e7d2d drm/amdgpu: convert userq UAPI _pad to flags
Reuse the _pad field for flags.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:18 -04:00
Wentao Liang
6027cbee19 drm/amd/display: Add error check for avi and vendor infoframe setup function
The function fill_stream_properties_from_drm_display_mode() calls the
function drm_hdmi_avi_infoframe_from_display_mode() and the
function drm_hdmi_vendor_infoframe_from_display_mode(), but does
not check its return value. Log the error messages to prevent silent
failure if either function fails.

Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:13 -04:00
Alex Deucher
8f23a97907 drm/amdgpu/userq: integrate with enforce isolation
Enforce isolation serializes access to the GFX IP.  User
queues are isolated in the MES scheduler, but we still
need to serialize between kernel queues and user queues.
For enforce isolation, group KGD user queues with KFD user
queues.

v2: split out variable renaming, add config guards
v3: use new function names

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:11 -04:00
Alex Deucher
28fc3172e4 drm/amdgpu: rename enforce isolation variables
Since they will be used for both KFD and KGD user queues,
rename them from kfd to userq.  No intended functional
change.

Acked-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:06 -04:00
Alex Deucher
94976e7e5e drm/amdgpu/userq: add helpers to start/stop scheduling
This will be used to stop/start user queue scheduling for
example when switching between kernel and user queues when
enforce isolation is enabled.

v2: use idx
v3: only stop compute/gfx queues

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:59 -04:00
Alex Deucher
56a0a80af0 drm/amdgpu/userq: track the xcp_id associated with the queue
Track this to align with KFD for enforce isolation
handling.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:55 -04:00
Emily Deng
5ae4591f4e drm/amdgpu: Clear overflow for SRIOV
For VF, it doesn't have the permission to clear overflow, clear the bit
by reset.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:51 -04:00
Alex Deucher
fb20954c97 drm/amdgpu/userq: rework driver parameter
Replace disable_kq parameter with user_queue parameter.
The parameter has the following logic:
 -1 = auto (ASIC specific default)
  0 = user queues disabled
  1 = user queues enabled and kernel queues enabled (if supported)
  2 = user queues enabled and kernel queues disabled

The default behavior (-1) is currently the same as 0 for current
ASICs.  To enable user queues (in addition to kernel queues) set
user_queue=1. To enable user queues and disable kernel queues
(to make all resources available to user queues), set user_queue=2.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:47 -04:00
Asad Kamal
172494c4e9 drm/amd/pm: Enable host limit metrics support
Enable host limit metrics support for smuv_13_0_12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:43 -04:00
Alex Deucher
0ed032dc7d drm/amdgpu/sdma7: properly reference trap interrupts for userqs
We need to take a reference to the interrupts to make
sure they stay enabled even if the kernel queues have
disabled them.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:39 -04:00
Alex Deucher
1197cfb730 drm/amdgpu/sdma6: properly reference trap interrupts for userqs
We need to take a reference to the interrupts to make
sure they stay enabled even if the kernel queues have
disabled them.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:36 -04:00
Asad Kamal
2c8b0d628a drm/amd/pm: Enable host limit metrics support
Enable host limit metrics support for smuv_13_0_6

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:31 -04:00
Sathishkumar S
b574729ff0 drm/amdgpu: Enable doorbell for JPEG5_0_1
Enable doorbell for JPEG5_0_1 and adjust index for VCN5_0_1.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:25 -04:00
Shiwu Zhang
8ae634f10e drm/amdgpu: Update vcn doorbell range in NBIO 7.9
Increase vcn doorbell range for gfx950 to 11.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:10 -04:00
Alex Deucher
e10414cf2e drm/amdgpu/gfx12: properly reference EOP interrupts for userqs
Regardless of whether we disable kernel queues, we need
to take an extra reference to the pipe interrupts for
user queues to make sure they stay enabled in case we
disable them for kernel queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:55:04 -04:00
Alex Deucher
ac9984cee7 drm/amdgpu/gfx11: properly reference EOP interrupts for userqs
Regardless of whether we disable kernel queues, we need
to take an extra reference to the pipe interrupts for
user queues to make sure they stay enabled in case we
disable them for kernel queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:56 -04:00
Eric Huang
6b9d26089f drm/amdkfd: fix a bug of smi event for superuser
rocm-smi with superuser permission doesn't show some
of smi events, i.e. page fault/migration, because the
condition of "(events & all)" is false. Superuser
should be able to detect all events, the condiiton of
"(events & all)" seems redundant, so removing it will
fix the issue.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:44 -04:00
Alexandre Demers
00ec6732a9 drm/amdgpu: add missing DCE6 to dce_version_to_string()
Missing DCE 6.0 6.1 and 6.4 are identified as UNKNOWN. Fix this.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:41 -04:00
Alexandre Demers
85207abb40 drm/amdgpu: fix typo in bios_parser.c
Probably a cut and paste error from using get_integrated_info_v8's comment.
This has to be get_integrated_info_v9

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:36 -04:00
Alexandre Demers
f82e7cf5f5 drm/amdgpu: fix duplicated value setting in dce100_resource_construct()
i2c_speed_in_khz was set twice with the same values. Looking at other DCE
versions, we probably wanted to set the value for i2c_speed_in_khz_hdcp.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:30 -04:00
Alexandre Demers
3d5d0d35a7 drm/amdgpu: fix typo in atombios.h
"aligned" not "aligend"

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:24 -04:00
Alexandre Demers
66f6ea421a drm/amdgpu: add missing parameter name in dce110_clk_src_construct() declaration
While not needed per speaking, all the other parameters have names but
this one.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:22 -04:00
Alexandre Demers
34c86a0f44 drm/amdgpu: rename function to follow naming convention in dce110
The prefix dce110 is used on all functions, but init_pipes() and
init_hw(). Under DCN, these sames functions are prefixed.

Let's keep thing coherent.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:18 -04:00
Dan Carpenter
0e023c327b drm/amdgpu: Clean up error handling in amdgpu_userq_fence_driver_alloc()
1) Checkpatch complains if we print an error message for kzalloc()
   failure.  The kzalloc() failure already has it's own error messages
   built in.  Also this allocation is small enough that it is guaranteed
   to succeed.
2) Return directly instead of doing a goto free_fence_drv.  The
   "fence_drv" is already NULL so no cleanup is necessary.

Reviewed-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:54:10 -04:00
Dan Carpenter
8ff7c78bae drm/amdgpu: Fix double free in amdgpu_userq_fence_driver_alloc()
The goto frees "fence_drv" so this is a double free bug.  There is no
need to call amdgpu_seq64_free(adev, fence_drv->va) since the seq64
allocation failed so change the goto to goto free_fence_drv.  Also
propagate the error code from amdgpu_seq64_alloc() instead of hard coding
it to -ENOMEM.

Fixes: e7cf21fbb2 ("drm/amdgpu: Few optimization and fixes for userq fence driver")
Reviewed-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:52:53 -04:00
Alex Deucher
987718c559 drm/amdgpu/userq: move runpm handling into core userq code
Pull it out of the MES code and into the generic code.
It's not MES specific and needs to be applied to all user
queues regardless of the backend.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:52:49 -04:00
Eric Huang
9315860d05 drm/amdkfd: fix NULL check mistake for process smi event
The mistake will lead to NULL kernel oops, so fix it.

Fixes: 4172b556fd ("drm/amdkfd: add smi events for process start and end")
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:45 -04:00
Jesse.zhang@amd.com
ce1d40196d drm/amdgpu/sdma_v4: Register the new sdma function pointers
Register stop/start/soft_reset queue functions for sdma v4_4_2.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:42 -04:00
Jesse.zhang@amd.com
2989184215 drm/amdgpu: Add the new sdma function pointers for amdgpu_sdma.h
This patch introduces new function pointers in the amdgpu_sdma structure
to handle queue stop, start and soft reset operations. These will replace
the older callback mechanism.

The new functions are:
- stop_kernel_queue: Stops a specific SDMA queue
- start_kernel_queue: Starts/Restores a specific SDMA queue
- soft_reset_kernel_queue: Performs soft reset on a specific SDMA queue

v2: Update stop_queue/start_queue function paramters to use ring pointer instead of device/instance(Chritian)
v3: move stop_queue/start_queue to struct amdgpu_sdma_instance and rename them. (Alex)
v4: rework the ordering a bit (Alex)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:36 -04:00
Alex Deucher
94fc88f680 drm/amdgpu: don't swallow errors in amdgpu_userqueue_resume_all()
since we loop through the queues |= the errors.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:32 -04:00
Alex Deucher
c2c722217a drm/amdgpu/userq: handle system suspend and resume
Unmap user queues on suspend and map them on resume.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:29 -04:00
Alex Deucher
73e12e98ec drm/amdgpu/userq: add suspend and resume helpers
Add helpers to unmap and map user queues on suspend and
resume.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:26 -04:00
Alex Deucher
c0bbf64870 drm/amdgpu/userq: properly clean up userq fence driver on failure
If userq creation fails, we need to properly unwind and free the
user queue fence driver.

v2: free idr as well (Sunil)

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:22 -04:00
Alex Deucher
edc762a51c drm/amdgpu/userq: move some code around
Move some userq fence handling code into amdgpu_userq_fence.c.
This matches the other code in that file.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:20 -04:00
Alex Deucher
b0db33c8c5 drm/amdgpu/userq: rework front end call sequence
Split out the queue map from the mqd create call and split
out the queue unmap from the mqd destroy call.  This splits
the queue setup and teardown with the actual enablement
in the firmware.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:16 -04:00
Alex Deucher
51a9ea4551 drm/amdgpu/userq: rename suspend/resume callbacks
Rename to map and umap to better align with what is happening
at the firmware level and remove the extra level of indirection
in the MES userq code.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:49:10 -04:00
Alex Deucher
38feab2dea drm/amdgpu/userq/mes: remove unused header
This is unused so remove it.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:47:26 -04:00
Lijo Lazar
c235a71322 drm/amdgpu: Use the right function for hdp flush
There are a few prechecks made before HDP flush like a flush is not
required on APU bare metal. Using hdp callback directly bypasses those
checks. Use amdgpu_device_flush_hdp which takes care of prechecks.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1d9bff4cf8)
2025-04-16 15:57:46 -04:00
Alex Deucher
cd9e6d6fdd drm/amd/display/dml2: use vzalloc rather than kzalloc
The structures are large and they do not require contiguous
memory so use vzalloc.

Fixes: 70839da636 ("drm/amd/display: Add new DCN401 sources")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4126
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 20c50a9a79)
Cc: stable@vger.kernel.org
2025-04-16 15:55:37 -04:00
David Rosca
2036be3174 drm/amdgpu: Add back JPEG to video caps for carrizo and newer
JPEG is not supported on Vega only.

Fixes: 0a6e7b06bd ("drm/amdgpu: Remove JPEG from vega and carrizo video caps")
Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0f4dfe86fe)
Cc: stable@vger.kernel.org
2025-04-16 15:55:00 -04:00
ZhenGuo Yin
e7afa85a0d drm/amdgpu: fix warning of drm_mm_clean
Kernel doorbell BOs needs to be freed before ttm_fini.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4145
Fixes: 54c30d2a8d ("drm/amdgpu: create kernel doorbell pages")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 39938a8ed9)
Cc: stable@vger.kernel.org
2025-04-16 15:52:49 -04:00
Mario Limonciello
1657793def drm/amd: Forbid suspending into non-default suspend states
On systems that default to 'deep' some userspace software likes
to try to suspend in 'deep' first.  If there is a failure for any
reason (such as -ENOMEM) the failure is ignored and then it will
try to use 's2idle' as a fallback. This fails, but more importantly
it leads to graphical problems.

Forbid this behavior and only allow suspending in the last state
supported by the system.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4093
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250408180957.4027643-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2aabd44aa8)
2025-04-16 15:52:20 -04:00
Christian König
447fab3095 drm/amdgpu: use a dummy owner for sysfs triggered cleaner shaders v4
Otherwise triggering sysfs multiple times without other submissions in
between only runs the shader once.

v2: add some comment
v3: re-add missing cast
v4: squash in semicolon fix

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8b2ae7d492)
2025-04-16 15:51:47 -04:00
Dave Airlie
683058df13 Merge tag 'drm-misc-next-2025-04-09' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.16-rc1:

UAPI Changes:
- Add ASAHI uapi header!
- Add apple fourcc modifiers.
- Add capset virtio definitions to UAPI.
- Extend EXPORT_SYNC_FILE for timeline syncobjs.

Cross-subsystem Changes:
- Adjust DMA-BUF sg handling to not cache map on attach.
- Update drm/ci, hlcdc, virtio, maintainers.
- Update fbdev todo.
- Allow setting dma-device for dma-buf import.
- Export efi_mem_desc_lookup to make efidrm build as a module.

Core Changes:
- Update drm scheduler docs.
- Use the correct resv object in TTM delayed destroy.
- Fix compiler warning with panic qr code, and other small fixes.
- drm/ci updates.
- Add debugfs file for listing all bridges.
- Small fixes to drm/client, ttm tests.
- Add documentation to display/hdmi.
- Add kunit tests for bridges.
- Dont fail managed device probing if connector polling fails.
- Create Kconfig.debug for drm core.
- Add tests for the drm scheduler.
- Add and use new access helpers for DPCPD.
- Add generic and optimized conversions for format-helper.
- Begin refcounting panel for improving lifetime handling.
- Unify simpledrm and ofdrm sysfb, and add extra features.
- Split hdmi audio in bridge to make DP audio work.

Driver Changes:
- Convert drivers to use devm_platform_ioremap_resource().
- Assorted small fixes to imx/legacy-bridg, gma500, pl111, nouveau, vc4,
  vmwgfx, ast, mxsfb, xlnx, accel/qaic, v3d, bridge/imx8qxp-ldb, ofdrm,
  bridge/fsl-ldb, udl, bridge/ti-sn65dsi86, bridge/anx7625, cirrus-qemu,
  bridge/cdns-dsi, panel/sharp, panel/himax, bridge/sil902x, renesas,
  imagination, various panels.
- Allow attaching more display to vkms.
- Add Powertip PH128800T004-ZZA01 panel.
- Add rotation quirk for ZOTAC panel.
- Convert bridge/tc358775 to atomic.
- Remove deprecated panel calls from synaptics, novatek, samsung panels.
- Refactor shmem helper page pinning and accel drivers using it.
- Add dmabuf support to accel/amdxdna.
- Use 4k page table format for panfrost/mediatek.
- Add common powerup/down dp link helper and use it.
- Assorted compiler warning fixes.
- Support dma-buf import for renesas

Signed-off-by: Dave Airlie <airlied@redhat.com>

# Conflicts:
#	include/drm/drm_kunit_helpers.h
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/e147ff95-697b-4067-9e2e-7cbd424e162a@linux.intel.com
2025-04-14 15:29:49 +10:00
Shane Xiao
c3abed53ca drm/amdkfd: Add rec SDMA engines support with limited XGMI
This patch adds recommended SDMA engines with limited XGMI SDMA engines.
It will help improve overall performance for device to device copies
with this optimization.

v2: Update the formatting issues and data type

Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Suggested-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-13 09:56:32 -04:00
Srinivasan Shanmugam
083a0c8d17 drm/amdgpu: Enhance Cleaner Shader Handling in GFX v9.0 Architecture v2
This commit modifies the gfx_v9_0_ring_emit_cleaner_shader function
to use a switch statement for cleaner shader emission based on the
specific GFX IP version.

The function now distinguishes between different IP versions, using
PACKET3_RUN_CLEANER_SHADER_9_0 for the versions 9.0.1, 9.1.0,
9.2.1, 9.2.2, 9.3.0, and 9.4.0, while retaining
PACKET3_RUN_CLEANER_SHADER for version 9.4.2.

v2: Simplify logic (Alex).

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:03:06 -04:00
Srinivasan Shanmugam
8896abcfdd drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER_9_0 for Cleaner Shader execution
This commit introduces the PACKET3_RUN_CLEANER_SHADER_9_0 definition,
which is a command packet utilized to instruct the GPU to execute the
cleaner shader for the GFX9.0 graphics architecture.

The cleaner shader is a piece of GPU code that is responsible for
clearing or initializing essential GPU resources, such as Local Data
Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar
General Purpose Registers (SGPRs). Properly clearing these resources  is
vital for ensuring data isolation and security between different
workloads executed on the GPU.

When the GPU receives this packet, it fetches and runs the cleaner
shader instructions from the specified location in the packet.  Thus by
preventing data leaks and ensuring that previous job states do not
interfere with subsequent workloads.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:03:02 -04:00
Jesse Zhang
cf93f10101 drm/amd/amdgpu: Fix out of bounds warning in amdgpu_hw_ip_info
Fix an array index out of bounds warning in the DMA IP case of
amdgpu_hw_ip_info() where it was incorrectly checking
adev->gfx.gfx_ring[i].no_user_submission instead of
adev->sdma.instance[i].ring.no_user_submission.

The mismatch caused UBSAN to report an array bounds violation since
it was accessing the GFX ring array with SDMA instance indices.

Fixes: 4310acd446 ("drm/amdgpu: add ring flag for no user submissions")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:02:17 -04:00
Eric Huang
4172b556fd drm/amdkfd: add smi events for process start and end
rocm-smi will be able to show the events for KFD process
start/end, it is the implementation of this feature.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:01:25 -04:00
Lijo Lazar
1d9bff4cf8 drm/amdgpu: Use the right function for hdp flush
There are a few prechecks made before HDP flush like a flush is not
required on APU bare metal. Using hdp callback directly bypasses those
checks. Use amdgpu_device_flush_hdp which takes care of prechecks.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:01:06 -04:00
Ellen Pan
5045c6c698 drm/amdgpu: Direct ret in ras_reset_err_cnt on VF
With adding sriov_vf check, we directly return EOPNOTSUPP in
ras_reset_error_count as we should not do anything on VF to reset RAS error
count.

This also fixes the issue that loading guest driver causes register
violations.

Reviewed-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:01:00 -04:00
Lijo Lazar
18a878fd8a drm/amdgpu: Use generic hdp flush function
Except HDP v5.2 all use a common logic for HDP flush. Use a generic
function. HDP v5.2 forces NO_KIQ logic, revisit it later.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:00:56 -04:00
Candice Li
d6b22b1dff drm/amdgpu: Set RAS EEPROM table version to v3 for umc v12_5
Set RAS EEPROM table version to v3 for umc v12_5.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:00:50 -04:00
Jesse.zhang@amd.com
e21e1e8bb8 drm/amdgpu: Enable per-queue reset for SDMA v4.4.2 on IP v9.5.0
Add support for per-queue reset on SDMA v4.4.2 when running with:
1. MEC firmware version 17 or later
2. DPM indicates SDMA reset is supported

v2: Fixed supported firmware versions  (Lijo)

Signed-off-by: Jesse.Zhang <Jesse.zhang@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:00:44 -04:00
Srinivasan Shanmugam
e62a8bc5d6 drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5.2/11.5.3 GPUs
Enable the cleaner shader for additional GFX11.5.2/11.5.3 series GPUs to
ensure data isolation among GPU tasks. The cleaner shader is tasked with
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps avoid
data leakage and guarantees the accuracy of computational results.

This update extends cleaner shader support to GFX11.5.2/11.5.3 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.

Cc: Mario Sopena-Novales <mario.novales@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:00:39 -04:00
Alex Deucher
20c50a9a79 drm/amd/display/dml2: use vzalloc rather than kzalloc
The structures are large and they do not require contiguous
memory so use vzalloc.

Fixes: 70839da636 ("drm/amd/display: Add new DCN401 sources")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4126
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:00:02 -04:00
Roman Li
309d11b4bb drm/amd/display: Add htmldocs description for fused_io interface
[Why]
htmldocs build warning: "Function parameter or struct member 'fused_io'
not described in 'amdgpu_display_manager'".

[How]
Add missing description.

Fixes: ce801e5d6c ("drm/amd/display: HDCP Locality check using DMUB Fused IO")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:58:19 -04:00
Alex Deucher
2e0454b730 drm/amdgpu: adjust enforce_isolation handling
Switch from a bool to an enum and allow more options
for enforce isolation.  There are now 3 modes of operation:
- Disabled (0)
- Enabled (serialization and cleaner shader) (1)
- Enabled in legacy mode (no serialization or cleaner shader) (2)
This provides better flexibility for more use cases.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:58:15 -04:00
Alex Deucher
b86fd212f3 drm/amdgpu/mes12: use the device value for enforce isolation
Use the local setting rather than the global parameter.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:58:12 -04:00
Alex Deucher
a7bb01337f drm/amdgpu/mes11: use the device value for enforce isolation
Use the local setting rather than the global parameter.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:58:04 -04:00
David Rosca
0f4dfe86fe drm/amdgpu: Add back JPEG to video caps for carrizo and newer
JPEG is not supported on Vega only.

Fixes: 0a6e7b06bd ("drm/amdgpu: Remove JPEG from vega and carrizo video caps")
Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:56:44 -04:00
Prike Liang
9a218d6f47 drm/amdgpu/gfx12: Implement the GFX12 KCQ pipe reset
Implement the GFX12 KCQ pipe reset, and disable the GFX12
kernel compute queue until the CPFW fully supports it.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:56:07 -04:00
Ce Sun
732c6cefc1 drm/amdgpu: Replace tmp_adev with hive in amdgpu_pci_slot_reset
Checking hive is more readable.

The following smatch warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6820 amdgpu_pci_slot_reset()
warn: iterator used outside loop: 'tmp_adev'

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:55:55 -04:00
ZhenGuo Yin
39938a8ed9 drm/amdgpu: fix warning of drm_mm_clean
Kernel doorbell BOs needs to be freed before ttm_fini.

Fixes: 54c30d2a8d ("drm/amdgpu: create kernel doorbell pages")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:54:57 -04:00
Kenneth Feng
c770ef1967 drm/amd/amdgpu: disable ASPM in some situations
disable ASPM with some ASICs on some specific platforms.
required from PCIe controller owner.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:54:48 -04:00
Prike Liang
0ec7535f5b drm/amdgpu: remove the duplicated mes queue active state setting
The MES queue deactivation and active status are already set in
mes_userq_unmap|map(), so the caller needn't set the queue_active
bit again.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:54:39 -04:00
Ruili Ji
f0ec5926da amd/amdgpu: Implement VCN queue reset for vcn 4.0.3
Add function for vcn queue reset to make driver to
do fine-grained reset instead of the whole gpu reset.

Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:54:27 -04:00
Masha Grinman
8ef4e99674 drm/amdgpu: Move read of snoop register from guest to host
Guest is reading/writing to snoop register which is a security violation
We moved the code to the host driver
And also added a validation on the guest side to check if it's guest

Signed-off-by: Masha Grinman <Masha.Grinman@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:54:02 -04:00
Mario Limonciello
2aabd44aa8 drm/amd: Forbid suspending into non-default suspend states
On systems that default to 'deep' some userspace software likes
to try to suspend in 'deep' first.  If there is a failure for any
reason (such as -ENOMEM) the failure is ignored and then it will
try to use 's2idle' as a fallback. This fails, but more importantly
it leads to graphical problems.

Forbid this behavior and only allow suspending in the last state
supported by the system.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4093
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250408180957.4027643-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:53:08 -04:00
Christian König
8b2ae7d492 drm/amdgpu: use a dummy owner for sysfs triggered cleaner shaders v4
Otherwise triggering sysfs multiple times without other submissions in
between only runs the shader once.

v2: add some comment
v3: re-add missing cast
v4: squash in semicolon fix

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:52:33 -04:00
Dave Airlie
47271a0cae amd-drm-fixes-6.15-2025-04-09:
amdgpu:
 - MES FW version caching fixes
 - Only use GTT as a fallback if we already have a backing store
 - dma_buf fix
 - IP discovery fix
 - Replay and PSR with VRR fix
 - DC FP fixes
 - eDP fixes
 - KIQ TLB invalidate fix
 - Enable dmem groups support
 - Allow pinning VRAM dma bufs if imports can do P2P
 - Workload profile fixes
 - Prevent possible division by 0 in fan handling
 
 amdkfd:
 - Queue reset fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ/akUAAKCRC93/aFa7yZ
 2HtpAP9ptJj5RKxOcdh8lbiOZN3RGjxsqw6wXk7LUnw3A2npVAD5AXoaTis4S84e
 0NIY7bd1Q1njP2akA4O5AKykkpgVtgw=
 =Ha4H
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.15-2025-04-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.15-2025-04-09:

amdgpu:
- MES FW version caching fixes
- Only use GTT as a fallback if we already have a backing store
- dma_buf fix
- IP discovery fix
- Replay and PSR with VRR fix
- DC FP fixes
- eDP fixes
- KIQ TLB invalidate fix
- Enable dmem groups support
- Allow pinning VRAM dma bufs if imports can do P2P
- Workload profile fixes
- Prevent possible division by 0 in fan handling

amdkfd:
- Queue reset fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250409165238.1180153-1-alexander.deucher@amd.com
2025-04-10 17:14:02 +10:00
Alex Deucher
34779e1446 drm/amdgpu/mes12: optimize MES pipe FW version fetching
Don't fetch it again if we already have it.  It seems the
registers don't reliably have the value at resume in some
cases.

Fixes: 785f0f9fe7 ("drm/amdgpu: Add mes v12_0 ip block support (v4)")
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9e7b08d239)
Cc: stable@vger.kernel.org # 6.12.x
2025-04-09 10:53:11 -04:00
Denis Arefev
7ba88b5ccc drm/amd/pm/smu11: Prevent division by zero
The user can set any speed value.
If speed is greater than UINT_MAX/8, division by zero is possible.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 1e866f1fe5 ("drm/amd/pm: Prevent divide by zero")
Signed-off-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit da7dc714a8)
Cc: stable@vger.kernel.org
2025-04-09 10:53:11 -04:00
Alex Deucher
35a5440832 drm/amdgpu: cancel gfx idle work in device suspend for s0ix
This is normally handled in the gfx IP suspend callbacks, but
for S0ix, those are skipped because we don't want to touch
gfx.  So handle it in device suspend.

Fixes: b9467983b7 ("drm/amdgpu: add dynamic workload profile switching for gfx10")
Fixes: 963537ca23 ("drm/amdgpu: add dynamic workload profile switching for gfx11")
Fixes: 5f95a15495 ("drm/amdgpu: add dynamic workload profile switching for gfx12")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 906ad45167)
Cc: stable@vger.kernel.org
2025-04-09 10:53:11 -04:00
Kenneth Feng
50f29ead1f drm/amd/display: pause the workload setting in dm
Pause the workload setting in dm when doing idle optimization

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b23f81c442)
2025-04-09 10:53:11 -04:00
Alex Deucher
c81a3ceedb drm/amdgpu/pm/swsmu: implement pause workload profile
Add the callback for implementation for swsmu.

Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 92e511d1ce)
2025-04-09 10:53:11 -04:00
Alex Deucher
7f991dd364 drm/amdgpu/pm: add workload profile pause helper
To be used for display idle optimizations when
we want to pause non-default profiles.

Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6dafb5d4c7)
2025-04-09 10:53:11 -04:00
Alex Deucher
72801504fd drm/amdgpu/sdma7: add support for disable_kq
When the parameter is set, disable user submissions
to kernel queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:23 -04:00
Alex Deucher
fcf5eb979a drm/amdgpu/sdma6: add support for disable_kq
When the parameter is set, disable user submissions
to kernel queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:23 -04:00
Alex Deucher
1d65006fc1 drm/amdgpu/sdma: add flag for tracking disable_kq
For SDMA, we still need kernel queues for paging so
they need to be initialized, but we no not want to
accept submissions from userspace when disable_kq
is set.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:23 -04:00
Alex Deucher
0981e0ef18 drm/amdgpu/gfx12: add support for disable_kq
Plumb in support for disabling kernel queues.

v2: use ring counts per Felix' suggestion
v3: fix stream fault handler, enable EOP interrupts
v4: fix MEC interrupt offset (Sunil)
v5: clean up after removing extra sched.ready settings

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:23 -04:00
Alex Deucher
1e63ebc0d4 drm/amdgpu/gfx11: add support for disable_kq
Plumb in support for disabling kernel queues in
GFX11.  We have to bring up a GFX queue briefly in
order to initialize the clear state.  After that
we can disable it.

v2: use ring counts per Felix' suggestion
v3: fix stream fault handler, enable EOP interrupts
v4: fix MEC interrupt offset (Sunil)

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
1f61fc28b9 drm/amdgpu/mes: make more vmids available when disable_kq=1
If we don't have kernel queues, the vmids can be used by
the MES for user queues.

Acked-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
acdc43f270 drm/amdgpu/mes: update hqd masks when disable_kq is set
Make all resources available to user queues.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Suggested-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
f091fa777b drm/amdgpu/gfx: add generic handling for disable_kq
Add proper checks for disable_kq functionality in
gfx helper functions.  Add special logic for families
that require the clear state setup.

v2: use ring count as per Felix suggestion
v3: fix num_gfx_rings handling in amdgpu_gfx_graphics_queue_acquire()
v4: fix error code (Alex)

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
4310acd446 drm/amdgpu: add ring flag for no user submissions
This would be set by IPs which only accept submissions
from the kernel, not userspace, such as when kernel
queues are disabled. Don't expose the rings to userspace
and reject any submissions in the CS IOCTL.

v2: fix error code (Alex)

Reviewed-by: Sunil Khatri<sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
a96a787d6d drm/amdgpu: add parameter to disable kernel queues
On chips that support user queues, setting this option
will disable kernel queues to be used to validate
user queues without kernel queues.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
cf97de5b54 drm/amdgpu/userq: prevent runtime pm when userqs are active
Similar to KFD, prevent runtime pm while user queues are active.

Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
4ce60dbada drm/amdgpu: store userq_managers in a list in adev
So we can iterate across them when we need to manage
all user queues.

v2: add uq_mgr to adev list in amdgpu_userq_mgr_init

Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
100b6010d7 drm/amdgpu: bump version for user queue IP support query
Add the user queue IP support query to the drm_amdgpu_info_device
query.

Cc: marek.olsak@amd.com
Cc: prike.liang@amd.com
Cc: sunil.khatri@amd.com
Cc: yogesh.mohanmarimuthu@amd.com
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
1af6881263 drm/amdgpu: add UAPI to query if user queues are supported
Add an INFO query to check if user queues are supported.

v2: switch to a mask of IPs (Marek)
v3: move to drm_amdgpu_info_device (Marek)

Cc: marek.olsak@amd.com
Cc: prike.liang@amd.com
Cc: sunil.khatri@amd.com
Cc: yogesh.mohanmarimuthu@amd.com
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
a4a3373da2 drm/amdgpu/gfx12: split userq setup to a separate switch
Add a separate switch statement for the userq callback
assignment so that we can assign the callbacks for each
asic as the firmware becomes available.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Alex Deucher
9983ed9693 drm/amdgpu/gfx11: clean up and consolidate sw_init
With the ME details fixed, we can now consolidate
this state.  Also split out the userq setup into a separate
switch statement so that we can set them per IP version
when the firmwares are ready.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:22 -04:00
Arvind Yadav
32bd8b3ea7 drm/amdgpu: Fix display freezing issue when resizing apps
The display is freezing because the amdgpu_userq_wait_ioctl()
is waiting for a non-user queue fence(specifically, the PT update fence).

RootCause:
The resume_work is initiated by both amdgpu_userq_suspend and
amdgpu_userqueue_ensure_ev_fence at same time. The amdgpu_userq_suspend
signals a dma-fence and subsequently triggers the resume_work, which is
intended to replace the existing fence by creating new dma-fence. However,
following this, the amdgpu_userqueue_ensure_ev_fence schedules another
resume_work that generates a new dma-fence, thereby replacing the one
created by amdgpu_userq_suspend. Consequently, the original fence will
never be signaled.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:21 -04:00
Alex Deucher
b6f190e623 drm/amdgpu/mes: warn on unexpected pipe numbers
Warn if the number of pipes exceeds what the MES supports.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:21 -04:00