2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

15900 Commits

Author SHA1 Message Date
Alexandre Demers
9101b84f8c drm/amdgpu: use "irq" in place of "interrupt" in DCE6/8 as in DCE10/11
"interrupt" becomes "irq" in:
dce_vX_0_set_hpd_interrupt_state()
dce_vX_0_set_crtc_interrupt_state()
dce_vX_0_set_pageflip_interrupt_state()

It is easier when going through the code to just change the DCE number in
the functions' name to find and compare them across DCE versions.

Also, it standardizes function mapping inside a given structure where .set
and .process are both set to functions with a "_irq" suffix.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:13 -04:00
Alexandre Demers
160e6f5108 drm/amdgpu: fix typos in DCEs
In DCE6, DCE8, DCE10, DCE11, "hdp" is replaced by "hpd" and
replace "type" by "hpd" for a uniform parameter naming usage across DCEs.

In link_factory.c, there is a missing "p" to "types"

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:13 -04:00
Alex Deucher
9e7b08d239 drm/amdgpu/mes12: optimize MES pipe FW version fetching
Don't fetch it again if we already have it.  It seems the
registers don't reliably have the value at resume in some
cases.

Fixes: 785f0f9fe7 ("drm/amdgpu: Add mes v12_0 ip block support (v4)")
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:13 -04:00
Alex Deucher
906ad45167 drm/amdgpu: cancel gfx idle work in device suspend for s0ix
This is normally handled in the gfx IP suspend callbacks, but
for S0ix, those are skipped because we don't want to touch
gfx.  So handle it in device suspend.

Fixes: b9467983b7 ("drm/amdgpu: add dynamic workload profile switching for gfx10")
Fixes: 963537ca23 ("drm/amdgpu: add dynamic workload profile switching for gfx11")
Fixes: 5f95a15495 ("drm/amdgpu: add dynamic workload profile switching for gfx12")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:05:35 -04:00
Alex Deucher
0e2ebfe276 drm/amdgpu/gfx12: dump full CP packet header FIFOs
In dev core dump, dump the full header fifo for
each queue. Each FIFO has 8 entries.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:05:14 -04:00
Alex Deucher
eb15a5d1ae drm/amdgpu/gfx11: dump full CP packet header FIFOs
In dev core dump, dump the full header fifo for
each queue. Each FIFO has 8 entries.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:05:11 -04:00
Alex Deucher
867cf768cb drm/amdgpu/gfx10: dump full CP packet header FIFOs
In dev core dump, dump the full header fifo for
each queue. Each FIFO has 8 entries.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:05:07 -04:00
Alex Deucher
fd4948494d drm/amdgpu/gfx9.4.3: dump full CP packet header FIFOs
In dev core dump, dump the full header fifo for
each queue. Each FIFO has 8 entries.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 18:01:08 -04:00
Alex Deucher
a267d1686c drm/amdgpu/gfx9: dump full CP packet header FIFOs
In dev core dump, dump the full header fifo for
each queue. Each FIFO has 8 entries.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 18:01:08 -04:00
Victor Skvortsov
f9fbc33881 drm/amdgpu: Fix CPER error handling on VFs
CPER read will loop infinitely if an error is encountered and
the more bit is set. Add error checks to break upon failure.

v2: added function pointer checks

Suggested-by: Tony Yi <Tony.Yi@amd.com>
Signed-off-by: Victor Skvortsov <Victor.Skvortsov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 18:00:40 -04:00
Sunil Khatri
89dab189a2 drm/amdgpu: Fix the comment to avoid warning
Fix the below comment warning
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:541:
warning: Function parameter or struct member 'adev'
not described in 'amdgpu_sdma_register_on_reset_callbacks'

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:59 -04:00
Lijo Lazar
6dee64e765 drm/amdgpu: Fix xgmi v6.4.1 link status reporting
Use the right register offsets for getting link status.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:59 -04:00
Lijo Lazar
5df0d6addb drm/amdgpu: Add basic validation for RAS header
If RAS header read from EEPROM is corrupted, it could result in trying
to allocate huge memory for reading the records. Add some validation to
header fields.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:59 -04:00
Apurv Mishra
daafa303d1 drm/amdkfd: Drop workaround for GC v9.4.3 revID 0
Remove workaround code for the early engineering
samples GC v9.4.3 SOCs with revID 0

Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:59 -04:00
Alexandre Demers
060708d1fa drm/amdgpu: huge sid.h cleanup, drop substituted defines.
Now that we are using the proper defines, cleanup useless old "substituted" defines.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:59 -04:00
Alexandre Demers
d3cd9565c6 drm/amdgpu: move si.c away from sid.h
Replace defines by the ones added earlier to GFX6, SMU6 and DCE6

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:59 -04:00
Boyuan Zhang
e66c07864e drm/amdgpu: enable FW workaround for VCN 4_0_5
Enabling VCN FW workaround for drm key injection through shared
memory for vcn 4_0_5

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:59 -04:00
Victor Skvortsov
0c6e39ce6d drm/amdgpu: Add indirect L1_TLB_CNTL reg programming for VFs
VFs on some IP versions are unable to access this register directly.

This register must be programmed before PSP ring is setup,
so use PSP VF mailbox directly. PSP will broadcast the register
value to all VF assigned instances.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:58 -04:00
Prike Liang
4aa8de3d03 drm/amdgpu/gfx12: Implement the gfx12 kgq pipe reset
Implement the GFX12 kgq pipe reset, and temporarily disable
the GFX12 pipe reset until the CPFW fully support it.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:58 -04:00
Prike Liang
d69248cf4c drm/amdgpu/gfx11: Implement the GFX11 KCQ pipe reset
Implement the GFX11 compute pipe reset. As the GFX11 CPFW
still hasn't fully supported pipe reset yet, therefore
disable the KCQ pipe reset temporarily.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:58 -04:00
Prike Liang
dcc8e148e0 drm/amdgpu/gfx11: Implement the GFX11 KGQ pipe reset
Implement the kernel graphics queue pipe reset,and the driver
will fallback to pipe reset when the queue reset fails. However,
the ME FW hasn't fully supported pipe reset yet so disable the
KGQ pipe reset temporarily.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:58 -04:00
Alex Deucher
a9a8bccaa3 drm/amdgpu/gfx11: fix CSIB handling
We shouldn't return after the last section.
We need to update the rest of the CSIB.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:35 -04:00
Alex Deucher
683308af03 drm/amdgpu/gfx10: fix CSIB handling
We shouldn't return after the last section.
We need to update the rest of the CSIB.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:35 -04:00
Alex Deucher
a4a4c0ae67 drm/amdgpu/gfx9: fix CSIB handling
We shouldn't return after the last section.
We need to update the rest of the CSIB.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Alex Deucher
c8b8d7a4f1 drm/amdgpu/gfx8: fix CSIB handling
We shouldn't return after the last section.
We need to update the rest of the CSIB.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Alex Deucher
be7652c23d drm/amdgpu/gfx7: fix CSIB handling
We shouldn't return after the last section.
We need to update the rest of the CSIB.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Alex Deucher
8307ebc15c drm/amdgpu/gfx6: fix CSIB handling
We shouldn't return after the last section.
We need to update the rest of the CSIB.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Alex Deucher
9cffd67e80 drm/amdgpu/gfx: assign the actual me0 queues per pipe
Set the actual number of queues per pipe for ME0 (gfx).
This way we will dump all of the queues properly in
dev core dumps.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Alex Deucher
8f970c46b5 drm/amdgpu/gfx: decouple the number of kgqs from the hw
The driver currently sets up one kgq per pipe.  As such
adev->gfx.me.num_queue_per_pipe is hardcoded to 1 everywhere.
This is fine for kernel queues, but when we enable user queues
we need to know that actual number of queues per pipe.  Decouple
the kgq setup from the actual hardware count.  For dev core
dumps and user queues, we want to know the actual number
of queues per pipe.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Alex Deucher
0d47bb77b5 drm/amdgpu/gfx: make amdgpu_gfx_me_queue_to_bit() static
It's not used outside of amdgpu_gfx.c.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Srinivasan Shanmugam
9eab245326 drm/amdgpu/gfx10: Add Cleaner Shader Support for GFX10.3.x GPUs
Enable the cleaner shader for other GFX10.3.x series of GPUs to provide
data isolation between GPU workloads. The cleaner shader is responsible
for clearing the Local Data Store (LDS), Vector General Purpose
Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which
helps prevent data leakage and ensures accurate computation results.

This update extends cleaner shader support to GFX10.3.x GPUs, previously
available for GFX10.3.0. It enhances security by clearing GPU memory
between processes and maintains a consistent GPU state across KGD and
KFD workloads.

Cc: Mario Sopena-Novales <mario.novales@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Alex Deucher
60d4952d89 drm/amdgpu: drop some dead code
Drop the cgs smu firmware code for SI, it's not used.
The smu firmware fetching for SI is done in si_dpm.c.

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Alexandre Demers
160d3d39f6 drm/amdgpu: continue cleaning up sid.h and si_enums.h
Remove more duplicated defines and move some in sid.h for coherence with
CIK.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:34 -04:00
Ananta Srikar
3470f80bd3 drm/amd/amdgpu: Fix typo
Fixes a typo in the word "version" in an error message.

Signed-off-by: Ananta Srikar <srikarananta01@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:33 -04:00
Andres Urian Florez
c6ae8d587e drm/amdgpu: Replace deprecated function strcpy() with strscpy()
Instead of using the strcpy() deprecated function to populate the
fw_name, use the strscpy() function

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy
Signed-off-by: Andres Urian Florez <andres.emb.sys@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:33 -04:00
Alex Deucher
48b733d99b drm/amdgpu: add rebar parameter
Add a new parameter to disable BAR resizing.  Note that this
only disables the driver from attempting to resize the BAR,
The BIOS may have resized the BAR at boot.

Some teams have found this useful in debugging P2P DMA
issues on systems where the available MMIO space did not allow
for all of the GPUs present to resize their BARs.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:33 -04:00
Alexandre Demers
b71b7cd91c drm/amdgpu: cleanup DCE6 a bit more
Use shifts already available in DCE6's defines, masks and shifts.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:33 -04:00
Alexandre Demers
d35a412910 drm/amdgpu: keep removing sid.h dependency from si_dma.c
Move and rename DMA_SEM_INCOMPLETE_TIMER_CNTL and DMA_SEM_WAIT_FAIL_TIMER_CNTL
in oss_1_0_d.h

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:33 -04:00
Alexandre Demers
14f15aa054 drm/amdgpu: move si_dma.c away from sid.h and si_enums.h
Replace defines for the ones in oss_1_0_d.h and oss_1_0_sh_mask.h

Taking the opportunity to add some comments taken from cik_sdma.c

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:33 -04:00
Alexandre Demers
230a4b0528 drm/amdgpu: make GFX6 easier to read
Just fix the style and add a comment for reading easiness

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:33 -04:00
Alexandre Demers
6168cb7a31 drm/amdgpu: move DCE6 away from sid.h and si_enums.h defines
This cleans up DCE6.

I added some minor tweaks taken from CIK to exit early

v2: minor fixes (Alex)

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:33 -04:00
Alexandre Demers
76eb396db3 drm/amdgpu: use GRPH_SECONDARY_SURFACE_ADDRESS_MASK with GRPH_SECONDARY_SURFACE_ADDRESS in DCE6
It seems a copy-paste error: since we are working with
mmGRPH_SECONDARY_SURFACE_ADDRESS,
GRPH_SECONDARY_SURFACE_ADDRESS__GRPH_SECONDARY_SURFACE_ADDRESS_MASK
should be used.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Alexandre Demers
c82d915fe1 drm/amdgpu: move si_ih.c away from sid.h defines
They are properly defined under oss_1_0_d.h

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Alexandre Demers
cbd8207e23 drm/amdgpu: remove PACKET3 duplicated defines from si_enums.h
PACKET3 is already in sid.h, as it is done under cikd.h for CIK

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Alexandre Demers
193e088015 drm/amdgpu: use proper defines, shifts and masks in DCE6 code
By replacing VGA_VSTATUS_CNTL by VGA_RENDER_CONTROL__VGA_VSTATUS_CNTL_MASK,
we also need to fix its usage in GMC6.

Note: VGA_VSTATUS_CNTL's binary value was inverted in dce_6_0_sh_mask.h,
so we need to invert its value where it was used.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Alexandre Demers
de81b86e96 drm/amdgpu: wire up defines, shifts and masks through SI code
To be able to remove as much duplicated defines, the different files
containing definitions, shifts and masks must be properly included.

Once done, the code will be migrated where needed to shifts and masks and
proper defines, before removing useless defines in the end.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Alexandre Demers
8e46cabf8e drm/amdgpu: move GFX6 defines into gfx_v6_0.c
Send a few GFX6 defines where it's used in GFX6.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Alexandre Demers
1be0ae9e12 drm/amdgpu: move X_GB_ADDR_CONFIG_GOLDEN in GFX7
[BONAIRE|HAWAII]_GB_ADDR_CONFIG_GOLDEN are only used by GFX7. So keep them
where they are needed.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Alexandre Demers
e319f9ec36 drm/amdgpu: small cleanup to CIK SDMA
Tidy cik_sdma_hw_init() by returning directly cik_sdma_start()'s result.

Keep amdgpu_cik_gpu_check_soft_reset() early declaration with others.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Alexandre Demers
60c53fe7bc drm/amdgpu: use cik_sdma_is_idle() in CIK SDMA
cik_sdma_is_idle() does exactly what we need, so use it.

V2: fix parameter (Alex)

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Alexandre Demers
62e0b8f766 drm/amdgpu: use gmc_v7_0_is_idle() since it is available under GMC7
gmc_v7_0_is_idle() does exactly what we need, so use it.

v2: fix parameter (Alex)

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:32 -04:00
Ce Sun
969fd18c8d drm/amdgpu/vcn: during dpc recovery will corrupt VCPU buffer
err_event_athub and dpc recovery will corrupt VCPU buffer,
so we need to restore fw data and clear buffer in amdgpu_vcn_resume()

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:31 -04:00
Ce Sun
8ba904f541 drm/amdgpu: Multi-GPU DPC recovery support
Add support for DPC recover based on refactored code

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:31 -04:00
Ce Sun
11bb33766f drm/amdgpu: refactor amdgpu_device_gpu_recover
Split amdgpu_device_gpu_recover into the following stages:
halt activities,asic reset,schedule resume and amdgpu resume.
The reason is that the subsequent addition of dpc recover
code will have a high similarity with gpu reset

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:31 -04:00
Christian König
f5e7fabd1f drm/amdgpu: allow pinning DMA-bufs into VRAM if all importers can do P2P
Try pinning into VRAM to allow P2P with RDMA NICs without ODP
support if all attachments can do P2P. If any attachment can't do
P2P just pin into GTT instead.

Acked-by: Simona Vetter <simona.vetter@ffwll.ch>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Pak Nin Lui <pak.lui@amd.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:30 -04:00
Maarten Lankhorst
1b5447d773 drm/amdgpu: Add cgroups implementation
Similar to xe, enable some simple management of VRAM only.

Reviewed-by: Christian König <christian.koenig@amd.com>
Co-developed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:30 -04:00
Jay Cornwall
3666ed8218 drm/amdgpu: Increase KIQ invalidate_tlbs timeout
KIQ invalidate_tlbs request has been seen to marginally exceed the
configured 100 ms timeout on systems under load.

All other KIQ requests in the driver use a 10 second timeout. Use a
similar timeout implementation on the invalidate_tlbs path.

v2: Poll once before msleep
v3: Fix return value

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Cc: Kent Russell <kent.russell@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 15:18:30 -04:00
Flora Cui
2f6dd741cd drm/amdgpu/ip_discovery: add missing ip_discovery fw
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 14:32:20 -04:00
Matthew Auld
c0dd8a9253 drm/amdgpu/dma_buf: fix page_link check
The page_link lower bits of the first sg could contain something like
SG_END, if we are mapping a single VRAM page or contiguous blob which
fits into one sg entry. Rather pull out the struct page, and use that in
our check to know if we mapped struct pages vs VRAM.

Fixes: f44ffd677f ("drm/amdgpu: add support for exporting VRAM using DMA-buf v3")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07 14:31:45 -04:00
Christian König
a755906fb2 drm/amdgpu: immediately use GTT for new allocations
Only use GTT as a fallback if we already have a backing store. This
prevents evictions when an application constantly allocates and frees new
memory.

Partially fixes
https://gitlab.freedesktop.org/drm/amd/-/issues/3844#note_2833985.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 216c1282dd ("drm/amdgpu: use GTT only as fallback for VRAM|GTT")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2025-04-07 14:30:57 -04:00
Alex Deucher
b71a2bb0ce drm/amdgpu/mes11: optimize MES pipe FW version fetching
Don't fetch it again if we already have it.  It seems the
registers don't reliably have the value at resume in some
cases.

Fixes: 028c3fb37e ("drm/amdgpu/mes11: initiate mes v11 support")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4083
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2025-04-07 14:30:10 -04:00
Thomas Zimmermann
1afba39f93 Merge drm/drm-next into drm-misc-next
Backmerging to get v6.15-rc1 into drm-misc-next. Also fixes a
build issue when enabling CONFIG_DRM_SCHED_KUNIT_TEST.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-04-07 14:35:48 +02:00
Linus Torvalds
16cd1c2657 A set of final cleanups for the timer subsystem:
1) Convert all del_timer[_sync]() instances over to the new
      timer_delete[_sync]() API and remove the legacy wrappers.
 
      Conversion was done with coccinelle plus some manual fixups as
      coccinelle chokes on scoped_guard().
 
   2) The final cleanup of the hrtimer_init() to hrtimer_setup() conversion.
 
      This has been delayed to the end of the merge window, so that all
      patches which have been merged through other trees are in mainline and
      all new users are catched.
 
 Doing this right before rc1 ensures that new code which is merged post rc1
 is not introducing new instances of the original functionality.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfyXi0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYzlD/4ykDZbUzgTreYOxEQpBJ9elPwBhxfL
 1v8OwDjRWlNrmLup8RiUfKrlbmztGl1J/u9ld0qhjcqkywCCBC1N5S+DhCjYetyP
 MPWLbi2Dc35cFA+M7i8fMgxI2K9MLz2Zj1UKxz1MdsSuNHm07N3mul/3T11Ye4Rz
 nPlzeQBTBDFCKTEGKjr8zjuoD15Wl48sObM0AjV35BPuQR1jfY4CE6VXo2h78+0c
 jYwpJpDmcd+o1bDrfFhWUME2DzABEkHhn4wNSETnM4E5RXZRMUbi4UiigzInibQr
 JOUTKwPJXTMX/Erd0XyXErrYf2qy1X9BQy6NlyDDOv+8kLEVRsC9Efplx9uoEtfi
 QvVT/UmgmhZFJBfIT3/B8OvasrfwOropaYoG4L0zbDpp1b09VY47N5lCLlNr/mZf
 jb2TwIln8Szy2EfIT2RSd0ZNupyU8V4aH/mYNpSlbUJ6mfvfIAttBSS/YH+Zeqku
 7zOJkoCusaySOCZCOQkeikL3ZBN+FHtNteXxmGnp34ed/tsfgGZj1lsbmkM2rrWo
 f2mQsYAclUA4KQeY9z/Xf7/c5wJUkME69PxOaaN23dOpBR7GA58Cvb0PQTnPlAiT
 KnH/JRweBHtcv4KEHMi2f5no4cxcmXyKTj7/TLyYNjc8LATL9Eo/nxG36PLxy4lN
 QPOWz11zEBLjQQ==
 =8Ftq
 -----END PGP SIGNATURE-----

Merge tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer cleanups from Thomas Gleixner:
 "A set of final cleanups for the timer subsystem:

   - Convert all del_timer[_sync]() instances over to the new
     timer_delete[_sync]() API and remove the legacy wrappers.

     Conversion was done with coccinelle plus some manual fixups as
     coccinelle chokes on scoped_guard().

   - The final cleanup of the hrtimer_init() to hrtimer_setup()
     conversion.

     This has been delayed to the end of the merge window, so that all
     patches which have been merged through other trees are in mainline
     and all new users are catched.

  Doing this right before rc1 ensures that new code which is merged post
  rc1 is not introducing new instances of the original functionality"

* tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/timers: Rename the hrtimer_init event to hrtimer_setup
  hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
  hrtimers: Rename debug_init() to debug_setup()
  hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
  hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
  hrtimers: Make callback function pointer private
  hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
  hrtimers: Switch to use __htimer_setup()
  hrtimers: Delete hrtimer_init()
  treewide: Convert new and leftover hrtimer_init() users
  treewide: Switch/rename to timer_delete[_sync]()
2025-04-06 08:35:37 -07:00
Linus Torvalds
758e4c86a1 drm fixes for 6.15-rc1
bridge:
 - tda998x: Select CONFIG_DRM_KMS_HELPER
 
 amdgpu:
 - Guard against potential division by 0 in fan code
 - Zero RPM support for SMU 14.0.2
 - Properly handle SI and CIK support being disabled
 - PSR fixes
 - DML2 fixes
 - DP Link training fix
 - Vblank fixes
 - RAS fixes
 - Partitioning fix
 - SDMA fix
 - SMU 13.0.x fixes
 - Rom fetching fix
 - MES fixes
 - Queue reset fix
 
 xe:
 - Fix NULL pointer dereference on error path
 - Add missing HW workaround for BMG
 - Fix survivability mode not triggering
 - Fix build warning when DRM_FBDEV_EMULATION is not set
 
 i915:
 - Bounds check for scalers in DSC prefill latency computation
 - Fix build by adding a missing include
 
 adp:
 - Fix error handling in plane setup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmfwQ5wACgkQDHTzWXnE
 hr7OIxAAn8ATKOYsyxaq0t8+gj2jhmGC1L3X6tQUzSh3uAwZpWzYlP8RVMMS9WrL
 Cwj0kBLlq71GasuG/wjEnAxuoahEWebrwTm73it9yWXWW+GsQYhOcWttvmhNSVET
 aTUdB2VM32S0QHf8t6rP0yG4XqJD95sSNnXCTh6/ksk5rhK6NpqyJRkdVBIRxI5O
 IUXxfxmREIrAq5ZfpoPuVUEA2US2W9p+L616RnILewa3iZYzK9XahfcErFU8xlS+
 796ZaYFGEWUmdJKTyCejRHz5z1afXAgKO3vRyGKbLC4MAsxk6MsWzTXsH5EMukxF
 lCtJQuODFDed2gXxLUOeXAUjWoHrgYpJBWRk23Pq83Kx2sf85MD2SF2d8pGJqMa+
 B0JYr/mFtZCi5xraHIH6rwx7k2+KqPVAl+P8f/Fye/AIxBUE3KoI43M69TQ3B2VD
 dG8Jz53DUVQ1TUGJjfZajninFfBdDlr9PgPLmp1gESjbGkI1azGXngwlaVK4kti+
 iZmM/4pbB0A/+PG7Wq1DFEhr0fvQW2RjtgXoh+CujRLgK7ZYwVbCcFpmGQgjOiXb
 IyphOc7tPiOH1EcXGSsrV5JlkyhVL0Ghu6JCKQCzF5MXbrg1fwjjUwVuGtJyqtso
 kKBbx3bx7rv577VXRhM13s0MPwOyhgi56a4kBtX7YRJqY5cVkdM=
 =P/sW
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly fixes, mostly from the end of last week, this week was very
  quiet, maybe you scared everyone away. It's mostly amdgpu, and xe,
  with some i915, adp and bridge bits, since I think this is overly
  quiet I'd expect rc2 to be a bit more lively.

  bridge:
   - tda998x: Select CONFIG_DRM_KMS_HELPER

  amdgpu:
   - Guard against potential division by 0 in fan code
   - Zero RPM support for SMU 14.0.2
   - Properly handle SI and CIK support being disabled
   - PSR fixes
   - DML2 fixes
   - DP Link training fix
   - Vblank fixes
   - RAS fixes
   - Partitioning fix
   - SDMA fix
   - SMU 13.0.x fixes
   - Rom fetching fix
   - MES fixes
   - Queue reset fix

  xe:
   - Fix NULL pointer dereference on error path
   - Add missing HW workaround for BMG
   - Fix survivability mode not triggering
   - Fix build warning when DRM_FBDEV_EMULATION is not set

  i915:
   - Bounds check for scalers in DSC prefill latency computation
   - Fix build by adding a missing include

  adp:
   - Fix error handling in plane setup"

  # -----BEGIN PGP SIGNATURE-----

* tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
  drm/i2c: tda998x: select CONFIG_DRM_KMS_HELPER
  drm/amdgpu/gfx12: fix num_mec
  drm/amdgpu/gfx11: fix num_mec
  drm/amd/pm: Add gpu_metrics_v1_8
  drm/amdgpu: Prefer shadow rom when available
  drm/amd/pm: Update smu metrics table for smu_v13_0_6
  drm/amd/pm: Remove host limit metrics support
  Remove unnecessary firmware version check for gc v9_4_2
  drm/amdgpu: stop unmapping MQD for kernel queues v3
  Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA"
  drm/amdgpu: Parse all deferred errors with UMC aca handle
  drm/amdgpu: Update ta ras block
  drm/amdgpu: Add NPS2 to DPX compatible mode
  drm/amdgpu: Use correct gfx deferred error count
  drm/amd/display: Actually do immediate vblank disable
  drm/amd/display: prevent hang on link training fail
  Revert "drm/amd/display: dml2 soc dscclk use DPM table clk setting"
  drm/amd/display: Increase vblank offdelay for PSR panels
  drm/amd: Handle being compiled without SI or CIK support better
  drm/amd/pm: Add zero RPM enabled OD setting support for SMU14.0.2
  ...
2025-04-05 15:35:11 -07:00
Thomas Gleixner
8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Linus Torvalds
2cd5769fb0 Driver core updates for 6.15-rc1
Here is the big set of driver core updates for 6.15-rc1.  Lots of stuff
 happened this development cycle, including:
   - kernfs scaling changes to make it even faster thanks to rcu
   - bin_attribute constify work in many subsystems
   - faux bus minor tweaks for the rust bindings
   - rust binding updates for driver core, pci, and platform busses,
     making more functionaliy available to rust drivers.  These are all
     due to people actually trying to use the bindings that were in 6.14.
   - make Rafael and Danilo full co-maintainers of the driver core
     codebase
   - other minor fixes and updates.
 
 This has been in linux-next for a while now, with the only reported
 issue being some merge conflicts with the rust tree.  Depending on which
 tree you pull first, you will have conflicts in one of them.  The merge
 resolution has been in linux-next as an example of what to do, or can be
 found here:
 	https://lore.kernel.org/r/CANiq72n3Xe8JcnEjirDhCwQgvWoE65dddWecXnfdnbrmuah-RQ@mail.gmail.com
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+mMrg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylRgwCdH58OE3BgL0uoFY5vFImStpmPtqUAoL5HpVWI
 jtbJ+UuXGsnmO+JVNBEv
 =gy6W
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updatesk from Greg KH:
 "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
  happened this development cycle, including:

   - kernfs scaling changes to make it even faster thanks to rcu

   - bin_attribute constify work in many subsystems

   - faux bus minor tweaks for the rust bindings

   - rust binding updates for driver core, pci, and platform busses,
     making more functionaliy available to rust drivers. These are all
     due to people actually trying to use the bindings that were in
     6.14.

   - make Rafael and Danilo full co-maintainers of the driver core
     codebase

   - other minor fixes and updates"

* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
  rust: platform: require Send for Driver trait implementers
  rust: pci: require Send for Driver trait implementers
  rust: platform: impl Send + Sync for platform::Device
  rust: pci: impl Send + Sync for pci::Device
  rust: platform: fix unrestricted &mut platform::Device
  rust: pci: fix unrestricted &mut pci::Device
  rust: device: implement device context marker
  rust: pci: use to_result() in enable_device_mem()
  MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
  rust/kernel/faux: mark Registration methods inline
  driver core: faux: only create the device if probe() succeeds
  rust/faux: Add missing parent argument to Registration::new()
  rust/faux: Drop #[repr(transparent)] from faux::Registration
  rust: io: fix devres test with new io accessor functions
  rust: io: rename `io::Io` accessors
  kernfs: Move dput() outside of the RCU section.
  efi: rci2: mark bin_attribute as __ro_after_init
  rapidio: constify 'struct bin_attribute'
  firmware: qemu_fw_cfg: constify 'struct bin_attribute'
  powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
  ...
2025-04-01 11:02:03 -07:00
Linus Torvalds
0c86b42439 drm for 6.15-rc1
uapi:
 - add mediatek tiled fourcc
 - add support for notifying userspace on device wedged
 
 new driver:
 - appletbdrm: support for Apple Touchbar displays on m1/m2
 - nova-core: skeleton rust driver to develop nova inside off
 
 firmware:
 - add some rust firmware pieces
 
 rust:
 - add 'LocalModule' type alias
 
 component:
 - add helper to query bound status
 
 fbdev:
 - fbtft: remove access to page->index
 
 media:
 - cec: tda998x: import driver from drm
 
 dma-buf:
 - add fast path for single fence merging
 
 tests:
 - fix lockdep warnings
 
 atomic:
 - allow full modeset on connector changes
 - clarify semantics of allow_modeset and drm_atomic_helper_check
 - async-flip: support on arbitary planes
 - writeback: fix UAF
 - Document atomic-state history
 
 format-helper:
 - support ARGB8888 to ARGB4444 conversions
 
 buddy:
 - fix multi-root cleanup
 
 ci:
 - update IGT
 
 dp:
 - support extended wake timeout
 - mst: fix RAD to string conversion
 - increase DPCD eDP control CAP size to 5 bytes
 - add DPCD eDP v1.5 definition
 - add helpers for LTTPR transparent mode
 
 panic:
 - encode QR code according to Fido 2.2
 
 scheduler:
 - add parameter struct for init
 - improve job peek/pop operations
 - optimise drm_sched_job struct layout
 
 ttm:
 - refactor pool allocation
 - add helpers for TTM shrinker
 
 panel-orientation:
 - add a bunch of new quirks
 
 panel:
 - convert panels to multi-style functions
 - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
   LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006,
   Lenovo T14s Gen6 Snapdragon
 - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
   kd110n11-51ie, Starry 2082109qfh040022-50e
 - visionox-r66451: use multi-style MIPI-DSI functions
 - raydium-rm67200: Add driver for Raydium RM67200
 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
 - sony-td4353-jdi: Use MIPI-DSI multi-func interface
 - summit: Add driver for Apple Summit display panel
 - visionox-rm692e5: Add driver for Visionox RM692E5
 
 bridge:
 - pass full atomic state to various callbacks
 - adv7511: Report correct capabilities
 - it6505: Fix HDCP V compare
 - snd65dsi86: fix device IDs
 - nwl-dsi: set bridge type
 - ti-sn65si83: add error recovery and set bridge type
 - synopsys: add HDMI audio support
 
 xe:
 - support device-wedged event
 - add mmap support for PCI memory barrier
 - perf pmu integration and expose per-engien activity
 - add EU stall sampling support
 - GPU SVM and Xe SVM implementation
 - use TTM shrinker
 - add survivability mode to allow the driver to do
   firmware updates in critical failure states
 - PXP HWDRM support for MTL and LNL
 - expose package/vram temps over hwmon
 - enable DP tunneling
 - drop mmio_ext abstraction
 - Reject BO evcition if BO is bound to current VM
 - Xe suballocator improvements
 - re-use display vmas when possible
 - add GuC Buffer Cache abstraction
 - PCI ID update for Panther Lake and Battlemage
 - Enable SRIOV for Panther Lake
 - Refactor VRAM manager location
 
 i915:
 - enable extends wake timeout
 - support device-wedged event
 - Enable DP 128b/132b SST DSC
 - FBC dirty rectangle support for display version 30+
 - convert i915/xe to drm client setup
 - Compute HDMI PLLS for rates not in fixed tables
 - Allow DSB usage when PSR is enabled on LNL+
 - Enable panel replay without full modeset
 - Enable async flips with compressed buffers on ICL+
 - support luminance based brightness via DPCD for eDP
 - enable VRR enable/disable without full modeset
 - allow GuC SLPC default strategies on MTL+ for performance
 - lots of display refactoring in move to struct intel_display
 
 amdgpu:
 - add device wedged event
 - support async page flips on overlay planes
 - enable broadcast RGB drm property
 - add info ioctl for virt mode
 - OEM i2c support for RGB lights
 - GC 11.5.2 + 11.5.3 support
 - SDMA 6.1.3 support
 - NBIO 7.9.1 + 7.11.2 support
 - MMHUB 1.8.1 + 3.3.2 support
 - DCN 3.6.0 support
 - Add dynamic workload profile switching for GC 10-12
 - support larger VBIOS sizes
 - Mark gttsize parameters as deprecated
 - Initial JPEG queue resset support
 
 amdkfd:
 - add KFD per process flags for setting precision
 - sync pasid values between KGD and KFD
 - improve GTT/VRAM handling for APUs
 - fix user queue validation on GC7/8
 - SDMA queue reset support
 
 raedeon:
 - rs400 hyperz fix
 
 i2c:
 - td998x: drop platform_data, split driver into media and bridge
 
 ast:
 - transmitter chip detection refactoring
 - vbios display mode refactoring
 - astdp: fix connection status and filter unsupported modes
 - cursor handling refactoring
 
 imagination:
 - check job dependencies with sched helper
 
 ivpu:
 - improve command queue handling
 - use workqueue for IRQ handling
 - add support HW fault injection
 - locking fixes
 
 mgag200:
 - add support for G200eH5
 
 msm:
 - dpu: add concurrent writeback support for DPU 10.x+
 - use LTTPR helpers
 - GPU:
   - Fix obscure GMU suspend failure
   - Expose syncobj timeline support
   - Extend GPU devcoredump with pagetable info
   - a623 support
   - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot / devcoredump
 - Display:
   - Add cpu-cfg interconnect paths on SM8560 and SM8650
   - Introduce KMS OMMU fault handler, causing devcoredump snapshot
   - Fixed error pointer dereference in msm_kms_init_aspace()
 - DPU:
   - Fix mode_changing handling
   - Add writeback support on SM6150 (QCS615)
   - Fix DSC programming in 1:1:1 topology
   - Reworked hardware resource allocation, moving it to the CRTC code
   - Enabled support for Concurrent WriteBack (CWB) on SM8650
   - Enabled CDM blocks on all relevant platforms
   - Reworked debugfs interface for BW/clocks debugging
   - Clear perf params before calculating bw
   - Support YUV formats on writeback
   - Fixed double inclusion
   - Fixed writeback in YUV formats when using cloned output, Dropped
     wb2_formats_rgb
   - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt
     kerneldocs
   - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode()
 - DSI:
   - DSC-related fixes
   - Rework clock programming
 - DSI PHY:
   - Fix 7nm (and lower) PHY programming
   - Add proper DT schema definitions for DSI PHY clocks
 - HDMI:
   - Rework the driver, enabling the use of the HDMI Connector framework
 - Bindings:
   - Added eDP PHY on SA8775P
 
 nouveau:
 - move drm_slave_encoder interface into driver
 - nvkm: refactor GSP RPC
 - use LTTPR helpers
 
 mediatek:
 - HDMI fixup and refinement
 - add MT8188 dsc compatible
 - MT8365 SoC support
 
 panthor:
 - Expose sizes of intenral BOs via fdinfo
 - Fix race between reset and suspend
 - Improve locking
 
 qaic:
 - Add support for AIC200
 
 renesas:
 - Fix limits in DT bindings
 
 rockchip:
 - support rk3562-mali
 - rk3576: Add HDMI support
 - vop2: Add new display modes on RK3588 HDMI0 up to 4K
 - Don't change HDMI reference clock rate
 - Fix DT bindings
 - analogix_dp: add eDP support
 - fix shutodnw
 
 solomon:
 - Set SPI device table to silence warnings
 - Fix pixel and scanline encoding
 
 v3d:
 - handle clock
 
 vc4:
 - Use drm_exec
 - Use dma-resv for wait-BO ioctl
 - Remove seqno infrastructure
 
 virtgpu:
 - Support partial mappings of GEM objects
 - Reserve VGA resources during initialization
 - Fix UAF in virtgpu_dma_buf_free_obj()
 - Add panic support
 
 vkms:
 - Switch to a managed modesetting pipeline
 - Add support for ARGB8888
 - fix UAf
 
 xlnx:
 - Set correct DMA segment size
 - use mutex guards
 - Fix error handling
 - Fix docs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmfmA2cACgkQDHTzWXnE
 hr7lKQ/+I7gmln3ka8FyKnpwG5KusDpxz3OzgUKHpzOTkXL1+vPMt0vjCKRJKE3D
 zfpUTWNbwlVN0krqmUGyFIeCt8wmevBF6HvQ+GsWbiWltUj3xIlnkV0TVH84XTUo
 0evrXNG9K8sLjMKjrf7yrGL53Ayoaq9IO9wtOws+FCgtykAsMR/IWLrYLpj21ZQ6
 Oclhq5Cz21WRoQpzySR23s3DLi4LHri26RGKbGNh2PzxYwyP/euGW6O+ncEduNmg
 vQLgUfptaM/EubJFG6jxDWZJ2ChIAUUxQuhZwt7DKqRsYIcJKcfDSUzqL95t6SYU
 zewlYmeslvusoesCeTJzHaUj34yqJGsvjFPsHFUEvAy8BVncsqS40D6mhrRMo5nD
 chnSJu+IpDOEEqcdbb1J73zKLw6X3GROM8qUQEThJBD2yTsOdw9d0HXekMDMBXSi
 NayKvXfsBV19rI74FYnRCzHt8BVVANh5qJNnR5RcnPZ2KzHQbV0JFOA9YhxE8vFU
 GWkFWKlpAQUQ+yoTy1kuO9dcUxLIC7QseMeS5BYhcJBMEV78xQuRYRxgsL8YS4Yg
 rIhcb3mZwMFj7jBAqfpLKWiI+GTup+P9vcz7Bvm5iIf8gEhZLUTwqBeAYXQkcWd4
 i1AqDuFR0//sAgODoeU2sg1Q3yTL/i/DhcwvCIPgcDZ/x4Eg868=
 =oK/N
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "Outside of drm there are some rust patches from Danilo who maintains
  that area in here, and some pieces for drm header check tests.

  The major things in here are a new driver supporting the touchbar
  displays on M1/M2, the nova-core stub driver which is just the vehicle
  for adding rust abstractions and start developing a real driver inside
  of.

  xe adds support for SVM with a non-driver specific SVM core
  abstraction that will hopefully be useful for other drivers, along
  with support for shrinking for TTM devices. I'm sure xe and AMD
  support new devices, but the pipeline depth on these things is hard to
  know what they end up being in the marketplace!

  uapi:
   - add mediatek tiled fourcc
   - add support for notifying userspace on device wedged

  new driver:
   - appletbdrm: support for Apple Touchbar displays on m1/m2
   - nova-core: skeleton rust driver to develop nova inside off

  firmware:
   - add some rust firmware pieces

  rust:
   - add 'LocalModule' type alias

  component:
   - add helper to query bound status

  fbdev:
   - fbtft: remove access to page->index

  media:
   - cec: tda998x: import driver from drm

  dma-buf:
   - add fast path for single fence merging

  tests:
   - fix lockdep warnings

  atomic:
   - allow full modeset on connector changes
   - clarify semantics of allow_modeset and drm_atomic_helper_check
   - async-flip: support on arbitary planes
   - writeback: fix UAF
   - Document atomic-state history

  format-helper:
   - support ARGB8888 to ARGB4444 conversions

  buddy:
   - fix multi-root cleanup

  ci:
   - update IGT

  dp:
   - support extended wake timeout
   - mst: fix RAD to string conversion
   - increase DPCD eDP control CAP size to 5 bytes
   - add DPCD eDP v1.5 definition
   - add helpers for LTTPR transparent mode

  panic:
   - encode QR code according to Fido 2.2

  scheduler:
   - add parameter struct for init
   - improve job peek/pop operations
   - optimise drm_sched_job struct layout

  ttm:
   - refactor pool allocation
   - add helpers for TTM shrinker

  panel-orientation:
   - add a bunch of new quirks

  panel:
   - convert panels to multi-style functions
   - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
     LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry
     116KHD024006, Lenovo T14s Gen6 Snapdragon
   - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
     kd110n11-51ie, Starry 2082109qfh040022-50e
   - visionox-r66451: use multi-style MIPI-DSI functions
   - raydium-rm67200: Add driver for Raydium RM67200
   - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
   - sony-td4353-jdi: Use MIPI-DSI multi-func interface
   - summit: Add driver for Apple Summit display panel
   - visionox-rm692e5: Add driver for Visionox RM692E5

  bridge:
   - pass full atomic state to various callbacks
   - adv7511: Report correct capabilities
   - it6505: Fix HDCP V compare
   - snd65dsi86: fix device IDs
   - nwl-dsi: set bridge type
   - ti-sn65si83: add error recovery and set bridge type
   - synopsys: add HDMI audio support

  xe:
   - support device-wedged event
   - add mmap support for PCI memory barrier
   - perf pmu integration and expose per-engien activity
   - add EU stall sampling support
   - GPU SVM and Xe SVM implementation
   - use TTM shrinker
   - add survivability mode to allow the driver to do firmware updates
     in critical failure states
   - PXP HWDRM support for MTL and LNL
   - expose package/vram temps over hwmon
   - enable DP tunneling
   - drop mmio_ext abstraction
   - Reject BO evcition if BO is bound to current VM
   - Xe suballocator improvements
   - re-use display vmas when possible
   - add GuC Buffer Cache abstraction
   - PCI ID update for Panther Lake and Battlemage
   - Enable SRIOV for Panther Lake
   - Refactor VRAM manager location

  i915:
   - enable extends wake timeout
   - support device-wedged event
   - Enable DP 128b/132b SST DSC
   - FBC dirty rectangle support for display version 30+
   - convert i915/xe to drm client setup
   - Compute HDMI PLLS for rates not in fixed tables
   - Allow DSB usage when PSR is enabled on LNL+
   - Enable panel replay without full modeset
   - Enable async flips with compressed buffers on ICL+
   - support luminance based brightness via DPCD for eDP
   - enable VRR enable/disable without full modeset
   - allow GuC SLPC default strategies on MTL+ for performance
   - lots of display refactoring in move to struct intel_display

  amdgpu:
   - add device wedged event
   - support async page flips on overlay planes
   - enable broadcast RGB drm property
   - add info ioctl for virt mode
   - OEM i2c support for RGB lights
   - GC 11.5.2 + 11.5.3 support
   - SDMA 6.1.3 support
   - NBIO 7.9.1 + 7.11.2 support
   - MMHUB 1.8.1 + 3.3.2 support
   - DCN 3.6.0 support
   - Add dynamic workload profile switching for GC 10-12
   - support larger VBIOS sizes
   - Mark gttsize parameters as deprecated
   - Initial JPEG queue resset support

  amdkfd:
   - add KFD per process flags for setting precision
   - sync pasid values between KGD and KFD
   - improve GTT/VRAM handling for APUs
   - fix user queue validation on GC7/8
   - SDMA queue reset support

  raedeon:
   - rs400 hyperz fix

  i2c:
   - td998x: drop platform_data, split driver into media and bridge

  ast:
   - transmitter chip detection refactoring
   - vbios display mode refactoring
   - astdp: fix connection status and filter unsupported modes
   - cursor handling refactoring

  imagination:
   - check job dependencies with sched helper

  ivpu:
   - improve command queue handling
   - use workqueue for IRQ handling
   - add support HW fault injection
   - locking fixes

  mgag200:
   - add support for G200eH5

  msm:
   - dpu: add concurrent writeback support for DPU 10.x+
   - use LTTPR helpers
   - GPU:
     - Fix obscure GMU suspend failure
     - Expose syncobj timeline support
     - Extend GPU devcoredump with pagetable info
     - a623 support
     - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot /
       devcoredump
   - Display:
     - Add cpu-cfg interconnect paths on SM8560 and SM8650
     - Introduce KMS OMMU fault handler, causing devcoredump snapshot
     - Fixed error pointer dereference in msm_kms_init_aspace()
   - DPU:
     - Fix mode_changing handling
     - Add writeback support on SM6150 (QCS615)
     - Fix DSC programming in 1:1:1 topology
     - Reworked hardware resource allocation, moving it to the CRTC code
     - Enabled support for Concurrent WriteBack (CWB) on SM8650
     - Enabled CDM blocks on all relevant platforms
     - Reworked debugfs interface for BW/clocks debugging
     - Clear perf params before calculating bw
     - Support YUV formats on writeback
     - Fixed double inclusion
     - Fixed writeback in YUV formats when using cloned output, Dropped
       wb2_formats_rgb
     - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt
       kerneldocs
     - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode()
   - DSI:
     - DSC-related fixes
     - Rework clock programming
   - DSI PHY:
     - Fix 7nm (and lower) PHY programming
     - Add proper DT schema definitions for DSI PHY clocks
   - HDMI:
     - Rework the driver, enabling the use of the HDMI Connector
       framework
   - Bindings:
     - Added eDP PHY on SA8775P

  nouveau:
   - move drm_slave_encoder interface into driver
   - nvkm: refactor GSP RPC
   - use LTTPR helpers

  mediatek:
   - HDMI fixup and refinement
   - add MT8188 dsc compatible
   - MT8365 SoC support

  panthor:
   - Expose sizes of intenral BOs via fdinfo
   - Fix race between reset and suspend
   - Improve locking

  qaic:
   - Add support for AIC200

  renesas:
   - Fix limits in DT bindings

  rockchip:
   - support rk3562-mali
   - rk3576: Add HDMI support
   - vop2: Add new display modes on RK3588 HDMI0 up to 4K
   - Don't change HDMI reference clock rate
   - Fix DT bindings
   - analogix_dp: add eDP support
   - fix shutodnw

  solomon:
   - Set SPI device table to silence warnings
   - Fix pixel and scanline encoding

  v3d:
   - handle clock

  vc4:
   - Use drm_exec
   - Use dma-resv for wait-BO ioctl
   - Remove seqno infrastructure

  virtgpu:
   - Support partial mappings of GEM objects
   - Reserve VGA resources during initialization
   - Fix UAF in virtgpu_dma_buf_free_obj()
   - Add panic support

  vkms:
   - Switch to a managed modesetting pipeline
   - Add support for ARGB8888
   - fix UAf

  xlnx:
   - Set correct DMA segment size
   - use mutex guards
   - Fix error handling
   - Fix docs"

* tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel: (1762 commits)
  drm/amd/pm: Update feature list for smu_v13_0_6
  drm/amdgpu: Add parameter documentation for amdgpu_sync_fence
  drm/amdgpu/discovery: optionally use fw based ip discovery
  drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics
  drm/amdgpu/discovery: check ip_discovery fw file available
  drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
  drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
  drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA
  drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush
  drm/amd/amdgpu: Increase max rings to enable SDMA page ring
  drm/amdgpu: Decode deferred error type in gfx aca bank parser
  drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs
  drm/amdgpu/mes: clean up SDMA HQD loop
  drm/amdgpu/mes: enable compute pipes across all MEC
  drm/amdgpu/mes: drop MES 10.x leftovers
  drm/amdgpu/mes: optimize compute loop handling
  drm/amdgpu/sdma: guilty tracking is per instance
  drm/amdgpu/sdma: fix engine reset handling
  drm/amdgpu: remove invalid usage of sched.ready
  drm/amdgpu: add cleaner shader trace point
  ...
2025-03-28 17:44:52 -07:00
Alex Deucher
dce8bd9137 drm/amdgpu/gfx12: fix num_mec
GC12 only has 1 mec.

Fixes: 52cb80c12e ("drm/amdgpu: Add gfx v12_0 ip block support (v6)")
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26 17:47:18 -04:00
Alex Deucher
4161050d47 drm/amdgpu/gfx11: fix num_mec
GC11 only has 1 mec.

Fixes: 3d879e81f0 ("drm/amdgpu: add init support for GFX11 (v2)")
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26 17:47:14 -04:00
Lijo Lazar
27145f78f5 drm/amdgpu: Prefer shadow rom when available
Fetch VBIOS from shadow ROM when available before trying other methods
like EFI method.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 9c081c11c6 ("drm/amdgpu: Reorder to read EFI exported ROM first")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4066
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2025-03-26 17:46:33 -04:00
Candice Li
5b3c08ae9e Remove unnecessary firmware version check for gc v9_4_2
GC v9_4_2 uses a new versioning scheme for CP firmware, making
the warning ("CP firmware version too old, please update!") irrelevant.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2025-03-26 17:45:53 -04:00
Christian König
1f86f4125e drm/amdgpu: stop unmapping MQD for kernel queues v3
This looks unnecessary and actually extremely harmful since using kmap()
is not possible while inside the ring reset.

Remove all the extra mapping and unmapping of the MQDs.

v2: also fix debugfs
v3: fix coding style typo

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26 17:45:42 -04:00
Jesse.zhang@amd.com
510a16d995 Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA"
this temporarily reverts
commit 6ec04e38b2 ("drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA")
it cause a regression.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26 17:44:47 -04:00
Xiang Liu
aedc92be96 drm/amdgpu: Parse all deferred errors with UMC aca handle
We should only increase the deferred errors in UMC block.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26 17:44:41 -04:00
Stanley.Yang
cc11dffc14 drm/amdgpu: Update ta ras block
Update ta ra block to keep sync with RAS TA.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26 17:44:34 -04:00
Lijo Lazar
ee97326fb9 drm/amdgpu: Add NPS2 to DPX compatible mode
Compute partition DPX is possible in NPS2 mode. Update the compatible
modes for DPX.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26 17:44:32 -04:00
Xiang Liu
f3f05a0ec5 drm/amdgpu: Use correct gfx deferred error count
In the case of parsing GFX deferred error from SMU corrected error
channel, the error count should be set to 1 instead of parsing from
MISC0 register, which is 0.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26 17:44:27 -04:00
Mario Limonciello
5f054ddead drm/amd: Handle being compiled without SI or CIK support better
If compiled without SI or CIK support but amdgpu tries to load it
will run into failures with uninitialized callbacks.

Show a nicer message in this case and fail probe instead.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4050
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2025-03-26 17:41:55 -04:00
Linus Torvalds
a50b4fe095 A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
   the callback pointer. This turned out to be suboptimal for the upcoming
   Rust integration and is obviously a silly implementation to begin with.
 
   This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
   with hrtimer_setup(T, cb);
 
   The conversion was done with Coccinelle and a few manual fixups.
 
   Once the conversion has completely landed in mainline, hrtimer_init()
   will be removed and the hrtimer::function becomes a private member.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff5jQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVvRD/wKtuwmiA66NJFgXC0qVq82A6fO3bY8
 GBdbfysDJIbqGu5PTcULTbJ8qkqv3jeLUv6CcXvS4sZ7y/uJQl2lzf8yrD/0bbwc
 rLI6sHiPSZmK93kNVN4X5H7kvt7cE/DYC9nnEOgK3BY5FgKc4n9887d4aVBhL8Lv
 ODwVXvZ+xi351YCj7qRyPU24zt/p4tkkT1o2k4a0HBluqLI0D+V20fke9IERUL8r
 d1uWKlcn0TqYDesE8HXKIhbst3gx52rMJrXBJDHwFmG6v8Pj1fkTXCVpPo8QcBz8
 OTVkpomN9f/Tx4+GZwhZOF86LhLL3OhxD6pT7JhFCXdmSGv+Ez8uyk1YZysM/XpV
 Juy/1yAcBpDIDkmhMFGdAAn48Nn9Fotty0r4je60zSEp1d/4QMXcFme29qr2JTUE
 iWnQ/HD6DxUjVHqy7CYvvo26Xegg1C7qgyOVt4PYZwAM1VKF5P3kzYTb4SAdxtop
 Tpji1sfW9QV08jqMNo6XntD32DSP9S2HqjO9LwBw700jnx2jjJ35fcJs6iodMOUn
 gckIZLMn3L0OoglPdyA5O7SNTbKE7aFiRKdnT/cJtR3Fa39Qu27CwC5gfiyuie9I
 Q+LG8GLuYSBHXAR+PBK4GWlzJ7Dn8k3eqmbnLeKpRMsU6ZzcttgA64xhaviN2wN0
 iJbvLJeisXr3GA==
 =bYAX
 -----END PGP SIGNATURE-----

Merge tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer cleanups from Thomas Gleixner:
 "A treewide hrtimer timer cleanup

  hrtimers are initialized with hrtimer_init() and a subsequent store to
  the callback pointer. This turned out to be suboptimal for the
  upcoming Rust integration and is obviously a silly implementation to
  begin with.

  This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
  with hrtimer_setup(T, cb);

  The conversion was done with Coccinelle and a few manual fixups.

  Once the conversion has completely landed in mainline, hrtimer_init()
  will be removed and the hrtimer::function becomes a private member"

* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
  wifi: rt2x00: Switch to use hrtimer_update_function()
  io_uring: Use helper function hrtimer_update_function()
  serial: xilinx_uartps: Use helper function hrtimer_update_function()
  ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
  RDMA: Switch to use hrtimer_setup()
  virtio: mem: Switch to use hrtimer_setup()
  drm/vmwgfx: Switch to use hrtimer_setup()
  drm/xe/oa: Switch to use hrtimer_setup()
  drm/vkms: Switch to use hrtimer_setup()
  drm/msm: Switch to use hrtimer_setup()
  drm/i915/request: Switch to use hrtimer_setup()
  drm/i915/uncore: Switch to use hrtimer_setup()
  drm/i915/pmu: Switch to use hrtimer_setup()
  drm/i915/perf: Switch to use hrtimer_setup()
  drm/i915/gvt: Switch to use hrtimer_setup()
  drm/i915/huc: Switch to use hrtimer_setup()
  drm/amdgpu: Switch to use hrtimer_setup()
  stm class: heartbeat: Switch to use hrtimer_setup()
  i2c: Switch to use hrtimer_setup()
  iio: Switch to use hrtimer_setup()
  ...
2025-03-25 10:54:15 -07:00
Dmitry Baryshkov
fcbb93f1e4 drm/display: dp: change drm_dp_dpcd_read_link_status() return value
drm_dp_dpcd_read_link_status() follows the "return error code or number
of bytes read" protocol, with the code returning less bytes than
requested in case of some errors. However most of the drivers
interpreted that as "return error code in case of any error". Switch
drm_dp_dpcd_read_link_status() to drm_dp_dpcd_read_data() and make it
follow that protocol too.

Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20250324-drm-rework-dpcd-access-v4-2-e80ff89593df@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-03-25 16:20:38 +02:00
Srinivasan Shanmugam
af23d3c9ca drm/amdgpu: Add parameter documentation for amdgpu_sync_fence
The 'flags' parameter, which specifies memory allocation behavior while
creating a sync entry,

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c:162: warning: Function parameter or struct member 'flags' not described in 'amdgpu_sync_fence'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:36 -04:00
Alex Deucher
80a0e82829 drm/amdgpu/discovery: optionally use fw based ip discovery
On chips without native IP discovery support, use the fw binary
if available, otherwise we can continue without it.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:36 -04:00
Flora Cui
25f602fbbc drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics
vega10/vega12/vega20/raven/raven2/picasso/arcturus/aldebaran

Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Flora Cui
017fbb6690 drm/amdgpu/discovery: check ip_discovery fw file available
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Jesse.zhang@amd.com
6ec04e38b2 drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA
This commit updates the VM flush implementation for the SDMA engine.

- Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ
  register value for the specified VMID and flush type. This function ensures that all relevant
  page table cache levels (L1 PTEs, L2 PTEs, and L2 PDEs) are invalidated.

- Modified the `sdma_v4_4_2_ring_emit_vm_flush` function to use the new `sdma_v4_4_2_get_invalidate_req`
  function. The updated function emits the necessary register writes and waits to perform a VM flush
  for the specified VMID. It updates the PTB address registers and issues a VM invalidation request
  using the specified VM invalidation engine.

- Included the necessary header file `gc/gc_9_0_sh_mask.h` to provide access to the required register
  definitions.

v2: vm flush by the vm inalidation packet (Lijo)
v3: code stle and define thh macro for the vm invalidation packet (Christian)
v4: Format definition sdma vm invalidate packet (Lijo)

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Jesse.zhang@amd.com
b09cdeb4d3 drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush
- Modify the VM invalidation engine allocation logic to handle SDMA page rings.
  SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of
  allocating a separate engine. This change ensures efficient resource management and
  avoids the issue of insufficient VM invalidation engines.

- Add synchronization for GPU TLB flush operations in gmc_v9_0.c.
  Use spin_lock and spin_unlock to ensure thread safety and prevent race conditions
  during TLB flush operations. This improves the stability and reliability of the driver,
  especially in multi-threaded environments.

 v2: replace the sdma ring check with a function `amdgpu_sdma_is_page_queue`
 to check if a ring is an SDMA page queue.(Lijo)

 v3: Add GC version check, only enabled on GC9.4.3/9.4.4/9.5.0
 v4: Fix code style and add more detailed description (Christian)
 v5: Remove dependency on vm_inv_eng loop order, explicitly lookup shared inv_eng(Christian/Lijo)
 v6: Added search shared ring function amdgpu_sdma_get_shared_ring (Lijo)

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Jesse.zhang@amd.com
ea6dd40caf drm/amd/amdgpu: Increase max rings to enable SDMA page ring
Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149.
This change is necessary to enable support for the SDMA page ring.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Xiang Liu
338f7412c7 drm/amdgpu: Decode deferred error type in gfx aca bank parser
In the case of injecting uncorrected error with background workload,
the deferred error among uncorrected errors need to be specified
by checking the deferred and poison bits of status register.

v2: refine checking for deferred error
v2: log possiable DEs among CEs
v2: generate CPER records for DEs among UEs

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Srinivasan Shanmugam
2ec0a7c337 drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs
Enable the cleaner shader for GFX11.5.0/11.5.1 GPUs to provide data
isolation between GPU workloads. The cleaner shader is responsible for
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps
prevent data leakage and ensures accurate computation results.

This update extends cleaner shader support to GFX11.5.0/11.5.1 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.

Cc: Mario Sopena-Novales <mario.novales@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Alex Deucher
5e93d0e335 drm/amdgpu/mes: clean up SDMA HQD loop
Follow the same logic as the other IP types.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Alex Deucher
a52077b6b6 drm/amdgpu/mes: enable compute pipes across all MEC
Enable pipes on both MECs for MES.

Fixes: 745f46b6a9 ("drm/amdgpu: enable mes v12 self test")
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Alex Deucher
652a06f74a drm/amdgpu/mes: drop MES 10.x leftovers
Leftover from MES bring up.  There is no production
MES support for MES 10.x.  The rest of the MES 10.x
code has already been removed so drop this.

Acked-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:35 -04:00
Alex Deucher
5608ddf6e9 drm/amdgpu/mes: optimize compute loop handling
Break when we get to the end of the supported pipes
rather than continuing the loop.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:34 -04:00
Alex Deucher
3bae7916e7 drm/amdgpu/sdma: guilty tracking is per instance
The gfx and page queues are per instance, so track them
per instance.

v2: drop extra parameter (Lijo)

Fixes: fdbfaaaae0 ("drm/amdgpu: Improve SDMA reset logic with guilty queue tracking")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:34 -04:00
Alex Deucher
e02fcf7308 drm/amdgpu/sdma: fix engine reset handling
Move the kfd suspend/resume code into the caller.  That
is where the KFD is likely to detect a reset so on the KFD
side there is no need to call them.  Also add a mutex to
lock the actual reset sequence.

v2: make the locking per instance

Fixes: bac38ca8c4 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+")
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:34 -04:00
Christian König
fc70d1ea1b drm/amdgpu: remove invalid usage of sched.ready
I can't count how often I had to remove this nonsense.

Probably doesn't need an explanation any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:34 -04:00
Christian König
02ba7543f2 drm/amdgpu: add cleaner shader trace point
Note when the cleaner shader is executed.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:34 -04:00
Christian König
1bb1314d0b drm/amdgpu: add isolation trace point
Note when we switch from one isolation owner to another.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:34 -04:00
Christian König
db1e58ec86 drm/amdgpu: stop reserving VMIDs to enforce isolation
That was quite troublesome for gang submit. Completely drop this
approach and enforce the isolation separately.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:34 -04:00
Christian König
b7fbcd77bb drm/amdgpu: rework how the cleaner shader is emitted v3
Instead of emitting the cleaner shader for every job which has the
enforce_isolation flag set only emit it for the first submission from
every client.

v2: add missing NULL check
v3: fix another NULL pointer deref

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:34 -04:00
Christian König
bd22e44ad4 drm/amdgpu: rework how isolation is enforced v2
Limiting the number of available VMIDs to enforce isolation causes some
issues with gang submit and applying certain HW workarounds which
require multiple VMIDs to work correctly.

So instead start to track all submissions to the relevant engines in a
per partition data structure and use the dma_fences of the submissions
to enforce isolation similar to what a VMID limit does.

v2: use ~0l for jobs without isolation to distinct it from kernel
    submissions which uses NULL for the owner. Add some warning when we
    are OOM.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:16:34 -04:00
Christian König
7f11c59e07 drm/amdgpu: overwrite signaled fence in amdgpu_sync
This allows using amdgpu_sync even without peeking into the fences for a
long time.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:15:08 -04:00
Christian König
16590745b5 drm/amdgpu: use GFP_NOWAIT for memory allocations
In the critical submission path memory allocations can't wait for
reclaim since that can potentially wait for submissions to finish.

Finally clean that up and mark most memory allocations in the critical
path with GFP_NOWAIT. The only exception left is the dma_fence_array()
used when no VMID is available, but that will be cleaned up later on.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:15:08 -04:00
Kenneth Feng
a67f0094c9 drm/amd/amdgpu: Revert "drm/amd/amdgpu: shorten the gfx idle worker timeout"
This reverts commit 55ff973fe1.

Reason for revert: this causes some tests fail with call trace.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:15:08 -04:00
Ahmad Rehman
cfdf8b34b9 drm/amdgpu/sdam: Skip SDMA queue reset for SRIOV
For SRIOV, skip the SDMA queue reset and return
error. The engine/queue reset failure will trigger
FLR in the sequence.

v2: do not add queue reset support mask for sriov

Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:15:08 -04:00
Ahmad Rehman
0a59fbd5d9 drm/amdgpu: Add support to load PSP TA v13.0.12 for SRIOV
Add case for 13.0.12.

Signed-off-by: Ahmad Rehman <ahrehman@amd.com>
Reviewed-by: Vignesh.Chander@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:15:08 -04:00
Ellen Pan
d7f5c13e45 drm/amdgpu: Enable amdgpu_ras_resume for gfx 9.5.0
This enables ras to be resumed after gpu recovery on mi350 sriov.

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21 12:15:08 -04:00
Victor Skvortsov
9c05636ca7 drm/amdgpu: Skip pcie_replay_count sysfs creation for VF
VFs cannot read the NAK_COUNTER register. This information is only
available through PMFW metrics.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:56:27 -04:00
Candice Li
a5f7e90fe0 drm/amdgpu: Add active_umc_mask to ras init_flags
Add active_umc_mask to ras init_flags.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:56:23 -04:00
Lijo Lazar
a7818b15cf Documentation/amdgpu: Add debug_mask documentation
Add description for debug_mask bit options.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:56:19 -04:00
Lijo Lazar
ab6893402a drm/amd/pm: Add debug bit for smu pool allocation
In certain cases, it's desirable to avoid PMFW log transactions to
system memory. Add a mask bit to decide whether to allocate smu pool in
device memory or system memory.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:56:13 -04:00
Alex Deucher
3b669df92c drm/amdgpu/vcn: adjust workload profile handling
No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two.  As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission.  It should not
affect the reference counting.

v2: bail early if the the profile is already active (Lijo)

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:56:09 -04:00
Alex Deucher
9e34d8d1a1 drm/amdgpu/gfx: adjust workload profile handling
No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two.  As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission.  It should not
affect the reference counting.

v2: bail early if the the profile is already active (Lijo)

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:56:04 -04:00
Candice Li
5762f9dcf7 drm/amdgpu: Add EEPROM I2C address support for smu v13_0_12
Add EEPROM I2C address support for smu v13_0_12.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:56:00 -04:00
Alex Deucher
ca6575a32a drm/amdgpu/vcn: fix ref counting for ring based profile handling
We need to make sure the workload profile ref counts are
balanced.  This isn't currently the case because we can
increment the count on submissions, but the decrement may
be delayed as work comes in.  Track when we enable the
workload profile so the references are balanced.

v2: switch to a mutex and active flag
v3: fix mutex init

Fixes: 1443dd3c67 ("drm/amd/pm: fix and simplify workload handling")
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:54:36 -04:00
Alex Deucher
553673a3e1 drm/amdgpu/gfx: fix ref counting for ring based profile handling
We need to make sure the workload profile ref counts are
balanced.  This isn't currently the case because we can
increment the count on submissions, but the decrement may
be delayed as work comes in.  Track when we enable the
workload profile so the references are balanced.

v2: switch to a mutex and active flag
v3: fix mutex init

Fixes: 8fdb3958e3 ("drm/amdgpu/gfx: add ring helpers for setting workload profile")
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
Tested-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:52:56 -04:00
Christian König
0d9a95099d drm/amdgpu: grab an additional reference on the gang fence v2
We keep the gang submission fence around in adev, make sure that it
stays alive.

v2: fix memory leak on retry

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19 15:51:31 -04:00
David Belanger
35b6162bb7 drm/amdgpu: Restore uncached behaviour on GFX12
Always use MTYPE_UC if UNCACHED flag is specified.

This makes kernarg region uncached and it restores
usermode cache disable debug flag functionality.

Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by
shader code.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit eb6cdfb807)
Cc: stable@vger.kernel.org # 6.12.x
2025-03-18 16:29:52 -04:00
Wentao Liang
86730b5261 drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()
In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is
incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini()
can only release 'pfp' field of 'gfx'. The release function of 'me' field
should be gfx_v12_0_me_fini().

Fixes: 52cb80c12e ("drm/amdgpu: Add gfx v12_0 ip block support (v6)")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ebdc52607a)
Cc: stable@vger.kernel.org # 6.12.x
2025-03-18 16:29:16 -04:00
David Rosca
7fc0765208 drm/amdgpu: Remove JPEG from vega and carrizo video caps
JPEG is only supported for VCN1+.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a6e7b06bd)
Cc: stable@vger.kernel.org
2025-03-18 16:26:12 -04:00
David Rosca
ec33964d9d drm/amdgpu: Fix JPEG video caps max size for navi1x and raven
8192x8192 is the maximum supported resolution.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6e0d2fde3a)
Cc: stable@vger.kernel.org
2025-03-18 16:25:46 -04:00
David Rosca
f0105e1731 drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size
1920x1088 is the maximum supported resolution.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1a0807feb9)
Cc: stable@vger.kernel.org
2025-03-18 16:25:16 -04:00
Lijo Lazar
6c11d4a87d drm/amdgpu: Use wafl version for xgmi
XGMI and WAFL share the same versions. Use WAFL version if XGMI version
is not present in discovery.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18 14:03:47 -04:00
Jesse.zhang@amd.com
cc63bcfd14 drm/amdgpu: Fix SDMA engine reset logic
The scheduler should restart only if the reset operation
succeeds  This ensures that new tasks are only submitted
to the queues after a successful reset.

Fixes: 4c02f73016 ("drm/amdgpu: Introduce conditional user queue suspension for SDMA resets")
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse.Zhang <Jesse.zhang@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18 14:03:47 -04:00
Flora Cui
b5aaa82e2b drm/amdgpu: release xcp_mgr on exit
Free on driver cleanup.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18 14:03:47 -04:00
Xiang Liu
d6f9bbce18 drm/amdgpu: Fix computation for remain size of CPER ring
The mistake of computation for remain size of CPER ring will cause
unbreakable while cycle when CPER ring overflow.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18 14:03:46 -04:00
Kenneth Feng
55ff973fe1 drm/amd/amdgpu: shorten the gfx idle worker timeout
Shorten the gfx idle worker timeout. This is to sync with
DAL when there is no activity on the screen. Original 1
second can not sync with DAL, so DAL can not apply MALL
when the workload type is not bootup default.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18 14:03:38 -04:00
Tao Zhou
05d50ea3ea drm/amdgpu: format old RAS eeprom data into V3 version
Clear old data and save it in V3 format.

v2: only format eeprom data for new ASICs.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18 14:03:38 -04:00
Alex Deucher
9deacd6c55 drm/amdgpu: don't free conflicting apertures for non-display devices
PCI_CLASS_ACCELERATOR_PROCESSING devices won't ever be
the sysfb, so there is no need to free conflicting
apertures.

Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18 14:03:38 -04:00
Alex Deucher
e00e5c2238 drm/amdgpu: adjust drm_firmware_drivers_only() handling
Move to probe so we can check the PCI device type and
only apply the drm_firmware_drivers_only() check for
PCI DISPLAY classes.  Also add a module parameter to
override the nomodeset kernel parameter as a workaround
for platforms that have this hardcoded on their kernel
command lines.

Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18 14:03:38 -04:00
Alex Deucher
4db4c82d4d drm/amdgpu: drop drm_firmware_drivers_only()
There are a number of systems and cloud providers out there
that have nomodeset hardcoded in their kernel parameters
to block nouveau for the nvidia driver.  This prevents the
amdgpu driver from loading. Unfortunately the end user cannot
easily change this.  The preferred way to block modules from
loading is to use modprobe.blacklist=<driver>.  That is what
providers should be using to block specific drivers.

Drop the check to allow the driver to load even when nomodeset
is specified on the kernel command line.

Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18 14:03:37 -04:00
David Belanger
eb6cdfb807 drm/amdgpu: Restore uncached behaviour on GFX12
Always use MTYPE_UC if UNCACHED flag is specified.

This makes kernarg region uncached and it restores
usermode cache disable debug flag functionality.

Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by
shader code.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:18:02 -04:00
Wentao Liang
ebdc52607a drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()
In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is
incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini()
can only release 'pfp' field of 'gfx'. The release function of 'me' field
should be gfx_v12_0_me_fini().

Fixes: 52cb80c12e ("drm/amdgpu: Add gfx v12_0 ip block support (v6)")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:17:24 -04:00
Shaoyun Liu
f81cd79311 drm/amd/amdgpu: Fix MES init sequence
When MES is been used , the set_hw_resource_1 API is required to
initialize MES internal context correctly

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:16:08 -04:00
Xiang Liu
13c13bdd1b drm/amdgpu: Enable ACA by default for psp v13_0_6/v13_0_14
Enable ACA by default for psp v13_0_6/v13_0_14.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:15:58 -04:00
ganglxie
a4b6e990d7 drm/amdgpu: Save PA of bad pages for old asics
for old asics that do not support mca translating, we
just save PA for them

Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:13:08 -04:00
Emily Deng
2da3af5f0b drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf.
In sriov multiple vf, Set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:13:02 -04:00
Emily Deng
8d5e70ba5d drm/amdgpu: Add amdgpu_sriov_multi_vf_mode function
Use amdgpu_sriov_multi_vf_mode to replace amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev).

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:12:52 -04:00
Lijo Lazar
357506799b drm/amdgpu: Calculate IP specific xgmi bandwidth
Use IP version specific xgmi speed/width for bandwidth calculation.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:11:46 -04:00
Harish Kasiviswanathan
8a7820c072 drm/amdgpu: Reduce dequeue retry timeout for gfx9 family
Dequeue retry timeout controls the interval between checks for unmet
conditions. On MI series, reduce this from 0x40 to 0x1 (~ 1 uS). The
cost of additional bandwidth consumed by CP when polling memory
shouldn't be substantial.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:10:38 -04:00
Alex Deucher
fc3c139cf0 drm/amdgpu/gfx12: don't read registers in mqd init
Just use the default values.  There's not need to
get the value from hardware and it could cause problems
if we do that at runtime and gfxoff is active.

Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:10:30 -04:00
Alex Deucher
e27b36ea6b drm/amdgpu/gfx11: don't read registers in mqd init
Just use the default values.  There's not need to
get the value from hardware and it could cause problems
if we do that at runtime and gfxoff is active.

Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:10:25 -04:00
Lijo Lazar
b9e75bcb2b drm/amdgpu: Remove unsupported xgmi versions
XGMI v4.8.0 is not used in any SOCs. Remove the associated functions.
Also, ensure get_xgmi_info callback pointer is not NULL before calling
the function.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:10:08 -04:00
David Rosca
19478f2011 drm/amdgpu: Update SRIOV video codec caps
There have been multiple fixes to the video caps that are missing for
SRIOV. Update the SRIOV caps with correct values.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:08:51 -04:00
David Rosca
0a6e7b06bd drm/amdgpu: Remove JPEG from vega and carrizo video caps
JPEG is only supported for VCN1+.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:08:25 -04:00
David Rosca
6e0d2fde3a drm/amdgpu: Fix JPEG video caps max size for navi1x and raven
8192x8192 is the maximum supported resolution.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:08:14 -04:00
David Rosca
1a0807feb9 drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size
1920x1088 is the maximum supported resolution.

Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:07:52 -04:00
Natalie Vock
6cc30748e1 drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags
PRT BOs may not have any backing store, so bo->tbo.resource will be
NULL. Check for that before dereferencing.

Fixes: 0cce5f285d ("drm/amdkfd: Check correct memory types for is_system variable")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3e3fcd29b5)
Cc: stable@vger.kernel.org # 6.12.x
2025-03-12 14:59:21 -04:00
Natalie Vock
3e3fcd29b5 drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags
PRT BOs may not have any backing store, so bo->tbo.resource will be
NULL. Check for that before dereferencing.

Fixes: 0cce5f285d ("drm/amdkfd: Check correct memory types for is_system variable")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-11 12:37:10 -04:00
Alexandre Demers
37c890d831 drm/amdgpu: finish wiring up sid.h in DCE6
For coherence with DCE8 et DCE10, add or move some values under sid.h
and remove duplicated from si_enums.h.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-11 12:37:07 -04:00
Alexandre Demers
760632fa2e drm/amdgpu: fix SI's GB_ADDR_CONFIG_GOLDEN values and wire up sid.h in GFX6
By wiring up sid.h in GFX6, we end up with a few duplicated defines such as
the golden registers. Let's clean this up.

[TAHITI,VERDE, HAINAN]_GB_ADDR_CONFIG_GOLDEN were defined both in sid.h
and under si_enums.h, with different values. Keep the values used under radeon
and move them under gfx_v6_0.c where they are used (as it is done under cik)

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-11 12:36:48 -04:00
Alexandre Demers
20fb56dfd8 drm/amdgpu: prepare DCE6 uniformisation with DCE8 and DCE10
Let's begin the cleanup in sid.h to prevent warnings and errors when wiring
sid.h into dce_v6_0.c.

This is a bigger cleanup.
Many defines found under sid.h have already been properly moved
into the different "_d.h" and "_sh_mask.h", so they should have been
already removed from sid.h and properly linked in where needed.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-11 12:36:42 -04:00
Dan Carpenter
0ee560d71f drm/amdgpu/gfx: delete stray tabs
These lines are indented one tab too far.  Delete the extra tabs.

Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-11 12:36:29 -04:00
André Almeida
099f273eff drm/amdgpu: Trigger a wedged event for ring reset
Instead of only triggering a wedged event for complete GPU resets,
trigger for ring resets. Regardless of the reset, it's useful for
userspace to know that it happened because the kernel will reject
further submissions from that app.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-10 14:56:25 -04:00
Alex Deucher
ded6ad4c6e drm/amdgpu/vce2: fix ip block reference
Need to use the correct IP block type.  VCE vs VCN.
Fixes mclk issues on Hawaii.

Suggested by selendym.

Fixes: 82ae6619a4 ("drm/amdgpu: update the handle ptr in wait_for_idle")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3997
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 02438acd25)
Cc: stable@vger.kernel.org
2025-03-10 14:18:04 -04:00
Mario Limonciello
4afacc9948 drm/amd: Keep display off while going into S4
When userspace invokes S4 the flow is:

1) amdgpu_pmops_prepare()
2) amdgpu_pmops_freeze()
3) Create hibernation image
4) amdgpu_pmops_thaw()
5) Write out image to disk
6) Turn off system

Then on resume amdgpu_pmops_restore() is called.

This flow has a problem that because amdgpu_pmops_thaw() is called
it will call amdgpu_device_resume() which will resume all of the GPU.

This includes turning the display hardware back on and discovering
connectors again.

This is an unexpected experience for the display to turn back on.
Adjust the flow so that during the S4 sequence display hardware is
not turned back on.

Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 68bfdc8dc0)
2025-03-10 14:09:09 -04:00
Alex Deucher
02438acd25 drm/amdgpu/vce2: fix ip block reference
Need to use the correct IP block type.  VCE vs VCN.
Fixes mclk issues on Hawaii.

Suggested by selendym.

Fixes: 82ae6619a4 ("drm/amdgpu: update the handle ptr in wait_for_idle")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3997
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-10 13:36:21 -04:00
Alex Deucher
2c01befe4a drm/amdgpu/vcn: fix idle work handler for VCN 2.5
VCN 2.5 uses the PG callback to enable VCN DPM which is
a global state.  As such, we need to make sure all instances
are in the same state.

v2: switch to a ref count (Lijo)
v3: switch to its own idle work handler
v4: fix logic in DPG handling

Fixes: 4ce4fe2720 ("drm/amdgpu/vcn: use per instance callbacks for idle work handler")
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-10 13:22:05 -04:00
Marek Olšák
3855f1d925 drm/amd/display: allow 256B DCC max compressed block sizes on gfx12
The hw supports it.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-10 13:22:01 -04:00
Greg Kroah-Hartman
993a47bd7b Linux 6.14-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfOKBUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1aQH/iC+Oyij4VxAjBek
 BOXIT/p6CwlIXb8ObiWWcRjDPizlcxb3RaV8J2RO+IqaQ2wltxpFANq2G7Re2FPm
 SNcEpIURAOVcxHGedcfFA91srO5F4FzNTO8LVp7MIbcgMYy3pdk+dbZmi6A691R+
 t9pb74m+MAnF1o/MUx7pUlhAT/4ymuuR0F7WCSg4h0Xwe5m0nlJY89kJBC7PCjyd
 n3mdhsz3rDSLmt/z/T7HGD89r8sYSvm9cOKtL3ELgGTrm7boQV8ii9Y9w04DI8PQ
 JmIernugcCxmhH36mVUAHgJf2+/T388xFUh/D5+skeUOUZpaJZG866rnb32WpsHc
 eWLFUeg=
 =Wypt
 -----END PGP SIGNATURE-----

Merge 6.14-rc6 into driver-core-next

We need the driver core fix in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-10 17:37:25 +01:00
Dave Airlie
236f475d29 amdgpu:
- Fix spelling typos
 - RAS updates
 - VCN 5.0.1 updates
 - SubVP fixes
 - DCN 4.0.1 fixes
 - MSO DPCD fixes
 - DIO encoder refactor
 - PCON fixes
 - Misc cleanups
 - DMCUB fixes
 - USB4 DP fixes
 - DM cleanups
 - Backlight cleanups and fixes
 - Support platform backlight curves
 - Misc code cleanups
 - SMU 14 fixes
 - JPEG 4.0.3 reset updates
 - SR-IOV fixes
 - SVM fixes
 - GC 12 DCC fixes
 - DC DCE 6.x fix
 - Hiberation fix
 
 amdkfd:
 - Fix possible NULL pointer in queue validation
 - Remove unnecessary CP domain validation
 - SDMA queue reset support
 - Add per process flags
 
 radeon:
 - Fix spelling typos
 - RS400 hyperZ fix
 
 UAPI:
 - Add KFD per process flags for setting precision
   Proposed user space: 2a64fa5e06
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ8tfmgAKCRC93/aFa7yZ
 2CmkAP4xoy09aHecz6WOdNN9VBJiSMSZ3UOyBL9Ll2i5nI3YegEA7Co+AmQpXOPn
 HKeD6Vy1tt1XE/Liuur5dbLUKsVYSAI=
 =IQ45
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.15-2025-03-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amdgpu:
- Fix spelling typos
- RAS updates
- VCN 5.0.1 updates
- SubVP fixes
- DCN 4.0.1 fixes
- MSO DPCD fixes
- DIO encoder refactor
- PCON fixes
- Misc cleanups
- DMCUB fixes
- USB4 DP fixes
- DM cleanups
- Backlight cleanups and fixes
- Support platform backlight curves
- Misc code cleanups
- SMU 14 fixes
- JPEG 4.0.3 reset updates
- SR-IOV fixes
- SVM fixes
- GC 12 DCC fixes
- DC DCE 6.x fix
- Hiberation fix

amdkfd:
- Fix possible NULL pointer in queue validation
- Remove unnecessary CP domain validation
- SDMA queue reset support
- Add per process flags

radeon:
- Fix spelling typos
- RS400 hyperZ fix

UAPI:
- Add KFD per process flags for setting precision
  Proposed user space: 2a64fa5e06

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250307211051.1880472-1-alexander.deucher@amd.com
2025-03-10 09:04:52 +10:00
Mario Limonciello
68bfdc8dc0 drm/amd: Keep display off while going into S4
When userspace invokes S4 the flow is:

1) amdgpu_pmops_prepare()
2) amdgpu_pmops_freeze()
3) Create hibernation image
4) amdgpu_pmops_thaw()
5) Write out image to disk
6) Turn off system

Then on resume amdgpu_pmops_restore() is called.

This flow has a problem that because amdgpu_pmops_thaw() is called
it will call amdgpu_device_resume() which will resume all of the GPU.

This includes turning the display hardware back on and discovering
connectors again.

This is an unexpected experience for the display to turn back on.
Adjust the flow so that during the S4 sequence display hardware is
not turned back on.

Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 15:33:49 -05:00
Alexandre Demers
092da9fb25 drm/amdgpu: add defines for pin_offsets in DCE8
Define pin_offsets values in the same way it is done in DCE8

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:56:08 -05:00
Srinivasan Shanmugam
9c551ca3db drm/amdgpu: Fix annotation for dce_v6_0_line_buffer_adjust function
Updated description for the 'other_mode' parameter. This parameter is
used to determine the display mode of another display controller that
may be sharing the line buffer.

Cc: Ken Wang <Qingqing.Wang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:56:04 -05:00
Xiang Liu
148084bbb1 drm/amdgpu: Use unique CPER record id across devices
Encode socket id to CPER record id to be unique across devices.

v2: add pointer check for adev->smuio.funcs->get_socket_id
v2: set 0 if adev->smuio.funcs->get_socket_id is NULL

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:54:08 -05:00
Shiwu Zhang
216be476f1 drm/amdgpu: fix the gb_addr_config_fields init value mismatch
For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten
in gfx hw_init it is read out to popluate the gb_addr_config_fields
in the sw_init stage, which causes mismatch.

Fix it by using the golden value in sw_init as well.

v2: This is a driver-set golden reg and keep as it is (Lijo)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:53:58 -05:00
Shiwu Zhang
3bc7bc73af drm/amdgpu: retire ip init code specific for A0 rev
For aqua_vanjaram, A0 HW is retired so remove the code
specific for it in gfx ip init.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:53:43 -05:00
Tao Zhou
334dc5fcc3 drm/amdgpu: increase RAS bad page threshold
For default policy, driver will issue an RMA event when the number of
bad pages is greater than 8 physical rows, rather than reaches 8
physical rows, don't rely on threshold configurable parameters in
default mode.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:53:39 -05:00
Emily Deng
fe2fa3be3d drm/amdgpu: Fix missing drain retry fault the last entry
While the entry get in svm_range_unmap_from_cpu is the last entry, and
the entry is page fault, it also need to be dropped. So for equal case,
it also need to be dropped.

v2:
Only modify the svm_range_restore_pages.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Xiaogang Chen<xiaogang.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:53:30 -05:00
Victor Lu
94b0908b85 drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOV
Aldebaran SRIOV VF cannot access the power brake feature regs.
The accesses can be skipped to avoid a dmesg warning.

v2: Remove redundant asic type check

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:53:24 -05:00
Charles Han
571d36837c drm/amdgpu: fix inconsistent indenting warning
Fix below inconsistent indenting smatch warning.
smatch warnings:
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:582 amdgpu_sdma_reset_engine() warn: inconsistent indenting

Signed-off-by: Charles Han <hanchunchao@inspur.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:53:06 -05:00
Victor Lu
3646cc65e2 drm/amdgpu: Do not write to GRBM_CNTL if Aldebaran SRIOV
Aldebaran SRIOV VF does not have write permissions to GRBM_CTNL.
This access can be skipped to avoid a dmesg warning.

v2: Use GC IP version check instead of asic check

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 12:52:29 -05:00
Sathishkumar S
a29936bcd2 drm/amdgpu: Fix core reset sequence for JPEG5_0_1
For cores 1 through 9 repair the core reset sequence by
adjusting offsets to access the expected registers.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:47:36 -05:00
Jonathan Kim
bac38ca8c4 drm/amdkfd: implement per queue sdma reset for gfx 9.4+
To reset hung SDMA queues on GFX 9.4+ for the GFX9 family, a soft reset
must be issued through SMU.  Since soft resets will reset an entire SDMA
engine, use a common KGD call to do the reset as the KGD will handle
avoiding a reset of in flight GFX and paging queues on that engine.

In addition, create a common call for all reset types to simplify
the handling of module parameter settings that block gpu resets.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:47:26 -05:00
Victor Lu
057fef20b8 drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c
SRIOV VF does not have write access to AGP BAR regs.
Skip the writes to avoid a dmesg warning.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:47:21 -05:00
Sathishkumar S
20c34e5c4a drm/amdgpu: Fix core reset sequence for JPEG4_0_3
For cores 1 through 7 repair the core reset sequence by
adjusting offsets to access the expected registers.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:47:07 -05:00
Tony Yi
a91d91b600 drm/amdgpu: Add support for CPERs on virtualization
Add support for CPERs on VFs.

VFs do not receive PMFW messages directly; as such, they need to
query them from the host. To avoid hitting host event guard,
CPER queries need to be rate limited. CPER queries share the same
RAS telemetry buffer as error count query, so a mutex protecting
the shared buffer was added as well.

For readability, the amdgpu_detect_virtualization was refactored
into multiple individual functions.

Signed-off-by: Tony Yi <Tony.Yi@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:47:03 -05:00
James Zhu
ca17c8e149 drm/amdkfd: remove unnecessary cpu domain validation
before move to GTT domain.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:46:55 -05:00
Tony Yi
d4c60219ac drm/amdgpu: Update headers for CPER support on SRIOV
Update amdgv_sriovmsg.h and mxgpu_nv.h to add new definitions for
CPER support on VFs. PMFW ACA messages are not available on VFs,
and VFs must query CPERs from host.

Signed-off-by: Tony Yi <Tony.Yi@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:46:40 -05:00
Lijo Lazar
6ef5ccaad7 drm/amdgpu: Reinit FW shared flags on VCN v5.0.1
After a full device reset, shared memory region will clear out and it's
not possible to reliably save the region in case of RAS errors.
Reinitialize the flags if required.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:46:33 -05:00
Lijo Lazar
6e09402098 drm/amdgpu: Use the right struct for VCN v5.0.1
VCN IP versions >= 5.0 uses VCN5 fw shared struct.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:46:26 -05:00
Alexandre Demers
ab23db6d08 drm/amdgpu: add dce_v6_0_soft_reset() to DCE6
DCE6 was missing soft reset, but it was easily identifiable under radeon.
This should be it, pretty much as it is done under DCE8 and DCE10.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:45:30 -05:00
Alexandre Demers
5f6021d52b drm/amdgpu: fix style in DCE6
Whitespace cleanups.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:45:27 -05:00
Alexandre Demers
029ab8cabd drm/amdgpu: add some comments in DCE6
Add some comments.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:45:23 -05:00
Asad Kamal
8df5f03be5 drm/amdgpu: Set PG state to gating for vcn_v_5_0_1
For vcn_v_5_0_1, set power state to gating during hw fini. Also there may
be scenario where VCN engine hangs during a job execution, then it's not
safe to assume that set_pg_state works fine during hw_fini to put the state
to gated. After a reset, we can assume that it's in the default state,
therefore reset the driver maintained state. Put the default state as gated
during reset as per this assumption.

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:44:56 -05:00
Mario Limonciello
f729e63743 drm/amd: Pass luminance data to amdgpu_dm_backlight_caps
The ATIF method on some systems will provide a backlight curve. Pass
this curve into amdgpu_dm add it to the structures.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Link: https://lore.kernel.org/r/20250228185145.186319-3-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:42:42 -05:00
Mario Limonciello
1c79b5fcdf drm/amd: Copy entire structure in amdgpu_acpi_get_backlight_caps()
As new members are introduced to the structure copying the entire
structure will help avoid missing them.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Link: https://lore.kernel.org/r/20250228185145.186319-2-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:42:37 -05:00
Lijo Lazar
a734a717dc drm/amdgpu: Avoid HDP flush on JPEG v5.0.1
Similar to JPEG v4.0.3, HDP flush shouldn't be performed by JPEG engine.
Keep it empty.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:38:01 -05:00
Lijo Lazar
6fcfaac604 drm/amdgpu: Initialize RRMT status on JPEG v5.0.1
Initialize RRMT enablement status from register.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:37:56 -05:00
Jesse.zhang@amd.com
77bd621d14 drm/amdgpu: Update SDMA scheduler mask handling to include page queue
This patch updates the SDMA scheduler mask handling to include the page queue
if it exists. The scheduler mask is calculated based on the number of SDMA
instances and the presence of the page queue. The mask is updated to reflect
the state of both the SDMA gfx ring and the page queue.

Changes:
- Add handling for the SDMA page queue in `amdgpu_debugfs_sdma_sched_mask_set`.
- Update scheduler mask calculations to include the page queue.
- Modify `amdgpu_debugfs_sdma_sched_mask_get` to return the correct mask value.

This change is necessary to verify multiple queues (SDMA gfx queue + page queue)
and ensure proper scheduling and state management for SDMA instances.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:37:42 -05:00
Lijo Lazar
0b9647d40e drm/amdgpu: Add offset normalization in VCN v5.0.1
VCN v5.0.1 also will need register offset normalization. Reuse the logic
from VCN v4.0.3. Also, avoid HDP flush similar to VCN v4.0.3

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:37:35 -05:00
Lijo Lazar
b5a3fc54e8 drm/amdgpu: Initialize RRMT status on VCN v5.0.1
Initialize RRMT status from register.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:37:31 -05:00
Xiang Liu
677ae51f49 drm/amdgpu: Free CPER entry after committing to ring
Free CPER entry when it's committed to CPER ring to avoid memory leak.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:37:27 -05:00
Alexandre Demers
899634a57a drm/amdgpu: fix spelling typos in SI
Fix typos

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:37:23 -05:00
Alexandre Demers
ce43abd7ec drm/amdgpu: fix spelling typos
Found some typos while exploring amdgpu code.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:37:13 -05:00
Dave Airlie
e21cba7047 drm-misc-next for v6.15:
Cross-subsystem Changes:
 
 bus:
 - mhi: Avoid access to uninitialized field
 
 Core Changes:
 
 - Fix docmentation
 
 dp:
 - Add helpers for LTTPR transparent mode
 
 sched:
 - Improve job peek/pop operations
 - Optimize layout of struct drm_sched_job
 
 Driver Changes:
 
 arc:
 - Convert to devm_platform_ioremap_resource()
 
 aspeed:
 - Convert to devm_platform_ioremap_resource()
 
 bridge:
 - ti-sn65dsi86: Support CONFIG_PWM tristate
 
 i915:
 - dp: Use helpers for LTTPR transparent mode
 
 mediatek:
 - Convert to devm_platform_ioremap_resource()
 
 msm:
 - dp: Use helpers for LTTPR transparent mode
 
 nouveau:
 - dp: Use helpers for LTTPR transparent mode
 
 panel:
 - raydium-rm67200: Add driver for Raydium RM67200
 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
 - sony-td4353-jdi: Use MIPI-DSI multi-func interface
 - summit: Add driver for Apple Summit display panel
 - visionox-rm692e5: Add driver for Visionox RM692E5
 
 repaper:
 - Fix integer overflows
 
 stm:
 - Convert to devm_platform_ioremap_resource()
 
 vc4:
 - Convert to devm_platform_ioremap_resource()
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmfAMnwACgkQaA3BHVML
 eiOszgf+NCaYdCCKXurnKy2DVHZ61ar+Fg/FHYs1oThFVqknRT9CBGY93g3I7VGs
 gYUATrH2fYt4daIF94ap4YLYuMrZSKXSd0duzr8lQzCVAb/UgvUap9yjGwJsCOZA
 bE2jxlKvo6cUJbq19+ATgrAQ7y31/w30PCPyBjMGR0t4C3bBbbCHpxFyKzDWCtFT
 JS6pNbCPblZCCvtFPmQRqQqLSE0K5x4SE49ZpHM6hhA4MavlwoN4ZkWincliSzF2
 XVCo7QHpVy3SItshhOB7ch7pIdkbm6nUvW2poR3UdXe+B+OSKRIz0C+2E98bYAra
 vwkyWoUP2A9nKZBsNy2d5026DeUDwg==
 =KURF
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2025-02-27' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.15:

Cross-subsystem Changes:

bus:
- mhi: Avoid access to uninitialized field

Core Changes:

- Fix docmentation

dp:
- Add helpers for LTTPR transparent mode

sched:
- Improve job peek/pop operations
- Optimize layout of struct drm_sched_job

Driver Changes:

arc:
- Convert to devm_platform_ioremap_resource()

aspeed:
- Convert to devm_platform_ioremap_resource()

bridge:
- ti-sn65dsi86: Support CONFIG_PWM tristate

i915:
- dp: Use helpers for LTTPR transparent mode

mediatek:
- Convert to devm_platform_ioremap_resource()

msm:
- dp: Use helpers for LTTPR transparent mode

nouveau:
- dp: Use helpers for LTTPR transparent mode

panel:
- raydium-rm67200: Add driver for Raydium RM67200
- simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
- sony-td4353-jdi: Use MIPI-DSI multi-func interface
- summit: Add driver for Apple Summit display panel
- visionox-rm692e5: Add driver for Visionox RM692E5

repaper:
- Fix integer overflows

stm:
- Convert to devm_platform_ioremap_resource()

vc4:
- Convert to devm_platform_ioremap_resource()

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227094041.GA114623@linux.fritz.box
2025-02-28 12:36:01 +10:00
Srinivasan Shanmugam
7d83c129a8 drm/amdgpu: Fix parameter annotation in vcn_v5_0_0_is_idle
Update parameter description in the vcn_v5_0_0_is_idle function

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:05 -05:00
Srinivasan Shanmugam
0f3fda3117 drm/amdgpu: Fix parameter annotations for VCN clock gating functions
The previous references to a non-existent `adev` parameter have been
removed & corrected to reflect the use of the `vinst` pointer, which
points to the VCN instance structure, in the below files:

- vcn_v1_0.c
- vcn_v2_0.c
- vcn_v3_0.c

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Function parameter or struct member 'vinst' not described in 'vcn_v1_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Excess function parameter 'adev' description in 'vcn_v1_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Function parameter or struct member 'vinst' not described in 'vcn_v2_0_mc_resume'
drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Excess function parameter 'adev' description in 'vcn_v2_0_mc_resume'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_disable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'adev' description in 'vcn_v3_0_disable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'inst' description in 'vcn_v3_0_disable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'adev' description in 'vcn_v3_0_enable_clock_gating'
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'inst' description in 'vcn_v3_0_enable_clock_gating'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:05 -05:00
Asad Kamal
f9234217d0 drm/amd/amdgpu: Add support for xgmi_v6_4_1
Add support for xgmi_v6_4_1 and use it appropriate places

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:04 -05:00
Lijo Lazar
485993e2f1 drm/amdgpu: Add xgmi speed/width related info
Add APIs to initialize XGMI speed, width details and get to max
bandwidth supported. It is assumed that a device only supports same
generation of XGMI links with uniform width.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:04 -05:00
Lijo Lazar
6f16d101da drm/amdgpu: Move xgmi definitions to xgmi header
Move definitions related to xgmi to amdgpu_xgmi header

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:04 -05:00
André Almeida
9c696cc57c drm/amdgpu: Create a debug option to disable ring reset
Prior to the addition of ring reset, the debug option
`debug_disable_soft_recovery` could be used to force a full device
reset. Now that we have ring reset, create a debug option to disable
them in amdgpu, forcing the driver to go with the full device
reset path again when both options are combined.

This option is useful for testing and debugging purposes when one wants
to test the full reset from userspace.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:04 -05:00
Alex Deucher
1d72fc2e9e drm/amdgpu/mes11: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
They are noops on GFX11 for most firmware versions. KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:03 -05:00
Colin Ian King
eaa3feb16d drm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammar
There is a spelling mistake and a grammatical error in a dev_err
message. Fix it.

Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:03 -05:00
Xiang Liu
00f85667fa drm/amdgpu: Decode deferred error type in aca bank parser
In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.

v2: check both deferred and poison bit

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:03 -05:00
Le Ma
5b5f01eff7 drm/amdgpu: add sdma page queue irq processing for sdma442
Add the trap irq processing for page queue of sdma442

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by and Tested-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:03 -05:00
Xiang Liu
d4bd7a50ca drm/amdgpu: Report generic instead of unknown boot time errors
Change the DMESG reporting of unknown errors to "Boot Controller
Generic Error" to align with the RAS SPEC and provide more clarity
to customers.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:03 -05:00
Lijo Lazar
b965e42530 drm/amdgpu: Fix logic to fetch supported NPS modes
Correct the logic to find supported NPS modes from firmware.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Ava Zhang <niandong.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Fixes: 30eb41f5d1 ("drm/amdgpu: Use firmware supported NPS modes")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:03 -05:00
Xiang Liu
906d2859e1 drm/amdgpu: Disable fru_id field in CPER section
The fru_id field is disabled cause of mis-matching defination
between CPER spec and driver.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:03 -05:00
Benjamin Chan
dce1b82398 drm/amdgpu: Add amdisp pinctrl MFD resource
AMDISP GPIO control uses a dedicated pinctrl driver,
and requires MFD hotadd GPIO resources.

Co-developed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Benjamin Chan <benjamin.chan@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:49 -05:00
Alex Deucher
4343f814e5 drm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
They are noops on GFX12.  There is no suspend/resume all support
in firmware so the function doesn't do anything.  KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:43 -05:00
Pratap Nirujogi
a67e75beff drm/amdgpu: Replace DRM_ERROR() with drm_err()
DRM_ERROR() is no longer preferred. Replace DRM_ERROR() usage
with drm_err() in isp driver.

Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:36 -05:00
Alex Deucher
4d1b653571 drm/amdgpu/vcn: use dev_info() for firmware information
To properly handle multiple GPUs.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:32 -05:00
Alex Deucher
c51aa7923e drm/amdgpu/vcn: optimize firmware storage
If each instance uses the same fw image, only store one
copy in the driver.

Acked-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:32 -05:00
Alex Deucher
31a37dfc8f drm/amdgpu/vcn5.0.1: use generic set_power_gating_state helper
No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:32 -05:00
Alex Deucher
9b648fa54c drm/amdgpu/vcn5.0.0: use generic set_power_gating_state helper
No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:32 -05:00
Alex Deucher
4bb5879322 drm/amdgpu/vcn4.0.5: use generic set_power_gating_state helper
No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
1ee6b2bff2 drm/amdgpu/vcn4.0.3: use generic set_power_gating_state helper
No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
8bdfa5756b drm/amdgpu/vcn4.0: use generic set_power_gating_state helper
No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
38c0d9882a drm/amdgpu/vcn3.0: use generic set_power_gating_state helper
No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
bd32af6faa drm/amdgpu/vcn2.5: use generic set_power_gating_state helper
No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
3389dd059f drm/amdgpu/vcn2.0: use generic set_power_gating_state helper
No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
cac3dc89f2 drm/amdgpu/vcn1.0: use generic set_power_gating_state helper
No need for an IP specific version.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
a2cf2a883c drm/amdgpu/vcn: add a generic helper for set_power_gating_state
It's common for all VCN variants.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
4ce4fe2720 drm/amdgpu/vcn: use per instance callbacks for idle work handler
Use the vcn instance power gating callbacks rather than
the IP powergating callback.  This limits power gating to
only the instance in use rather than all of the instances.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
592846e3fe drm/amdgpu/vcn5.0.1: add set_pg_state callback
Rework the code as a vcn instance callback.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
f2eb0a66ca drm/amdgpu/vcn5.0.0: add set_pg_state callback
Rework the code as a vcn instance callback.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
f9993efed7 drm/amdgpu/vcn4.0.5: add set_pg_state callback
Rework the code as a vcn instance callback.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
39fb77a8d3 drm/amdgpu/vcn4.0.3: add set_pg_state callback
Rework the code as a vcn instance callback.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:31 -05:00
Alex Deucher
8b18f03142 drm/amdgpu/vcn4.0: add set_pg_state callback
Rework the code as a vcn instance callback.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
bda37b68f6 drm/amdgpu/vcn3.0: add set_pg_state callback
Rework the code as a vcn instance callback.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
307ce8bdc6 drm/amdgpu/vcn2.5: add set_pg_state callback
Rework the code as a vcn instance callback.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
40c6d55806 drm/amdgpu/vcn2.0: add set_pg_state callback
Rework the code as a vcn instance callback.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
c5ed3655cd drm/amdgpu/vcn1.0: add set_pg_state callback
Rework the code as a vcn instance callback.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
55945f08d9 drm/amdgpu/vcn: add new per instance callback for powergating
This is per instance so add a new function pointer for it.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
64303b72de drm/amdgpu/vcn: adjust pause_dpg_mode function signature
Change it to take a vcn instance rather than adev to align
with the vcn instance changes.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
0a3fb7338f drm/amdgpu/vcn5.0.1: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
e3eb71cd69 drm/amdgpu/vcn5.0.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
c07c0c0df9 drm/amdgpu/vcn4.0.5: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:30 -05:00
Alex Deucher
4a23b9c670 drm/amdgpu/vcn4.0.3: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
259873561f drm/amdgpu/vcn4.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
f1ab687040 drm/amdgpu/vcn2.5: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
38a404f8af drm/amdgpu/vcn2.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
201fee333d drm/amdgpu/vcn1.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
710151263c drm/amdgpu/vcn3.0: convert internal functions to use vcn_inst
Pass the vcn instance structure to these functions rather
than adev and the instance number.

TODO: clean up the function internals to use the vinst state
directly rather than accessing it indirectly via adev->vcn.inst[].

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
f98675638f drm/amdgpu/vcn: switch vcn helpers to be instance based
Pass the instance to the helpers.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
cb10727168 drm/amdgpu/vcn: move more instanced data to vcn_instance
Move more per instance data into the per instance structure.

v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)
v3: fix typo on vcn 2.5

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
9bf9442051 drm/amdgpu/vcn: make powergating status per instance
Store it per instance so we can track it per instance.

v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
bee48570cf drm/amdgpu/vcn: switch work handler to be per instance
Have a separate work handler for each VCN instance. This
paves the way for per instance VCN power gating at runtime.

v2: index instances directly on vcn1.0 and 2.0 to make
it clear that they only support a single instance (Lijo)

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:29 -05:00
Alex Deucher
94629182f3 drm/amdgpu/vcn5.0.1: split code along instances
Split the code on a per instance basis.  This will allow
us to use the per instance functions in the future to
handle more things per instance.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:28 -05:00
Alex Deucher
0797c54502 drm/amdgpu/vcn5.0.0: split code along instances
Split the code on a per instance basis.  This will allow
us to use the per instance functions in the future to
handle more things per instance.

v2: squash in fix for stop() from Boyuan

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:28 -05:00
Alex Deucher
ecc9ab4e92 drm/amdgpu/vcn4.0.5: split code along instances
Split the code on a per instance basis.  This will allow
us to use the per instance functions in the future to
handle more things per instance.

v2: squash in fix for stop() from Boyuan

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:28 -05:00
Alex Deucher
5826d5a5d5 drm/amdgpu/vcn4.0.3: split code along instances
Split the code on a per instance basis.  This will allow
us to use the per instance functions in the future to
handle more things per instance.

v2: squash in fix for stop() from Boyuan

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:28 -05:00
Alex Deucher
f4cd7a85db drm/amdgpu/vcn4.0: split code along instances
Split the code on a per instance basis.  This will allow
us to use the per instance functions in the future to
handle more things per instance.

v2: squash in fix for stop() from Boyuan

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:28 -05:00
Alex Deucher
d39f1bb577 drm/amdgpu/vcn3.0: split code along instances
Split the code on a per instance basis.  This will allow
us to use the per instance functions in the future to
handle more things per instance.

v2: squash in fix for stop() from Boyuan

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:28 -05:00
Alex Deucher
dae8700198 drm/amdgpu/vcn2.5: fix VCN stop logic
Need to make sure we call amdgpu_dpm_enable_vcn()
in vcn_v2_5_stop() at the end if there are errors
or DPG is enabled.

Fixes: ebc25499de ("drm/amdgpu/vcn2.5: split code along instances")
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Suggested-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:11 -05:00
Dave Airlie
425b848175 amd-drm-next-6.15-2025-02-21:
amdgpu:
 - Add OEM i2c support for RGB lights, etc.
 - Add support for GC 11.5.3
 - Add support for GC 11.5.2
 - Add support for SDMA 6.1.3
 - Add support for NBIO 7.11.2
 - Add support for NBIO 7.9.1
 - Add support for MMHUB 3.3.2
 - Add support for MMHUB 1.8.1
 - Add support for SMU 14.0.5
 - Add support for SMUIO 13.0.11
 - Add support for PSP 14.0.5
 - Add support for UMC 12.5.0
 - Add support for DCN 3.6.0
 - JPEG 4.0.3 updates
 - Add dynamic workload profile switching for GC 10-12
 - support larger vbios sizes
 - GC 9.5.0 updates
 - SMU 13.0.12 updates
 - SMU 13.0.6 updates
 - IP discovery updates
 - GC 10 queue reset updates
 - DCN 4.0.1 updates
 - UHBR link rate fixes
 - Aborted suspend fix
 - Mark gttsize parameter as deprecated
 - GC 10 cleaner shader updates
 - PSR-SU fixes
 - Clean up PM4 headers
 - Cursor fixes
 - Enable devcoredump for JPEG
 - Misc cleanups
 - Runpm cleanups
 - MES updates
 - GC 9 gfxoff fixes
 - Vbios fetching cleanups
 - Documentation updates
 - Update secondary plane handling
 - DML2 updates
 - SDMA fixes for MI
 - Cleaner shader fixes for GC 11/12
 - ACA updates
 - Initial JPEG queue reset support
 - RAS updates
 - Initial RAS CPER support
 - DCN/DCE panic screen handling cleanup
 - BT2020 fixes
 - SR-IOV fixes
 
 amdkfd:
 - synchronize pasid values between KGD and KFD
 - Misc cleanups
 - Improve GTT/VRAM handling for APUs
 - Topology updates
 - Fix user queue validation on GC 7/8
 
 UAPI:
 - Enable "Broadcast RGB" drm property
 - Add INFO IOCTL query for virtualization mode
   Proposed userspace:
   e663bed7d6
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ7jwVwAKCRC93/aFa7yZ
 2EBmAP42gCX0ESuLJlwLTrobhs9mALp5g6AClLTSBuxNcgfOJwEAvqq12R2db8F9
 hFF3rq07C8YaL3bMIi3IhqIXVsnUBgo=
 =uvFs
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.15-2025-02-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.15-2025-02-21:

amdgpu:
- Add OEM i2c support for RGB lights, etc.
- Add support for GC 11.5.3
- Add support for GC 11.5.2
- Add support for SDMA 6.1.3
- Add support for NBIO 7.11.2
- Add support for NBIO 7.9.1
- Add support for MMHUB 3.3.2
- Add support for MMHUB 1.8.1
- Add support for SMU 14.0.5
- Add support for SMUIO 13.0.11
- Add support for PSP 14.0.5
- Add support for UMC 12.5.0
- Add support for DCN 3.6.0
- JPEG 4.0.3 updates
- Add dynamic workload profile switching for GC 10-12
- support larger vbios sizes
- GC 9.5.0 updates
- SMU 13.0.12 updates
- SMU 13.0.6 updates
- IP discovery updates
- GC 10 queue reset updates
- DCN 4.0.1 updates
- UHBR link rate fixes
- Aborted suspend fix
- Mark gttsize parameter as deprecated
- GC 10 cleaner shader updates
- PSR-SU fixes
- Clean up PM4 headers
- Cursor fixes
- Enable devcoredump for JPEG
- Misc cleanups
- Runpm cleanups
- MES updates
- GC 9 gfxoff fixes
- Vbios fetching cleanups
- Documentation updates
- Update secondary plane handling
- DML2 updates
- SDMA fixes for MI
- Cleaner shader fixes for GC 11/12
- ACA updates
- Initial JPEG queue reset support
- RAS updates
- Initial RAS CPER support
- DCN/DCE panic screen handling cleanup
- BT2020 fixes
- SR-IOV fixes

amdkfd:
- synchronize pasid values between KGD and KFD
- Misc cleanups
- Improve GTT/VRAM handling for APUs
- Topology updates
- Fix user queue validation on GC 7/8

UAPI:
- Enable "Broadcast RGB" drm property
- Add INFO IOCTL query for virtualization mode
  Proposed userspace:
  e663bed7d6

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221213651.4176031-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-02-26 16:41:44 +10:00
Pierre-Eric Pelloux-Prayer
d3c7059b6a drm/amdgpu: init return value in amdgpu_ttm_clear_buffer
Otherwise an uninitialized value can be returned if
amdgpu_res_cleared returns true for all regions.

Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812

Fixes: a68c7eaa7a ("drm/amdgpu: Enable clear page functionality")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7c62aacc3b)
Cc: stable@vger.kernel.org
2025-02-25 12:29:12 -05:00
Alex Deucher
748a1f51bb drm/amdgpu/mes: keep enforce isolation up to date
Re-send the mes message on resume to make sure the
mes state is up to date.

Fixes: 8521e3c5f0 ("drm/amd/amdgpu: limit single process inside MES")
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 27b7915147)
2025-02-25 12:23:48 -05:00
Alex Deucher
e7ea88207c drm/amdgpu/gfx: only call mes for enforce isolation if supported
This should not be called on chips without MES so check if
MES is enabled and if the cleaner shader is supported.

Fixes: 8521e3c5f0 ("drm/amd/amdgpu: limit single process inside MES")
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
(cherry picked from commit 80513e3897)
2025-02-25 12:22:56 -05:00
Alex Deucher
099bffc7ca drm/amdgpu: disable BAR resize on Dell G5 SE
There was a quirk added to add a workaround for a Sapphire
RX 5600 XT Pulse that didn't allow BAR resizing.  However,
the quirk caused a regression with runtime pm on Dell laptops
using those chips, rather than narrowing the scope of the
resizing quirk, add a quirk to prevent amdgpu from resizing
the BAR on those Dell platforms unless runtime pm is disabled.

v2: update commit message, add runpm check

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707
Fixes: 907830b0fc ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5235053f44)
Cc: stable@vger.kernel.org
2025-02-25 12:20:27 -05:00
Tao Zhou
dab993bf15 drm/amdgpu: increase AMDGPU_MAX_RINGS
Increase it since a cper ring is introduced.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:13 -05:00
Srinivasan Shanmugam
59f9c2c9f6 drm/amdgpu: Fix correct parameter desc for VCN idle check functions
Fixes the kdoc for the following VCN idle check functions by updating
the parameter description from 'handle' to 'ip_block':

- vcn_v4_0_is_idle
- vcn_v4_0_3_is_idle
- vcn_v4_0_5_is_idle
- vcn_v5_0_1_is_idle

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Excess function parameter 'handle' description in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Excess function parameter 'handle' description in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Excess function parameter 'handle' description in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_is_idle'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:13 -05:00
Pierre-Eric Pelloux-Prayer
7c62aacc3b drm/amdgpu: init return value in amdgpu_ttm_clear_buffer
Otherwise an uninitialized value can be returned if
amdgpu_res_cleared returns true for all regions.

Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812

Fixes: a68c7eaa7a ("drm/amdgpu: Enable clear page functionality")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:12 -05:00
ganglxie
a8f921a10a drm/amdgpu: Change page/record number calculation based on nps
save only one record to save eeprom space,and
bad_page_num = pa_rec_num + mca_rec_num*16

Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:12 -05:00
ganglxie
0153d27673 drm/amdgpu: Refine bad page adding
bad page adding can be simpler with nps info

Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:12 -05:00
Jesse.zhang@amd.com
e4e6ae41cc drm/amdgpu: update SDMA sysfs reset mask in late_init
- Added `sdma_v4_4_2_update_reset_mask` function to update the reset mask.
- update the sysfs reset mask to the `late_init` stage to ensure that the SMU  initialization
     and capability setup are completed before checking the SDMA reset capability.
- For IP versions 9.4.3 and 9.4.4, enable per-queue reset if the MEC firmware version is at least 0xb0 and PMFW supports queue reset.
- Add a TODO comment for future support of per-queue reset for IP version 9.5.0.

This change ensures that per-queue reset is only enabled when the MEC and PMFW support it.

v2: fix ip version (9.5.4 -> 9.5.0)(Lijo)

Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:12 -05:00
Xiang Liu
ff930483af drm/amdgpu: Set CPER enabled flag after ring initiailized
Setting cper.enabled to be true only after cper ring is successfully
created.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:12 -05:00
ganglxie
f2510355fb drm/amdgpu: Save nps to eeprom
nps info saved together with bad page makes bad page parsing more efficient

Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:12 -05:00
Xiang Liu
ce615fe328 drm/amdgpu: Check if CPER enabled when generating CPER
In the case of CPER disabled, generating CPER will cause kernel NULL
pointer dereference without checking.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:12 -05:00
Jonathan Kim
9424a5bf08 drm/amdgpu: simplify xgmi peer info calls
Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:45:12 -05:00
Dr. David Alan Gilbert
9d8af72fe7 drm/amdgpu: Remove unused nbif_v6_3_1_sriov_funcs
The nbif_v6_3_1_sriov_funcs instance of amdgpu_nbio_funcs was added in
commit 894c6d3522 ("drm/amdgpu: Add nbif v6_3_1 ip block support")
but has remained unused.

Alex has confirmed it wasn't needed.

Remove it, together with the four unused stub functions:
  nbif_v6_3_1_sriov_ih_doorbell_range
  nbif_v6_3_1_sriov_gc_doorbell_init
  nbif_v6_3_1_sriov_vcn_doorbell_range
  nbif_v6_3_1_sriov_sdma_doorbell_range

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:01 -05:00
Sathishkumar S
62431979dd drm/amdgpu: Add ring reset callback for JPEG5_0_1
Add ring reset function callback for JPEG5_0_1 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:00 -05:00
André Almeida
b7fd6528b5 drm/amdgpu: Log after a successful ring reset
When a ring reset happens, the kernel log shows only "amdgpu: Starting
<ring name> ring reset", but when it finishes nothing appears in the
log. Explicitly write in the log that the reset has finished correctly.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:00 -05:00
André Almeida
28d05f0836 drm/amdgpu: Log the creation of a coredump file
After a GPU reset happens, the driver creates a coredump file. However,
the user might not be aware of it. Log the file creation the user can
find more information about the device and add the file to bug reports.
This is similar to what the xe driver does.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:00 -05:00
Alex Deucher
27b7915147 drm/amdgpu/mes: keep enforce isolation up to date
Re-send the mes message on resume to make sure the
mes state is up to date.

Fixes: 8521e3c5f0 ("drm/amd/amdgpu: limit single process inside MES")
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:00 -05:00
Sathishkumar S
9b71be8785 drm/amdgpu: Add core reset registers for JPEG5_0_1
Add core reset control register definitions and align
all prior register definitions to end at 100 column
length for uniformity.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:00 -05:00
Sathishkumar S
da120ed561 drm/amdgpu: Per-instance init func for JPEG5_0_1
Add helper functions to handle per-instance and per-core
initialization and deinitialization in JPEG5_0_1.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:00 -05:00
Alex Deucher
5235053f44 drm/amdgpu: disable BAR resize on Dell G5 SE
There was a quirk added to add a workaround for a Sapphire
RX 5600 XT Pulse that didn't allow BAR resizing.  However,
the quirk caused a regression with runtime pm on Dell laptops
using those chips, rather than narrowing the scope of the
resizing quirk, add a quirk to prevent amdgpu from resizing
the BAR on those Dell platforms unless runtime pm is disabled.

v2: update commit message, add runpm check

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707
Fixes: 907830b0fc ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:00 -05:00
Asad Kamal
25907304cf drm/amd/pm: Fetch fru product info for smu_v13_0_12
Fetch fru product info for smu_v13_0_12 from static metrics table

v2: Field by field copy for fru info(Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:00 -05:00
Jesse.zhang@amd.com
c94943b086 drm/amdgpu: Update amdgpu_job_timedout to check if the ring is guilty
This patch updates the `amdgpu_job_timedout` function to check if
the ring is actually guilty of causing the timeout. If not, it
skips error handling and fence completion.

v2: move the is_guilty check down into the queue reset area (Alex)
v3: need to call is_guilty before reset (Alex)
v4: squash in is_guilty logic fixes (Alex)

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com
8225254492 drm/amdgpu: Add reset function pointer for SDMA v4.4.2 page ring
This patch adds a reset function pointer to the SDMA v4.4.2 page ring
functionality. The new function pointer `reset` is set to
`sdma_v4_4_2_reset_queue`, which is responsible for resetting the SDMA queue.

Changes:
- Add `reset` function pointer to `sdma_v4_4_2_page_ring_funcs`.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com
fdbfaaaae0 drm/amdgpu: Improve SDMA reset logic with guilty queue tracking
This patch includes the remaining improvements to the SDMA reset logic:
- Added `gfx_guilty` and `page_guilty` flags to track guilty queues.
- Updated the reset and resume functions to handle the guilty state.
- Cached the `rptr` before reset.

v2:
   1.replace the caller with a guilty bool.
   If the queue is the guilty one, set the rptr and wptr  to the saved wptr value,
   else, set the rptr and wptr to the saved rptr value. (Alex)
   2. cache the rptr before the reset. (Alex)

v3: Keeping intermediate variables like u64 rwptr simplifies resotre rptr/wptr.(Lijo)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com
0ad649321a drm/amdgpu/sdma: Introduce is_guilty callbacks for sdma GFX and PAGE rings
This patch introduces the `is_guilty` callbacks for the GFX and PAGE rings.
These callbacks check if a ring is guilty of causing a timeout or error.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com
4d3c4f4f7f drm/amdgpu: Introduce cached_rptr and is_guilty callback in amdgpu_ring
This patch introduces the following changes:
- Add `cached_rptr` to the `amdgpu_ring` structure to store the read pointer before a reset.
- Add `is_guilty` callback to the `amdgpu_ring_funcs` structure to check if a ring is guilty of causing a timeout.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com
4c02f73016 drm/amdgpu: Introduce conditional user queue suspension for SDMA resets
- Modify the `amdgpu_sdma_reset_engine` function to accept a `suspend_user_queues` parameter.
- This parameter allows the function to conditionally suspend and resume user queues during SDMA resets.
- Ensure that user queues are suspended only when necessary to avoid unnecessary overhead and potential deadlocks.
- Restart the scheduler's work queue for the GFX and page rings after the reset to allow new tasks to be submitted.

This change improves synchronization between the KGD and the KFD during SDMA resets,
ensuring proper handling of user queues and avoiding race conditions.

V2: replace the ring_lock with the existed the scheduler
    locks for the queues (ring->sched) on the sdma engine.(Alex)

v3: call drm_sched_wqueue_stop() rather than job_list_lock.
    If a GPU ring reset was already initiated for one ring at amdgpu_job_timedout,
    skip resetting that ring and call drm_sched_wqueue_stop()
    for the other rings (Alex)

   replace  the common lock (sdma_reset_lock) with DQM lock to
   to resolve reset races between the two driver sections during KFD eviction.(Jon)

   Rename the caller to Reset_src and
   Change AMDGPU_RESET_SRC_SDMA_KGD/KFD to AMDGPU_RESET_SRC_SDMA_HWS/RING (Jon)

v4: restart the wqueue if the reset was successful,
    or fall back to a full adapter reset. (Alex)

   move definition of reset source to enumeration AMDGPU_RESET_SRCS, and
   check reset src in amdgpu_sdma_reset_instance (Jon)

v5: Call amdgpu_amdkfd_suspend/resume at the start/end of reset function respectively under !SRC_HWS
    conditions only (Jon)

v6: replace the paramter src with a bool suspend_user_queues,
    remove the paramter src in pre/post func. (Jon)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Acked-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:59 -05:00
Lijo Lazar
0ca5751560 drm/amdgpu: Remove redundant logic in GC v9.4.3
GFXOFF check is not needed for GC v9.4.3. Also, save/restore list is
available by default.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:59 -05:00
Sathishkumar S
793ee232ee drm/amdgpu: Do not poweroff UVDJ in JPEG4_0_3
Update power gate setting to not poweroff UVDJ in JPEG4_0_3.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:58 -05:00
Jesse.zhang@amd.com
d6e6ea5efb drm/amdgpu/sdma: Refactor SDMA reset functionality and add callback support
This patch refactors the SDMA reset functionality in the `sdma_v4_4_2` driver
to improve modularity and support shared usage between AMDGPU and KFD. The
changes include:

1. **Refactored SDMA Reset Logic**:
   - Split the `sdma_v4_4_2_reset_queue` function into two separate functions:
     - `sdma_v4_4_2_stop_queue`: Stops the SDMA queue before reset.
     - `sdma_v4_4_2_restore_queue`: Restores the SDMA queue after reset.
   - These functions are now used as callbacks for the shared reset mechanism.

2. **Added Callback Support**:
   - Introduced a new structure `sdma_v4_4_2_reset_funcs` to hold the stop and
     restore callbacks.
   - Added `sdma_v4_4_2_set_reset_funcs` to register these callbacks with the
     shared reset mechanism using `amdgpu_set_on_reset_callbacks`.

3. **Fixed Reset Queue Function**:
   - Modified `sdma_v4_4_2_reset_queue` to use the shared `amdgpu_sdma_reset_queue`
     function, ensuring consistency across the driver.

This patch ensures that SDMA reset functionality is more modular, reusable, and
aligned with the shared reset mechanism between AMDGPU and KFD.

v2: Renamed sdma_v4_4_2_set_reset_funcs to sdma_v4_4_2_set_engine_reset_funcs.
    Renamed sdma_v4_4_2_reset_funcs to sdma_v4_4_2_engine_reset_funcs.(Alex)

Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:58 -05:00
Jesse.zhang@amd.com
f33044952c drm/amdgpu/kfd: Add shared SDMA reset functionality with callback support
This patch introduces shared SDMA reset functionality between AMDGPU and KFD.
The implementation includes the following key changes:

1. Added `amdgpu_sdma_reset_queue`:
   - Resets a specific SDMA queue by instance ID.
   - Invokes registered pre-reset and post-reset callbacks to allow KFD and AMDGPU
     to save/restore their state during the reset process.

2. Added `amdgpu_set_on_reset_callbacks`:
   - Allows KFD and AMDGPU to register callback functions for pre-reset and
     post-reset operations.
   - Callbacks are stored in a global linked list and invoked in the correct order
     during SDMA reset.

This patch ensures that both AMDGPU and KFD can handle SDMA reset events
gracefully, with proper state saving and restoration. It also provides a flexible
callback mechanism for future extensions.

v2: fix CamelCase and put the SDMA helper into amdgpu_sdma.c (Alex)

v3: rename the `amdgpu_register_on_reset_callbacks` function to
      `amdgpu_sdma_register_on_reset_callbacks`
    move global reset_callback_list to struct amdgpu_sdma (Alex)

v4: Update the reset callback function description and
   rename the reset function to amdgpu_sdma_reset_engine (Alex)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:58 -05:00
Likun Gao
71209c9663 drm/amdgpu: correct the name of mes_pipe structure
Correct the structure name admgpu_mes_pipe to amdgpu_mes_pipe.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:58 -05:00
Sunil Khatri
7dc3405403 drm/amdgpu: update the handle ptr in is_idle
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of is_idle.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:58 -05:00
Thomas Zimmermann
130377304e Merge drm/drm-next into drm-misc-next
Backmerging to get fixes from v6.14-rc4.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-02-25 11:43:10 +01:00
Dave Airlie
fb51bf0255 Linux 6.14-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAme7hfkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGU+IH/1bk6zIvAwXXS5yu
 KNsQ8dEkC3Xme6HqLtPsAhRLF+5YJf6MaGm1ip5dDMyIvasa2gwvCQQQoOpeMbKj
 79VKT+m9t3szMHZaQYjlOuYHBmNSJ4cMCD2Qh6ktXHGPfTTWDFGf7fBwBOkVNeJU
 1Ask+bxeop21aJMhfYXrUta3OYyerLBUR6jCiCM82A/GLtdv6oNGXBu3ygDt9Tjx
 ZHSl+CYjKpmGUP8JnMKwCBHVguEfqgzZ//dY1H16AvOLed9k2jkMFn8O5Vi3vjnx
 TWMMXoiJimuamGzbjxtCCqzxNlFFDT4gRpDqeJxb16W/gDTFmbRr9LDjNehCZe33
 AigLZ6M=
 =Y/7F
 -----END PGP SIGNATURE-----

Merge tag 'v6.14-rc4' into drm-next

Backmerge Linux 6.14-rc4 at the request of tzimmermann so misc-next
can base on rc4.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-02-25 17:36:09 +10:00
Tvrtko Ursulin
80b6ef8ae2 drm/amdgpu: Pop jobs from the queue more robustly
Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a newly
added drm_sched_entity_queue_pop.

This allows breaking the hidden dependency that queue_node has to be the
first element in struct drm_sched_job.

A comment is also added with a reference to the mailing list discussion
explaining the copied helper will be removed when the whole broken
amdgpu_job_stop_all_jobs_on_sched is removed.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Zhang, Hawking <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-3-tvrtko.ursulin@igalia.com
2025-02-24 10:17:40 +01:00
Christian König
cb0de06d1b drm/amdgpu: remove all KFD fences from the BO on release
Remove all KFD BOs from the private dma_resv object.

This prevents the KFD from being evict unecessarily when an exported BO
is released.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-21 10:41:49 -05:00
Thomas Weißschuh
2d0f5001b6 drm/amdgpu: Constify 'struct bin_attribute'
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241216-sysfs-const-bin_attr-drm-v1-4-210f2b36b9bf@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21 09:20:31 +01:00
Sunil Khatri
3521276ad1 drm/amdgpu: update the handle ptr in get_clockgating_state
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of get_clockgating_state.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19 15:19:05 -05:00
Xiang Liu
2f94469cc0 drm/amdgpu: Remove redundant check of adev
There is no need to check adev for sure.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19 15:16:37 -05:00
Xiang Liu
663a87763b drm/amdgpu: Check aca enabled inside cper init/fini func
Move code about checking aca enabled to the cper init/fini function
to make code clean.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19 15:16:33 -05:00
Lijo Lazar
30eb41f5d1 drm/amdgpu: Use firmware supported NPS modes
If firmware supported NPS modes are available through CAP register, use
those values for supported NPS modes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19 15:16:29 -05:00
Sathishkumar S
c4c3808feb drm/amdgpu: Add ring reset callback for JPEG4_0_3
Add ring reset function callback for JPEG4_0_3 to
recover from job timeouts without a full gpu reset.

V2:
 - sched->ready flag shouldn't be modified by HW backend (Christian)

V3:
 - Dont modifying sched/job-submission state from HW backend (Christian)
 - Implement per-core reset sequence

V4:
 -  Dont create reset_mask sysfs and return -EOPNOTSUPP on VFs (Lijo)

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19 15:16:11 -05:00
Srinivasan Shanmugam
dc0297f319 drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV
RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).

Fixes the below:

[  253.013423] =============================
[  253.013434] [ BUG: Invalid wait context ]
[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G     U     OE
[  253.013464] -----------------------------
[  253.013475] kworker/0:1/10 is trying to lock:
[  253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.013815] other info that might help us debug this:
[  253.013827] context-{4:4}
[  253.013835] 3 locks held by kworker/0:1/10:
[  253.013847]  #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[  253.013877]  #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  253.013905]  #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[  253.014154] stack backtrace:
[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G     U     OE      6.12.0-amdstaging-drm-next-lol-050225 #14
[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[  253.014224] Workqueue: events work_for_cpu_fn
[  253.014241] Call Trace:
[  253.014250]  <TASK>
[  253.014260]  dump_stack_lvl+0x9b/0xf0
[  253.014275]  dump_stack+0x10/0x20
[  253.014287]  __lock_acquire+0xa47/0x2810
[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014321]  lock_acquire+0xd1/0x300
[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014562]  ? __lock_acquire+0xa6b/0x2810
[  253.014578]  __mutex_lock+0x85/0xe20
[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014782]  ? sched_clock_noinstr+0x9/0x10
[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014808]  ? local_clock_noinstr+0xe/0xc0
[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.015029]  mutex_lock_nested+0x1b/0x30
[  253.015044]  ? mutex_lock_nested+0x1b/0x30
[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[  253.017493]  local_pci_probe+0x4b/0xb0
[  253.017746]  work_for_cpu_fn+0x1a/0x30
[  253.017995]  process_one_work+0x21e/0x680
[  253.018248]  worker_thread+0x190/0x330
[  253.018500]  ? __pfx_worker_thread+0x10/0x10
[  253.018746]  kthread+0xe7/0x120
[  253.018988]  ? __pfx_kthread+0x10/0x10
[  253.019231]  ret_from_fork+0x3c/0x60
[  253.019468]  ? __pfx_kthread+0x10/0x10
[  253.019701]  ret_from_fork_asm+0x1a/0x30
[  253.019939]  </TASK>

v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).

Fixes: e864180ee4 ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao <lin.cao@amd.com>
Cc: Jingwen Chen <Jingwen.Chen2@amd.com>
Cc: Victor Skvortsov <victor.skvortsov@amd.com>
Cc: Zhigang Luo <zhigang.luo@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19 15:14:38 -05:00
Nam Cao
690d59fee8 drm/amdgpu: Switch to use hrtimer_setup()
hrtimer_setup() takes the callback function pointer as argument and
initializes the timer completely.

Replace hrtimer_init() and the open coded initialization of
hrtimer::function with the new setup mechanism.

Patch was created by using Coccinelle.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/all/2e3664eebb00d3a8c786ee7cc1fba8096bababc9.1738746904.git.namcao@linutronix.de
2025-02-18 11:19:05 +01:00
Thomas Zimmermann
8bd1a8e757 Merge drm/drm-next into drm-misc-next
Backmerging to get bugfixes from v6.14-rc2.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-02-18 07:43:43 +01:00
Xiang Liu
f9d35b945c drm/amdgpu: Generate bad page threshold cper records
Generate CPER record when bad page threshold exceed and
commit to CPER ring.

v2: return -ENOMEM instead of false
v2: check return value of fill section function

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:30 -05:00
Xiang Liu
4058e7cbfd drm/amdgpu: Commit CPER entry
Commit the CPER entry to the ring buffer.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:30 -05:00
Tao Zhou
8652920d2c drm/amdgpu: add mutex lock for cper ring
Avoid the confliction between read and write of ring buffer.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:30 -05:00
Tao Zhou
a6d9d19290 drm/amdgpu: add data write function for CPER ring
Old CPER data will be overwritten if ring buffer is full, and read
pointer always points to CPER header.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:30 -05:00
Tao Zhou
5a14282429 drm/amdgpu: read CPER ring via debugfs
We read CPER data from read pointer to write pointer without changing
the pointers.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Tao Zhou
4d614ce8ff drm/amdgpu: add RAS CPER ring buffer
And initialize it, this is a pure software ring to store RAS CPER data.

v2: change ring size to 0x100000
v2: update the initialization of count_dw of cper ring, it's dword
variable
v3: skip VM inv eng for cper
v3: init/fini when aca enabled

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Xiang Liu
b3060f5bea drm/amdgpu: Get timestamp from system time
Get system local time and encode it to timestamp for CPER.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Alex Deucher
f3e10e1a0c drm/amdgpu/mes12: allocate hw_resource_1 buffer once
Allocate the buffer at sw init time so we don't alloc
and free it for every suspend/resume or reset cycle.

Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Alex Deucher
13d68ae651 drm/amdgpu/mes11: allocate hw_resource_1 buffer once
Allocate the buffer at sw init time so we don't alloc
and free it for every suspend/resume or reset cycle.

Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Hawking Zhang
652e090230 drm/amdgpu: Generate cper records
Encode the error information in CPER format and commit
to the cper ring

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <keivnyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Hawking Zhang
ad97840f95 drm/amdgpu: Introduce funcs for generating cper record
Introduce new functions that are used to generate
cper ue or ce records.

v2: return -ENOMEM instead of false
v2: check return value of fill section function

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Yang Wang <keivnyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Hawking Zhang
56316ee91b drm/amdgpu: Include ACA error type in aca bank
ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.

To address this, for each ACA bank, include
both the ACA error type and the ACA SMU type.

This addition is useful for creating CPER records.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <keivnyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Candice Li
76b1f8b32d drm/amdgpu: Optimize the enablement of GECC
Enable GECC only when the default memory ECC mode or
the module parameter amdgpu_ras_enable is activated.

v2: Add kernel message to remind users explicitly set
    amdgpu_ras_enable=1 before driver loading to enable GECC
    and set amdgpu_ras_enable=0 to disable GECC when GECC is
    currently enabled if needed.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Hawking Zhang
92d5d2a09d drm/amdgpu: Introduce funcs for populating CPER
Introduce utility functions designed to assist
in populating CPER records.

v2: call cper_init/fini in device_ip_init/fini.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Srinivasan Shanmugam
2012aff981 drm/amdgpu: Rename VCN clock gating function for consistency
Change the function name from vcn_v2_5_enable_clock_gating_inst
to vcn_v2_5_enable_clock_gating to ensure consistency in naming.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:781: warning: expecting prototype for vcn_v2_5_enable_clock_gating_inst(). Prototype was for vcn_v2_5_enable_clock_gating() instead

Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:28 -05:00
Alex Deucher
eda80f1c2a drm/amdgpu/vcn4.0.3: drop dpm power helpers
VCN 4.0.3 doesn't support powergating so there is
no need to call these.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:25 -05:00
Alex Deucher
7780239809 drm/amdgpu/vcn5.0.1: drop dpm power helpers
VCN 5.0.1 doesn't support powergating so there is
no need to call these.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:18 -05:00
Alex Deucher
0487f50310 drm/amdgpu/vcn5.0.1: use correct dpm helper
The VCN and UVD helpers were split in
commit ff69bba05f ("drm/amd/pm: add inst to dpm_set_powergating_by_smu")
However, this happened in parallel to the vcn 5.0.1
development so it was missed there.

Fixes: 346492f30c ("drm/amdgpu: Add VCN_5_0_1 support")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Sonny Jiang <sonjiang@amd.com>
Cc: Boyuan Zhang <boyuan.zhang@amd.com>
2025-02-17 14:09:11 -05:00
Alex Deucher
56763be400 drm/amdgpu/umsch: tidy up the ucode name string handling
Make the constant parts of the name part of the string
we pass to amdgpu_ucode_request().  Only the version
number varies from IP to IP.

Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Lang Yu <Lang.Yu@amd.com>
2025-02-17 14:09:02 -05:00
Alex Deucher
c917e39cbd drm/amdgpu/umsch: fix ucode check
Return an error if the IP version doesn't match otherwise
we end up passing a NULL string to amdgpu_ucode_request.
We should never hit this in practice today since we only
enable the umsch code on the supported IP versions, but
add a check to be safe.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502130406.iWQ0eBug-lkp@intel.com/
Fixes: 020620424b ("drm/amd: Use a constant format string for amdgpu_ucode_request")
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Lang Yu <Lang.Yu@amd.com>
2025-02-17 14:08:53 -05:00
Amber Lin
5183e69090 drm/amdgpu: Remove extra checks for CPX
As far as the number of XCCs, the number of compute partitions, and the
number of memory partitions qualify, CPX is valid.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:49 -05:00
Alex Deucher
fe652becdb drm/amdgpu/umsch: declare umsch firmware
Needed to be properly picked up for the initrd, etc.

Fixes: 3488c79bea ("drm/amdgpu: add initial support for UMSCH")
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Lang Yu <Lang.Yu@amd.com>
2025-02-17 14:08:37 -05:00
Alex Deucher
80513e3897 drm/amdgpu/gfx: only call mes for enforce isolation if supported
This should not be called on chips without MES so check if
MES is enabled and if the cleaner shader is supported.

Fixes: 8521e3c5f0 ("drm/amd/amdgpu: limit single process inside MES")
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
2025-02-17 14:08:28 -05:00
Sathishkumar S
500c04d2a7 drm/amdgpu: Add ring reset callback for JPEG2_0_0
Add ring reset function callback for JPEG2_0_0 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:28 -05:00
Sathishkumar S
09e24a0b52 drm/amdgpu: Add ring reset callback for JPEG2_5_0
Add ring reset function callback for JPEG2_5_0 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:28 -05:00
Sathishkumar S
cb493aee4d drm/amdgpu: Per-instance init func for JPEG2_5_0
Add helper functions to handle per-instance initialization
and deinitialization in JPEG2_5_0.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:28 -05:00
Sathishkumar S
03399d0bff drm/amdgpu: Add ring reset callback for JPEG3_0_0
Add ring reset function callback for JPEG3_0_0 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:27 -05:00
Sathishkumar S
74894ffc7d drm/amdgpu: Add ring reset callback for JPEG4_0_0
Add ring reset function callback for JPEG4_0_0 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:27 -05:00
Sathishkumar S
abce7b4fc7 drm/amdgpu: Per-instance init func for JPEG4_0_3
Add helper functions to handle per-instance and per-core
initialization and deinitialization in JPEG4_0_3.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:27 -05:00
Saleemkhan Jamadar
b0bebbe4ea drm/amdgpu/umsch: remove vpe test from umsch
current test is more intrusive for user queue test

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Suggested-by: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:27 -05:00
Candice Li
59af05d6a3 drm/amdgpu: Enable ACA by default for psp v13_0_12
Enable ACA by default for psp v13_0_12.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:08:27 -05:00
Dave Airlie
0ed1356af8 drm-misc-next for v6.15:
UAPI Changes:
 
 fourcc:
 - Add modifiers for MediaTek tiled formats
 
 Cross-subsystem Changes:
 
 bus:
 - mhi: Enable image transfer via BHIe in PBL
 
 dma-buf:
 - Add fast-path for single-fence merging
 
 Core Changes:
 
 atomic helper:
 - Allow full modeset on connector changes
 - Clarify semantics of allow_modeset
 - Clarify semantics of drm_atomic_helper_check()
 
 buddy allocator:
 - Fix multi-root cleanup
 
 ci:
 - Update IGT
 
 display:
 - dp: Support Extendeds Wake Timeout
 - dp_mst: Fix RAD-to-string conversion
 
 panic:
 - Encode QR code according to Fido 2.2
 
 probe helper:
 - Cleanups
 
 scheduler:
 - Cleanups
 
 ttm:
 - Refactor pool-allocation code
 - Cleanups
 
 Driver Changes:
 
 amdxdma:
 - Fix error handling
 - Cleanups
 
 ast:
 - Refactor detection of transmitter chips
 - Refactor support of VBIOS display-mode handling
 - astdp: Fix connection status; Filter unsupported display modes
 
 bridge:
 - adv7511: Report correct capabilities
 - it6505: Fix HDCP V compare
 - sn65dsi86: Fix device IDs
 - Cleanups
 
 i915:
 - Enable Extendeds Wake Timeout
 
 imagination:
 - Check job dependencies with DRM-sched helper
 
 ivpu:
 - Improve command-queue handling
 - Use workqueue for IRQ handling
 - Add suport for HW fault injection
 - Locking fixes
 - Cleanups
 
 mgag200:
 - Add support for G200eH5 chips
 
 msm:
 - dpu: Add concurrent writeback support for DPU 10.x+
 
 nouveau:
 - Move drm_slave_encoder interface into driver
 - nvkm: Refactor GSP RPC
 
 omapdrm:
 - Cleanups
 
 panel:
 - Convert several panels to multi-style functions to improve error
   handling
 - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
   LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006,
   Lenovo T14s Gen6 Snapdragon
 - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
   kd110n11-51ie, Starry 2082109qfh040022-50e
 
 panthor:
 - Expose sizes of intenral BOs via fdinfo
 - Fix race between reset and suspend
 - Cleanups
 
 qaic:
 - Add support for AIC200
 - Cleanups
 
 renesas:
 - Fix limits in DT bindings
 
 rockchip:
 - rk3576: Add HDMI support
 - vop2: Add new display modes on RK3588 HDMI0 up to 4K
 - Don't change HDMI reference clock rate
 - Fix DT bindings
 
 solomon:
 - Set SPI device table to silence warnings
 - Fix pixel and scanline encoding
 
 v3d:
 - Cleanups
 
 vc4:
 - Use drm_exec
 - Use dma-resv for wait-BO ioctl
 - Remove seqno infrastructure
 
 virtgpu:
 - Support partial mappings of GEM objects
 - Reserve VGA resources during initialization
 - Fix UAF in virtgpu_dma_buf_free_obj()
 - Add panic support
 
 vkms:
 - Switch to a managed modesetting pipeline
 - Add support for ARGB8888
 
 xlnx:
 - Set correct DMA segment size
 - Fix error handling
 - Fix docs
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmesYp4ACgkQaA3BHVML
 eiP0agf/ZsUXdVdJ3HMI3vAMbiUnW8ThQZx0M3Zkpf3EOCKgdibbjNkhgTzsp3b/
 GqcP/pmqwmjCUgVADVDVJvRTD1rSadcxCYYTwI9fsjnXWmib9l4sjI3jPRJkIekj
 a5eeCFnMEKzfaxhFPJ/nu+esMZ19ZRE1iSO7WC7YC0SSR6zYP/QaNkGti8QA5TpL
 oBHd4jctoxXw9sw/ACEwzsSRdxL/opXQ5KbFJFmr2Q9/osaTUX7QtslDhnh+2lVK
 ZGxGdFIvFYKB+QrXpfjJU51Q01gqp/PWS+YIDrpr7bLC5YeFQVe3FDjqYIXSoE2t
 dFYtU2mSS/rwijcRc64sghXCdbKS1w==
 =iJGz
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2025-02-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.15:

UAPI Changes:

fourcc:
- Add modifiers for MediaTek tiled formats

Cross-subsystem Changes:

bus:
- mhi: Enable image transfer via BHIe in PBL

dma-buf:
- Add fast-path for single-fence merging

Core Changes:

atomic helper:
- Allow full modeset on connector changes
- Clarify semantics of allow_modeset
- Clarify semantics of drm_atomic_helper_check()

buddy allocator:
- Fix multi-root cleanup

ci:
- Update IGT

display:
- dp: Support Extendeds Wake Timeout
- dp_mst: Fix RAD-to-string conversion

panic:
- Encode QR code according to Fido 2.2

probe helper:
- Cleanups

scheduler:
- Cleanups

ttm:
- Refactor pool-allocation code
- Cleanups

Driver Changes:

amdxdma:
- Fix error handling
- Cleanups

ast:
- Refactor detection of transmitter chips
- Refactor support of VBIOS display-mode handling
- astdp: Fix connection status; Filter unsupported display modes

bridge:
- adv7511: Report correct capabilities
- it6505: Fix HDCP V compare
- sn65dsi86: Fix device IDs
- Cleanups

i915:
- Enable Extendeds Wake Timeout

imagination:
- Check job dependencies with DRM-sched helper

ivpu:
- Improve command-queue handling
- Use workqueue for IRQ handling
- Add suport for HW fault injection
- Locking fixes
- Cleanups

mgag200:
- Add support for G200eH5 chips

msm:
- dpu: Add concurrent writeback support for DPU 10.x+

nouveau:
- Move drm_slave_encoder interface into driver
- nvkm: Refactor GSP RPC

omapdrm:
- Cleanups

panel:
- Convert several panels to multi-style functions to improve error
  handling
- edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
  LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006,
  Lenovo T14s Gen6 Snapdragon
- himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
  kd110n11-51ie, Starry 2082109qfh040022-50e

panthor:
- Expose sizes of intenral BOs via fdinfo
- Fix race between reset and suspend
- Cleanups

qaic:
- Add support for AIC200
- Cleanups

renesas:
- Fix limits in DT bindings

rockchip:
- rk3576: Add HDMI support
- vop2: Add new display modes on RK3588 HDMI0 up to 4K
- Don't change HDMI reference clock rate
- Fix DT bindings

solomon:
- Set SPI device table to silence warnings
- Fix pixel and scanline encoding

v3d:
- Cleanups

vc4:
- Use drm_exec
- Use dma-resv for wait-BO ioctl
- Remove seqno infrastructure

virtgpu:
- Support partial mappings of GEM objects
- Reserve VGA resources during initialization
- Fix UAF in virtgpu_dma_buf_free_obj()
- Add panic support

vkms:
- Switch to a managed modesetting pipeline
- Add support for ARGB8888

xlnx:
- Set correct DMA segment size
- Fix error handling
- Fix docs

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250212090625.GA24865@linux.fritz.box
2025-02-14 10:24:02 +10:00
André Almeida
6fe52b63f5
drm/amdgpu: Use device wedged event
Use DRM's device wedged event to notify userspace that a reset had
happened. For now, only use `none` method meant for telemetry
capture.

In the future we might want to report a recovery method if the reset didn't
succeed.

Acked-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250204070528.1919158-6-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-02-13 12:15:44 -05:00
Alex Deucher
7845438718 drm/amdgpu/mes: Add cleaner shader fence address handling in MES for GFX12
This commit introduces enhancements to the handling of the cleaner
shader fence in the AMDGPU MES driver:

- The MES (Microcode Execution Scheduler) now sends a PM4 packet to the
  KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring
  that requests are handled in a controlled manner and avoiding the
  race conditions.
- The CP (Compute Processor) firmware has been updated to use a private
  bus for accessing specific registers, avoiding unnecessary operations
  that could lead to issues in VF (Virtual Function) mode.
- The cleaner shader fence memory address is now set correctly in the
  `mes_set_hw_res_pkt` structure, allowing for proper synchronization of
  the cleaner shader execution.

Cc: Christian König <christian.koenig@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:06:37 -05:00
Xiaogang Chen
10e08943ca drm/amdkfd: Fix pasid value leak
Curret kfd does not allocate pasid values, instead uses pasid value for each
vm from graphic driver. So should not prevent graphic driver from releasing
pasid values since the values are allocated by graphic driver, not kfd driver
anymore. This patch does not stop graphic driver release pasid values.

Fixes: 8544374c0f ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver")
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:05:50 -05:00
Saleemkhan Jamadar
4b9a3117bb drm/amdgpu/vcn: enable TMZ support for vcn 4_0_5
TMZ support is enabled for vcn on GC IP 11_5_0

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:05:50 -05:00
Srinivasan Shanmugam
be2560e4b8 drm/amdgpu/mes: Add cleaner shader fence address handling in MES for GFX11
This commit introduces enhancements to the handling of the cleaner
shader fence in the AMDGPU MES driver:

- The MES (Microcode Execution Scheduler) now sends a PM4 packet to the
  KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring
  that requests are handled in a controlled manner and avoiding the
  race conditions.
- The CP (Compute Processor) firmware has been updated to use a private
  bus for accessing specific registers, avoiding unnecessary operations
  that could lead to issues in VF (Virtual Function) mode.
- The cleaner shader fence memory address is now set correctly in the
  `mes_set_hw_res_pkt` structure, allowing for proper synchronization of
  the cleaner shader execution.

Cc: lin cao <lin.cao@amd.com>
Cc: Jingwen Chen <Jingwen.Chen2@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed by: Shaoyun.liu  <Shaoyun.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:05:49 -05:00
Ying Li
16a5a8fe6f drm/amd/amdgpu: add support for IP version 11.5.2
This initializes drm/amd/amdgpu version 11.5.2

Signed-off-by: YING LI <yingli12@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:05:49 -05:00
Philip Yang
23b645231e drm/amdgpu: Unlocked unmap only clear page table leaves
SVM migration unmap pages from GPU and then update mapping to GPU to
recover page fault. Currently unmap clears the PDE entry for range
length >= huge page and free PTB bo, update mapping to alloc new PT bo.
There is race bug that the freed entry bo maybe still on the pt_free
list, reused when updating mapping and then freed, leave invalid PDE
entry and cause GPU page fault.

By setting the update to clear only one PDE entry or clear PTB, to
avoid unmap to free PTE bo. This fixes the race bug and improve the
unmap and map to GPU performance. Update mapping to huge page will
still free the PTB bo.

With this change, the vm->pt_freed list and work is not needed. Add
WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the
PTB.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:05:49 -05:00
Alex Deucher
1350dd3691 drm/amdgpu/mes11: fix set_hw_resources_1 calculation
It's GPU page size not CPU page size.  In most cases they
are the same, but not always.  This can lead to overallocation
on systems with larger pages.

Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:05:49 -05:00
Alex Deucher
ebc25499de drm/amdgpu/vcn2.5: split code along instances
Split the code on a per instance basis.  This will allow
us to use the per instance functions in the future to
handle more things per instance.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:05:49 -05:00
Harish Kasiviswanathan
3394b1f76d drm/amdgpu: Set snoop bit for SDMA for MI series
SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for
this to happen.

v2: Missed a few mmhub_v9_4. Added now.
v3: Calculate hub offset once since it doesn't change inside the loop
    Modified function names based on review comments.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:05:49 -05:00
Tim Huang
76e0410fe0 drm/amdgpu: add discovery support for DCN IP version 3.6.0
Add discovery entry for DCN IP version 3.6.0.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:04:09 -05:00
Jiang Liu
e92f3f94ca drm/amdgpu: reset psp->cmd to NULL after releasing the buffer
Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini().

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:04:08 -05:00
Asad Kamal
aafe181f7d drm/amdgpu: Add flags to distinguish vf/pf/pt mode
Add extra flag definition for ids_flag field to distinguish
between vf/pf/pt modes

v2: Updated kms driver minor version & removed pf check as default is 0
v3: Fix up version (Alex)
v4: rebase (Alex)

Proposed userspace:
e663bed7d6

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:04:08 -05:00