2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

15190 Commits

Author SHA1 Message Date
Alex Deucher
1741281a15 drm/amdgpu/gfx10: add ring reset callbacks
Add ring reset callbacks for gfx and compute.

v2: fix gfx handling
v3: wait for KIQ to complete

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02 11:40:38 -04:00
Jiadong Zhu
a10c93931b drm/amdgpu/gfx11: wait for reset done before remap
There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02 11:40:34 -04:00
Alex Deucher
7d8e9e65f2 drm/amdgpu/gfx11: rename gfx_v11_0_gfx_init_queue()
Rename to gfx_v11_0_kgq_init_queue() to better align with
the other naming in the file.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02 11:40:31 -04:00
Prike Liang
072b441478 drm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2)
Since the MES FW resets kernel compute queue always failed, this
may caused by the KIQ failed to process unmap KCQ. So, before MES
FW work properly that will fallback to driver executes dequeue and
resets SPI directly. Besides, rework the ring reset function and make
the busy ring type reset in each function respectively.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02 11:40:21 -04:00
Alex Deucher
b3e9bfd866 drm/amdgpu/gfx11: add ring reset callbacks
Add ring reset callbacks for gfx and compute.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02 11:35:12 -04:00
Prike Liang
ad17b124c3 drm/amdgpu/gfx9.4.3: Implement compute pipe reset
Implement the compute pipe reset, and the driver will
fallback to pipe reset when queue reset fails.
The pipe reset only deactivates the queue which is
scheduled in the pipe, and meanwhile the MEC pipe
will be reset to the firmware _start pointer. So,
it seems pipe reset will cost more cycles than the
queue reset; therefore, the driver tries to recover
by doing queue reset first.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02 11:32:32 -04:00
Christian Brauner
641bb4394f fs: move FMODE_UNSIGNED_OFFSET to fop_flags
This is another flag that is statically set and doesn't need to use up
an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit.

(1) mem_open() used from proc_mem_operations
(2) adi_open() used from adi_fops
(3) drm_open_helper():
    (3.1) accel_open() used from DRM_ACCEL_FOPS
    (3.2) drm_open() used from
    (3.2.1) amdgpu_driver_kms_fops
    (3.2.2) psb_gem_fops
    (3.2.3) i915_driver_fops
    (3.2.4) nouveau_driver_fops
    (3.2.5) panthor_drm_driver_fops
    (3.2.6) radeon_driver_kms_fops
    (3.2.7) tegra_drm_fops
    (3.2.8) vmwgfx_driver_fops
    (3.2.9) xe_driver_fops
    (3.2.10) DRM_GEM_FOPS
    (3.2.11) DEFINE_DRM_GEM_DMA_FOPS
(4) struct memdev sets fmode flags based on type of device opened. For
    devices using struct mem_fops unsigned offset is used.

Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts
into the open helper to ensure that the flag is always set.

Link: https://lore.kernel.org/r/20240809-work-fop_unsigned-v1-1-658e054d893e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30 08:22:36 +02:00
Alex Deucher
6c0a7c3c69 drm/amdgpu: always allocate cleared VRAM for GEM allocations
This adds allocation latency, but aligns better with user
expectations.  The latency should improve with the drm buddy
clearing patches that Arun has been working on.

In addition this fixes the high CPU spikes seen when doing
wipe on release.

v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian)

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528
Fixes: a68c7eaa7a ("drm/amdgpu: Enable clear page functionality")
Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
2024-08-29 14:56:37 -04:00
Jack Xiao
52491d97aa drm/amdgpu/mes: add mes mapping legacy queue switch
For mes11 old firmware has issue to map legacy queue,
add a flag to switch mes to map legacy queue.

Fixes: f9d8c5c785 ("drm/amdgpu/gfx: enable mes to map legacy queue support")
Reported-by: Andrew Worsley <amworsley@gmail.com>
Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:41:02 -04:00
Alex Deucher
1125f95cd2 drm/amdgpu/gfx12: return early in preempt_ib()
When MES is enabled KIQ is not available.  Return an error
when someone uses the debugfs preempt test interface in
that case.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:39:57 -04:00
Alex Deucher
1e487c9173 drm/amdgpu/gfx11: return early in preempt_ib()
When MES is enabled KIQ is not available.  Return an error
when someone uses the debugfs preempt test interface in
that case.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:39:48 -04:00
Sunil Khatri
30e8f4c2bd drm/amdgpu: Move the dumping log out of for loop
log message "Dumping IP State Completed" needs to
be logged only once when state dumping is complete.

Hence moving it out of the for loop.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Trigger Huang <Trigger.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:39:19 -04:00
Victor Zhao
af76ca8e18 drm/amd/amdgpu: move drain_workqueue before shutdown is set
[background] when unloading amdgpu driver right after running a
workload, drain_workqueue is causing "Fence fallback timer
expired on ring sdma0.0". Under sriov, this issue will cause sriov
full access timeout and a reset happening.

move drain_workqueue before shutdown is set to allow ih process and
before enter full access under sriov to avoid full access time cost.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:39:07 -04:00
Trigger Huang
c67db6a6a6 drm/amdgpu: Do core dump immediately when job tmo
Do the coredump immediately after a job timeout to get a closer
representation of GPU's error status.

V2: This will skip printing vram_lost as the GPU reset is not
happened yet (Alex)

V3: Unconditionally call the core dump as we care about all the reset
functions(soft-recovery and queue reset and full adapter reset, Alex)

V4: Do the dump after adev->job_hang = true (Sunil)

Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Acked-by:  Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:39:00 -04:00
Trigger Huang
6122f5c72e drm/amdgpu: skip printing vram_lost if needed
The vm lost status can only be obtained after a GPU reset occurs, but
sometimes a dev core dump can be happened before GPU reset. So a new
argument is added to tell the dev core dump implementation whether to
skip printing the vram_lost status in the dump.
And this patch is also trying to decouple the core dump function from
the GPU reset function, by replacing the argument amdgpu_reset_context
with amdgpu_job to specify the context for core dump.

V2: Inform user if VRAM lost check is skipped so users don't assume
VRAM wasn't lost (Alex)

Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:38:53 -04:00
Alex Deucher
7c1a2d8aba drm/amdgpu/gfx9: put queue resets behind a debug option
Pending extended validation.

Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:38:48 -04:00
Alex Deucher
a9b67c036c drm/amdgpu: add experimental resets debug flag
Add this flag to enable experimental resets for testing before they
are fully validated.

Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:38:36 -04:00
Likun Gao
6d5064c379 drm/amdgpu: support for gc_info table v1.3
Add gc_info table v1.3 for IP discovery.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 875ff9a7ee)
2024-08-28 10:05:54 -04:00
Alex Deucher
959fc102ff drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer.  On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 40318a2406)
2024-08-28 10:04:53 -04:00
Daniel Vetter
e55ef65510 amd-drm-next-6.12-2024-08-26:
amdgpu:
 - SDMA devcoredump support
 - DCN 4.0.1 updates
 - DC SUBVP fixes
 - Refactor OPP in DC
 - Refactor MMHUBBUB in DC
 - DC DML 2.1 updates
 - DC FAMS2 updates
 - RAS updates
 - GFX12 updates
 - VCN 4.0.3 updates
 - JPEG 4.0.3 updates
 - Enable wave kill (soft recovery) for compute queues
 - Clean up CP error interrupt handling
 - Enable CP bad opcode interrupts
 - VCN 4.x fixes
 - VCN 5.x fixes
 - GPU reset fixes
 - Fix vbios embedded EDID size handling
 - SMU 14.x updates
 - Misc code cleanups and spelling fixes
 - VCN devcoredump support
 - ISP MFD i2c support
 - DC vblank fixes
 - GFX 12 fixes
 - PSR fixes
 - Convert vbios embedded EDID to drm_edid
 - DCN 3.5 updates
 - DMCUB updates
 - Cursor fixes
 - Overdrive support for SMU 14.x
 - GFX CP padding optimizations
 - DCC fixes
 - DSC fixes
 - Preliminary per queue reset infrastructure
 - Initial per queue reset support for GFX 9
 - Initial per queue reset support for GFX 7, 8
 - DCN 3.2 fixes
 - DP MST fixes
 - SR-IOV fixes
 - GFX 9.4.3/4 devcoredump support
 - Add process isolation framework
 - Enable process isolation support for GFX 9.4.3/4
 - Take IOMMU remapping into account for P2P DMA checks
 
 amdkfd:
 - CRIU fixes
 - Improved input validation for user queues
 - HMM fix
 - Enable process isolation support for GFX 9.4.3/4
 - Initial per queue reset support for GFX 9
 - Allow users to target recommended SDMA engines
 
 radeon:
 - remove .load and drm_dev_alloc
 - Fix vbios embedded EDID size handling
 - Convert vbios embedded EDID to drm_edid
 - Use GEM references instead of TTM
 - r100 cp init cleanup
 - Fix potential overflows in evergreen CS offset tracking
 
 UAPI:
 - KFD support for targetting queues on recommended SDMA engines
   Proposed userspace:
   2f588a2406
   eb30a5bbc7
 
 drm/buddy:
 - Add start address support for trim function
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZszhcQAKCRC93/aFa7yZ
 2M4ZAQD+xgIJkQ9HISQeqER5GblnfrorARd32yP/BH0c+JbGUAD9H/BIB41teZ80
 vw2WTx+4TyB39awgvtpDH8iEQdkcSAE=
 =717w
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.12-2024-08-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.12-2024-08-26:

amdgpu:
- SDMA devcoredump support
- DCN 4.0.1 updates
- DC SUBVP fixes
- Refactor OPP in DC
- Refactor MMHUBBUB in DC
- DC DML 2.1 updates
- DC FAMS2 updates
- RAS updates
- GFX12 updates
- VCN 4.0.3 updates
- JPEG 4.0.3 updates
- Enable wave kill (soft recovery) for compute queues
- Clean up CP error interrupt handling
- Enable CP bad opcode interrupts
- VCN 4.x fixes
- VCN 5.x fixes
- GPU reset fixes
- Fix vbios embedded EDID size handling
- SMU 14.x updates
- Misc code cleanups and spelling fixes
- VCN devcoredump support
- ISP MFD i2c support
- DC vblank fixes
- GFX 12 fixes
- PSR fixes
- Convert vbios embedded EDID to drm_edid
- DCN 3.5 updates
- DMCUB updates
- Cursor fixes
- Overdrive support for SMU 14.x
- GFX CP padding optimizations
- DCC fixes
- DSC fixes
- Preliminary per queue reset infrastructure
- Initial per queue reset support for GFX 9
- Initial per queue reset support for GFX 7, 8
- DCN 3.2 fixes
- DP MST fixes
- SR-IOV fixes
- GFX 9.4.3/4 devcoredump support
- Add process isolation framework
- Enable process isolation support for GFX 9.4.3/4
- Take IOMMU remapping into account for P2P DMA checks

amdkfd:
- CRIU fixes
- Improved input validation for user queues
- HMM fix
- Enable process isolation support for GFX 9.4.3/4
- Initial per queue reset support for GFX 9
- Allow users to target recommended SDMA engines

radeon:
- remove .load and drm_dev_alloc
- Fix vbios embedded EDID size handling
- Convert vbios embedded EDID to drm_edid
- Use GEM references instead of TTM
- r100 cp init cleanup
- Fix potential overflows in evergreen CS offset tracking

UAPI:
- KFD support for targetting queues on recommended SDMA engines
  Proposed userspace:
  2f588a2406
  eb30a5bbc7

drm/buddy:
- Add start address support for trim function

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240826201528.55307-1-alexander.deucher@amd.com
2024-08-27 14:33:12 +02:00
Daniel Vetter
4461e9e5c3 Linux 6.11-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbK2B8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFwkH/10QpUgzIfbFKbF+
 5hwcvaqS5myxWwJ4PjN0eR1qGE6RzVO0Tb24+TVql+7pxu+iWm1kYgC3+/T5xJsP
 ECAszdmPWSco1xaHrh2y3PyCJjaBiqFbIxdjPp7odjDpG9qarbcty8YpWs44u/gd
 RDXzHUuScEShBhEt0ZhvE1pIDL8jJ8JL3yqOMZ+XaDxtJbjaHw4GHp8efxlBWc8N
 jZKIVJi22q5NWG5T0tGtPWwzCm0ewA/JNMTEvE9leoSoAgO85NZ0ivxMC76q/tbj
 BrYk5KnzfhJs4b/n/KtIwWaLTgLyXKGqHMaMq8sbXtp410aUdgnRJO2cl3fI+1vc
 vxQfAfk=
 =RemI
 -----END PGP SIGNATURE-----

Merge v6.11-rc5 into drm-next

amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might
as well catch up with a backmerge and handle them all. Plus both misc
and intel maintainers asked for a backmerge anyway.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-08-27 14:09:45 +02:00
Xiaogang Chen
6ef29715ac drm/amdkfd: Change kfd/svm page fault drain handling
When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and
not handle any incoming pages fault of this process until a deferred work item
got executed by default system wq. The time period of "not handle page fault"
can be long and is unpredicable. That is advese to kfd performance on page
faults recovery.

This patch uses time stamp of incoming page fault to decide to drop or recover
page fault. When app unmap vm ranges kfd records each gpu device's ih ring
current time stamp. These time stamps are used at kfd page fault recovery
routine.

Any page fault happened on unmapped ranges after unmap events is application
bug that accesses vm range after unmap. It is not driver work to cover that.

By using time stamp of page fault do not need drain page faults at deferred
work. So, the time period that kfd does not handle page faults is reduced
and can be controlled.

Signed-off-by: Xiaogang.Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-23 10:55:13 -04:00
Likun Gao
875ff9a7ee drm/amdgpu: support for gc_info table v1.3
Add gc_info table v1.3 for IP discovery.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-23 10:54:57 -04:00
Yang Wang
4416377ae1 drm/amdgpu: add list empty check to avoid null pointer issue
Add list empty check to avoid null pointer issues in some corner cases.
- list_for_each_entry_safe()

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-23 10:53:45 -04:00
Alex Deucher
40318a2406 drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer.  On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-23 10:53:25 -04:00
Hawking Zhang
b05d6476ae drm/amdgpu: Retire query_utcl2_poison_status callback
Driver switches to interrupt source id to identify
utcl2 poison event. polling interface is not needed.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-23 10:53:16 -04:00
Rahul Jain
75f0efbc4b drm/amdgpu: Take IOMMU remapping into account for p2p checks
when trying to enable p2p the amdgpu_device_is_peer_accessible()
checks the condition where address_mask overlaps the aper_base
and hence returns 0, due to which the p2p disables for this platform

IOMMU should remap the BAR addresses so the device can access
them. Hence check if peer_adev is remapping DMA

v5: (Felix, Alex)
- fixing comment as per Alex feedback
- refactor code as per Felix

v4: (Alex)
- fix the comment and description

v3:
- remove iommu_remap variable

v2: (Alex)
- Fix as per review comments
- add new function amdgpu_device_check_iommu_remap to check if iommu
  remap

Signed-off-by: Rahul Jain <Rahul.Jain@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-23 10:53:05 -04:00
Alex Deucher
9cead81eff drm/amdgpu: fix eGPU hotplug regression
The driver needs to wait for the on board firmware
to finish its initialization before probing the card.
Commit 959056982a ("drm/amdgpu: Fix discovery initialization failure during pci rescan")
switched from using msleep() to using usleep_range() which
seems to have caused init failures on some navi1x boards. Switch
back to msleep().

Fixes: 959056982a ("drm/amdgpu: Fix discovery initialization failure during pci rescan")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Ma Jun <Jun.Ma2@amd.com>
(cherry picked from commit c69b07f7bb)
Cc: stable@vger.kernel.org # 6.10.x
2024-08-20 23:07:11 -04:00
Candice Li
c99769bcea drm/amdgpu: Validate TA binary size
Add TA binary size validation to avoid OOB write.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c0a04e3570)
Cc: stable@vger.kernel.org
2024-08-20 23:04:17 -04:00
Alex Deucher
e3e4bf58ba drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1
The workaround seems to cause stability issues on other
SDMA 5.2.x IPs.

Fixes: a03ebf1163 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2dc3851ef7)
Cc: stable@vger.kernel.org
2024-08-20 22:51:37 -04:00
Yang Wang
0b43312902 drm/amdgpu: fixing rlc firmware loading failure issue
Skip rlc firmware validation to ignore firmware header size mismatch issues.
This restores the workaround added in
commit 849e133c97 ("drm/amdgpu: Fix the null pointer when load rlc firmware")

Fixes: 3af2c80ae2 ("drm/amdgpu: refine gfx10 firmware loading")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 89ec85d16e)
2024-08-20 22:51:31 -04:00
Alex Deucher
88c511dea1 drm/amd/gfx11: move the gfx mutex into the caller
Otherwise we can fail to drop the software mutex when
we fail to take the hardware mutex.

Fixes: 76acba7b7f ("drm/amdgpu/gfx11: add a mutex for the gfx semaphore")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:14:14 -04:00
Victor Zhao
bf2bc61638 drm/amd/amdgpu: allow use kiq to do hdp flush under sriov
when use cpu to do page table update under sriov runtime, since mmio
access is blocked, kiq has to be used to flush hdp.

change WREG32_NO_KIQ to WREG32 to allow kiq.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:14:14 -04:00
Alex Deucher
c69b07f7bb drm/amdgpu: fix eGPU hotplug regression
The driver needs to wait for the on board firmware
to finish its initialization before probing the card.
Commit 959056982a ("drm/amdgpu: Fix discovery initialization failure during pci rescan")
switched from using msleep() to using usleep_range() which
seems to have caused init failures on some navi1x boards. Switch
back to msleep().

Fixes: 959056982a ("drm/amdgpu: Fix discovery initialization failure during pci rescan")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Ma Jun <Jun.Ma2@amd.com>
2024-08-20 22:14:14 -04:00
Candice Li
c0a04e3570 drm/amdgpu: Validate TA binary size
Add TA binary size validation to avoid OOB write.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:14:13 -04:00
Mukul Joshi
ccf8ef6b75 drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11
Add implementation for MES Suspend and Resume APIs to unmap/map
all queues for GFX11. Support for GFX12 will be added when the
corresponding firmware support is in place.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:14:04 -04:00
Srinivasan Shanmugam
f846250b8a drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute rings
This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v9_4_3 module.

The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.

`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.

`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.

These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.

The commit also includes a check for the type of the ring. If the type
of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the
`enforce_isolation` structure in the `gfx` structure of the
`amdgpu_device` is set to the `xcp_id` of the ring. This ensures that
the correct `xcp_id` is used when enforcing isolation on compute rings.
The `xcp_id` is an identifier for an XCP partition, and different rings
can be associated with different XCP partitions.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
2024-08-20 22:08:02 -04:00
Srinivasan Shanmugam
b710dbe55d drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings
This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v9_0 module.

The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.

`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.

`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.

These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-20 22:07:58 -04:00
Srinivasan Shanmugam
afefd6f245 drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization
This commit introduces the Enforce Isolation Handler designed to enforce
shader isolation on AMD GPUs, which helps to prevent data leakage
between different processes.

The handler counts the number of emitted fences for each GFX and compute
ring. If there are any fences, it schedules the `enforce_isolation_work`
to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences,
it signals the Kernel Fusion Driver (KFD) to resume the runqueue.

The function is synchronized using the `enforce_isolation_mutex`.

This commit also introduces a reference count mechanism
(kfd_sch_req_count) to keep track of the number of requests to enable
the KFD scheduler. When a request to enable the KFD scheduler is made,
the reference count is decremented. When the reference count reaches
zero, a delayed work is scheduled to enforce isolation after a delay of
GFX_SLICE_PERIOD.

When a request to disable the KFD scheduler is made, the function first
checks if the reference count is zero. If it is, it cancels the delayed
work for enforcing isolation and checks if the KFD scheduler is active.
If the KFD scheduler is active, it sends a request to stop the KFD
scheduler and sets the KFD scheduler state to inactive. Then, it
increments the reference count.

The function is synchronized using the kfd_sch_mutex to ensure that the
KFD scheduler state and reference count are updated atomically.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:07:35 -04:00
Amber Lin
234eebe161 drm/amdkfd: APIs to stop/start KFD scheduling
Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling
compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling.
When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from
runlist. If users send ioctls to KFD to create queues, they'll be added
but those queues won't be mapped to runlist (so not scheduled) until
amdgpu_amdkfd_start_sched is called.

v2: fix build (Alex)

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:07:28 -04:00
Srinivasan Shanmugam
b1f49ff9cb drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware
This commit extends the cleaner shader feature to support GFX9.4.4
hardware.

The cleaner shader feature is used to clear or initialize certain GPU
resources, such as Local Data Share (LDS), Vector General Purpose
Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). This
operation needs to be performed in isolation, while no other tasks
should be running on the GPU at the same time.

Previously, the cleaner shader feature was implemented for GFX9.4.3
hardware. This commit adds support for GFX9.4.4 hardware by allowing the
cleaner shader to be used with this hardware version.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:07:23 -04:00
Srinivasan Shanmugam
335288315a drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3
This commit adds the cleaner shader microcode for GFX9.4.3 GPUs. The
cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).

Clearing these resources is important for ensuring data isolation
between different workloads running on the GPU. Without the cleaner
shader, residual data from a previous workload could potentially be
accessed by a subsequent workload, leading to data leaks and incorrect
computation results.

The cleaner shader microcode is represented as an array of 32-bit words
(`gfx_9_4_3_cleaner_shader_hex`). This array is the binary
representation of the cleaner shader code, which is written in a
low-level GPU instruction set.

When the cleaner shader feature is enabled, the AMDGPU driver loads this
array into a specific location in the GPU memory. The GPU then reads
this memory location to fetch and execute the cleaner shader
instructions.

The cleaner shader is executed automatically by the GPU at the end of
each workload, before the next workload starts. This ensures that all
GPU resources are in a clean state before the start of each workload.

This addition is part of the cleaner shader feature implementation. The
cleaner shader feature helps improve GPU performance and resource
utilization by cleaning up GPU resources after they are used. It also
enhances security and reliability by preventing data leaks between
workloads.

v2: fix copyright date (Alex)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:07:16 -04:00
Srinivasan Shanmugam
d4c3815495 drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware
The patch modifies the gfx_v9_4_3_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v9_4_3_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.

This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v9_4_3 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.

Finally, the patch updates the ring function structures to include the
new gfx_v9_4_3_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:07:11 -04:00
Srinivasan Shanmugam
c2e70d307f drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware
The patch modifies the gfx_v9_0_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v9_0_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.

This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v9_0 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.

Finally, the patch updates the ring function structures to include the
new gfx_v9_0_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:07:07 -04:00
Srinivasan Shanmugam
22ff907d4f drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution
This commit adds the PACKET3_RUN_CLEANER_SHADER definition. This packet
is a command packet used to instruct the GPU to execute the cleaner
shader.

The cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs). Clearing these resources is important for ensuring data
isolation between different workloads running on the GPU.

The PACKET3_RUN_CLEANER_SHADER packet is used to trigger the execution
of the cleaner shader on the GPU. The packet consists of a header
followed by a RESERVED field, which is programmed to zero. When the GPU
receives this packet, it fetches and executes the cleaner shader
instructions from the location specified in the packet.

The cleaner shader feature helps to enhances security and reliability by
preventing data leaks between workloads.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:07:02 -04:00
Srinivasan Shanmugam
d361ad5d2f drm/amdgpu: Add sysfs interface for running cleaner shader
This patch adds a new sysfs interface for running the cleaner shader on
AMD GPUs. The cleaner shader is used to clear GPU memory before it's
reused, which can help prevent data leakage between different processes.

The new sysfs file is write-only and is named `run_cleaner_shader`.
Write the number of the partition to this file to trigger the cleaner shader
on that partition. There is only one partition on GPUs which do not
support partitioning.

Changes made in this patch:

- Added `amdgpu_set_run_cleaner_shader` function to handle writes to the
  `run_cleaner_shader` sysfs file.
- Added `run_cleaner_shader` to the list of device attributes in
  `amdgpu_device_attrs`.
- Updated `default_attr_update` to handle `run_cleaner_shader`.
- Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device
  attributes.

v2: fix error handling (Alex)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
2024-08-20 22:06:56 -04:00
Srinivasan Shanmugam
e189be9b2e drm/amdgpu: Add enforce_isolation sysfs attribute
This commit adds a new sysfs attribute 'enforce_isolation' to control
the 'enforce_isolation' setting per GPU. The attribute can be read and
written, and accepts values 0 (disabled) and 1 (enabled).

When 'enforce_isolation' is enabled, reserved VMIDs are allocated for
each ring. When it's disabled, the reserved VMIDs are freed.

The set function locks a mutex before changing the 'enforce_isolation'
flag and the VMIDs, and unlocks it afterwards. This ensures that these
operations are atomic and prevents race conditions and other concurrency
issues.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20 22:06:52 -04:00
Srinivasan Shanmugam
dba1a6cfc3 drm/amdgpu: Enforce isolation as part of the job
This patch adds a new parameter 'enforce_isolation' to the amdgpu_job
structure. This parameter is used to determine whether shader isolation
should be enforced for a job. The enforce_isolation parameter is then
stored in the amdgpu_job structure and used when flushing the VM.

The enforce_isolation field of the amdgpu_job structure is set directly
after the job is allocated

This change allows more fine-grained control over shader isolation,
making it possible to enforce isolation on a per-job basis rather than
globally. This can be useful in scenarios where only certain jobs
require isolation.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-20 22:06:43 -04:00
Victor Skvortsov
19cff16559 drm/amdgpu: abort KIQ waits when there is a pending reset
Stop waiting for the KIQ to return back when there is a reset pending.
It's quite likely that the KIQ will never response.

Signed-off-by: Koenig Christian <Christian.Koenig@amd.com>
Suggested-by: Lazar Lijo <Lijo.Lazar@amd.com>
Tested-by: Victor Skvortsov <victor.skvortsov@amd.com>
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:27:50 -04:00
Srinivasan Shanmugam
9659520419 drm/amdgpu: Make enforce_isolation setting per GPU
This commit makes enforce_isolation setting to be per GPU and per
partition by adding the enforce_isolation array to the adev structure.
The adev variable is set based on the global enforce_isolation module
parameter during device initialization.

In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU
is used to determine whether to enforce isolation between graphics and
compute processes on that GPU.

In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU
and partition is used to determine whether to enforce isolation between
graphics and compute processes on that GPU and partition.

This allows the enforce_isolation setting to be controlled individually
for each GPU and each partition, which is useful in a system with
multiple GPUs and partitions where different isolation settings might be
desired for different GPUs and partitions.

v2: fix loop in amdgpu_vmid_mgr_init() (Alex)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-16 14:27:45 -04:00
Alex Deucher
ee7a846ea2 drm/amdgpu: Emit cleaner shader at end of IB submission
This commit introduces the emission of a cleaner shader at the end of
the IB submission process. This is achieved by adding a new function
pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If
the `emit_cleaner_shader` function is set in the ring functions, it is
called during the VM flush process.

The cleaner shader is only emitted if the `enable_cleaner_shader` flag
is set in the `amdgpu_device` structure. This allows the cleaner shader
emission to be controlled on a per-device basis.

By emitting a cleaner shader at the end of the IB submission, we can
ensure that the VM state is properly cleaned up after each submission.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-16 14:27:42 -04:00
Srinivasan Shanmugam
aec773a1fb drm/amdgpu: Add infrastructure for Cleaner Shader feature
The cleaner shader is used by the CP firmware to clean LDS and GPRs
between processes on the CUs.

This adds an internal API for GFX IP code to allocate and initialize the
cleaner shader.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-16 14:27:34 -04:00
Alex Deucher
f49280ffd2 drm/amdgpu: handle enforce isolation on non-0 gfxhub
Some chips have more than one gfxhub so check if we
are a gfxhub rather than just gfxhub 0.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:27:28 -04:00
Alex Deucher
2dc3851ef7 drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1
The workaround seems to cause stability issues on other
SDMA 5.2.x IPs.

Fixes: a03ebf1163 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:27:04 -04:00
Sunil Khatri
1a2103d685 drm/amdgpu: add vcn ip dump support for vcn_v2_6
Add support for logging the registers in devcoredump
buffer for vcn_v2_6.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:57 -04:00
Sunil Khatri
bc62abe1b9 drm/amdgpu: add print support for vcn_v2_5 ip dump
Add support for logging the registers in devcoredump
buffer for vcn_v2_5.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:52 -04:00
Sunil Khatri
0eea81ee2e drm/amdgpu: add vcn_v2_5 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v2_5.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:47 -04:00
Sunil Khatri
b910cacb4e drm/amdgpu: add print support for vcn_v2_0 ip dump
Add support for logging the registers in devcoredump
buffer for vcn_v2_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:41 -04:00
Sunil Khatri
2239aaa204 drm/amdgpu: add vcn_v2_0 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v2_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:36 -04:00
Sunil Khatri
ef9f3b5fd9 drm/amdgpu: add print support for vcn_v1_0 ip dump
Add support for logging the registers in devcoredump
buffer for vcn_v1_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:31 -04:00
Sunil Khatri
837cc7f1bf drm/amdgpu: add vcn_v1_0 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v1_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:25 -04:00
Sunil Khatri
439c3b124e drm/amdgpu: add print support for vcn_v4_0_5 ip dump
Add support for logging the registers in devcoredump
buffer for vcn_v4_0_5.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:19 -04:00
Sunil Khatri
3a50a51d04 drm/amdgpu: add print support for vcn_v4_0 ip dump
Add support for logging the registers in devcoredump
buffer for vcn_v4_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:12 -04:00
Sunil Khatri
dc57edda81 drm/amdgpu: add print support for vcn_v4_0_3 ip dump
Add support for logging the registers in devcoredump
buffer for vcn_v4_0_3.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:26:06 -04:00
Sunil Khatri
46553db49c drm/amdgpu: add vcn_v4_0_5 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v4_0_5.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:25:59 -04:00
Sunil Khatri
9d87dac3f9 drm/amdgpu: add vcn_v4_0 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v4_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:25:53 -04:00
Sunil Khatri
8962915044 drm/amdgpu: add vcn_v4_0_3 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v4_0_3.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:25:30 -04:00
Sunil Khatri
f3c958ab85 drm/amdgpu: add print support for vcn_v5_0 ip dump
Add support for logging the registers in devcoredump
buffer for vcn_v5_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:25:12 -04:00
Alex Deucher
32aada4d0a drm/amdgpu/mes12: add API for user queue reset
Add API for resetting user queues.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:25:09 -04:00
Alex Deucher
d4f1fde734 drm/amdgpu/mes11: add API for user queue reset
Add API for resetting user queues.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:25:06 -04:00
Alex Deucher
5b7a59de48 drm/amdgpu/mes: add API for user queue reset
Add API for resetting user queues.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:25:02 -04:00
Alex Deucher
478efcb90b drm/amdgpu/gfx11: export gfx_v11_0_request_gfx_index_mutex()
It will be used by the queue reset code.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:24:56 -04:00
Alex Deucher
76acba7b7f drm/amdgpu/gfx11: add a mutex for the gfx semaphore
This will be used in more places in the future so
add a mutex.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:24:33 -04:00
Alex Deucher
b5be054c58 drm/amdgpu/gfx11: enter safe mode before touching CP_INT_CNTL
Need to enter safe mode before touching GC MMIO.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:24:23 -04:00
Alex Deucher
d479158f65 drm/amdgpu/gfx7: add ring reset callback for gfx
Add ring reset callback for gfx.

v2: fix operator precedence (kernel test robot)

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:24:18 -04:00
Alex Deucher
4af8071b65 drm/amdgpu/gfx8: add ring reset callback for gfx
Add ring reset callback for gfx.

v2: fix operator precedence (kernel test robot)

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:24:09 -04:00
Sunil Khatri
f685b38455 drm/amdgpu: add vcn_v5_0 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v5_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:24:04 -04:00
Sunil Khatri
6d88c0f94a drm/amdgpu: add print support for vcn_v3_0 ip dump
Add support for logging the registers in devcoredump
buffer for vcn_v3_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:23:59 -04:00
Sunil Khatri
ab10f77487 drm/amdgpu: add vcn_v3_0 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v3_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:23:53 -04:00
Sunil Khatri
27a74c125d drm/amdgpu: add vcn ip dump ptr in vcn global struct
Add pointer to the vcn ip dump in the vcn global structure
to be accessible for all vcn version via global adev.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:23:45 -04:00
Zhang Zekun
20588d5afc drm/amd: Remove unused declarations
amdgpu_gart_table_vram_pin() and amdgpu_gart_table_vram_unpin() has
been removed since commit 575e55ee4f ("drm/amdgpu: recover gart table
at resume") remain the declarations untouched in the header files.

Besides, amdgpu_dm_display_resume() has also beed removed since
commit a80aa93de1 ("drm/amd/display: Unify dm resume sequence into a
single call"). So, let's remove this unused declarations.

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:23:16 -04:00
Yang Wang
89ec85d16e drm/amdgpu: fixing rlc firmware loading failure issue
Skip rlc firmware validation to ignore firmware header size mismatch issues.
This restores the workaround added in
commit 849e133c97 ("drm/amdgpu: Fix the null pointer when load rlc firmware")

Fixes: 3af2c80ae2 ("drm/amdgpu: refine gfx10 firmware loading")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:19:16 -04:00
Sunil Khatri
0f2c243dbf drm/amdgpu: remove ME0 registers from mi300 dump
Remove ME0 registers from MI300 gfx_9_4_3 ipdump
MI300 does not have  gfx ME and hence those register
are just empty one and could be dropped.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:54 -04:00
Alex Deucher
3ec2ad7c34 drm/amdgpu/gfx9: use rlc safe mode for soft recovery
Protect the MMIO access with safe mode.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:51 -04:00
Alex Deucher
d082e5cde4 drm/amdgpu/gfx9.4.3: use rlc safe mode for soft recovery
Protect the MMIO access with safe mode.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:46 -04:00
Alex Deucher
a48f31fb78 drm/amdgpu/gfx9.4.3: use proper rlc safe mode helpers
Rather than open coding it for the queue reset.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:44 -04:00
Alex Deucher
27ef61f961 drm/amdgpu/gfx9: use proper rlc safe mode helpers
Rather than open coding it for the queue reset.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:41 -04:00
Alex Deucher
c4f503551f drm/amdgpu/gfx9: add ring reset callback for gfx
Add ring reset callback for gfx.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:38 -04:00
Alex Deucher
31ef969301 drm/amdgpu/gfx9: per queue reset only on bare metal
It's not supported under SR-IOV at the moment.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:35 -04:00
Jiadong Zhu
4dc4422f11 drm/amdgpu/gfx9.4.3: implement reset_hw_queue for gfx9.4.3
Using mmio to do queue reset. Enter safe mode
before writing mmio registers.

v2: set register instance offset according to xcc id.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:30 -04:00
Jiadong Zhu
2e9bbdd7b7 drm/amdgpu/gfx9: implement reset_hw_queue for gfx9
Using mmio to do queue reset. Enter safe mode
when writing registers.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:27 -04:00
Jiadong Zhu
186020c166 drm/amdgpu/gfx: add a new kiq_pm4_funcs callback for reset_hw_queue
Add reset_hw_queue in kiq_pm4_funcs callbacks.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:25 -04:00
Jiadong Zhu
4c953e53cc drm/amdgpu/gfx_9.4.3: wait for reset done before remap
There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.

v2: fix KIQ locking (Alex)
v3: fix KIQ locking harder

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:22 -04:00
Jiadong Zhu
6f38589e17 drm/amdgpu/gfx9.4.3: remap queue after reset successfully
Kiq command unmap_queues only does the dequeueing action.
We have to map the queue back with clean mqd.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:19 -04:00
Alex Deucher
5d0112f777 drm/amdgpu/gfx9.4.3: add ring reset callback
Add ring reset callback for compute.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:14 -04:00
Jiadong Zhu
fdbd69486b drm/amdgpu/gfx9: wait for reset done before remap
There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.

v2: fix KIQ locking (Alex)
v3: fix KIQ locking harder

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:06 -04:00
Jiadong Zhu
b5e1a3874f drm/amdgpu/gfx9: remap queue after reset successfully
Kiq command unmap_queues only does the dequeueing action.
We have to map the queue back with clean mqd.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:03 -04:00
Alex Deucher
5fb4d2a771 drm/amdgpu/gfx9: add ring reset callback
Add ring reset callback for compute.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:18:00 -04:00
Prike Liang
fb0a5834a3 drm/amdgpu: increase the reset counter for the queue reset
Update the reset counter for the amdgpu_cs_query_reset_state()

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:17:56 -04:00
Alex Deucher
15789fa0f0 drm/amdgpu: add per ring reset support (v5)
If a specific job is hung, try and reset just the
ring associated with the job.

v2: move to amdgpu_job.c
v3: fix drm_sched_stop() handling when ring reset fails
v4: drop unnecessary amdgpu_fence_driver_clear_job_fences() and
    drm_sched_increase_karma()
v5: rework sched_stop handling

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:17:52 -04:00
Alex Deucher
57a372f676 drm/amdgpu: add new ring reset callback
Use this to reset just a single ring.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:17:40 -04:00
Soham Dandapat
406792dc2a drm/amdgpu: Return earlier in amdgpu_sw_ring_ib_end if mcbp is off
As we don't trigger preemption is sw ring muxer when mcbp is
disabled,so return earlier in amdgpu_sw_ring_ib_end function
if mcbp is disabled ,not required to call amdgpu_ring_mux_end_ib

Signed-off-by: Soham Dandapat <sdandapa@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:17:31 -04:00
Sunil Khatri
37ee145623 drm/amdgpu: add cp queue registers print for gfx9_4_3
Add gfx9_4_3 print support of CP queue registers
for all queues to be used by devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:17:26 -04:00
Sunil Khatri
f9e491c863 drm/amdgpu: add cp queue registers for gfx9_4_3 ipdump
Add gfx9 support of CP queue registers for all queues
to be used by devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:17:14 -04:00
Thomas Zimmermann
ddda6542c8 drm/amdgpu: Use backlight power constants
Replace FB_BLANK_ constants with their counterparts from the
backlight subsystem. The values are identical, so there's no
change in functionality or semantics.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240731122311.1143153-2-tzimmermann@suse.de
2024-08-16 09:27:15 +02:00
Kenneth Feng
23acd1f344 drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1
add HDP_SD support on gc 12.0.0/1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 61cffacb3a)
2024-08-13 13:20:43 -04:00
Yinjie Yao
507a2286c0 drm/amdgpu: Update kmd_fw_shared for VCN5
kmd_fw_shared changed in VCN5

Signed-off-by: Yinjie Yao <yinjie.yao@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aa02486fb1)
2024-08-13 13:20:36 -04:00
David (Ming Qiang) Wu
470516c292 drm/amd/amdgpu: command submission parser for JPEG
Add JPEG IB command parser to ensure registers
in the command are within the JPEG IP block.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a7f670d5d8)
Cc: stable@vger.kernel.org
2024-08-13 13:17:36 -04:00
Jack Xiao
4246b1077f drm/amdgpu/mes12: fix suspend issue
Use mes pipe to unmap kcq and kgq.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f7fb9d677f)
2024-08-13 13:17:36 -04:00
Jack Xiao
af401543df drm/amdgpu/mes12: sw/hw fini for unified mes
Free memory for two pipes and unmap pipe0 via pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 98cae695a8)
2024-08-13 13:17:36 -04:00
Jack Xiao
7254027e1e drm/amdgpu/mes12: configure two pipes hardware resources
Configure two pipes with different hardware resources.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ea5d6db17a)
2024-08-13 13:17:36 -04:00
Jack Xiao
1097727d6d drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes
Adjust mes12 sw/hw initiailization for both pipe0 and pipe1
enablement. The two pipes are almost identical pipe. Pipe0
behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aa539da8af)
2024-08-13 13:17:36 -04:00
Jack Xiao
3738a7f0dd drm/amdgpu/mes12: add mes pipe switch support
Add mes pipe switch to let caller choose pipe
to submit packet.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b2dee0837a)
2024-08-13 13:17:36 -04:00
Jack Xiao
a13d91bf3c drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1
Enable unified mes firmware to load on pipe0 and pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e69c2dd753)
2024-08-13 13:17:36 -04:00
Jack Xiao
2029b3d7e1 drm/amdgpu/mes: add multiple mes ring instances support
Add multiple mes ring instances in mes structure to support
multiple mes pipes.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c7d4355648)
2024-08-13 13:04:48 -04:00
Bas Nieuwenhuizen
0573a1e2ea drm/amdgpu: Actually check flags for all context ops.
Missing validation ...

Checked libdrm and it clears all the structs, so we should be
safe to just check everything.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c6b86421f1)
Cc: stable@vger.kernel.org
2024-08-13 13:03:57 -04:00
Alex Deucher
e6c6bd6253 drm/amdgpu/jpeg4: properly set atomics vmid field
This needs to be set as well if the IB uses atomics.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c6c2e8b6a4)
Cc: stable@vger.kernel.org
2024-08-13 13:03:11 -04:00
Alex Deucher
e414a304f2 drm/amdgpu/jpeg2: properly set atomics vmid field
This needs to be set as well if the IB uses atomics.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 35c628774e)
Cc: stable@vger.kernel.org
2024-08-13 13:02:48 -04:00
Jack Xiao
11752c013f drm/amdgpu/mes: fix mes ring buffer overflow
wait memory room until enough before writing mes packets
to avoid ring buffer overflow.

v2: squash in sched_hw_submission fix

Fixes: de32462541 ("drm/amdgpu: cleanup MES11 command submission")
Fixes: fffe347e14 ("drm/amdgpu: cleanup MES12 command submission")
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 34e087e892)
Cc: stable@vger.kernel.org
2024-08-13 12:50:01 -04:00
Sunil Khatri
b232c4a63a drm/amdgpu: add print support for gfx9_4_3 ipdump
Add support of gfx9_4_3 ipdump print so devcoredump
could trigger it to dump the captured registers
in devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:13:03 -04:00
Sunil Khatri
1091796fb1 drm/amdgpu: add gfx9_4_3 register support in ipdump
Add general registers of gfx9_4_3 in ipdump for
devcoredump support.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:13:03 -04:00
David (Ming Qiang) Wu
6a28a072d9 drm/amd/amdgpu: cleanup parse_cs callbacks
Because gpu_addr is updated in the calling routine
(amdgpu_cs_patch_ibs()),it is removed in the callback.

Use .patch_cs_in_place instead of .parse_cs for
amdgpu_vce_ring_parse_cs_vm() as there is no need for keeping
a temporary IB, therefore ib->sa_bo is NULL and amdgpu_ib_free()
is removed.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:13:03 -04:00
David (Ming Qiang) Wu
a7f670d5d8 drm/amd/amdgpu: command submission parser for JPEG
Add JPEG IB command parser to ensure registers
in the command are within the JPEG IP block.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:13:03 -04:00
Jack Xiao
f7fb9d677f drm/amdgpu/mes12: fix suspend issue
Use mes pipe to unmap kcq and kgq.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:13:03 -04:00
Jack Xiao
98cae695a8 drm/amdgpu/mes12: sw/hw fini for unified mes
Free memory for two pipes and unmap pipe0 via pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:13:03 -04:00
Jack Xiao
ea5d6db17a drm/amdgpu/mes12: configure two pipes hardware resources
Configure two pipes with different hardware resources.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Jack Xiao
aa539da8af drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes
Adjust mes12 sw/hw initiailization for both pipe0 and pipe1
enablement. The two pipes are almost identical pipe. Pipe0
behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Jack Xiao
b2dee0837a drm/amdgpu/mes12: add mes pipe switch support
Add mes pipe switch to let caller choose pipe
to submit packet.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Victor Skvortsov
9e823f3070 drm/amdgpu: Block MMR_READ IOCTL in reset
Register access from userspace should be blocked until
reset is complete.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Jonathan Kim
a85c3db6b3 drm/amdkfd: fallback to pipe reset on queue reset fail for gfx9
If queue reset fails, tell the CP to reset the pipe.
Since queues multiplex context per pipe and we've issued a device wide
preemption prior to the hang, we can assume the hung pipe only has one
queue to reset on pipe reset.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Lijo Lazar
9c081c11c6 drm/amdgpu: Reorder to read EFI exported ROM first
On EFI BIOSes, PCI ROM may be exported through EFI_PCI_IO_PROTOCOL and
expansion ROM BARs may not be enabled. Choose to read from EFI exported
ROM data before reading PCI Expansion ROM BAR.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Jack Xiao
e69c2dd753 drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1
Enable unified mes firmware to load on pipe0 and pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Victor Skvortsov
f83cec3b3a drm/amdgpu: Disable dpm_enabled flag while VF is in reset
VFs do not perform HW fini/suspend in FLR, so the dpm_enabled
is incorrectly kept enabled. Add interface to disable it in
virt_pre_reset call.

v2: Made implementation generic for all asics
v3: Re-order conditionals so PP_MP1_STATE_FLR is only evaluated on VF

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Victor Skvortsov
35c7152202 Revert "drm/amdgpu: Extend KIQ reg polling wait for VF"
KIQ timeouts no longer seen.

This reverts commit 3a19a8af64.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Yinjie Yao
aa02486fb1 drm/amdgpu: Update kmd_fw_shared for VCN5
kmd_fw_shared changed in VCN5

Signed-off-by: Yinjie Yao <yinjie.yao@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:52 -04:00
Kenneth Feng
61cffacb3a drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1
add HDP_SD support on gc 12.0.0/1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:51 -04:00
Victor Zhao
ef6c2cb349 drm/amd/sriov: extend NV_MAILBOX_POLL_MSG_TIMEDOUT
on MI300/MI308 UBB products, when doing mode1 reset, since 1 gpu need to
wait all 8 gpus finish mode1 reset and then do re-init. As observed,
sometimes the gpu which triggered the reset need to wait 15s for all
gpus to finish.

If poll msg timeout, guest driver will send the reset message again, and
may mess up the following reinit sequence on other gpus.

So extend the time to cover the maximum time needed to recover.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 12:12:51 -04:00
Sunil Khatri
9b7e697839 drm/amdgpu: fix ptr check warning in gfx12 ip_dump
Change condition, if (ptr == NULL) to if (!ptr)
for a better format and fix the warning.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:34:51 -04:00
Sunil Khatri
bd15f805cd drm/amdgpu: fix ptr check warning in gfx11 ip_dump
Change condition, if (ptr == NULL) to if (!ptr)
for a better format and fix the warning.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:34:31 -04:00
Sunil Khatri
98df5a7732 drm/amdgpu: fix ptr check warning in gfx10 ip_dump
Change condition, if (ptr == NULL) to if (!ptr)
for a better format and fix the warning.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:34:23 -04:00
Sunil Khatri
07f4f9c00e drm/amdgpu: fix ptr check warning in gfx9 ip_dump
Change if (ptr == NULL) to if (!ptr) for a better
format and fix the warning.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:33:07 -04:00
Sunil Khatri
311f2b5874 Revert "drm/amdgpu: add vcn ip dump ptr in vcn global struct"
This reverts commit f3392e662e.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:30:49 -04:00
Sunil Khatri
434b3554d6 Revert "drm/amdgpu: add vcn_v3_0 ip dump support"
This reverts commit 58d283801d.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:30:24 -04:00
Sunil Khatri
2f93ec07ab Revert "drm/amdgpu: add print support for vcn_v3_0 ip dump"
This reverts commit cd162ae9bc.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:29:32 -04:00
Jack Xiao
c7d4355648 drm/amdgpu/mes: add multiple mes ring instances support
Add multiple mes ring instances in mes structure to support
multiple mes pipes.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:29:25 -04:00
Sunil Khatri
3df3433414 Revert "drm/amdgpu: add vcn_v5_0 ip dump support"
This reverts commit a46a7bef7d.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:28:51 -04:00
Sunil Khatri
a46a7bef7d drm/amdgpu: add vcn_v5_0 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v5_0.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:28:13 -04:00
Alex Deucher
947c080869 drm/amdgpu/mes12: add API for legacy queue reset
Add API for resetting kernel queues.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:28:11 -04:00
Alex Deucher
45a2a45143 drm/amdgpu/mes11: add API for legacy queue reset
Add API for resetting kernel queues.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:28:07 -04:00
Alex Deucher
c30fb344a2 drm/amdgpu/mes: add API for legacy queue reset
Add API for resetting kernel queues.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:27:59 -04:00
Bas Nieuwenhuizen
c6b86421f1 drm/amdgpu: Actually check flags for all context ops.
Missing validation ...

Checked libdrm and it clears all the structs, so we should be
safe to just check everything.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:27:22 -04:00
Arnd Bergmann
020620424b drm/amd: Use a constant format string for amdgpu_ucode_request
Multiple files in amdgpu call amdgpu_ucode_request() with a fw_name
variable that the compiler cannot check for being a valid format string,
as seen by enabling the (default-disabled) -Wformat-security option:

drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c: In function 'amdgpu_mes_init_microcode':
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1517:61: error: format not a string literal and no format arguments [-Werror=format-security]
 1517 |         r = amdgpu_ucode_request(adev, &adev->mes.fw[pipe], fw_name);
      |                                                             ^~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c: In function 'amdgpu_uvd_sw_init':
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:9: error: format not a string literal and no format arguments [-Werror=format-security]
  263 |         r = amdgpu_ucode_request(adev, &adev->uvd.fw, fw_name);
      |         ^
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c: In function 'amdgpu_vce_sw_init':
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:161:9: error: format not a string literal and no format arguments [-Werror=format-security]
  161 |         r = amdgpu_ucode_request(adev, &adev->vce.fw, fw_name);
      |         ^
drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c: In function 'amdgpu_umsch_mm_init_microcode':
drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c:590:9: error: format not a string literal and no format arguments [-Werror=format-security]
  590 |         r = amdgpu_ucode_request(adev, &adev->umsch_mm.fw, fw_name);
      |         ^
drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c: In function 'amdgpu_cgs_get_firmware_info':
drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c:417:72: error: format not a string literal and no format arguments [-Werror=format-security]
  417 |                         err = amdgpu_ucode_request(adev, &adev->pm.fw, fw_name);
      |                                                                        ^~~~~~~
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: In function 'load_dmcu_fw':
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2221:9: error: format not a string literal and no format arguments [-Werror=format-security]
 2221 |         r = amdgpu_ucode_request(adev, &adev->dm.fw_dmcu, fw_name_dmcu);
      |         ^
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: In function 'dm_init_microcode':
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5147:9: error: format not a string literal and no format arguments [-Werror=format-security]
 5147 |         r = amdgpu_ucode_request(adev, &adev->dm.dmub_fw, fw_name_dmub);
      |         ^

Change these all to use a "%s" format with the actual name as an argument,
to let the compiler prove this to be correct.

Fixes: e5a7d047f4 ("drm/amd: Use `amdgpu_ucode_*` helpers for CGS")
Fixes: 52215e2a5d ("drm/amd: Use `amdgpu_ucode_*` helpers for VCE")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:27:03 -04:00
Tobias Jakobi
8641b81739 drm/amd: Make amd_ip_funcs static for SDMA v5.2
The struct can be static, as it is only used in this
translation unit.

Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:26:59 -04:00
Tobias Jakobi
9a12b1c7a0 drm/amd: Make amd_ip_funcs static for SDMA v5.0
The struct can be static, as it is only used in this
translation unit.

Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:26:53 -04:00
WangYuli
0cee47cde4 drm/amd/amdgpu: Properly tune the size of struct
The struct assertion is failed because sparse cannot parse
`#pragma pack(push, 1)` and `#pragma pack(pop)` correctly.
GCC's output is still 1-byte-aligned. No harm to memory layout.

The error can be filtered out by sparse-diff, but sometimes
multiple lines queezed into one, making the sparse-diff thinks
its a new error. I'm trying to aviod this by fixing errors.

Link: https://lore.kernel.org/all/20230620045919.492128-1-suhui@nfschina.com/
Link: https://lore.kernel.org/all/93d10611-9fbb-4242-87b8-5860b2606042@suswa.mountain/
Fixes: 1721bc1b2a ("drm/amdgpu: Update VF2PF interface")
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: wenlunpeng <wenlunpeng@uniontech.com>
Reported-by: Su Hui <suhui@nfschina.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:25:49 -04:00
Alex Deucher
c6c2e8b6a4 drm/amdgpu/jpeg4: properly set atomics vmid field
This needs to be set as well if the IB uses atomics.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:25:45 -04:00
Alex Deucher
35c628774e drm/amdgpu/jpeg2: properly set atomics vmid field
This needs to be set as well if the IB uses atomics.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 10:25:16 -04:00
Thomas Zimmermann
7a26f18119 drm/amdgpu: Do not set struct drm_driver.lastclose
Remove the implementation of struct drm_driver.lastclose. The hook
was only necessary before in-kernel DRM clients existed, but is now
obsolete. The code in amdgpu_driver_lastclose_kms() is performed by
drm_lastclose().

v2:
- update commit message

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240812083000.337744-3-tzimmermann@suse.de
2024-08-13 16:21:08 +02:00
Jack Xiao
34e087e892 drm/amdgpu/mes: fix mes ring buffer overflow
wait memory room until enough before writing mes packets
to avoid ring buffer overflow.

v2: squash in sched_hw_submission fix

Fixes: de32462541 ("drm/amdgpu: cleanup MES11 command submission")
Fixes: fffe347e14 ("drm/amdgpu: cleanup MES12 command submission")
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13 09:57:52 -04:00
Al Viro
1da91ea87a introduce fd_file(), convert all accessors to it.
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
	Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
	This commit converts (almost) all of f.file to
fd_file(f).  It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).

	NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).

[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-08-12 22:00:43 -04:00
Daniel Vetter
4e996697a4 drm-misc-next for v6.12:
UAPI Changes:
 
 - remove Power Saving Policy property
 
 Core Changes:
 
 - update connector documentation
 
 CI:
 - add tests for mediatek, meson, rockchip
 
 Driver Changes:
 
 amdgpu:
 - revert support for Power Saving Policy property
 
 bridge:
 - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR
 
 mgag200:
 - transparently support BMC outputs
 
 omapdrm:
 - use common helper for_each_endpoint_of_node()
 
 panel:
 - panel-edp: fix name for HKC MB116AN01
 
 vkms:
 - clean up endianess warnings
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAma1wMsACgkQaA3BHVML
 eiM+/Af9Fuq5tLo9MlRMDS5kF41UJhEiqd805tlz0mofoIbsbynKIxbTD+sCrXxZ
 mzWX5lVO6tPPudtYbHX9tkD4eV8neiI9QTuTPiBgP4oRpPlJbnJzuG77/qG3gPIe
 L+ByiEx1e88A7BY0rY/xjBAKirYaQfVkGouD6m3NgZasiop0Bie3hTDtfPoRYh5u
 odsEHvrOf4tFOTjEZI9/KlEmW2lBziydTtm2yIiFyQIAy0O+FZIfDtMRjpo3ivKR
 wfPm15SleKGuLvDL8QsBNl+HshFywqn489yvbp0g56aRfIeQgNFF5g9Aj1+DXniE
 C/Dm7GDQhK98Wso3yECoK0/AuMHBmA==
 =JEjA
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2024-08-09' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.12:

UAPI Changes:

- remove Power Saving Policy property

Core Changes:

- update connector documentation

CI:
- add tests for mediatek, meson, rockchip

Driver Changes:

amdgpu:
- revert support for Power Saving Policy property

bridge:
- lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR

mgag200:
- transparently support BMC outputs

omapdrm:
- use common helper for_each_endpoint_of_node()

panel:
- panel-edp: fix name for HKC MB116AN01

vkms:
- clean up endianess warnings

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240809071241.GA222501@localhost.localdomain
2024-08-09 10:41:59 +02:00
Daniel Vetter
91dae758bd drm-misc-next for v6.12:
UAPI Changes:
 
 virtio:
 - Define DRM capset
 
 Cross-subsystem Changes:
 
 dma-buf:
 - heaps: Clean up documentation
 
 printk:
 - Pass description to kmsg_dump()
 
 Core Changes:
 
 CI:
 - Update IGT tests
 - Point upstream repo to GitLab instance
 
 modesetting:
 - Introduce Power Saving Policy property for connectors
 - Add might_fault() to drm_modeset_lock priming
 - Add dynamic per-crtc vblank configuration support
 
 panic:
 - Avoid build-time interference with framebuffer console
 
 docs:
 - Document Colorspace property
 
 scheduler:
 - Remove full_recover from drm_sched_start
 
 TTM:
 - Make LRU walk restartable after dropping locks
 - Allow direct reclaim to allocate local memory
 
 Driver Changes:
 
 amdgpu:
 - Support Power Saving Policy connector property
 
 ast:
 - astdp: Support AST2600 with VGA; Clean up HPD
 
 bridge:
 - Silence error message on -EPROBE_DEFER
 - analogix: Clean aup
 - bridge-connector: Fix double free
 - lt6505: Disable interrupt when powered off
 - tc358767: Make default DP port preemphasis configurable
 
 gma500:
 - Update i2c terminology
 
 ivpu:
 - Add MODULE_FIRMWARE()
 
 lcdif:
 - Fix pixel clock
 
 loongson:
 - Use GEM refcount over TTM's
 
 mgag200:
 - Improve BMC handling
 - Support VBLANK intterupts
 
 nouveau:
 - Refactor and clean up internals
 - Use GEM refcount over TTM's
 
 panel:
 - Shutdown fixes plus documentation
 - Refactor several drivers for better code sharing
 - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus
   DT; Fix porch parameter
 - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1,
   BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2,
   CMN N116BCP-EA2, CSW MNB601LS1-4
 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT
 - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT
 - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor
   for code sharing
 
 sti:
 - Fix module owner
 
 stm:
 - Avoid UAF wih managed plane and CRTC helpers
 - Fix module owner
 - Fix error handling in probe
 - Depend on COMMON_CLK
 - ltdc: Fix transparency after disabling plane; Remove unused interrupt
 
 tegra:
 - Call drm_atomic_helper_shutdown()
 
 v3d:
 - Clean up perfmon
 
 vkms:
 - Clean up
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmareygACgkQaA3BHVML
 eiO2vwf9FirbMiq4lfHzgcbNIU1dTUtjRAZjrlwGmqk5cb9lUshAMCMBMOEQBDdg
 XMQQj/RMBvRUuxzsPGk78ObSz5FBaBLgKwFprer0V6uslQaJxj4YRsnkp0l2n+0k
 +ebhfo2rUgZOdgNOkXH326w9UhqiydIa7GaA2aq1vUzXKFDfvGXtSN75BMlEWlKP
 rTft56AiwjwcKu7zYFHGlFUMSNpKAQy7lnV3+dBXAfFNHu4zVNoI/yWGEOdR7eVo
 WhiEcpvismsOh+BfUvMNPP3RKwjXHdwMlJYb+v9XGgH27hqc50lSceWydHtoJTto
 DTXF9WQhJ+/GQR9ZGmBjos9GVbECDA==
 =L/1W
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2024-08-01' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.12:

UAPI Changes:

virtio:
- Define DRM capset

Cross-subsystem Changes:

dma-buf:
- heaps: Clean up documentation

printk:
- Pass description to kmsg_dump()

Core Changes:

CI:
- Update IGT tests
- Point upstream repo to GitLab instance

modesetting:
- Introduce Power Saving Policy property for connectors
- Add might_fault() to drm_modeset_lock priming
- Add dynamic per-crtc vblank configuration support

panic:
- Avoid build-time interference with framebuffer console

docs:
- Document Colorspace property

scheduler:
- Remove full_recover from drm_sched_start

TTM:
- Make LRU walk restartable after dropping locks
- Allow direct reclaim to allocate local memory

Driver Changes:

amdgpu:
- Support Power Saving Policy connector property

ast:
- astdp: Support AST2600 with VGA; Clean up HPD

bridge:
- Silence error message on -EPROBE_DEFER
- analogix: Clean aup
- bridge-connector: Fix double free
- lt6505: Disable interrupt when powered off
- tc358767: Make default DP port preemphasis configurable

gma500:
- Update i2c terminology

ivpu:
- Add MODULE_FIRMWARE()

lcdif:
- Fix pixel clock

loongson:
- Use GEM refcount over TTM's

mgag200:
- Improve BMC handling
- Support VBLANK intterupts

nouveau:
- Refactor and clean up internals
- Use GEM refcount over TTM's

panel:
- Shutdown fixes plus documentation
- Refactor several drivers for better code sharing
- boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus
  DT; Fix porch parameter
- edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1,
  BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2,
  CMN N116BCP-EA2, CSW MNB601LS1-4
- himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT
- ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT
- jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor
  for code sharing

sti:
- Fix module owner

stm:
- Avoid UAF wih managed plane and CRTC helpers
- Fix module owner
- Fix error handling in probe
- Depend on COMMON_CLK
- ltdc: Fix transparency after disabling plane; Remove unused interrupt

tegra:
- Call drm_atomic_helper_shutdown()

v3d:
- Clean up perfmon

vkms:
- Clean up

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box
2024-08-08 18:58:46 +02:00
Arunpravin Paneer Selvam
6ad9dafba1 drm/amdgpu: Add DCC GFX12 flag to enable address alignment
We require this flag AMDGPU_GEM_CREATE_GFX12_DCC or any other
kernel level GFX12 DCC flag to differentiate the DCC buffers and other
pinned display buffers(which has TTM_PL_FLAG_CONTIGUOUS enabled).

If we use the TTM_PL_FLAG_CONTIGUOUS flag for DCC buffers, we may over
allocate for all the pinned display buffers unnecessarily that leads to
memory allocation failure.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 46142cc1b9)
2024-08-07 18:23:59 -04:00
Frank Min
7fc5f252c0 drm/amdgpu: correct sdma7 max dw
correct sdma7 max dw into 8

Signed-off-by: Frank Min <Frank.Min@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 86598c3819)
2024-08-07 18:23:49 -04:00
Arunpravin Paneer Selvam
4a5ad08f53 drm/amdgpu: Add address alignment support to DCC buffers
Add address alignment support to the DCC VRAM buffers.

v2:
  - adjust size based on the max_texture_channel_caches values
    only for GFX12 DCC buffers.
  - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only
    for DCC buffers.
  - roundup non power of two DCC buffer adjusted size to nearest
    power of two number as the buddy allocator does not support non
    power of two alignments. This applies only to the contiguous
    DCC buffers.

v3:(Alex)
  - rewrite the max texture channel caches comparison code in an
    algorithmic way to determine the alignment size.

v4:(Alex)
  - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c
    and add a new gmc func callback for dcc alignment. If the callback
    is non-NULL, call it to get the alignment, otherwise, use the default.

v5:(Alex)
  - Set the Alignment to a default value if the callback doesn't exist.
  - Add the callback to amdgpu_gmc_funcs.

v6:
  - Fix checkpatch warning reported by Intel CI.

v7:(Christian)
  - remove the AMDGPU_GEM_CREATE_GFX12_DCC flag and keep a flag that
    checks the BO pinning and for a specific hw generation.

v8:(Christian)
  - move this check into gmc_v12_0_get_dcc_alignment.

v9:
  - Fix 32bit build errors

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aa94b623cb)
2024-08-07 18:23:42 -04:00
Frank Min
5d687a67fd drm/amdgpu: change non-dcc buffer copy configuration
Without setting cpv bit and 7th ib dw, non-dcc buffer copy will have
random corruption

So set the cpv bit and clear the 7th ib dw for copy non-dcc buffers

Signed-off-by: Frank Min <Frank.Min@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5aacf8917f)
2024-08-07 18:21:05 -04:00
Joshua Ashton
829798c789 drm/amdgpu: Forward soft recovery errors to userspace
As we discussed before[1], soft recovery should be
forwarded to userspace, or we can get into a really
bad state where apps will keep submitting hanging
command buffers cascading us to a hard reset.

1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 434967aadb)
Cc: stable@vger.kernel.org
2024-08-07 18:20:07 -04:00
Likun Gao
8ff3bb44cc drm/amdgpu: add golden setting for gc v12
Adding Manual GDB golden setting for gc v12
revision 0 ASIC.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c9875d0a78)
2024-08-07 18:19:38 -04:00
Likun Gao
aa5c9701eb drm/amdgpu: force to use legacy inv in mmhub
MMHUB v4.1.0 only support fixed cache mode, so
only use legacy invalidation accordingly.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9192c7613c)
2024-08-07 18:16:38 -04:00
Arunpravin Paneer Selvam
46142cc1b9 drm/amdgpu: Add DCC GFX12 flag to enable address alignment
We require this flag AMDGPU_GEM_CREATE_GFX12_DCC or any other
kernel level GFX12 DCC flag to differentiate the DCC buffers and other
pinned display buffers(which has TTM_PL_FLAG_CONTIGUOUS enabled).

If we use the TTM_PL_FLAG_CONTIGUOUS flag for DCC buffers, we may over
allocate for all the pinned display buffers unnecessarily that leads to
memory allocation failure.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:03 -04:00
Tim Huang
92549780e3 drm/amdgpu: fix unchecked return value warning for amdgpu_atombios
This resolves the unchecded return value warning reported by Coverity.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:03 -04:00
Tim Huang
c0277b9d7c drm/amdgpu: fix unchecked return value warning for amdgpu_gfx
This resolves the unchecded return value warning reported by Coverity.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:03 -04:00
Frank Min
86598c3819 drm/amdgpu: correct sdma7 max dw
correct sdma7 max dw into 8

Signed-off-by: Frank Min <Frank.Min@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:03 -04:00
Arunpravin Paneer Selvam
aa94b623cb drm/amdgpu: Add address alignment support to DCC buffers
Add address alignment support to the DCC VRAM buffers.

v2:
  - adjust size based on the max_texture_channel_caches values
    only for GFX12 DCC buffers.
  - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only
    for DCC buffers.
  - roundup non power of two DCC buffer adjusted size to nearest
    power of two number as the buddy allocator does not support non
    power of two alignments. This applies only to the contiguous
    DCC buffers.

v3:(Alex)
  - rewrite the max texture channel caches comparison code in an
    algorithmic way to determine the alignment size.

v4:(Alex)
  - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c
    and add a new gmc func callback for dcc alignment. If the callback
    is non-NULL, call it to get the alignment, otherwise, use the default.

v5:(Alex)
  - Set the Alignment to a default value if the callback doesn't exist.
  - Add the callback to amdgpu_gmc_funcs.

v6:
  - Fix checkpatch warning reported by Intel CI.

v7:(Christian)
  - remove the AMDGPU_GEM_CREATE_GFX12_DCC flag and keep a flag that
    checks the BO pinning and for a specific hw generation.

v8:(Christian)
  - move this check into gmc_v12_0_get_dcc_alignment.

v9:
  - Fix 32bit build errors

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:03 -04:00
Frank Min
5aacf8917f drm/amdgpu: change non-dcc buffer copy configuration
Without setting cpv bit and 7th ib dw, non-dcc buffer copy will have
random corruption

So set the cpv bit and clear the 7th ib dw for copy non-dcc buffers

Signed-off-by: Frank Min <Frank.Min@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:02 -04:00
Tao Zhou
b3a3c9a6b2 drm/amdgpu: report bad status in GPU recovery
Instead of printing GPU reset failed.

v2: add check for reset_context->src.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:02 -04:00
Tao Zhou
dd3e296289 drm/amdgpu: update bad state check in GPU recovery
Return RMA status without message print.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:02 -04:00
Joshua Ashton
434967aadb drm/amdgpu: Forward soft recovery errors to userspace
As we discussed before[1], soft recovery should be
forwarded to userspace, or we can get into a really
bad state where apps will keep submitting hanging
command buffers cascading us to a hard reset.

1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:01 -04:00
Yang Wang
671af06690 drm/amdgpu: remove RAS unused paramter 'err_addr'
- amdgpu_ras_error_statistic_ue_count()
- amdgpu_ras_error_statistic_ce_count()
- amdgpu_ras_error_statistic_de_count()

The parameter 'err_addr' is no longer used since following patch.

Fixes: a7e8467fbe ("drm/amdgpu: Remove unused code")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:01 -04:00
Likun Gao
c9875d0a78 drm/amdgpu: add golden setting for gc v12
Adding Manual GDB golden setting for gc v12
revision 0 ASIC.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:01 -04:00
Tao Zhou
792be2e23a drm/amdgpu: create function to check RAS RMA status
In the convenience of calling it globally.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:01 -04:00
Sunil Khatri
62341f7bc2 drm/amdgpu: optimize the padding for gfx_v9_4_3
Adding NOP packets one by one in the ring
does not use the CP efficiently.

Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.

Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:00 -04:00
Sunil Khatri
dee44a7cb5 drm/amdgpu: optimize the padding for gfx9
Adding NOP packets one by one in the ring
does not use the CP efficiently.

Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.

Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:11:00 -04:00
Tvrtko Ursulin
bb670c31e1 drm/amdpgu: Micro-optimise amdgpu_ring_commit
For some value of optimisation we can replace the division with an
bitwise and. And it even shrinks the code. Before:

     6c9:       53                      push   %rbx
     6ca:       4c 8b 47 08             mov    0x8(%rdi),%r8
     6ce:       31 d2                   xor    %edx,%edx
     6d0:       48 89 fb                mov    %rdi,%rbx
     6d3:       8b 87 c8 05 00 00       mov    0x5c8(%rdi),%eax
     6d9:       41 8b 48 04             mov    0x4(%r8),%ecx
     6dd:       f7 d0                   not    %eax
     6df:       21 c8                   and    %ecx,%eax
     6e1:       83 c1 01                add    $0x1,%ecx
     6e4:       83 c0 01                add    $0x1,%eax
     6e7:       f7 f1                   div    %ecx
     6e9:       89 d6                   mov    %edx,%esi
     6eb:       41 ff 90 88 00 00 00    call   *0x88(%r8)

After:

     6c9:       53                      push   %rbx
     6ca:       48 8b 57 08             mov    0x8(%rdi),%rdx
     6ce:       48 89 fb                mov    %rdi,%rbx
     6d1:       8b 87 c8 05 00 00       mov    0x5c8(%rdi),%eax
     6d7:       8b 72 04                mov    0x4(%rdx),%esi
     6da:       f7 d0                   not    %eax
     6dc:       21 f0                   and    %esi,%eax
     6de:       83 c0 01                add    $0x1,%eax
     6e1:       21 c6                   and    %eax,%esi
     6e3:       ff 92 88 00 00 00       call   *0x88(%rdx)

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:10:40 -04:00
Hawking Zhang
dfe9d047b1 drm/amdgpu: Add more types for boot time error reporting
Data abort exception and unknown errors are supported.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:10:17 -04:00
Likun Gao
9192c7613c drm/amdgpu: force to use legacy inv in mmhub
MMHUB v4.1.0 only support fixed cache mode, so
only use legacy invalidation accordingly.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 11:10:08 -04:00
Sunil Khatri
f59902ffcc drm/amdgpu: optimize the padding for gfx12
Adding NOP packets one by one in the ring
does not use the CP efficiently.

Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.

Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 10:44:37 -04:00
Yifan Zhang
62eefd10ac drm/amdgpu: use CPU for page table update if SDMA is unavailable
avoid using SDMA if it is unavailable.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 10:43:57 -04:00
Sunil Khatri
847e387e00 drm/amdgpu: optimize the padding for gfx11
Adding NOP packets one by one in the ring
does not use the CP efficiently.

Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.

Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 10:43:49 -04:00
Sunil Khatri
67c4ca9f79 drm/amdgpu: do not call insert_nop fn for zero count
Do not make a function call for zero size NOP as it
does not add anything in the ring and is unnecessary
function call.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 10:43:28 -04:00
Jonathan Kim
ee0a469cf9 drm/amdkfd: support per-queue reset on gfx9
Support per-queue reset for GFX9.  The recommendation is for the driver
to target reset the HW queue via a SPI MMIO register write.

Since this requires pipe and HW queue info and MEC FW is limited to
doorbell reports of hung queues after an unmap failure, scan the HW
queue slots defined by SET_RESOURCES first to identify the user queue
candidates to reset.

Only signal reset events to processes that have had a queue reset.

If queue reset fails, fall back to GPU reset.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 10:43:18 -04:00
Sunil Khatri
e89d2fec4c drm/amdgpu: optimize the padding for gfx10
Adding NOP packets one by one in the ring
does not use the CP efficiently.

Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.

Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 10:43:01 -04:00
Sunil Khatri
836af5be1b drm/amdgpu: Clean up the register dump via debugfs list
debugfs register list for dump is cleaned as it have
some issues related to proper power state of the IP
before register read.

Since the above mentioned is removed we no longer want
this to be dumped part of the devcoredump and hence
removed.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 10:42:22 -04:00
Sunil Khatri
17277da266 drm/amdgpu: Remove debugfs amdgpu_reset_dump_register_list
There are some problem with existing amdgpu_reset_dump_register_list
debugfs node. It is supposed to read a list of registers but there
could be cases when the IP is not in correct power state. Register
read in such cases could lead to more problems.

We are taking care of all such power states in devcoredump and
dumping the registers of need for debugging. So cleaning this code
and we dont need this functionality via debugfs anymore.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06 10:42:09 -04:00
Hamza Mahfooz
717b432b6d
Revert "drm/amd: Add power_saving_policy drm property to eDP connectors"
This reverts commit 9d8c094dda.

It was merged without meeting userspace requirements.

Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240802145946.48073-2-hamza.mahfooz@amd.com
2024-08-02 11:29:17 -04:00
Thomas Zimmermann
0e8655b4e8 Merge drm/drm-next into drm-misc-next
Backmerging to get a late RC of v6.10 before moving into v6.11.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-29 09:35:54 +02:00
Michael Chen
9038e25c80 drm/amdgpu: increase mes log buffer size for gfx12
MES firmware requires larger log buffer for gfx12. Allocate
proper buffer respectively for gfx11 and gfx12.

Signed-off-by: Michael Chen <michael.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 739d0f3e1f)
2024-07-27 18:10:12 -04:00
Christian König
f3572db3c0 drm/amdgpu: fix contiguous handling for IB parsing v2
Otherwise we won't get correct access to the IB.

v2: keep setting AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS to avoid problems in
    the VRAM backend.

Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3501
Fixes: e362b7c8f8 ("drm/amdgpu: Modify the contiguous flags behaviour")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fbfb5f0342)
2024-07-27 18:09:38 -04:00
Thomas Weißschuh
aeb81b62c7 drm/amdgpu: convert bios_hardcoded_edid to drm_edid
Instead of manually passing around 'struct edid *' and its size,
use 'struct drm_edid', which encapsulates a validated combination of
both.

As the drm_edid_ can handle NULL gracefully, the explicit checks can be
dropped.

Also save a few characters by transforming '&array[0]' to the equivalent
'array' and using 'max_t(int, ...)' instead of manual casts.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:35:05 -04:00
Sunil Khatri
13d8850a33 drm/amdgpu: trigger ip dump before suspend of IP's
Problem:
IP dump right now is done post suspend of all
IP's which for some IP's could change power
state and software state too which we do not want
to reflect in the dump as it might not be same at
the time of hang.

Solution:
IP should be dumped as close to the HW state when
the GPU was in hung state without trying to reinitialize
any resource.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:34:26 -04:00
Michael Chen
739d0f3e1f drm/amdgpu: increase mes log buffer size for gfx12
MES firmware requires larger log buffer for gfx12. Allocate
proper buffer respectively for gfx11 and gfx12.

Signed-off-by: Michael Chen <michael.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:32:05 -04:00
Sunil Khatri
076362d931 drm/amdgpu: print VCN instance dump for valid instance
VCN dump is dependent on power state of the ip. Dump is
valid if VCN was powered up at the time of ip dump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:31:10 -04:00
Venkata Narendra Kumar Gutta
25dd25f86e drm/amdgpu: Add MFD support for ISP I2C bus
ISP I2C bus device can't be enumerated via ACPI mechanism
since it shares the memory map with the AMDGPU.
So use the MFD mechanism for registering the ISP I2C device
and add the required resources.

Signed-off-by: Venkata Narendra Kumar Gutta <vengutta@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:28:58 -04:00
Christian König
fbfb5f0342 drm/amdgpu: fix contiguous handling for IB parsing v2
Otherwise we won't get correct access to the IB.

v2: keep setting AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS to avoid problems in
    the VRAM backend.

Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3501
Fixes: e362b7c8f8 ("drm/amdgpu: Modify the contiguous flags behaviour")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:28:47 -04:00
Sunil Khatri
cd162ae9bc drm/amdgpu: add print support for vcn_v3_0 ip dump
Add support for logging the registers in devcoredump
buffer for vcn_v3_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:28:41 -04:00
Sunil Khatri
58d283801d drm/amdgpu: add vcn_v3_0 ip dump support
Add support of vcn ip dump in the devcoredump
for vcn_v3_0.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:28:35 -04:00
Sunil Khatri
50d10d9271 drm/amdgpu: add macro to calculate offset with instance
Add macro definition which calculate offset of the
register with index override.

This is useful in case when there is an array of
registers which is common for all instances.
To read registers in that case it is easy to define
registers once and the index value is manually passed
to calculate proper offset of register for each instance.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:28:28 -04:00
Sunil Khatri
f3392e662e drm/amdgpu: add vcn ip dump ptr in vcn global struct
Add pointer to the vcn ip dump in the vcn global structure
to be accessible for all vcn version via global adev.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-27 17:28:17 -04:00
Alex Deucher
8155566a26 drm/amdgpu: properly handle vbios fake edid sizing
The comment in the vbios structure says:
// = 128 means EDID length is 128 bytes, otherwise the EDID length = ucFakeEDIDLength*128

This fake edid struct has not been used in a long time, so I'm
not sure if there were actually any boards out there with a non-128 byte
EDID, but align the code with the comment.

Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Reported-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lists.freedesktop.org/archives/amd-gfx/2024-June/109964.html
Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-25 17:41:46 -04:00
Christian König
83b501c179 drm/scheduler: remove full_recover from drm_sched_start
This was basically just another one of amdgpus hacks. The parameter
allowed to restart the scheduler without turning fence signaling on
again.

That this is absolutely not a good idea should be obvious by now since
the fences will then just sit there and never signal.

While at it cleanup the code a bit.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com
2024-07-25 14:05:12 +02:00
ZhenGuo Yin
5659b0c93a drm/amdgpu: reset vm state machine after gpu reset(vram lost)
[Why]
Page table of compute VM in the VRAM will lost after gpu reset.
VRAM won't be restored since compute VM has no shadows.

[How]
Use higher 32-bit of vm->generation to record a vram_lost_counter.
Reset the VM state machine when vm->genertaion is not equal to
the new generation token.

v2: Check vm->generation instead of calling drm_sched_entity_error
in amdgpu_vm_validate.
v3: Use new generation token instead of vram_lost_counter for check.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit 47c0388b05)
2024-07-24 17:30:49 -04:00
Tim Huang
fab1ead0ae drm/amdgpu: add missed harvest check for VCN IP v4/v5
To prevent below probe failure, add a check for models with VCN
IP v4.0.6 where VCN1 may be harvested.

v2:
Apply the same check to VCN IP v4.0 and v5.0.

[   54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[   54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43
c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02
00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00
[   54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286
[   54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX:
0000000000000000
[   54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI:
ffff99a82f680000
[   54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09:
0000000000000000
[   54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12:
0000000000000000
[   54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15:
0000000000000001
[   54.075400] FS:  00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000)
knlGS:0000000000000000
[   54.075988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4:
0000000000750ef0
[   54.076927] PKRU: 55555554
[   54.077132] Call Trace:
[   54.077319]  <TASK>
[   54.077484]  ? show_regs+0x69/0x80
[   54.077747]  ? __die+0x28/0x70
[   54.077979]  ? page_fault_oops+0x180/0x4b0
[   54.078286]  ? do_user_addr_fault+0x2d2/0x680
[   54.078610]  ? exc_page_fault+0x84/0x190
[   54.078910]  ? asm_exc_page_fault+0x2b/0x30
[   54.079224]  ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[   54.079941]  ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu]
[   54.080617]  vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu]
[   54.081316]  amdgpu_device_ip_set_powergating_state+0x64/0xc0
[amdgpu]
[   54.082057]  amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu]
[   54.082727]  amdgpu_ring_alloc+0x44/0x70 [amdgpu]
[   54.083351]  amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu]
[   54.084054]  amdgpu_ring_test_helper+0x22/0x90 [amdgpu]
[   54.084698]  vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu]
[   54.085307]  amdgpu_device_init+0x1f96/0x2780 [amdgpu]
[   54.085951]  amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu]
[   54.086591]  amdgpu_pci_probe+0x19f/0x550 [amdgpu]
[   54.087215]  local_pci_probe+0x48/0xa0
[   54.087509]  pci_device_probe+0xc9/0x250
[   54.087812]  really_probe+0x1a4/0x3f0
[   54.088101]  __driver_probe_device+0x7d/0x170
[   54.088443]  driver_probe_device+0x24/0xa0
[   54.088765]  __driver_attach+0xdd/0x1d0
[   54.089068]  ? __pfx___driver_attach+0x10/0x10
[   54.089417]  bus_for_each_dev+0x8e/0xe0
[   54.089718]  driver_attach+0x22/0x30
[   54.090000]  bus_add_driver+0x120/0x220
[   54.090303]  driver_register+0x62/0x120
[   54.090606]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   54.091255]  __pci_register_driver+0x62/0x70
[   54.091593]  amdgpu_init+0x67/0xff0 [amdgpu]
[   54.092190]  do_one_initcall+0x5f/0x330
[   54.092495]  do_init_module+0x68/0x240
[   54.092794]  load_module+0x201c/0x2110
[   54.093093]  init_module_from_file+0x97/0xd0
[   54.093428]  ? init_module_from_file+0x97/0xd0
[   54.093777]  idempotent_init_module+0x11c/0x2a0
[   54.094134]  __x64_sys_finit_module+0x64/0xc0
[   54.094476]  do_syscall_64+0x58/0x120
[   54.094767]  entry_SYSCALL_64_after_hwframe+0x6e/0x76

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit 0b071245dd)
2024-07-24 17:30:23 -04:00
Stanley.Yang
1a8825259a drm/amdgpu: Fix eeprom max record count
The eeprom table is empty before initializing,
set eeprom table version first before initializing.

Changed from V1:
	Reuse amdgpu_ras_set_eeprom_table_version function

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 015b8a2fdf)
2024-07-24 17:30:23 -04:00
YiPeng Chai
afac8c6554 drm/amdgpu: fix ras UE error injection failure issue
The ras command shared memory is allocated from
VRAM and the response status of the command
buffer will not be zero due to gpu being in
fatal error state after ras UE error injection.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8284951a6e)
2024-07-24 17:30:23 -04:00
Jane Jian
6728f55590 drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF
For VCN/JPEG 4.0.3, use only the local addressing scheme.

- Mask bit higher than AID0 range

v2
remain the case for mmhub use master XCC

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit caaf576292)
2024-07-24 17:30:23 -04:00
Lijo Lazar
485432d090 drm/amdgpu: Add empty HDP flush function to VCN v4.0.3
VCN 4.0.3 does not HDP flush with RRMT enabled. Instead, mmsch
will do the HDP flush.

This change is necessary for VCN v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 49cfaebe48)
2024-07-24 17:30:23 -04:00
Lijo Lazar
23df34997d drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3
JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead,
mmsch fw will do the flush.

This change is necessary for JPEG v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 585e3fdb36)
2024-07-24 17:30:23 -04:00
Ma Ke
df65aabef3 drm/amd/amdgpu: Fix uninitialized variable warnings
Return 0 to avoid returning an uninitialized variable r.

Cc: stable@vger.kernel.org
Fixes: 230dd6bb61 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6472de66c0)
2024-07-24 17:30:23 -04:00
David Belanger
73048bda46 drm/amdgpu: Fix atomics on GFX12
If PCIe supports atomics, configure register to prevent DF from
breaking atomics in separate load/store operations.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 666f14cab2)
2024-07-24 17:30:23 -04:00
Alex Deucher
a03ebf1163 drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell
We seem to have a case where SDMA will sometimes miss a doorbell
if GFX is entering the powergating state when the doorbell comes in.
To workaround this, we can update the wptr via MMIO, however,
this is only safe because we disallow gfxoff in begin_ring() for
SDMA 5.2 and then allow it again in end_ring().

Enable this workaround while we are root causing the issue with
the HW team.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440
Tested-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit f2ac526349)
2024-07-24 17:29:57 -04:00
ZhenGuo Yin
47c0388b05 drm/amdgpu: reset vm state machine after gpu reset(vram lost)
[Why]
Page table of compute VM in the VRAM will lost after gpu reset.
VRAM won't be restored since compute VM has no shadows.

[How]
Use higher 32-bit of vm->generation to record a vram_lost_counter.
Reset the VM state machine when vm->genertaion is not equal to
the new generation token.

v2: Check vm->generation instead of calling drm_sched_entity_error
in amdgpu_vm_validate.
v3: Use new generation token instead of vram_lost_counter for check.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-24 14:45:24 -04:00
Tim Huang
0b071245dd drm/amdgpu: add missed harvest check for VCN IP v4/v5
To prevent below probe failure, add a check for models with VCN
IP v4.0.6 where VCN1 may be harvested.

v2:
Apply the same check to VCN IP v4.0 and v5.0.

[   54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[   54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43
c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02
00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00
[   54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286
[   54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX:
0000000000000000
[   54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI:
ffff99a82f680000
[   54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09:
0000000000000000
[   54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12:
0000000000000000
[   54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15:
0000000000000001
[   54.075400] FS:  00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000)
knlGS:0000000000000000
[   54.075988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4:
0000000000750ef0
[   54.076927] PKRU: 55555554
[   54.077132] Call Trace:
[   54.077319]  <TASK>
[   54.077484]  ? show_regs+0x69/0x80
[   54.077747]  ? __die+0x28/0x70
[   54.077979]  ? page_fault_oops+0x180/0x4b0
[   54.078286]  ? do_user_addr_fault+0x2d2/0x680
[   54.078610]  ? exc_page_fault+0x84/0x190
[   54.078910]  ? asm_exc_page_fault+0x2b/0x30
[   54.079224]  ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[   54.079941]  ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu]
[   54.080617]  vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu]
[   54.081316]  amdgpu_device_ip_set_powergating_state+0x64/0xc0
[amdgpu]
[   54.082057]  amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu]
[   54.082727]  amdgpu_ring_alloc+0x44/0x70 [amdgpu]
[   54.083351]  amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu]
[   54.084054]  amdgpu_ring_test_helper+0x22/0x90 [amdgpu]
[   54.084698]  vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu]
[   54.085307]  amdgpu_device_init+0x1f96/0x2780 [amdgpu]
[   54.085951]  amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu]
[   54.086591]  amdgpu_pci_probe+0x19f/0x550 [amdgpu]
[   54.087215]  local_pci_probe+0x48/0xa0
[   54.087509]  pci_device_probe+0xc9/0x250
[   54.087812]  really_probe+0x1a4/0x3f0
[   54.088101]  __driver_probe_device+0x7d/0x170
[   54.088443]  driver_probe_device+0x24/0xa0
[   54.088765]  __driver_attach+0xdd/0x1d0
[   54.089068]  ? __pfx___driver_attach+0x10/0x10
[   54.089417]  bus_for_each_dev+0x8e/0xe0
[   54.089718]  driver_attach+0x22/0x30
[   54.090000]  bus_add_driver+0x120/0x220
[   54.090303]  driver_register+0x62/0x120
[   54.090606]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   54.091255]  __pci_register_driver+0x62/0x70
[   54.091593]  amdgpu_init+0x67/0xff0 [amdgpu]
[   54.092190]  do_one_initcall+0x5f/0x330
[   54.092495]  do_init_module+0x68/0x240
[   54.092794]  load_module+0x201c/0x2110
[   54.093093]  init_module_from_file+0x97/0xd0
[   54.093428]  ? init_module_from_file+0x97/0xd0
[   54.093777]  idempotent_init_module+0x11c/0x2a0
[   54.094134]  __x64_sys_finit_module+0x64/0xc0
[   54.094476]  do_syscall_64+0x58/0x120
[   54.094767]  entry_SYSCALL_64_after_hwframe+0x6e/0x76

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-24 14:44:05 -04:00
Yifan Zhang
3b37e2725a drm/amdgpu: skip kfd init if GFX is not ready.
avoid kfd init crash in that case.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-24 14:43:58 -04:00
Alex Deucher
bd4bea5ab2 drm/amdgpu/gfx9.4.3: Enable bad opcode interrupt
For the bad opcode case, it will cause CP/ME hang.
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.
And the driver needs to perform a vmid reset when receiving the interrupt.

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:46 -04:00
Alex Deucher
238352b494 drm/amdgpu/gfx9: Enable bad opcode interrupt
For the bad opcode case, it will cause CP/ME hang.
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.
And the driver needs to perform a vmid reset when receiving the interrupt.

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:46 -04:00
Jesse Zhang
5ebca62eb8 drm/amdgpu/gfx12: Enable bad opcode interrupt
For the bad opcode case, it will cause CP/ME hang.
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.
And the driver needs to perform a vmid reset when receiving the interrupt.

v2: update irq naming (drop priv) (Alex)

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:46 -04:00
Jesse Zhang
bc6c2a6f64 drm/amdgpu/gfx10: Enable bad opcode interrupt
For the bad opcode case, it will cause CP/ME hang.
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.
And the driver needs to perform a vmid reset when receiving the interrupt.

v2: update irq naming (drop priv) (Alex)

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Jesse Zhang
a790902237 drm/amdgpu/gfx11: Enable bad opcode interrupt
For the bad opcode case, it will cause CP/ME hang.
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.
And the driver needs to perform a vmid reset when receiving the interrupt.

v2: update irq naming (drop priv) (Alex)

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Alex Deucher
acddd5cf70 drm/amdgpu/gfx: add bad opcode interrupt
Add the irq source for bad opcodes.

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Alex Deucher
48695573d2 drm/amdgpu/gfx9: properly handle error ints on all pipes
Need to handle the interrupt enables for all pipes.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Alex Deucher
3987932176 drm/amdgpu/gfx12: properly handle error ints on all pipes
Need to handle the interrupt enables for all pipes.

v2: fix indexing (Jessie)

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Alex Deucher
2662b7d9d8 drm/amdgpu/gfx11: properly handle error ints on all pipes
Need to handle the interrupt enables for all pipes.

v2: fix indexing (Jessie)

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Alex Deucher
4b95cec689 drm/amdgpu/gfx10: properly handle error ints on all pipes
Need to handle the interrupt enables for all pipes.

v2: fix indexing (Jessie)

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Alex Deucher
af4808ac40 drm/amdgpu/gfx12: enable wave kill for compute queues
It should work the same for compute as well as gfx.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Alex Deucher
f53f526f70 drm/amdgpu/gfx11: enable wave kill for compute queues
It should work the same for compute as well as gfx.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Alex Deucher
a2737c404c drm/amdgpu/gfx10: enable wave kill for compute queues
It should work the same for compute as well as gfx.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
Stanley.Yang
015b8a2fdf drm/amdgpu: Fix eeprom max record count
The eeprom table is empty before initializing,
set eeprom table version first before initializing.

Changed from V1:
	Reuse amdgpu_ras_set_eeprom_table_version function

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:45:45 -04:00
YiPeng Chai
8284951a6e drm/amdgpu: fix ras UE error injection failure issue
The ras command shared memory is allocated from
VRAM and the response status of the command
buffer will not be zero due to gpu being in
fatal error state after ras UE error injection.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:43:06 -04:00
Philip Yang
834368eab3 drm/amdkfd: Ensure user queue buffers residency
Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap
BO from the GPU if the bo_va queue_refcount is not zero.

Create queue to increase the bo_va queue_refcount, destroy queue to
decrease the bo_va queue_refcount, to ensure the queue buffers mapped on
the GPU when queue is active.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:54 -04:00
Alex Deucher
22a9d5cbf8 drm/amdgpu/gfx9.4.3: implement wave kill for compute queues
Based on gfx9.0 implementation.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:45 -04:00
Sunil Khatri
eac3b274aa drm/amdgpu: add print support for sdma_v_4_4_2 ip_dump
Add print support for ip dump for sdma_v_4_4_2 in
devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:37 -04:00
Alex Deucher
9c7e69d2e1 drm/amdgpu/gfx9: enable wave kill for compute queues
It should work the same for compute as well as gfx.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:11 -04:00
Alex Deucher
7e60ecc2b7 drm/amdgpu/gfx8: enable wave kill for compute queues
It should work the same for compute as well as gfx.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:08 -04:00
Alex Deucher
12fb3e9c88 drm/amdgpu/gfx7: enable wave kill for compute queues
It should work the same for compute as well as gfx.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:03 -04:00
Philip Yang
fb91065851 drm/amdkfd: Refactor queue wptr_bo GART mapping
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo
reference from queue write_ptr if it is mapped to the KFD node with
expected size.

Add wptr_bo to structure queue_properties because structure queue is
allocated after queue buffers are validated, then we can remove wptr_bo
parameter from pqm_create_queue.

Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART
mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue,
the same location with queue ctx_bo GART mapping.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:35:04 -04:00
Sunil Khatri
db54a725d5 drm/amdgpu: Add sdma_v4_4_2 ip dump for devcoredump
Add ip dump for sdma_v4_4_2 for devcoredump for all
instances of sdma.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:58 -04:00
Sunil Khatri
a11b36ba9c drm/amdgpu: add print support for sdma_v_4_0 ip_dump
Add print support for ip dump for sdma_v_4_0 in
devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:49 -04:00
Philip Yang
c86ad39140 drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer,
otherwise amdgpu_bo_unref clear the local variable, the original pointer
not set to NULL, this could cause use-after-free bug.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:44 -04:00
Philip Yang
f9e292cbba drm/amdkfd: kfd_bo_mapped_dev support partition
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter
instead of adev, to support spatial partition. This is only used by CRIU
checkpoint restore now. No functional change.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:36 -04:00
Jane Jian
caaf576292 drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF
For VCN/JPEG 4.0.3, use only the local addressing scheme.

- Mask bit higher than AID0 range

v2
remain the case for mmhub use master XCC

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:28 -04:00
Lijo Lazar
49cfaebe48 drm/amdgpu: Add empty HDP flush function to VCN v4.0.3
VCN 4.0.3 does not HDP flush with RRMT enabled. Instead, mmsch
will do the HDP flush.

This change is necessary for VCN v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:18 -04:00
Lijo Lazar
585e3fdb36 drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3
JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead,
mmsch fw will do the flush.

This change is necessary for JPEG v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:12 -04:00
Pierre-Eric Pelloux-Prayer
fec5f8e8c6 drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit
Before this commit, only submits with both a BO_HANDLES chunk and a
'bo_list_handle' would be rejected (by amdgpu_cs_parser_bos).

But if UMD sent multiple BO_HANDLES, what would happen is:
* only the last one would be really used
* all the others would leak memory as amdgpu_cs_p1_bo_handles would
  overwrite the previous p->bo_list value

This commit rejects submissions with multiple BO_HANDLES chunks to
match the implementation of the parser.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:57 -04:00
Sunil Khatri
80237bfc03 drm/amdgpu: Add sdma_v4_0 ip dump for devcoredump
Add ip dump for sdma_v4_0 for devcoredump for all
instances of sdma.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:52 -04:00
Sunil Khatri
abf839f5eb drm/amdgpu: add print support for sdma_v_7_0 ip_dump
Add print support for ip dump for sdma_v_7_0 in
devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:45 -04:00
Ma Ke
6472de66c0 drm/amd/amdgpu: Fix uninitialized variable warnings
Return 0 to avoid returning an uninitialized variable r.

Cc: stable@vger.kernel.org
Fixes: 230dd6bb61 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:38 -04:00
Ma Ke
93381e6b61 drm/amdgpu: fix a possible null pointer dereference
In amdgpu_connector_add_common_modes(), the return value of drm_cvt_mode()
is assigned to mode, which will lead to a NULL pointer dereference on
failure of drm_cvt_mode(). Add a check to avoid npd.

Cc: stable@vger.kernel.org
Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:25 -04:00
David Belanger
666f14cab2 drm/amdgpu: Fix atomics on GFX12
If PCIe supports atomics, configure register to prevent DF from
breaking atomics in separate load/store operations.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:17 -04:00
Sunil Khatri
4df9e2200f drm/amdgpu: Add sdma_v7_0 ip dump for devcoredump
Add ip dump for sdma_v7_0 for devcoredump for all
instances of sdma.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:10 -04:00
Alex Deucher
f2ac526349 drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell
We seem to have a case where SDMA will sometimes miss a doorbell
if GFX is entering the powergating state when the doorbell comes in.
To workaround this, we can update the wptr via MMIO, however,
this is only safe because we disallow gfxoff in begin_ring() for
SDMA 5.2 and then allow it again in end_ring().

Enable this workaround while we are root causing the issue with
the HW team.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440
Tested-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:32:58 -04:00
YiPeng Chai
a7e8467fbe drm/amdgpu: Remove unused code
Remove unused code.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:32:20 -04:00
YiPeng Chai
56631dee29 drm/amdgpu: optimize logging deferred error info
1. Use pa_pfn as the radix-tree key index to log
   deferred error info.
2. Use local array to store a row of bad pages.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:32:14 -04:00
YiPeng Chai
27cdf8c3ca drm/amdgpu: optimize umc v12 address conversion function
Split into 3 parts:
1. Convert soc physical address via ras ta.
2. Expand bad pages from soc physical address.
3. Dump bad address info.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:31:59 -04:00
Sunil Khatri
e84f798a93 drm/amdgpu: add print support for sdma_v_5_0 ip_dump
Add support for ip dump for sdma_v_5_0 in devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri
0f1a93704a drm/amdgpu: Add sdma_v5_0 ip dump for devcoredump
Add ip dump for sdma_v5_0 for devcoredump for all
instances of sdma.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri
ccb54d7d91 drm/amdgpu: add print support for sdma_v_6_0 ip_dump
Add print support for ip dump for sdma_v_6_0 in
devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri
1eba165aa4 drm/amdgpu: Add sdma_v6_0 ip dump for devcoredump
Add ip dump for sdma_v6_0 for devcoredump for all
instances of sdma.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri
00bb3223bf drm/amdgpu: fix the print message in devcoredump
Fix the memory type logged for gtt memory size
which is wrongly logged as visible vram size.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri
43796955a8 drm/amdgpu: fix the extra space between two functions
fix extra line space between two functions.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri
08bed7e4ff drm/amdgpu: add print support for sdma_v_5_2 ip_dump
Add support for ip dump for sdma_v_5_2 in devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:08 -04:00
Sunil Khatri
f763c3b543 drm/amdgpu: Add sdma_v5_2 ip dump for devcoredump
Add ip dump for sdma_v5_2 for devcoredump for all
instances of sdma.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:08 -04:00
YiPeng Chai
b3fb79cda5 drm/amdgpu: add mutex to protect ras shared memory
Add mutex to protect ras shared memory.

v2:
  Add TA_RAS_COMMAND__TRIGGER_ERROR command call
  status check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:44:30 -04:00
Boyuan Zhang
7d75ef3736 drm/amdgpu/vcn: not pause dpg for unified queue
For unified queue, DPG pause for encoding is done inside VCN firmware,
so there is no need to pause dpg based on ring type in kernel.

For VCN3 and below, pausing DPG for encoding in kernel is still needed.

v2: add more comments
v3: update commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:44:04 -04:00
Boyuan Zhang
ecfa23c8df drm/amdgpu/vcn: identify unified queue in sw init
Determine whether VCN using unified queue in sw_init, instead of calling
functions later on.

v2: fix coding style

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:43:48 -04:00
Aurabindo Pillai
28814be882 drm/amd: Bump KMS_DRIVER_MINOR version
Increase the KMS minor version to indicate GFX12 DCC support since this
contains a major change in how DCC is managed across IPs like GFX, DCN
etc. This will be used mainly by userspace like Mesa to figure out
DCC support on GFX12 hardware.

v2: fix version number (Alex)

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-15 17:41:23 -04:00
Alex Deucher
1cff1010be drm/amdgpu/mes12: add missing opcode string
Fixes the indexing of the string array.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Alex Deucher
478cb8badf drm/amdgpu/mes11: update opcode strings
Add new packet.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Mario Limonciello
9d8c094dda
drm/amd: Add power_saving_policy drm property to eDP connectors
When the `power_saving_policy` property is set to bit mask
"Require color accuracy" ABM should be disabled immediately and
any requests by sysfs to update will return an -EBUSY error.

When the `power_saving_policy` property is set to bit mask
"Require low latency" PSR should be disabled.

When the property is restored to an empty bit mask ABM and PSR
can be enabled again.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703051722.328-3-mario.limonciello@amd.com
2024-07-10 17:00:07 -04:00
Alex Deucher
8030f6533e drm/amdgpu: remove exp hw support check for gfx12
Enable it by default.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 13:45:57 -04:00
YiPeng Chai
e23300dfff drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed
The problem case is as follows:
1. GPU A triggers a gpu ras reset, and GPU A drives
   GPU B to also perform a gpu ras reset.
2. After gpu B ras reset started, gpu B queried a DE
   data. Since the DE data was queried in the ras reset
   thread instead of the page retirement thread, bad
   page retirement work would not be triggered. Then
   even if all gpu resets are completed, the bad pages
   will be cached in RAM until GPU B's bad page retirement
   work is triggered again and then saved to eeprom.

This patch can save the bad pages to eeprom in time after gpu
ras reset is completed.

v2:
  1. Add the above description to code comments.
  2. Reuse existing function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:41 -04:00
YiPeng Chai
c04706914d drm/amdgpu: flush all cached ras bad pages to eeprom
Before uninstalling gpu driver, flush all cached ras
bad pages to eeprom.

v2:
  Put the same code into a function and reuse the function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:35 -04:00
Sunil Khatri
c39385710c drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX12.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:22 -04:00
Sunil Khatri
a85cc86cce drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX11.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:04 -04:00
Alex Deucher
7d570f56f1 drm/amdgpu/job: Replace DRM_INFO/ERROR logging
Use the dev_info/err variants so we get per device logging
in multi-GPU cases.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:55 -04:00
Sunil Khatri
948f2828a6 drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX10.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:48 -04:00
Lijo Lazar
d02ddefc7e drm/amdgpu: Initialize VF partition mode
For SOCs with GFX v9.4.3, a VF may have multiple compute partitions.
Fetch the partition information during init and initialize partition
nodes. There is no support to switch partition mode in VF mode, hence
disable the same.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:28 -04:00
Gavin Wan
5d64af40e3 drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping.
sdma has 2 instances in SRIOV cpx mode. Odd numbered VFs have
sdma0/sdma1 instances. Even numbered vfs have sdma2/sdma3. For
Even numbered vfs, the sdma2 & sdma3 (irq srouce id
CLIENTID_SDMA2 and CLIENTID_SDMA3) should map to irq seq 0 & 1.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:19 -04:00
Thomas Hellström
4c44f89c5d drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves
To address the problem with hitches moving when bulk move
sublists are lru-bumped, register the list cursors with the
ttm_lru_bulk_move structure when traversing its list, and
when lru-bumping the list, move the cursor hitch to the tail.
This also means it's mandatory for drivers to call
ttm_lru_bulk_move_init() and ttm_lru_bulk_move_fini() when
initializing and finalizing the bulk move structure, so add
those calls to the amdgpu- and xe driver.

Compared to v1 this is slightly more code but less fragile
and hopefully easier to understand.

Changes in previous series:
- Completely rework the functionality
- Avoid a NULL pointer dereference assigning manager->mem_type
- Remove some leftover code causing build problems
v2:
- For hitch bulk tail moves, store the mem_type in the cursor
  instead of with the manager.
v3:
- Remove leftover mem_type member from change in v2.
v6:
- Add some lockdep asserts (Matthew Brost)
- Avoid NULL pointer dereference (Matthew Brost)
- No need to check bo->resource before dereferencing
  bo->bulk_move (Matthew Brost)

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240705153206.68526-5-thomas.hellstrom@linux.intel.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2024-07-09 12:39:33 +02:00
Zhigang Luo
ee98fb71ba drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1
to avoid reading wrong WPTR from doorbell in sriov vf, set
CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:56:27 -04:00
Yang Wang
59f488be76 drm/amdgpu: add ras event state device attribute support
add amdgpu ras 'event_state' sysfs device attribute support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:56:13 -04:00
Yang Wang
12b435a40c drm/amdgpu: add ras POSION_CONSUMPTION event id support
add amdgpu ras POSION_CONSUMPTION event id support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:37 -04:00
Yang Wang
5b9de2596f drm/amdgpu: add ras POSION_CREATION event id support
add amdgpu ras POSION_CREATION event id support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:18 -04:00
Yang Wang
75ac6a2506 drm/amdgpu: refine amdgpu ras event id core code
v1:
- use unified event id to manage ras events
- add a new function amdgpu_ras_query_error_status_with_event() to accept
  event type as parameter.

v2:
add a warn log to show the location of function failure
when calling amdgpu_ras_mark_event(). (Tao Zhou)

v3:
change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL.

v4:
rename amdgpu_ras_get_recovery_event() to
amdgpu_ras_get_fatal_error_event().

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:11 -04:00
Christian König
320debca1b drm/amdgpu: reject gang submit on reserved VMIDs
A gang submit won't work if the VMID is reserved and we can't flush out
VM changes from multiple engines at the same time.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:50:53 -04:00
Saleemkhan Jamadar
b6ad109166 drm/amdgpu: enable dpg for vcn and jpeg on GC 11_5_2
DPG mode is enabled for vcn and jpeg on VCN v4_0_5

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:50:40 -04:00
Yang Wang
332210c13a drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOG
remove redundant semicolons in RAS_EVENT_LOG to avoid
code format check warning.

Fixes: b712d7c201 ("drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:43 -04:00
Frank Min
54837bd2be drm/amdgpu: restore dcc bo tilling configs while moving
While moving buffer which has dcc tiling config, it is needed to restore
its original dcc tiling.

1. extend copy flag to cover tiling bits
2. add logic to restore original dcc tiling config

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:27 -04:00
Sunil Khatri
c8714ac982 drm/amdgpu: add gfx queue support for gfx12 ipdump
Add support of all the CP GFX queues for gfx12 ipdump
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:22 -04:00
Sunil Khatri
495e6173a4 drm/amdgpu: add cp queue registers for gfx12 ipdump
Add gfx12 support of CP queue registers for all queues
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:15 -04:00
Sunil Khatri
f0c6b79bfc drm/amdgpu: enable redirection of irq's for IH v7.0
Enable redirection of irq for pagefaults for specific
clients to avoid overflow without dropping interrupts.

So here we redirect the interrupts to another IH ring
i.e ring1 where only these interrupts are processed.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:09 -04:00
Sunil Khatri
906219ec94 drm:amdgpu: enable IH ring1 for IH v7.0
We need IH ring1 for handling the pagefault
interrupts which over flow in default
ring for specific usecases.

Enable ring1 allows software to redirect
high interrupts to ring1 from default IH
ring.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:02 -04:00
Yifan Zha
33f23fc315 drm/amdgpu: Set no_hw_access when VF request full GPU fails
[Why]
If VF request full GPU access and the request failed,
the VF driver can get stuck accessing registers for an extended period during
the unload of KMS.

[How]
Set no_hw_access flag when VF request for full GPU access fails
This prevents further hardware access attempts, avoiding the prolonged
stuck state.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:56 -04:00
Sunil Khatri
2262acad0a drm/amdgpu: add print support for gfx12 ipdump
Add support of gfx12 ipdump print so devcoredump
could trigger it to dump the captured registers
in devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:49 -04:00
Sunil Khatri
fbbbb62112 drm/amdgpu: add gfx12 register support in ipdump
Add general registers of gfx12 in ipdump for
devcoredump support.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:43 -04:00
Frank Min
ffcc5745ed drm/amdgpu: update gfxhub client id for gfx12
update gfxhub client id for gfx12

Signed-off-by: Frank Min <Frank.Min@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:36 -04:00
Tim Huang
064d92436b drm/amd/pm: avoid to load smu firmware for APUs
Certain call paths still load the SMU firmware for APUs,
which needs to be skipped.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:30 -04:00
YiPeng Chai
78347b651a drm/amdgpu: sysfs node disable query error count during gpu reset
Sysfs node disable query error count during gpu reset.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:14 -04:00
Daniel Vetter
6be146cf57 amd-drm-next-6.11-2024-07-03:
amdgpu:
 - Use vmalloc for dc_state
 - Replay fixes
 - Freesync fixes
 - DCN 4.0.1 fixes
 - DML fixes
 - DCC updates
 - Misc code cleanups and bug fixes
 - 8K display fixes
 - DCN 3.5 fixes
 - Restructure DIO code
 - DML1 fixes
 - DML2 fixes
 - GFX11 fix
 - GFX12 updates
 - GFX12 modifiers fixes
 - RAS fixes
 - IP dump fixes
 - Add some updated IP version checks
 _ Silence UBSAN warning
 
 radeon:
 - GPUVM fix
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZoW9RgAKCRC93/aFa7yZ
 2AfjAQDJH95ijhdClk4S7t0ofjam7UPhVkJiUl8u73BNbuul/QEAk7RadnWw8SwC
 wvSXMCyHY3XszMJ1AdWG+4uwq5axPQE=
 =aWkn
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-07-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-07-03:

amdgpu:
- Use vmalloc for dc_state
- Replay fixes
- Freesync fixes
- DCN 4.0.1 fixes
- DML fixes
- DCC updates
- Misc code cleanups and bug fixes
- 8K display fixes
- DCN 3.5 fixes
- Restructure DIO code
- DML1 fixes
- DML2 fixes
- GFX11 fix
- GFX12 updates
- GFX12 modifiers fixes
- RAS fixes
- IP dump fixes
- Add some updated IP version checks
_ Silence UBSAN warning

radeon:
- GPUVM fix

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703211314.2041893-1-alexander.deucher@amd.com
2024-07-05 12:02:12 +02:00
Daniel Vetter
71e9f407fd amd-drm-next-6.11-2024-06-28:
amdgpu:
 - JPEG 5.x fixes
 - More FW loading cleanups
 - Misc code cleanups
 - GC 12.x fixes
 - ASPM fix
 - DCN 4.0.1 updates
 - SR-IOV fixes
 - HDCP fix
 - USB4 fixes
 - Silence UBSAN warnings
 - MES submission fixes
 - Update documentation for new products
 - DCC updates
 - Initial ISP 4.x plumbing
 - RAS fixes
 - Misc small fixes
 
 amdkfd:
 - Fix missing unlock in error path for adding queues
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZn8qfAAKCRC93/aFa7yZ
 2H9QAQCZZsu8vps+HVUY5vihNXNL9TdUbrrroQBEyk+DsJzE9QD8CJW8/oiP8/fa
 /EsAWXNZUH+mPxRaVWTxp0j2P941pww=
 =uZbG
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-06-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-06-28:

amdgpu:
- JPEG 5.x fixes
- More FW loading cleanups
- Misc code cleanups
- GC 12.x fixes
- ASPM fix
- DCN 4.0.1 updates
- SR-IOV fixes
- HDCP fix
- USB4 fixes
- Silence UBSAN warnings
- MES submission fixes
- Update documentation for new products
- DCC updates
- Initial ISP 4.x plumbing
- RAS fixes
- Misc small fixes

amdkfd:
- Fix missing unlock in error path for adding queues

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240628213135.427214-1-alexander.deucher@amd.com
2024-07-05 11:39:23 +02:00
Daniel Vetter
86634fa4e6 Linux 6.10-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmaB0NweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkvwH/36UJRk/o6wvXnyH
 E6QjCSWo2226APyWks22NjtC3I/8Iqdvkneuh6wG0qL2sXAB078EMjUq5R81bF8H
 wWFBJwetjYTp8GEyLioMEb2wCH/J3R29dLFC4UYTplafXRGP6//xcpJaKmTxcgdR
 31IzvTPXbApZ7L3k1U6rA2bK9PNKcFCOvZlrNMUCuwMrabymHsDfOUt1DqXyg2xp
 zjqiWYBwlklozmgawSWt/mdEgkWuTcAbg+KyqDVQF59s9aj/OOwZ0j+HACq5V8CM
 quTPIAYL6CC9p7uxa69lGr/sgC0Is/BZLPX7RTZAwCgarGvnX+1HUsjDcaFCtrVg
 O6fPUV8=
 =pgUx
 -----END PGP SIGNATURE-----

Merge v6.10-rc6 into drm-next

The exynos-next pull is based on a newer -rc than drm-next. hence
backmerge first to make sure the unrelated conflicts we accumulated
don't end up randomly in the exynos merge pull, but are separated out.

Conflicts are all benign: Adjacent changes in amdgpu and fbdev-dma
code, and cherry-pick conflict in xe.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-07-05 10:47:28 +02:00
Sunil Khatri
ea67deb03c drm/amdgpu: fix out of bounds access in gfx11 during ip dump
During ip dump in gfx11 the index variable is reused but is
not reinitialized to 0 and this causes the index calculation
to be wrong and access out of bound access.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:07:16 -04:00
Tim Huang
94845ea057 drm/amdgpu: add firmware for PSP IP v14.0.4
This patch is to add firmware for PSP 14.0.4.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:07:10 -04:00
Tim Huang
b709f949f0 drm/amdgpu: enable mode2 reset for SMU IP v14.0.4
Set the default reset method to mode2 for SMU 14.0.4.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:07:02 -04:00
Tim Huang
38a16bfe6f drm/amdgpu: add SMU IP v14.0.4 discovery support
This patch is to add SMU 14.0.4 support

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:57 -04:00
Tim Huang
9cd2ad14d8 drm/amdgpu: add PSP IP v14.0.4 discovery support
This patch is to add PSP 14.0.4 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:36 -04:00
Tim Huang
c7c3f786b9 drm/amdgpu: add PSP IP v14.0.4 support
This patch is to add PSP 14.0.4 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:30 -04:00
Tim Huang
614a9f5ed5 drm/amdgpu: add firmware for VPE IP v6.1.3
This patch is to add firmware for VPE 6.1.3.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:24 -04:00
Tim Huang
ca15cd559f drm/amdgpu: add VPE IP v6.1.3 discovery support
This patch is to add VPE 6.1.3 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:19 -04:00
Tim Huang
f3e2a425c6 drm/amdgpu: add VPE IP v6.1.3 support
This patch is to add VPE 6.1.3 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:13 -04:00
Tim Huang
410bb279a8 drm/amdgpu: Add NBIO IP v7.11.3 support
Enable setting soc21 common clockgating for NBIO 7.11.3.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:06 -04:00
Tim Huang
5aea871694 drm/amdgpu: add NBIO IP v7.11.3 discovery support
This patch is to add NBIO 7.11.3 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:00 -04:00
Tim Huang
6857669a22 drm/amdgpu: add firmware for SDMA IP v6.1.2
This patch is to add firmware for SDMA 6.1.2.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:53 -04:00
Tim Huang
dfeccf4d54 drm/amdgpu: add SDMA IP v6.1.2 discovery support
This patch is to add SDMA 6.1.2 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:41 -04:00
Tim Huang
4448b1ff4d drm/amdgpu: add firmware for GC IP v11.5.2
This patch is to add firmware for GC 11.5.2.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:35 -04:00
Tim Huang
23c1ea0241 drm/amdgpu: add GC IP v11.5.2 to GC 11.5.0 family
This patch is to add GC 11.5.2 to GC 11.5.0 family.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:23 -04:00
Tim Huang
43e4cc2299 drm/amdgpu: add GC IP v11.5.2 soc21 support
Add CG and PG flags for GFX IP v11.5.2 and
PG flags for VCN IP v4.0.5.

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Li Ma <li.ma@amd.com>
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:15 -04:00
Tim Huang
98392782df drm/amdgpu: add tmz support for GC IP v11.5.2
Add tmz support for GC 11.5.2.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:09 -04:00
Tim Huang
02cf3ed627 drm/amdgpu: add GFXHUB IP v11.5.2 support
This patch is to add GFXHUB 11.5.2 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:03 -04:00
Tim Huang
b6a343df46 drm/amdgpu: initialize GC IP v11.5.2
Initialize GC 11.5.2 and set gfx hw configuration.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:04:57 -04:00
Sunil Khatri
425c4a6f8b drm/amdgpu: fix out of bounds access in gfx10 during ip dump
During ip dump in gfx10 the index variable is reused but is
not reinitialized to 0 and this causes the index calculation
to be wrong and access out of bound access.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:04:49 -04:00
Marek Olšák
f340f2bad1 drm/amdgpu: rewrite convert_tiling_flags_to_modifier_gfx12
There were multiple bugs, like checking SWIZZLE_MODE before checking
GFX12_SWIZZLE_MODE, which has undefined behavior.

The function had no effect before (it always returned -EINVAL).

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Hawking Zhang
d3dbccacfd drm/amdgpu: Fix hbm stack id in boot error report
To align with firmware, hbm id field 0x1 refers to
hbm stack 0, 0x2 refers to hbm statck 1.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Marek Olšák
0d3157d04d drm/amdgpu: add amdgpu_framebuffer::gfx12_dcc
amdgpu_framebuffer doesn't have tiling_flags, so we need this.

amdgpu_display_get_fb_info never gets NULL parameters, so checking for NULL
was useless.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Marek Olšák
8dd1426e2c drm/amdgpu: handle gfx12 in amdgpu_display_verify_sizes
It verified GFX9-11 swizzle modes on GFX12, which has undefined behavior.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Hawking Zhang
c2fad73174 drm/amdgpu: Correct register used to clear fault status
Driver should write to fault_cntl registers to do
one-shot address/status clear.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Marek Olšák
fd536d2e12 drm/amdgpu: don't use amdgpu_lookup_format_info on gfx12
It only uses fields for GFX9-11 related to the separate DCC buffer,
which doesn't exist in GFX12.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:46 -04:00
Marek Olšák
30fb9cad6f drm/amdgpu/gfx12: remove GDS leftovers
GDS doesn't exist in gfx12. The incomplete packet allows userspace to hang
the hw from the kernel.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:46 -04:00
Marek Olšák
e5f6bfe402 drm/amdgpu/gfx12: remove superfluous cache flags
If any INV flags are needed, they should be executed via ACQUIRE_MEM
before INDIRECT_BUFFER.

GLM flags are also removed because the hw ignores them.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:46 -04:00
Marek Olšák
b16ec6300f drm/amdgpu/gfx11: remove superfluous cache flags
If any INV flags are needed, they should be executed via ACQUIRE_MEM
before INDIRECT_BUFFER.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:46 -04:00
Marek Olšák
11317d2963 drm/amdgpu: check for LINEAR_ALIGNED correctly in check_tiling_flags_gfx6
Fix incorrect check.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:36 -04:00
Mario Limonciello
15eb8573ad drm/amd: Don't initialize ISP hardware without FW
Although designs may contain an ISP IP block, the camera might be a USB
camera. Because of this the ISP firmware is considered optional from
amdgpu.  However if the firmware doesn't get loaded the hardware should
not be initialized.

Adjust the return code for early init to ensure the IP block doesn't go
through the other init and fini sequences. Also decrease the message
about firmware load failure to debug so it's not as alarming to users.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Yang Wang
71fe449484 drm/amdgpu: refine isp firmware loading
refine isp firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi
75be61aa77 drm/amd/amdgpu: Enable MMHUB prefetch for ISP v4.1.0 and 4.1.1
Remove temporary WA to disable ISP prefetch as MMHUB SAW is initialized
to support ISP HW access GART memory using the TLSi path with prefetch
enabled.

Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi
7c2d3112b2 drm/amd/amdgpu: Fix 'snprintf' output truncation warning
snprintf can truncate the output fw filename if the isp ucode_prefix
exceeds 29 characters. Knowing ISP ucode_prefix is in the format
isp_x_x_x, limiting the size of ucode_prefix[] to 10 characters
to fix the warning.

Fixes the below warning:

   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c: In function 'isp_early_init':
   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:192:58: warning: 'snprintf'
   output may be truncated before the last format character
   [-Wformat-truncation=]
     192 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix);
         |                                                          ^
   In function 'isp_load_fw_by_psp',
       inlined from 'isp_early_init' at drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:218:8:
   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:192:9: note: 'snprintf' output between 12 and 41 bytes into a destination of size 40
     192 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi
062666ffbc drm/amd/amdgpu: Disable MMHUB prefetch for ISP v4.1.1
Disable MMHUB prefetch for ISP v4.1.1 as a temporary WA until
the GART supports MMHUB TLSi and SAW for ISP HW to access
GART memory using the TLSi path.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi
05bafe95e5 drm/amd/amdgpu: Add ISP4.1.0 and ISP4.1.1 modules
Add independent IP centric modules for ISP4.1.0 and ISP4.1.1 hw blocks.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi
0253d718a0 drm/amd/amdgpu: Map ISP interrupts as generic IRQs
Map ISP IH interrupts to Linux generic IRQ for ISP driver to
handle the interrupts using MFD IORESOURCE_IRQ resource.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Alex Deucher
8930b90be6 drm/amdgpu: fix Kconfig for ISP v2
Add new config option and set proper dependencies for ISP.

v2: add missed guards, drop separate Kconfig

Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Pratap Nirujogi <pratap.nirujogi@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi
d232584ae3 drm/amd/amdgpu: Enable ISP in amdgpu_discovery
Enable ISP for ISP V4.1.0 and V4.1.1 in amdgpu_discovery.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:39 -04:00
Pratap Nirujogi
8fcbfd53ea drm/amd/amdgpu: Add ISP driver support
Add the isp driver in amdgpu to support ISP device on the APUs that
supports ISP IP block. ISP hw block is used for camera front-end, pre
and post processing operations.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:39 -04:00
Pratap Nirujogi
772e4d56da drm/amd/amdgpu: Add ISP support to amdgpu_discovery
ISP hw block is supported in some of the AMD GPU versions, add support
to discover ISP IP in amdgpu_discovery.

v2: squash in documentation update (Alex)

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:39 -04:00
Sonny Jiang
d4b8386c86 drm/amdgpu/jpeg5: Add support for DPG mode
Add DPG support for JPEG 5.0

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:33 -04:00
Frank Min
291af3f598 drm/amdgpu: tolerate allocating GTT bo with dcc flag
Do not return failure for allocating GTT bo with dcc flag on gfx12.
This will improve compatibility for UMD.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:17 -04:00
Jane Jian
b17eecc08f drm/amdgpu: normalize registers as local xcc to read/write in gfx_v9_4_3
[WHY]
sriov has the higher bit violation when flushing tlb

[HOW]
normalize the registers to keep lower 16-bit(dword aligned) to aviod higher bit violation
RLCG will mask xcd out and always assume it's accessing its own xcd

v2
add check in wait mem that only do the normalization on regspace

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Yiqing Yao <YiQing.Yao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:33:27 -04:00
YiPeng Chai
f852c9795c drm/amdgpu: add gpu reset check and exception handling
Add gpu reset check and exception handling for
page retirement.

v2:
  Clear poison consumption messages cached in fifo after
non mode-1 reset.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:32:12 -04:00
YiPeng Chai
e278849cb2 drm/amdgpu: refine poison consumption interrupt handler
1. The poison fifo is only used for poison consumption
   requests.
2. Merge reset requests when poison fifo caches multiple
   poison consumption messages

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:32:06 -04:00
YiPeng Chai
5f08275cfd drm/amdgpu: refine poison creation interrupt handler
In order to apply to the case where a large number
of ras poison interrupts:
1. Change to use variable to record poison creation
   requests to avoid fifo full.
2. Prioritize handling poison creation requests
   instead of following the order of requests
   received by the driver.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:31:43 -04:00
Vignesh Chander
cbda2758d8 drm/amdgpu: process RAS fatal error MB notification
For RAS error scenario, VF guest driver will check mailbox
and set fed flag to avoid unnecessary HW accesses.
additionally, poll for reset completion message first
to avoid accidentally spamming multiple reset requests to host.

v2: add another mailbox check for handling case where kfd detects
timeout first

v3: set host_flr bit and use wait_for_reset

Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:31:37 -04:00
YiPeng Chai
78146c1dcd drm/amdgpu: add variable to record the deferred error number read by driver
Add variable to record the deferred error
number read by driver.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:31:20 -04:00
Vignesh Chander
29b6985de5 drm/amdgpu: Use dev_ prints for virtualization as it supports multi adapter
So we can get clearer per device logging.

Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:30:39 -04:00
Danijel Slivka
afbf7955ff drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts
Why:
Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit
if RB_ENABLE is not set.

How to fix:
Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set.
The RB_ENABLE bit is required to be set, together with
WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit
would clear the RB_OVERFLOW.

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:30:27 -04:00
Kenneth Feng
bf826ba9b4 Revert "drm/amd/amdgpu: add module parameter for jpeg"
This reverts commit d3620eeae8.
Revert this due to a final solution:
commit ed3165d660 ("drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playback")

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Sonny Jiang <sonjiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:40 -04:00
Lijo Lazar
fe86c4d1a2 drm/amdgpu: Don't show false warning for reg list
If reg list is already loaded on PSP 13.0.2 SOCs, psp will give
TEE_ERR_CANCEL response on second time load. Avoid printing warn
message for it.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:39 -04:00
Julia Zhang
79ea35c7d8 drm/amdgpu: avoid using null object of framebuffer
Instead of using state->fb->obj[0] directly, get object from framebuffer
by calling drm_gem_fb_get_obj() and return error code when object is
null to avoid using null object of framebuffer.

Reported-by: Fusheng Huang <fusheng.huang@ecarxgroup.com>
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:39 -04:00
Hawking Zhang
bdbdc7cecd drm/amdgpu: Fix smatch static checker warning
adev->gfx.imu.funcs could be NULL

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:39 -04:00
Bob Zhou
9ff2e14cf0 drm/amdgpu: add missing error handling in function amdgpu_gmc_flush_gpu_tlb_pasid
Fix the unchecked return value warning reported by Coverity,
so add error handling.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:39 -04:00
Hawking Zhang
9da0f77367 drm/amdgpu: Fix register access violation
fault_status is read only register. fault_cntl
is not accessible from guest environment.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:35 -04:00
Frank Min
748bd8ebae drm/amdgpu: access ltr through pci cfg space
Access ltr through pci cfg space instead of mmio while programing
aspm on gfx12

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:35 -04:00
Yang Wang
6bab222b8b drm/amdgpu: refine gfx12 firmware loading
refine gfx12 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:35 -04:00
Frank Min
541fe90ee6 drm/amdgpu: update MTYPE mapping for gfx12
gfx12 only support MTYPE UC and NC, so update it accordingly.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:47 -04:00
Pierre-Eric Pelloux-Prayer
c71c9aafd5 amdgpu: don't dereference a NULL resource in sysfs code
dma_resv_trylock being successful doesn't guarantee that bo->tbo.base.resv
is not NULL, so check its validity before using it.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:46 -04:00
Lijo Lazar
e779af8e8b drm/amdgpu: Fix pci state save during mode-1 reset
Cache the PCI state before bus master is disabled. The saved state is
later used for other cases like restoring config space after mode-2
reset.

Fixes: 5c03e5843e ("drm/amdgpu:add smu mode1/2 support for aldebaran")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:46 -04:00
Yang Wang
dab70d9f65 drm/amdgpu: refine gfx11 firmware loading
refine gfx11 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:46 -04:00
Sonny Jiang
ed3165d660 drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playback
Doorbell needs to be configured after power up during each playback

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:41 -04:00
Alex Deucher
83edf00d89 drm/amdgpu/atomfirmware: fix parsing of vram_info
v3.x changed the how vram width was encoded.  The previous
implementation actually worked correctly for most boards.
Fix the implementation to work correctly everywhere.

This fixes the vram width reported in the kernel log on
some boards.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:41 -04:00
Dave Airlie
365aa9f573 amd-drm-next-6.11-2024-06-22:
amdgpu:
 - HPD fixes
 - PSR fixes
 - DCC updates
 - DCN 4.0.1 fixes
 - FAMS fixes
 - Misc code cleanups
 - SR-IOV fixes
 - GPUVM TLB flush cleanups
 - Make VCN less verbose
 - ACPI backlight fixes
 - MES fixes
 - Firmware loading cleanups
 - Replay fixes
 - LTTPR fixes
 - Trap handler fixes
 - Cursor and overlay fixes
 - Primary plane zpos fixes
 - DML 2.1 fixes
 - RAS updates
 - USB4 fixes
 - MALL fixes
 - Reserved VMID fix
 - Silence UBSAN warnings
 
 amdkfd:
 - Misc code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZnbsMgAKCRC93/aFa7yZ
 2GTUAQDGl46/pxpFRmZ8n3Hy6OCnmvoFqI9Re1uAc2RbQNNOAgEAi72PwS2iquU/
 69ectl+oi8P/yNMxP4rO1KgOP3AMsg8=
 =bvKZ
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-06-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-06-22:

amdgpu:
- HPD fixes
- PSR fixes
- DCC updates
- DCN 4.0.1 fixes
- FAMS fixes
- Misc code cleanups
- SR-IOV fixes
- GPUVM TLB flush cleanups
- Make VCN less verbose
- ACPI backlight fixes
- MES fixes
- Firmware loading cleanups
- Replay fixes
- LTTPR fixes
- Trap handler fixes
- Cursor and overlay fixes
- Primary plane zpos fixes
- DML 2.1 fixes
- RAS updates
- USB4 fixes
- MALL fixes
- Reserved VMID fix
- Silence UBSAN warnings

amdkfd:
- Misc code cleanups

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240622152523.2267072-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-06-27 17:18:49 +10:00
Lijo Lazar
48880f9686 drm/amdgpu: Don't show false warning for reg list
If reg list is already loaded on PSP 13.0.2 SOCs, psp will give
TEE_ERR_CANCEL response on second time load. Avoid printing warn
message for it.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-25 14:22:56 -04:00
Julia Zhang
bcfa48ff78 drm/amdgpu: avoid using null object of framebuffer
Instead of using state->fb->obj[0] directly, get object from framebuffer
by calling drm_gem_fb_get_obj() and return error code when object is
null to avoid using null object of framebuffer.

Reported-by: Fusheng Huang <fusheng.huang@ecarxgroup.com>
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-06-25 14:22:08 -04:00
Lijo Lazar
74fa02c4a5 drm/amdgpu: Fix pci state save during mode-1 reset
Cache the PCI state before bus master is disabled. The saved state is
later used for other cases like restoring config space after mode-2
reset.

Fixes: 5c03e5843e ("drm/amdgpu:add smu mode1/2 support for aldebaran")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-25 14:13:12 -04:00
Alex Deucher
f6f49dda49 drm/amdgpu/atomfirmware: fix parsing of vram_info
v3.x changed the how vram width was encoded.  The previous
implementation actually worked correctly for most boards.
Fix the implementation to work correctly everywhere.

This fixes the vram width reported in the kernel log on
some boards.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-06-25 14:11:46 -04:00
Dave Airlie
f680df51ca drm-misc-next for 6.11:
UAPI Changes:
   - Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION
 
 Core Changes:
   - connector: Create a set of helpers to help with HDMI support
   - fbdev: Create memory manager optimized fbdev emulation
   - panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer
 
 Driver Changes:
   - Remove driver owner assignments
   - Allow more drivers to compile with COMPILE_TEST
   - Conversions to drm_edid
   - ivpu: hardware scheduler support, profiling support, improvements
     to the platform support layer
   - mgag200: general reworks and improvements
   - nouveau: Add NVreg_RegistryDwords command line option
   - rockchip: Conversion to the hdmi helpers
   - sun4i: Conversion to the hdmi helpers
   - vc4: Conversion to the hdmi helpers
   - v3d: Perf counters improvements
   - zynqmp: IRQ and debugfs improvements
   - bridge:
     - Remove redundant checks on bridge->encoder
   - panels:
     - Switch panels from register table initialization to proper code
     - Now that the panel code tracks the panel state, remove every
       ad-hoc implementation in the panel drivers
     - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
       13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE
       nv110wum-l60, IVO t109nw41
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZlhUKAAKCRAnX84Zoj2+
 dgHoAYDTpShgXFXnlnMtqZr+ZuShcjcwiqzwM4qNWdtyji9MONtJJU3ZQnGlnXbI
 ZU+oZP0Bf0PyT0/8bf+rmZBJ1UdAxt2IQaLkP1tTHOad4E+KlcL5n1opzMi160mB
 EZSm9f7aNw==
 =bZPt
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2024-05-30' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for 6.11:

UAPI Changes:
  - Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION

Core Changes:
  - connector: Create a set of helpers to help with HDMI support
  - fbdev: Create memory manager optimized fbdev emulation
  - panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer

Driver Changes:
  - Remove driver owner assignments
  - Allow more drivers to compile with COMPILE_TEST
  - Conversions to drm_edid
  - ivpu: hardware scheduler support, profiling support, improvements
    to the platform support layer
  - mgag200: general reworks and improvements
  - nouveau: Add NVreg_RegistryDwords command line option
  - rockchip: Conversion to the hdmi helpers
  - sun4i: Conversion to the hdmi helpers
  - vc4: Conversion to the hdmi helpers
  - v3d: Perf counters improvements
  - zynqmp: IRQ and debugfs improvements
  - bridge:
    - Remove redundant checks on bridge->encoder
  - panels:
    - Switch panels from register table initialization to proper code
    - Now that the panel code tracks the panel state, remove every
      ad-hoc implementation in the panel drivers
    - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
      13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE
      nv110wum-l60, IVO t109nw41

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240530-hilarious-flat-magpie-5fa186@houat
2024-06-21 10:30:31 +10:00
Likun Gao
ed5a4484f0 drm/amdgpu: init TA fw for psp v14
Add support to init TA firmware for psp v14.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 18:25:58 -04:00
Christian König
e356d321d0 drm/amdgpu: cleanup MES11 command submission
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.

Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.

While at it cleanup the coding style.

Fixes: eef016ba89 ("drm/amdgpu/mes11: Use a separate fence per transaction")
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 18:25:58 -04:00
Christian König
8bd82363e2 drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2
This reverts commit b8c415e3bf ("drm/amdgpu: take runtime pm reference
when we attach a buffer") and commit 425285d39a ("drm/amdgpu: add amdgpu
runpm usage trace for separate funcs").

Taking a runtime pm reference for DMA-buf is actually completely
unnecessary and even dangerous.

The problem is that calling pm_runtime_get_sync() from the DMA-buf
callbacks is illegal because we have the reservation locked here
which is also taken during resume. So this would deadlock.

When the buffer is in GTT it is still accessible even when the GPU
is powered down and when it is in VRAM the buffer gets migrated to
GTT before powering down.

The only use case which would make it mandatory to keep the runtime
pm reference would be if we pin the buffer into VRAM, and that's not
something we currently do.

v2: improve the commit message

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-06-19 14:17:25 -04:00
Harish Kasiviswanathan
49c9ffabde drm/amdgpu: Indicate CU havest info to CP
To achieve full occupancy CP hardware needs to know if CUs in SE are
symmetrically or asymmetrically harvested

v2: Reset is_symmetric_cus for each loop

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 14:17:25 -04:00
Yunxiang Li
84801d4f1e drm/amdgpu: fix locking scope when flushing tlb
Which method is used to flush tlb does not depend on whether a reset is
in progress or not. We should skip flush altogether if the GPU will get
reset. So put both path under reset_domain read lock.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-06-19 14:17:25 -04:00
Likun Gao
1ecef55893 drm/amdgpu: init TA fw for psp v14
Add support to init TA firmware for psp v14.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:52:43 -04:00
Yang Wang
017d0b67bf drm/amdgpu: refine gfx6 firmware loading
refine gfx6 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:52:36 -04:00
Yang Wang
a4fcb5f733 Revert "drm/amdgpu: change aca bank error lock type to spinlock"
This reverts commit f6bce954f4.

Revert this patch to modify lock type back to 'mutex' to avoid kernel
calltrace issue.

[  602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu]
[  602.668939] Call Trace:
[  602.668940]  <TASK>
[  602.668941]  dump_stack_lvl+0x4c/0x70
[  602.668945]  dump_stack+0x14/0x20
[  602.668946]  __schedule_bug+0x5a/0x70
[  602.668950]  __schedule+0x940/0xb30
[  602.668952]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668955]  ? hrtimer_reprogram+0x77/0xb0
[  602.668957]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668959]  ? hrtimer_start_range_ns+0x126/0x370
[  602.668961]  schedule+0x39/0xe0
[  602.668962]  schedule_hrtimeout_range_clock+0xb1/0x140
[  602.668964]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  602.668966]  schedule_hrtimeout_range+0x17/0x20
[  602.668967]  usleep_range_state+0x69/0x90
[  602.668970]  psp_cmd_submit_buf+0x132/0x570 [amdgpu]
[  602.669066]  psp_ras_invoke+0x75/0x1a0 [amdgpu]
[  602.669156]  psp_ras_query_address+0x9c/0x120 [amdgpu]
[  602.669245]  umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu]
[  602.669337]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669339]  ? stack_depot_save+0x12/0x20
[  602.669342]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669343]  ? set_track_prepare+0x52/0x70
[  602.669346]  ? kmemleak_alloc+0x4f/0x90
[  602.669348]  ? __kmalloc_node+0x34b/0x450
[  602.669352]  amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu]
[  602.669438]  mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu]
[  602.669554]  mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu]
[  602.669655]  amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu]
[  602.669743]  ? kmemleak_free+0x36/0x60
[  602.669745]  ? kvfree+0x32/0x40
[  602.669747]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669749]  ? kfree+0x15d/0x2a0
[  602.669752]  amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu]
[  602.669839]  amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu]
[  602.669924]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669925]  ? __call_rcu_common.constprop.0+0xa6/0x2b0
[  602.669929]  amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu]
[  602.670014]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.670017]  amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu]
[  602.670103]  amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu]
[  602.670187]  ? srso_alias_return_thunk+0x5/0

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: YiPeng Chai <yipeng.chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:51:23 -04:00
Yang Wang
8c9ee18019 Revert "drm/amdgpu: change bank cache lock type to spinlock"
This reverts commit 258ed689bc

revert this patch to modify lock type back to 'mutex' to avoid kernel
calltrace issue.

[  602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu]
[  602.668939] Call Trace:
[  602.668940]  <TASK>
[  602.668941]  dump_stack_lvl+0x4c/0x70
[  602.668945]  dump_stack+0x14/0x20
[  602.668946]  __schedule_bug+0x5a/0x70
[  602.668950]  __schedule+0x940/0xb30
[  602.668952]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668955]  ? hrtimer_reprogram+0x77/0xb0
[  602.668957]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668959]  ? hrtimer_start_range_ns+0x126/0x370
[  602.668961]  schedule+0x39/0xe0
[  602.668962]  schedule_hrtimeout_range_clock+0xb1/0x140
[  602.668964]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  602.668966]  schedule_hrtimeout_range+0x17/0x20
[  602.668967]  usleep_range_state+0x69/0x90
[  602.668970]  psp_cmd_submit_buf+0x132/0x570 [amdgpu]
[  602.669066]  psp_ras_invoke+0x75/0x1a0 [amdgpu]
[  602.669156]  psp_ras_query_address+0x9c/0x120 [amdgpu]
[  602.669245]  umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu]
[  602.669337]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669339]  ? stack_depot_save+0x12/0x20
[  602.669342]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669343]  ? set_track_prepare+0x52/0x70
[  602.669346]  ? kmemleak_alloc+0x4f/0x90
[  602.669348]  ? __kmalloc_node+0x34b/0x450
[  602.669352]  amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu]
[  602.669438]  mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu]
[  602.669554]  mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu]
[  602.669655]  amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu]
[  602.669743]  ? kmemleak_free+0x36/0x60
[  602.669745]  ? kvfree+0x32/0x40
[  602.669747]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669749]  ? kfree+0x15d/0x2a0
[  602.669752]  amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu]
[  602.669839]  amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu]
[  602.669924]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669925]  ? __call_rcu_common.constprop.0+0xa6/0x2b0
[  602.669929]  amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu]
[  602.670014]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.670017]  amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu]
[  602.670103]  amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu]
[  602.670187]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.670189]  ? __schedule+0x37d/0xb30
[  602.670191]  process_one_work+0x176/0x350
[  602.670194]  worker_thread+0x2f7/0x420
[  602.670197]  ?

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:50:31 -04:00
Alex Deucher
19797687e6 drm/amdgpu: remove amdgpu_mes_fence_wait_polling()
No longer used so remove it.

Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:50:24 -04:00
Alex Deucher
fffe347e14 drm/amdgpu: cleanup MES12 command submission
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.

Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.

While at it cleanup the coding style.

Fixes: ade887c633 ("drm/amdgpu/mes12: Use a separate fence per transaction")
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:49:59 -04:00
Yang Wang
3af2c80ae2 drm/amdgpu: refine gfx10 firmware loading
refine gfx10 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:49:53 -04:00
Yang Wang
23fc94795b drm/amdgpu: refine gfx9 firmware loading
refine gfx9 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:49:28 -04:00
Christian König
de32462541 drm/amdgpu: cleanup MES11 command submission
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.

Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.

While at it cleanup the coding style.

Fixes: eef016ba89 ("drm/amdgpu/mes11: Use a separate fence per transaction")
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:48:25 -04:00
Christian König
a6328c9c3d drm/amdgpu: fix using the reserved VMID with gang submit
We need to ensure that even when using a reserved VMID that the gang
members can still run in parallel.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:48:00 -04:00
Victor Lu
b32563859d drm/amdgpu: Do not wait for MP0_C2PMSG_33 IFWI init in SRIOV
SRIOV does not need to wait for IFWI init, and MP0_C2PMSG_33 is blocked
for VF access.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:47:52 -04:00
Mukul Joshi
4d14a74054 Revert "drm/amdgpu: Add missing locking for MES API calls"
This reverts commit 3612702852.

This is causing a BUG message during suspend.

[   61.603542] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283
[   61.603550] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2028, name: kworker/u64:14
[   61.603553] preempt_count: 1, expected: 0
[   61.603555] RCU nest depth: 0, expected: 0
[   61.603557] Preemption disabled at:
[   61.603559] [<ffffffffc08a3261>] amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu]
[   61.603789] CPU: 9 PID: 2028 Comm: kworker/u64:14 Tainted: G        W          6.8.0+ #7
[   61.603795] Workqueue: events_unbound async_run_entry_fn
[   61.603801] Call Trace:
[   61.603803]  <TASK>
[   61.603806]  dump_stack_lvl+0x37/0x50
[   61.603811]  ? amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu]
[   61.604007]  dump_stack+0x10/0x20
[   61.604010]  __might_resched+0x16f/0x1d0
[   61.604016]  __might_sleep+0x43/0x70
[   61.604020]  mutex_lock+0x1f/0x60
[   61.604024]  amdgpu_mes_unmap_legacy_queue+0x6d/0x100 [amdgpu]
[   61.604226]  gfx11_kiq_unmap_queues+0x3dc/0x430 [amdgpu]
[   61.604422]  ? srso_alias_return_thunk+0x5/0xfbef5
[   61.604429]  amdgpu_gfx_disable_kgq+0x122/0x160 [amdgpu]
[   61.604621]  gfx_v11_0_hw_fini+0xda/0x100 [amdgpu]
[   61.604814]  gfx_v11_0_suspend+0xe/0x20 [amdgpu]
[   61.605008]  amdgpu_device_ip_suspend_phase2+0x135/0x1d0 [amdgpu]
[   61.605175]  amdgpu_device_suspend+0xec/0x180 [amdgpu]

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:46:39 -04:00
Yang Wang
3a3be8bb97 drm/amdgpu: refine gfx8 firmware loading
refine gfx8 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:18:27 -04:00
Tao Zhou
09a3d8202d drm/amdgpu: set RAS fed status for more cases
Indicate fatal error for each RAS block and NBIO.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:18:26 -04:00
Tao Zhou
7e4371676e drm/amdgpu: create amdgpu_ras_in_recovery to simplify code
Reduce redundant code and user doesn't need to pay attention to RAS
details.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:18:26 -04:00
Tao Zhou
5f7697bbc1 drm/amdgpu: trigger mode1 reset for RAS RMA status
Check RMA status in bad page retirement flow.

v2: fix coding bugs in v1.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:18:26 -04:00
Yang Wang
9d26e0cfc2 drm/amdgpu: refine gfx7 firmware loading
refine gfx7 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:15 -04:00
Christian König
030631e97b drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2
This reverts commit b8c415e3bf ("drm/amdgpu: take runtime pm reference
when we attach a buffer") and commit 425285d39a ("drm/amdgpu: add amdgpu
runpm usage trace for separate funcs").

Taking a runtime pm reference for DMA-buf is actually completely
unnecessary and even dangerous.

The problem is that calling pm_runtime_get_sync() from the DMA-buf
callbacks is illegal because we have the reservation locked here
which is also taken during resume. So this would deadlock.

When the buffer is in GTT it is still accessible even when the GPU
is powered down and when it is in VRAM the buffer gets migrated to
GTT before powering down.

The only use case which would make it mandatory to keep the runtime
pm reference would be if we pin the buffer into VRAM, and that's not
something we currently do.

v2: improve the commit message

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-06-14 16:17:13 -04:00
Yang Wang
c37b8f7868 drm/amdgpu: refine imu firmware loading
refine imu firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang
cd093c24ee drm/amdgpu: refine gmc firmware loading
refine gmc firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang
8d7ff60f36 drm/amdgpu: refine vpe firmware loading
refine vpe firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang
b441e9ac9d drm/amdgpu: refine vcn firmware loading
refine vcn firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang
9817f06173 drm/amdgpu: move aca/mca init functions into ras_init() stage
adjust the function position to better match aca/mca fini code in ras_fini().

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Bob Zhou
be6a69b21a drm/amdgpu: fix overflowed constant warning in mmhub_set_clockgating()
To fix potential overflowed constant warning, modify the variables to u32
for getting the return value of RREG32_SOC15().

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Harish Kasiviswanathan
199d69d5f9 drm/amdgpu: Indicate CU havest info to CP
To achieve full occupancy CP hardware needs to know if CUs in SE are
symmetrically or asymmetrically harvested

v2: Reset is_symmetric_cus for each loop

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Lijo Lazar
3a86fdc422 drm/amdgpu: Skip coredump during resets for debug
Skip scheduling coredump when gpu reset is intentionally triggered
through debugfs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang
3618fa26c8 drm/amdgpu: refine sdma firmware loading
refine sdma firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang
8cae4b578e drm/amdgpu: refine psp firmware loading
refine psp firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:11 -04:00
Yang Wang
bf349b036d drm/amdgpu: refine mes firmware loading
v1:
refine mes firmware loading

v2:
use dev_info instead of DRM_INFO

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:11 -04:00
Mukul Joshi
3612702852 drm/amdgpu: Add missing locking for MES API calls
Add missing locking at a few places when calling MES APIs to ensure
exclusive access to MES queue.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:11 -04:00
Mario Limonciello
2fe87f54ab drm/amd/display: Set default brightness according to ACPI
Currently, amdgpu will always set up the brightness at 100% when it
loads.  However this is jarring when the BIOS has it previously
programmed to a much lower value.

The ACPI ATIF method includes two members for "ac_level" and "dc_level".
These represent the default values that should be used if the system is
brought up in AC and DC respectively.

Use these values to set up the default brightness when the backlight
device is registered.

v2: squash in ACPI fix

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:11 -04:00
David (Ming Qiang) Wu
ee3942d9ab drm/amdgpu: drop some kernel messages in VCN code
Similar to commit 813e7d4cd0 where some kernel log
messages are dropped. With this commit, more log
messages in older version of VCN/JPEG code are dropped.

Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:16:51 -04:00
Yang Wang
a777c9d70a drm/amdgpu: refine gpu_info firmware loading
refine gpu_info firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yang Wang
1bfe5e7746 drm/amdgpu: enhance amdgpu_ucode_request() function flexibility
v1:
Adding formatting string feature to improve function flexibility.

v2:
modify macro name to ADMGPU_UCODE_MAX_NAME.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Bob Zhou
37f432481d drm/amdgpu: fix the overflowed constant warning for RREG32_SOC15()
To fix potential overflowed constant warning reported by Coverity,
modify the variables to uint32_t.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yunxiang Li
18f2525d31 drm/amdgpu: add lock in amdgpu_gart_invalidate_tlb
We need to take the reset domain lock before flush hdp. We can't put the
lock inside amdgpu_device_flush_hdp itself because it is used during
reset where we already take the write side lock.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yunxiang Li
9c33e5fd4f drm/amdgpu: fix locking scope when flushing tlb
Which method is used to flush tlb does not depend on whether a reset is
in progress or not. We should skip flush altogether if the GPU will get
reset. So put both path under reset_domain read lock.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-06-14 16:15:59 -04:00
Yunxiang Li
ba531117a8 drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable
Here since we are in reset and takes the reset_domain write side lock
already. We can't use the flush tlb helper which tries to take the read
side.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yunxiang Li
c1f9d82b92 drm/amdgpu: use helper in amdgpu_gart_unbind
When amdgpu_gart_invalidate_tlb helper is introduced this part was left
out of the conversion. Avoid the code duplication here.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yunxiang Li
4b0e76e4c1 drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover
At this point the gart is not set up, there's no point to invalidate tlb
here and it could even be harmful.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Yunxiang Li
5c0a1cdd17 drm/amdgpu: fix sriov host flr handler
We send back the ready to reset message before we stop anything. This is
wrong. Move it to when we are actually ready for the FLR to happen.

In the current state since we take tens of seconds to stop everything,
it is very likely that host would give up waiting and reset the GPU
before we send ready, so it would be the same as before. But this gets
rid of the hack with reset_domain locking and also let us tell how slow
ready to reset actually is from the host. The ready to reset speed can
be improved later.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Yunxiang Li
b3948ad1ac drm/amdgpu: add skip_hw_access checks for sriov
Accessing registers via host is missing the check for skip_hw_access and
the lockdep check that comes with it.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Eric Huang
bac640ddb5 drm/amdgpu: add reset source in various cases
To fullfill the reset event description.

Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Eric Huang
7bed1df814 drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc
amdgpu_job_ring may return NULL, which causes kernel NULL
pointer error, using another way to print ring name instead
of ring->name.

Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Jesse Zhang
27b500b77b drm/amdgpu: remove dead code in atom_get_src_int
Since the range of align is 0~7, the expression is: align = (attr >> 3) & 7.
In the case of ATOM_ARG_IMM, the code cannot reach the default case.
So there is no need for "break".

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 15:34:10 -04:00
Frank Min
faa64f633c drm/amdgpu: add sdma 7.0 support for copy dcc buffer
1. Add dcc buffer flag for copy buffer
2. Add sdma 7.0 support copy dcc buffer

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 15:22:14 -04:00
Likun Gao
7c85e97083 drm/amdgpu: support for DCC feature
Deal with AMDGPU_GEM_CREATE_GFX12_DCC to set DCC bit
when needed.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 15:21:51 -04:00
Alex Deucher
6b83b94a94 drm/amdgpu: add additional VM bits
Add additional VM PTE bits.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 15:20:56 -04:00
Maxime Ripard
14731a640e
Merge drm/drm-fixes into drm-misc-fixes
Roll -rc3 and current drm/fixes in.

This will also unstuck our for-next branch.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-06-14 09:55:46 +02:00
Dave Airlie
1ddaaa2440 amd-drm-next-6.11-2024-06-07:
amdgpu:
 - DCN 4.0.x support
 - DCN 3.5 updates
 - GC 12.0 support
 - DP MST fixes
 - Cursor fixes
 - MES11 updates
 - MMHUB 4.1 support
 - DML2 Updates
 - DCN 3.1.5 fixes
 - IPS fixes
 - Various code cleanups
 - GMC 12.0 support
 - SDMA 7.0 support
 - SMU 13 updates
 - SR-IOV fixes
 - VCN 5.x fixes
 - MES12 support
 - SMU 14.x updates
 - Devcoredump improvements
 - Fixes for HDP flush on platforms with >4k pages
 - GC 9.4.3 fixes
 - RAS ACA updates
 - Silence UBSAN flex array warnings
 - MMHUB 3.3 updates
 
 amdkfd:
 - Contiguous VRAM allocations
 - GC 12.0 support
 - SDMA 7.0 support
 - SR-IOV fixes
 
 radeon:
 - Backlight workaround for iMac
 - Silence UBSAN flex array warnings
 
 UAPI:
 - GFX12 modifier and DCC support
   Proposed Mesa changes:
   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510
 - KFD GFX ALU exceptions
   Proposed ROCdebugger changes:
   08c760622b
   944fe1c141
 - KFD Contiguous VRAM allocation flag
   Proposed ROCr/HIP changes:
   f7b4a26991
   26e8530d05
   1d48f2a1ab
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZmNlVQAKCRC93/aFa7yZ
 2MLSAP9lflyG//+wC9WWX5OjvqFWO6qkhVm1w55xy6TwN9NkqQEA76TqmcNZ6rk1
 4o9RaYpMJQU275FvK1NvwUbl4PPQYAs=
 =cFxt
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-06-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-06-07:

amdgpu:
- DCN 4.0.x support
- DCN 3.5 updates
- GC 12.0 support
- DP MST fixes
- Cursor fixes
- MES11 updates
- MMHUB 4.1 support
- DML2 Updates
- DCN 3.1.5 fixes
- IPS fixes
- Various code cleanups
- GMC 12.0 support
- SDMA 7.0 support
- SMU 13 updates
- SR-IOV fixes
- VCN 5.x fixes
- MES12 support
- SMU 14.x updates
- Devcoredump improvements
- Fixes for HDP flush on platforms with >4k pages
- GC 9.4.3 fixes
- RAS ACA updates
- Silence UBSAN flex array warnings
- MMHUB 3.3 updates

amdkfd:
- Contiguous VRAM allocations
- GC 12.0 support
- SDMA 7.0 support
- SR-IOV fixes

radeon:
- Backlight workaround for iMac
- Silence UBSAN flex array warnings

UAPI:
- GFX12 modifier and DCC support
  Proposed Mesa changes:
  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510
- KFD GFX ALU exceptions
  Proposed ROCdebugger changes:
  08c760622b
  944fe1c141
- KFD Contiguous VRAM allocation flag
  Proposed ROCr/HIP changes:
  f7b4a26991
  26e8530d05
  1d48f2a1ab

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240607195900.902537-1-alexander.deucher@amd.com
2024-06-11 14:01:55 +10:00
Arunpravin Paneer Selvam
31849bf07e drm/amdgpu: Fix the BO release clear memory warning
This happens when the amdgpu_bo_release_notify running
before amdgpu_ttm_set_buffer_funcs_status set the buffer
funcs to enabled.

check the buffer funcs enablement before calling the fill
buffer memory.

v2:(Christian)
  - Apply it only for GEM buffers and since GEM buffers are only
    allocated/freed while the driver is loaded we never run into
    the issue to clear with buffer funcs disabled.

v3:(Mario)
  - drop the stable tag as this will presumably go into a
    -fixes PR for 6.10

Log snip:
*ERROR* Trying to clear memory with ring turned off.
RIP: 0010:amdgpu_bo_release_notify+0x201/0x220 [amdgpu]

Fixes: a68c7eaa7a ("drm/amdgpu: Enable clear page functionality")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Richard Gong <richard.gong@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240610180401.9540-1-Arunpravin.PaneerSelvam@amd.com
2024-06-10 23:46:33 +05:30
Tao Zhou
b95fa494d6 drm/amdgpu: add RAS is_rma flag
Set the flag to true if bad page number reaches threshold.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:14 -04:00
Lin.Cao
c8ad1bbbc2 drm/amdgpu: fix failure mapping legacy queue when FLR
Flag "mes.ring.shced.ready" will be set as true after mes hw init and set
as false when mes hw fini to avoid duplicate initialization. But hw fini
will not be called when function level reset, which will cause mes hw
init be skipped during FLR, which will leads to mapping legacy queue
fail. Set this flag as false when post reset will fix this issue.

Signed-off-by: Lin.Cao <lincao12@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:14 -04:00
Eric Huang
dbe2c4c8ab drm/amdkfd: add reset cause in gpu pre-reset smi event
reset cause is requested by customer as additional
info for gpu reset smi event.

v2: integerate reset sources suggested by Lijo Lazar

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:14 -04:00
Frank Min
978f5428c9 drm/amdgpu: Set PTE_IS_PTE bit for gfx12
Set PTE_IS_PTE bit while PRT is enabled on gfx12.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:14 -04:00
Eric Huang
2656e1ce78 drm/amdgpu: add reset sources in gpu reset context
reset source or reset cause is very useful info
for reset context, it will be used by events API.

Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:13 -04:00
Shane Xiao
45bd39fb3b drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_VG10
This patch changes the implementation of AMDGPU_PTE_MTYPE_VG10,
clear the bits before setting the new one.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:13 -04:00
Alex Deucher
a9ebd10482 Revert "drm/amdgpu/gfx11: enable gfx pipe1 hardware support"
This reverts commit 6670142d25.

Pierre-Eric reported problems with this on his navi33.  Revert
for now until we understand what is going wrong.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Pierre-eric.Pelloux-prayer@amd.com
2024-06-05 11:03:45 -04:00
Shane Xiao
3928290102 drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_NV10
This patch changes the implementation of AMDGPU_PTE_MTYPE_NV10,
clear the bits before setting the new one.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:43 -04:00
Yifan Zhang
301dfbfc84 drm/amdgpu: disable lane0 L1TLB and enable lane1 L1TLB
This patch to disable lane0 L1TLB and enable lane1 L1TLB.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:37 -04:00
Yifan Zhang
b7e2170b87 drm/amdgpu: init SAW registers for mmhub v3.3
This patch to configure mmhub3.3 SAW registers

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:30 -04:00
Sunil Khatri
a1a049bd59 drm/amdgpu: fix comments and error message for ipdump
Fix comments and error messages to rightly represent
the information.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:09 -04:00
Sunil Khatri
33837d62a4 drm/amdgpu: rename ip_dump_cp_queues to compute queues
Rename the variable ip_dump_cp_queues to ip_dump_compute_queue
as it represent compute queues.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:03 -04:00
Sunil Khatri
34b8d94b6c drm/amdgpu: add cp queue registers for gfx9 ipdump
Add gfx9 support of CP queue registers for all queues
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:01:57 -04:00
Sunil Khatri
173ef9182a drm/amdgpu: add print support for gfx9 ipdump
Add support of gfx9 ipdump print so devcoredump
could trigger it to dump the captured registers
in devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:01:50 -04:00
Sunil Khatri
514dc965b2 drm/amdgpu: add gfx9 register support in ipdump
Add general registers of gfx9 in ipdump for
devcoredump support.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:01:28 -04:00
Srinivasan Shanmugam
745f7170db drm/amdgpu: Fix type mismatch in amdgpu_gfx_kiq_init_ring
This commit fixes a type mismatch in the amdgpu_gfx_kiq_init_ring
function triggered by the snprintf function expecting unsigned char
arguments due to the '%hhu' format specifier, but receiving int and u32
arguments.

The issue occurred because the arguments xcc_id, ring->me, ring->pipe,
and ring->queue were of type int and u32, not unsigned char. This led to
a type mismatch when these arguments were passed to snprintf.

To resolve this, the snprintf line was modified to cast these arguments
to unsigned char. This ensures that the arguments are of the correct
type for the '%hhu' format specifier and resolves the warning.

Fixes the below:
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:4: warning: format
>> specifies type 'unsigned char' but the argument has type 'int'
>> [-Wformat]
                    xcc_id, ring->me, ring->pipe, ring->queue);
                    ^~~~~~
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:12: warning: format
>> specifies type 'unsigned char' but the argument has type 'u32' (aka
>> 'unsigned int') [-Wformat]
                    xcc_id, ring->me, ring->pipe, ring->queue);
                            ^~~~~~~~
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:22: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat]
                    xcc_id, ring->me, ring->pipe, ring->queue);
                                      ^~~~~~~~~~
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:34: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat]
                    xcc_id, ring->me, ring->pipe, ring->queue);
                                                  ^~~~~~~~~~~
   4 warnings generated.

Fixes: 0ea5544555 ("drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405250446.XeaWe66u-lkp@intel.com/
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:58:43 -04:00
Jesse Zhang
c09d2eff81 drm/amdgu: fix Unintentional integer overflow for mall size
Potentially overflowing expression mall_size_per_umc * adev->gmc.num_umc with type unsigned int (32 bits, unsigned)
is evaluated using 32-bit arithmetic,and then used in a context that expects an expression of type u64 (64 bits, unsigned).

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:58:36 -04:00
Hawking Zhang
a474161e84 drm/amdgpu: Update programming for boot error reporting
AMDGPU_RAS_GPU_ERR_BOOT_STATUS field is no longer valid.
The polling sequence is also simplifed according to
the latest firmware change.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:58:26 -04:00
Alex Deucher
7d3b9668e6 drm/amdgpu/soc24: use common nbio callback to set remap offset
This fixes HDP flushes on systems with non-4K pages.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:58:10 -04:00
Hawking Zhang
473af28d3e drm/amdgpu: Estimate RAS reservation when report capacity v2
Add estimate of how much vram we need to reserve for RAS
when caculating the total available vram.

v2: apply the change to MP0 v13_0_2 and v13_0_14

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:57:53 -04:00
Tao Zhou
76bec2a031 drm/amdgpu: use u32 for buf size in __amdgpu_eeprom_xfer
And also make sure the value of msg[1].len should be in the range of u16.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:57:47 -04:00
David (Ming Qiang) Wu
813e7d4cd0 drm/amdgpu: drop some kernel messages in VCN code
We have messages when the VCN fails to initialize and
there is no need to report on success.
Also PSP loading FWs is the default for production.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sonny Jiang <sonjiang@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:57:34 -04:00
Shane Xiao
eba791dc17 drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_GFX12
This patch changes the implementation of AMDGPU_PTE_MTYPE_GFX12,
clear the bits before setting the new one.
This fixed the potential issue that GFX12 setting memory to NC.

v2: Clear mtype field before setting the new one (Alex)
v3: Fix typo (Felix)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:57:24 -04:00
Dave Airlie
bb61cf46b6 amd-drm-fixes-6.10-2024-05-30:
amdgpu:
 - RAS fix
 - Fix colorspace property for MST connectors
 - Fix for PCIe DPM
 - Silence UBSAN warning
 - GPUVM robustness fix
 - Partition fix
 - Drop deprecated I2C_CLASS_SPD
 
 amdkfd:
 - Revert unused changes for certain 11.0.3 devices
 - Simplify APU VRAM handling
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZljezAAKCRC93/aFa7yZ
 2PZpAP43w/Q7aNgEpzdViO6QckU2GtOaRskavnzew4yxd5Y4xAD/U4vEQ3V7Rc0Q
 XRv4c+rJAVXIlfcS4guLvHSTyqzAPQY=
 =WQzB
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.10-2024-05-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.10-2024-05-30:

amdgpu:
- RAS fix
- Fix colorspace property for MST connectors
- Fix for PCIe DPM
- Silence UBSAN warning
- GPUVM robustness fix
- Partition fix
- Drop deprecated I2C_CLASS_SPD

amdkfd:
- Revert unused changes for certain 11.0.3 devices
- Simplify APU VRAM handling

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240530202316.2246826-1-alexander.deucher@amd.com
2024-05-31 08:38:15 +10:00
Rajneesh Bhardwaj
a9bc5a19e4 drm/amdgpu: Make CPX mode auto default in NPS4
On GFXIP9.4.3, make CPX mode as the default compute mode if the node is
setup in NPS4 memory partition mode. This change is only applicable for
dGPU, for APU, continue to use TPX mode.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:06:48 -04:00
Alex Deucher
1f327dfc84 drm/amdkfd: simplify APU VRAM handling
With commit 89773b8559
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified.  Since AMD_IS_APU
is set for both big and small APUs, we can simplify the checks in
the code.

v2: clean up a few more places (Lang)

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:06:15 -04:00
Jesse Zhang
a0cf36546c drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent.
To make the code more robust, check the pointer parent.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:03:20 -04:00
Alex Deucher
ba46b3bda2 drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()
Use current speed/width on devices which don't support
dynamic PCIe switching.

Fixes: 466a7d1153 ("drm/amd: Use the first non-dGPU PCI device for BW limits")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:01:49 -04:00
Alex Deucher
6670142d25 drm/amdgpu/gfx11: enable gfx pipe1 hardware support
Enable gfx pipe1 hardware support.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Rajneesh Bhardwaj
ef5c0f897e drm/amdgpu: Make CPX mode auto default in NPS4
On GFXIP9.4.3, make CPX mode as the default compute mode if the node is
setup in NPS4 memory partition mode. This change is only applicable for
dGPU, for APU, continue to use TPX mode.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Alex Deucher
2e216b1e6b drm/amdgpu/gfx11: handle priority setup for gfx pipe1
Set up pipe1 as a high priority queue.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Alex Deucher
eab57bf22f drm/amdgpu/gfx11: select HDP ref/mask according to gfx ring pipe
Use correct ref/mask for differnent gfx ring pipe. Ported from
ZhenGuo's patch for gfx10.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Victor Skvortsov
e864180ee4 drm/amdgpu: Add lock around VF RLCG interface
flush_gpu_tlb may be called from another thread while
device_gpu_recover is running.

Both of these threads access registers through the VF
RLCG interface during VF Full Access. Add a lock around this interface
to prevent race conditions between these threads.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Alex Deucher
7978c4d414 drm/amdkfd: simplify APU VRAM handling
With commit 89773b8559
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified.  Since AMD_IS_APU
is set for both big and small APUs, we can simplify the checks in
the code.

v2: clean up a few more places (Lang)

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Yang Wang
3c603b1fa8 drm/amdgpu: fix typo in amdgpu_ras_aca_sysfs_read() function
fix typo "info.ue_count" in amdgpu_ras_aca_sysfs_read() function.

Fixes: 865d339763 ("drm/amdgpu: add aca deferred error type support")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:09:48 -04:00
Jesse Zhang
511a623fb4 drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent.
To make the code more robust, check the pointer parent.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:09:41 -04:00
David (Ming Qiang) Wu
ab47fa8358 drm/amd/amdgpu: add AMD_PG_SUPPORT_VCN_DPG flag
AMD_PG_SUPPORT_VCN_DPG is needed for secure parts
and should/can be enabled by now.

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Sonny Jiang <sonjiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:09:35 -04:00
Alex Deucher
f2a1fbdd1f drm/amdgpu: drop MES 10.1 support v3
It was an enablement vehicle for MES 11 and was never
productized.  Remove it.

v2: drop additional checks in the GFX10 code.
v3: drop mes_api_def.h

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:09:01 -04:00
Alex Deucher
87dfeb47a5 drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()
Use current speed/width on devices which don't support
dynamic PCIe switching.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:08:44 -04:00
Maxime Ripard
375c4d1583
Merge drm/drm-next into drm-misc-next
Let's start the new release cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-27 11:08:31 +02:00
Linus Torvalds
56fb6f9285 drm fixes for 6.10-rc1
nouveau:
 - fix bo metadata uAPI for vm bind
 
 panthor:
 - Fixes for panthor's heap logical block.
 - Reset on unrecoverable fault
 - Fix VM references.
 - Reset fix.
 
 xlnx:
 - xlnx compile and doc fixes.
 
 amdgpu:
 - Handle vbios table integrated info v2.3
 
 amdkfd:
 - Handle duplicate BOs in reserve_bo_and_cond_vms
 - Handle memory limitations on small APUs
 
 dp/mst:
 - MST null deref fix.
 
 bridge:
 - Don't let next bridge create connector in adv7511 to make probe work.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZQ9rAACgkQDHTzWXnE
 hr7kew//e+OwUJcock4rqpMsXrVwgse4bJ1H7gnHLmN/AQKsXWcaHKixJu+WkkQI
 62gyhOEiSH+J4Je6zi5JbIt9AKgIKyZeJqg/QeSMqUFJzYkPKhrhj0aX4AefuNJB
 dxCI1+UnkNdQ/LM+Dwb4sq757FX0/yPqhzBiytWEJJVHP4Th5SMRFmSs8YsT1Z4F
 bn220QFvPv+KrVV2eKEf6plJqYhs92nT8cVkNw8x9LhnivLRhyh7we3Ew6R2zouV
 Izm1nYx1CtTqaiHo3+UOeqRLzhqpmukzmJyML9zk/qt+s+YMj9yI/ie02EnTPqg6
 NQCqqmoDy1BpUa37N486Q7MFTiS076HMaQpYiDmVz3PIWWPvKZDqcC5vonoh99ra
 b88X4J4w3e747xZP7VDsnAOFsaLgcRah4JNtv8H3rcurbz0FrcV/0g/W/kt+Nenc
 k2m2s2hv56yWXd6B7bd3tfUe5dOwxc+CgiNddM5X3WPenEvnH+DKBOoFojSRl2o7
 LpRL7WhY6fceRUqVrHzfIPLYgdfz/Ho9LYST2jGIdKDnwV0Hw0sNfU9C0KOzm0J6
 rjgGEeMHgVdZPzSeXSmaO82mBRvR7Ke6da7FIazbocledOapsB5qxjhLvkRyULAp
 mlGZPRPJdNVM8slNLxEqrT/ugorIo5owtQdEnJmDJXFdozcLTMY=
 =pexB
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Some fixes for the end of the merge window, mostly amdgpu and panthor,
  with one nouveau uAPI change that fixes a bad decision we made a few
  months back.

  nouveau:
   - fix bo metadata uAPI for vm bind

  panthor:
   - Fixes for panthor's heap logical block.
   - Reset on unrecoverable fault
   - Fix VM references.
   - Reset fix.

  xlnx:
   - xlnx compile and doc fixes.

  amdgpu:
   - Handle vbios table integrated info v2.3

  amdkfd:
   - Handle duplicate BOs in reserve_bo_and_cond_vms
   - Handle memory limitations on small APUs

  dp/mst:
   - MST null deref fix.

  bridge:
   - Don't let next bridge create connector in adv7511 to make probe
     work"

* tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amdgpu/atomfirmware: add intergrated info v2.3 table
  drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
  drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
  drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
  drm/bridge: adv7511: Attach next bridge without creating connector
  drm/buddy: Fix the warn on's during force merge
  drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
  drm/panthor: Call panthor_sched_post_reset() even if the reset failed
  drm/panthor: Reset the FW VM to NULL on unplug
  drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
  drm/panthor: Force an immediate reset on unrecoverable faults
  drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
  drm/panthor: Fix an off-by-one in the heap context retrieval logic
  drm/panthor: Relax the constraints on the tiler chunk size
  drm/panthor: Make sure the tiler initial/max chunks are consistent
  drm/panthor: Fix tiler OOM handling to allow incremental rendering
  drm: xlnx: zynqmp_dpsub: Fix compilation error
  drm: xlnx: zynqmp_dpsub: Fix few function comments
2024-05-24 17:28:02 -07:00
Hawking Zhang
ec58991054 drm/amdgpu: correct hbm field in boot status
hbm filed takes bit 13 and bit 14 in boot status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 18:13:23 -04:00
Sunil Khatri
498906d376 drm/amdgpu: add gfx queue support for gfx11 ipdump
Add support of all the CP GFX queues for gfx11 ipdump
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:14:08 -04:00
Sunil Khatri
368c33ac8a drm/amdgpu: add cp queue registers for gfx11 ipdump
Add gfx11 support of CP queue registers for all queues
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:14:00 -04:00
Sunil Khatri
015a04a59e drm/amdgpu: add print support for gfx11 ipdump
Add support of gfx11 ipdump print so devcoredump
could trigger it to dump the captured registers
in devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:48 -04:00
Sunil Khatri
b5812822d9 drm/amdgpu: add gfx11 registers support in ipdump
Add general registers of gfx11 in ipdump for
devcoredump support.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:41 -04:00
Sunil Khatri
836bc350a5 drm/amdgpu: add more device info to the devcoredump
Adding more device information:
a. PCI info
b. VRAM and GTT info
c. GDC config

Also correct the print layout and section information for
in devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:28 -04:00
Sunil Khatri
29b1fc665c drm/amdgpu: add prints in IP State dump
add prints before and after ip state is dumped.
It avoids user to think of system being
stuck/hung as dump could take some time after a
gpu hang.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:20 -04:00
Sunil Khatri
8444453dce drm/amdgpu: add gfx queue support of gfx10 in ipdump
Add gfx queue register for all instances in devcoredump
for gfx10.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:13 -04:00
Sunil Khatri
0f83227bc8 drm/amdgpu: Add cp queues support fro gfx10 in ipdump
Add support to dump registers of all instances of
cp queue registers of gfx10 to devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:06 -04:00
Sunil Khatri
74feef5667 drm/amdgpu: rename the ip_dump to ip_dump_core
Rename the memory pointer from ip_dump to ip_dump_core
to make it specific to core registers and rest other
registers to be dumped in their respective memories.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:12:58 -04:00
Lijo Lazar
621a4e9efb drm/amdgpu: Add CRC16 selection in config
KFD uses crc16 for gpu_id generation.

Fixes: 3ed181b8ff ("drm/amdkfd: Ensure gpu_id is unique")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405211405.TidTWIBX-lkp@intel.com/
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:52 -04:00
Victor Zhao
e21e0b7824 drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rw
the inst passed to amdgpu_virt_rlcg_reg_rw should be physical instance.
Fix the miss matched code.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:38 -04:00
Jane Jian
f889f9c68b drm/amdgpu - optimize rlc spm cntl
v1
- driver MMIO read the register to check whether write is required
- if write is required, sriov full time to use rlcg, otherwise use KIQ

v2
- include gfx v11 sriov runtime case

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:31 -04:00
Hawking Zhang
cf85764e2b drm/amdgpu: correct hbm field in boot status
hbm filed takes bit 13 and bit 14 in boot status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:12 -04:00
Frank Min
8d7b149675 drm/amdgpu: program device_cntl2 through pci cfg space
device_cntl2 is accessible from pci config space, so program it through pci cfg
space instead of mmio.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:05 -04:00
Li Ma
19f0edd897 drm/amdgpu/atomfirmware: add intergrated info v2.3 table
[Why]
The vram width value is 0.
Because the integratedsysteminfo table in VBIOS has updated to 2.3.

[How]
Driver needs a new intergrated info v2.3 table too.
Then the vram width value will be correct.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:10:48 -04:00
Srinivasan Shanmugam
0ea5544555 drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring
This commit fixes a format truncation issue arosed by the snprintf
function potentially writing more characters into the ring->name buffer
than it can hold, in the amdgpu_gfx_kiq_init_ring function

The issue occurred because the '%d' format specifier could write between
1 and 10 bytes into a region of size between 0 and 8, depending on the
values of xcc_id, ring->me, ring->pipe, and ring->queue. The snprintf
function could output between 12 and 41 bytes into a destination of size
16, leading to potential truncation.

To resolve this, the snprintf line was modified to use the '%hhu' format
specifier for xcc_id, ring->me, ring->pipe, and ring->queue. The '%hhu'
specifier is used for unsigned char variables and ensures that these
values are printed as unsigned decimal integers.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_kiq_init_ring’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:61: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 8 [-Wformat-truncation=]
  332 |         snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d",
      |                                                             ^~
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:50: note: directive argument in the range [0, 2147483647]
  332 |         snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d",
      |                                                  ^~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:9: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 16
  332 |         snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d",
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  333 |                  xcc_id, ring->me, ring->pipe, ring->queue);
      |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 345a36c4f1 ("drm/amdgpu: prefer snprintf over sprintf")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:10:03 -04:00
Jesse Zhang
806e8c5579 drm/amdgpu: fix invadate operation for pg_flags
Since the type of pg_flags is u32, adev->pg_flags >> 16 >> 16 is 0
regardless of the values of its operands.

So removing the operations upper_32_bits and lower_32_bits.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:09:50 -04:00
Jack Xiao
fa10408116 drm/amdgpu/mes12: mes hw_fini fix for mode1 reset
Port mes11 hw_fini to mes12, fix for mode1 reset.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:09:42 -04:00
Jesse Zhang
64da71ea76 drm/amdgpu: fix invadate operation for umsch
Since the type of data_size is uint32_t, adev->umsch_mm.data_size - 1 >> 16 >> 16 is 0
regardless of the values of its operands

So removing the operations upper_32_bits and lower_32_bits.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:09:35 -04:00
Jesse Zhang
030ffd4d43 drm/admgpu: fix dereferencing null pointer context
When user space sets an invalid ta type, the pointer context will be empty.
So it need to check the pointer context before using it

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:09:15 -04:00
Yang Wang
9262f411dc drm/amdgpu: skip to create ras xxx_err_count node when ACA is enabled
skip to create 'xxx_err_count' node when ACA is enabled.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:08:25 -04:00
Jani Nikula
42505ab120 drm/amdgpu: remove amdgpu_connector_edid() and stop using edid_blob_ptr
amdgpu_connector_edid() copies the EDID from edid_blob_ptr as a side
effect if amdgpu_connector->edid isn't initialized. However, everywhere
that the returned EDID is used, the EDID should have been set
beforehands.

Only the drm EDID code should look at the EDID property, anyway, so stop
using it.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pan
Cc: amd-gfx@lists.freedesktop.org
Reviewed-by: Robert Foss <rfoss@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1463862965d76e9458551598fd4d287a08d3d264.1715353572.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-05-23 14:37:24 +03:00
Steven Rostedt (Google)
2c92ca849f tracing/treewide: Remove second parameter of __assign_str()
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.

This means that with:

  __string(field, mystring)

Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.

There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:

  git grep -l __assign_str | while read a ; do
      sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
      mv /tmp/test-file $a;
  done

I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.

Note, the same updates will need to be done for:

  __assign_str_len()
  __assign_rel_str()
  __assign_rel_str_len()

I tested this with both an allmodconfig and an allyesconfig (build only for both).

[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/

Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>	# xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-22 20:14:47 -04:00
Li Ma
e64e8f7c17 drm/amdgpu/atomfirmware: add intergrated info v2.3 table
[Why]
The vram width value is 0.
Because the integratedsysteminfo table in VBIOS has updated to 2.3.

[How]
Driver needs a new intergrated info v2.3 table too.
Then the vram width value will be correct.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-05-22 12:01:39 -04:00
Linus Torvalds
f0bae243b2 pci-v6.10-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmZLzNIUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwr/Q//STe2XGKI8bAKqP2wbbkzm+ISnK4A
 Lqf3FEAIXunxDRspszfXKKV2p4vaIkmOFiwIdtp/kWvd0DQn5+ATXJ/iQtp8aFX/
 R+6BQ7EZc2G7fN5fbQuK54+CvmWEpkKEMbXYbd6ivQ14Cijdb3Nbu+w+DYFjS+6C
 k2a9lS1bTW7Xcy0fyiO1w6GQiWqtmOH8U3OlQtIrI0EVkDG9OG1LsLuc92/FgkOo
 REN+sU+hX1K5fHrvm2CtjYDn/9/B6bJ/It22H1dPgUL9nKvKC67fYzosMtUCOX1M
 6XSPjZIuXOmQGeZXHhpSlVwaidxoUjYO98I7nMquxKdCy6yct3geK7ULG/xeQCgD
 ML7MGQB4+sTiSWalXUQaziKqF1FIDEvU3HMGXFWnoBL5l56eRp8KS1EI9Eqk9pU3
 pk9fJaCkcFnkzPtMFzqPOm5q9zUZ6bGbfYb0hs72TUKplmVDhFo2T1YsW2AOyHZ7
 mjuDzUYZX0H7uM1tntA56IgZX+oNOrLvhBt5L5M/BQeCsZFBBUfIcAEaYoL9LwXO
 AYgIG3jdqzHHyAUzutJF+XHKinJLMHm0XVYbFmO6saPhFzrUJSNHqT7NzW1DGGTl
 OnO8e1WNMX1EcnKvnc6fXyGmM3SgVwy45FsbG/zRnhn4uBKqKtjrh6uX/myA22LK
 CSeqSUK9XmXxFNA=
 =xjoS
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Skip E820 checks for MCFG ECAM regions for new (2016+) machines,
     since there's no requirement to describe them in E820 and some
     platforms require ECAM to work (Bjorn Helgaas)

   - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien
     Le Moal)

   - Remove last user and pci_enable_device_io() (Heiner Kallweit)

   - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen)

   - Skip waiting for devices that have been disconnected while
     suspended (Ilpo Järvinen)

   - Clear Secondary Status errors after enumeration since Master Aborts
     and Unsupported Request errors are an expected part of enumeration
     (Vidya Sagar)

  MSI:

   - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas)

  Error handling:

   - Mask Genesys GL975x SD host controller Replay Timer Timeout
     correctable errors caused by a hardware defect; the errors cause
     interrupts that prevent system suspend (Kai-Heng Feng)

   - Fix EDR-related _DSM support, which previously evaluated revision 5
     but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan)

  ASPM:

   - Simplify link state definitions and mask calculation (Ilpo
     Järvinen)

  Power management:

   - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS
     apparently doesn't know how to put them back in D0 (Mario
     Limonciello)

  CXL:

   - Support resetting CXL devices; special handling required because
     CXL Ports mask Secondary Bus Reset by default (Dave Jiang)

  DOE:

   - Support DOE Discovery Version 2 (Alexey Kardashevskiy)

  Endpoint framework:

   - Set endpoint BAR to be 64-bit if the driver says that's all the
     device supports, in addition to doing so if the size is >2GB
     (Niklas Cassel)

   - Simplify endpoint BAR allocation and setting interfaces (Niklas
     Cassel)

  Cadence PCIe controller driver:

   - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof
     Kozlowski)

  Cadence PCIe endpoint driver:

   - Configure endpoint BARs to be 64-bit based on the BAR type, not the
     BAR value (Niklas Cassel)

  Freescale Layerscape PCIe controller driver:

   - Convert DT binding to YAML (Frank Li)

  MediaTek MT7621 PCIe controller driver:

   - Add DT binding missing 'reg' property for child Root Ports
     (Krzysztof Kozlowski)

   - Fix theoretical string truncation in PHY name (Sergio Paracuellos)

  NVIDIA Tegra194 PCIe controller driver:

   - Return success for endpoint probe instead of falling through to the
     failure path (Vidya Sagar)

  Renesas R-Car PCIe controller driver:

   - Add DT binding missing IOMMU properties (Geert Uytterhoeven)

   - Add DT binding R-Car V4H compatible for host and endpoint mode
     (Yoshihiro Shimoda)

  Rockchip PCIe controller driver:

   - Configure endpoint BARs to be 64-bit based on the BAR type, not the
     BAR value (Niklas Cassel)

   - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski)

   - Set the Subsystem Vendor ID, which was previously zero because it
     was masked incorrectly (Rick Wertenbroek)

  Synopsys DesignWare PCIe controller driver:

   - Restructure DBI register access to accommodate devices where this
     requires Refclk to be active (Manivannan Sadhasivam)

   - Remove the deinit() callback, which was only need by the
     pcie-rcar-gen4, and do it directly in that driver (Manivannan
     Sadhasivam)

   - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean
     up things like eDMA (Manivannan Sadhasivam)

   - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel
     to dw_pcie_ep_init() (Manivannan Sadhasivam)

   - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to
     reflect the actual functionality (Manivannan Sadhasivam)

   - Call dw_pcie_ep_init_registers() directly from all the glue
     drivers, not just those that require active Refclk from the host
     (Manivannan Sadhasivam)

   - Remove the "core_init_notifier" flag, which was an obscure way for
     glue drivers to indicate that they depend on Refclk from the host
     (Manivannan Sadhasivam)

  TI J721E PCIe driver:

   - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli)

   - Add DT binding J722S SoC support (Siddharth Vadapalli)

  TI Keystone PCIe controller driver:

   - Add DT binding missing num-viewport, phys and phy-name properties
     (Jan Kiszka)

  Miscellaneous:

   - Constify and annotate with __ro_after_init (Heiner Kallweit)

   - Convert DT bindings to YAML (Krzysztof Kozlowski)

   - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming
     Zhou)"

* tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
  PCI: Do not wait for disconnected devices when resuming
  x86/pci: Skip early E820 check for ECAM region
  PCI: Remove unused pci_enable_device_io()
  ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io()
  PCI: Update pci_find_capability() stub return types
  PCI: Remove PCI_IRQ_LEGACY
  scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY
  scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios
  Revert "genirq/msi: Provide constants for PCI/IMS support"
  Revert "x86/apic/msi: Enable PCI/IMS"
  Revert "iommu/vt-d: Enable PCI/IMS"
  Revert "iommu/amd: Enable PCI/IMS"
  Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support"
  ...
2024-05-21 10:09:28 -07:00
Lang Yu
eb853413d0 drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
Small APUs(i.e., consumer, embedded products) usually have a small
carveout device memory which can't satisfy most compute workloads
memory allocation requirements.

We can't even run a Basic MNIST Example with a default 512MB carveout.
https://github.com/pytorch/examples/tree/main/mnist. Error Log:

"torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate
84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes
is free. Of the allocated memory 103.83 MiB is allocated by PyTorch,
and 22.17 MiB is reserved by PyTorch but unallocated"

Though we can change BIOS settings to enlarge carveout size,
which is inflexible and may bring complaint. On the other hand,
the memory resource can't be effectively used between host and device.

The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
Then device and host can flexibly and effectively share memory resource.

v2: Report local_mem_size_private as 0. (Felix)

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 17:17:53 -04:00
Lang Yu
2a705f3e49 drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
Two attachments use the same VM, root PD would be locked twice.

[   57.910418] Call Trace:
[   57.793726]  ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu]
[   57.793820]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu]
[   57.793923]  ? idr_get_next_ul+0xbe/0x100
[   57.793933]  kfd_process_device_free_bos+0x7e/0xf0 [amdgpu]
[   57.794041]  kfd_process_wq_release+0x2ae/0x3c0 [amdgpu]
[   57.794141]  ? process_scheduled_works+0x29c/0x580
[   57.794147]  process_scheduled_works+0x303/0x580
[   57.794157]  ? __pfx_worker_thread+0x10/0x10
[   57.794160]  worker_thread+0x1a2/0x370
[   57.794165]  ? __pfx_worker_thread+0x10/0x10
[   57.794167]  kthread+0x11b/0x150
[   57.794172]  ? __pfx_kthread+0x10/0x10
[   57.794177]  ret_from_fork+0x3d/0x60
[   57.794181]  ? __pfx_kthread+0x10/0x10
[   57.794184]  ret_from_fork_asm+0x1b/0x30

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 17:17:53 -04:00
Tvrtko Ursulin
9c1a429217 drm/amdgpu: Fix amdgpu_vm_is_bo_always_valid kerneldoc
Align kerneldoc with the function argument name.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 26e20235ce ("drm/amdgpu: Add amdgpu_bo_is_vm_bo helper")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:26 -04:00
Dr. David Alan Gilbert
191ef65b4e drm/amdgpu: remove unused struct 'hqd_registers'
'hqd_registers' used to be used in a member of the 'bonaire_mqd'
struct. 'bonaire_mqd' was removed by
commit 486d807cd9 ("drm/amdgpu: remove duplicate definition of cik_mqd")
It's now unused.

Remove 'hqd_registers' as well.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:26 -04:00
Tao Zhou
2aadb520bf drm/amdgpu: update type of buf size to u32 for eeprom functions
Avoid overflow issue.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:26 -04:00
Victor Skvortsov
5434bc03f5 drm/amdgpu: Queue KFD reset workitem in VF FED
The guest recovery sequence is buggy in Fatal Error when both
FLR & KFD reset workitems are queued at the same time. In addition,
FLR guest recovery sequence is out of order when PF/VF communication
breaks due to a GPU fatal error

As a temporary work around, perform a KFD style reset (Initiate reset
request from the guest) inside the pf2vf thread on FED.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:25 -04:00
Victor Skvortsov
3a19a8af64 drm/amdgpu: Extend KIQ reg polling wait for VF
Runtime KIQ interface to read/write registers in VF may take longer
than expected for BM environment. Extend the timeout.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:25 -04:00
Linus Torvalds
ff9a79307f Kbuild updates for v6.10
- Avoid 'constexpr', which is a keyword in C23
 
  - Allow 'dtbs_check' and 'dt_compatible_check' run independently of
    'dt_binding_check'
 
  - Fix weak references to avoid GOT entries in position-independent
    code generation
 
  - Convert the last use of 'optional' property in arch/sh/Kconfig
 
  - Remove support for the 'optional' property in Kconfig
 
  - Remove support for Clang's ThinLTO caching, which does not work with
    the .incbin directive
 
  - Change the semantics of $(src) so it always points to the source
    directory, which fixes Makefile inconsistencies between upstream and
    downstream
 
  - Fix 'make tar-pkg' for RISC-V to produce a consistent package
 
  - Provide reasonable default coverage for objtool, sanitizers, and
    profilers
 
  - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
 
  - Remove the last use of tristate choice in drivers/rapidio/Kconfig
 
  - Various cleanups and fixes in Kconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX
 2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa
 msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj
 GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi
 inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ
 lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv
 JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm
 Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw
 y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL
 orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2
 aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd
 ZCSnZ31h7woGfNho
 =D5B/
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Avoid 'constexpr', which is a keyword in C23

 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of
   'dt_binding_check'

 - Fix weak references to avoid GOT entries in position-independent code
   generation

 - Convert the last use of 'optional' property in arch/sh/Kconfig

 - Remove support for the 'optional' property in Kconfig

 - Remove support for Clang's ThinLTO caching, which does not work with
   the .incbin directive

 - Change the semantics of $(src) so it always points to the source
   directory, which fixes Makefile inconsistencies between upstream and
   downstream

 - Fix 'make tar-pkg' for RISC-V to produce a consistent package

 - Provide reasonable default coverage for objtool, sanitizers, and
   profilers

 - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.

 - Remove the last use of tristate choice in drivers/rapidio/Kconfig

 - Various cleanups and fixes in Kconfig

* tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits)
  kconfig: use sym_get_choice_menu() in sym_check_prop()
  rapidio: remove choice for enumeration
  kconfig: lxdialog: remove initialization with A_NORMAL
  kconfig: m/nconf: merge two item_add_str() calls
  kconfig: m/nconf: remove dead code to display value of bool choice
  kconfig: m/nconf: remove dead code to display children of choice members
  kconfig: gconf: show checkbox for choice correctly
  kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal
  Makefile: remove redundant tool coverage variables
  kbuild: provide reasonable defaults for tool coverage
  modules: Drop the .export_symbol section from the final modules
  kconfig: use menu_list_for_each_sym() in sym_check_choice_deps()
  kconfig: use sym_get_choice_menu() in conf_write_defconfig()
  kconfig: add sym_get_choice_menu() helper
  kconfig: turn defaults and additional prompt for choice members into error
  kconfig: turn missing prompt for choice members into error
  kconfig: turn conf_choice() into void function
  kconfig: use linked list in sym_set_changed()
  kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED
  kconfig: gconf: remove debug code
  ...
2024-05-18 12:39:20 -07:00
Yang Wang
062a7ce676 drm/amdgpu: fix ACA no query result after gpu reset
fix ACA no query result after gpu reset.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:39 -04:00
Lijo Lazar
546e6309d1 drm/amd/pm: Remove legacy interface for xgmi plpd
Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI
PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client.
Remove that as well.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:39 -04:00
Yang Wang
258ed689bc drm/amdgpu: change bank cache lock type to spinlock
modify the lock type to 'spinlock' to avoid schedule issue
in interrupt context.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:39 -04:00
Yang Wang
f6bce954f4 drm/amdgpu: change aca bank error lock type to spinlock
modify the lock type to 'spinlock' to avoid schedule issue
in interrupt context.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Tvrtko Ursulin
50bff04d02 drm/amdgpu: Describe all object placements in debugfs
Accurately show all placements when describing objects in debugfs, instead
of bunching them up under the 'CPU' placement.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Tvrtko Ursulin
8fb0efb101 drm/amdgpu: Reduce mem_type to domain double indirection
All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM
placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT,
depending on AMDGPU_GEM_CREATE_PREEMPTIBLE.

Simplify a few places in the code which convert the TTM placement into
a domain by checking against the current placement directly.

In the conversion AMDGPU_PL_PREEMPT either does not have to be handled
because amdgpu_mem_type_to_domain() cannot return that value anyway.

v2:
 * Remove AMDGPU_PL_PREEMPT handling.

v3:
 * Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com> # v1
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> # v2
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Tvrtko Ursulin
26e20235ce drm/amdgpu: Add amdgpu_bo_is_vm_bo helper
Help code readability by replacing a bunch of:

bo->tbo.base.resv == vm->root.bo->tbo.base.resv

With:

amdgpu_vm_is_bo_always_valid(vm, bo)

No functional changes.

v2:
 * Rename helper and move to amdgpu_vm. (Christian)

v3:
 * Use Christian's kerneldoc.

v4:
 * Fixed logic inversion in amdgpu_vm_bo_get_memory.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Srinivasan Shanmugam
e060c7ba7e drm/amdgpu: Remove duplicate check for *is_queue_unmap in sdma_v7_0_ring_set_wptr
This commit removes a duplicate check for *is_queue_unmap in the
sdma_v7_0_ring_set_wptr function. The check at line 171 was considered
dead code because at this point in the code, we already know that
*is_queue_unmap is false due to the check at line 161.

By removing this unnecessary check, improves the readability of the
code

Fixes the below:
	drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c:171 sdma_v7_0_ring_set_wptr()
	warn: duplicate check '*is_queue_unmap' (previous on line 161)

drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
    140 static void sdma_v7_0_ring_set_wptr(struct amdgpu_ring *ring)
    141 {
    142         struct amdgpu_device *adev = ring->adev;
    143         uint32_t *wptr_saved;
    144         uint32_t *is_queue_unmap;
    145         uint64_t aggregated_db_index;
    146         uint32_t mqd_size = adev->mqds[AMDGPU_HW_IP_DMA].mqd_size;
    147
    148         DRM_DEBUG("Setting write pointer\n");
    149
    150         if (ring->is_mes_queue) {
    151                 wptr_saved = (uint32_t *)(ring->mqd_ptr + mqd_size);
    152                 is_queue_unmap = (uint32_t *)(ring->mqd_ptr + mqd_size +
                        ^^^^^^^^^^^^^^^^ Set here

    153                                               sizeof(uint32_t));
    154                 aggregated_db_index =
    155                         amdgpu_mes_get_aggregated_doorbell_index(adev,
    156                                                          ring->hw_prio);
    157
    158                 atomic64_set((atomic64_t *)ring->wptr_cpu_addr,
    159                              ring->wptr << 2);
    160                 *wptr_saved = ring->wptr << 2;
    161                 if (*is_queue_unmap) {
                            ^^^^^^^^^^^^^^^ Checked here

    162                         WDOORBELL64(aggregated_db_index, ring->wptr << 2);
    163                         DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n",
    164                                         ring->doorbell_index, ring->wptr << 2);
    165                         WDOORBELL64(ring->doorbell_index, ring->wptr << 2);
    166                 } else {
    167                         DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n",
    168                                         ring->doorbell_index, ring->wptr << 2);
    169                         WDOORBELL64(ring->doorbell_index, ring->wptr << 2);
    170
--> 171                         if (*is_queue_unmap)
                                    ^^^^^^^^^^^^^^^ This is dead code.  We know it's false.

    172                                 WDOORBELL64(aggregated_db_index,
    173                                             ring->wptr << 2);
    174                 }
    175         } else {
    176                 if (ring->use_doorbell) {
    177                         DRM_DEBUG("Using doorbell -- "
    178                                   "wptr_offs == 0x%08x "

Fixes: b412351e91 ("drm/amdgpu: Add sdma v7_0 ip block support (v7)")
Cc: Likun Gao <Likun.Gao@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Tim Van Patten
1446226d32 drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1
The following commit updated gmc->noretry from 0 to 1 for GC HW IP
9.3.0:

    commit 5f3854f1f4 ("drm/amdgpu: add more cases to noretry=1")

This causes the device to hang when a page fault occurs, until the
device is rebooted. Instead, revert back to gmc->noretry=0 so the device
is still responsive.

Fixes: 5f3854f1f4 ("drm/amdgpu: add more cases to noretry=1")
Signed-off-by: Tim Van Patten <timvp@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Ruijing Dong
a1a9143c96 drm/amdgpu/vcn: update vcn5 enc/dec capabilities
Update the capabilities for supporting 8k encoding.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Jiapeng Chong
72e6ea95c4 drm/amdgpu: Remove duplicate amdgpu_umsch_mm.h header
./drivers/gpu/drm/amd/amdgpu/amdgpu.h: amdgpu_umsch_mm.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9063
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Kenneth Feng
d3620eeae8 drm/amd/amdgpu: add module parameter for jpeg
add module parameter for jpeg. this is a temporary workaround for jpeg unit test fail
on vcn 5.0 now. will be removed later.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Alex Deucher
736f911204 drm/amdgpu: fix documentation errors in gmc v12.0
Fix up parameter descriptions.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Sreekant Somasekharan
0cce5f285d drm/amdkfd: Check correct memory types for is_system variable
To catch GPU mapping of system memory, TTM_PL_TT and AMDGPU_PL_PREEMPT
must be checked.

Fixes: 628e1ace23 ("drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC")
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Alex Deucher
b72fa761fc drm/amdgpu: fix documentation errors in sdma v7.0
Fix up parameter descriptions.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Yang Wang
b712d7c201 drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()
create a new helper function to avoid compiler 'side-effect'
check about RAS_EVENT_LOG() macro.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Frank Min
9a55c77978 drm/amdgpu: fix getting vram info for gfx12
gfx12 query video mem channel/type/width from umc_info of atom list, so
fix it accordingly.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Frank Min
b80160a53a drm/amdgpu/mes: use mc address for wptr in add queue packet
use mc address for wptr in add queue packet

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Likun Gao
c14d5b5095 drm/amdgpu: enable gfxoff for gc v12.0.0
Enable GFXOFF for GC v12.0.0.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Likun Gao
d8cd2d617a drm/amd/amdgpu: enable mmhub and athub cg on gc 12.0.0
Enable mmhub and athub cg on gc 12.0.0

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Likun Gao
7be73af53b drm/amdgpu: switch default mes to uni mes
Switch the default mes to uni mes for gfx v12.
V2: remove uni_mes set for gfx v11.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Yang Wang
b2aa3d4b30 drm/amdgpu: add debug flag to enable RAS ACA
Use debug_mask=0x10 (BIT.4) param to help enable RAS ACA.
(RAS ACA is disabled by default.)

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Likun Gao
2ac72cbc7e drm/amdgpu: enable some cg feature for gc 12.0.0
Enable 3D cgcg, 3D cgls, sram fgcg, perfcounter mgcg,
repeater fgcg for gc v12.0.0.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Ma Jun
4c11d30c95 drm/amdgpu: Fix the null pointer dereference to ras_manager
Check ras_manager before using it

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Xiaogang Chen
f326d7cc74 drm/kfd: Correct pinned buffer handling at kfd restore and validate process
This reverts commit 8a774fe912 ("drm/amdgpu: avoid restore process run into dead loop")
since buffer got pinned is not related whether it needs mapping
And skip buffer validation at kfd driver if the buffer has been pinned.

Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Lijo Lazar
b194d21b9b drm/amdgpu: Use NPS ranges from discovery table
Add GMC API to fetch NPS range information from discovery table. Use NPS
range information in GMC 9.4.3 SOCs when available, otherwise fallback
to software method.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Friedrich Vock
0cdb3f9740 drm/amdgpu: Check if NBIO funcs are NULL in amdgpu_device_baco_exit
The special case for VM passthrough doesn't check adev->nbio.funcs
before dereferencing it. If GPUs that don't have an NBIO block are
passed through, this leads to a NULL pointer dereference on startup.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Fixes: 1bece222ea ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Lijo Lazar
ce798376ef drm/amdgpu: Fix memory range calculation
Consider the 16M reserved region also before range calculation for GMC
9.4.3 SOCs.

Fixes: a433f1f594 ("drm/amdgpu: Initialize memory ranges for GC 9.4.3")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Likun Gao
73fbc3e000 drm/amdgpu: enable gfx cgcg&cgls for gfx v12_0_0
Enable GFX CGCG and CGLS for gfx version 12.0.0.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:11:02 -04:00
Sonny Jiang
75125e6b4c drm/amdgpu/jpeg5: enable power gating
Enable PG on JPEG5

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:10:56 -04:00
Likun Gao
ef57158462 drm/amdgpu: support imu for gc 12_0_0
Support IMU for ASIC with GC 12.0.0

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:10:31 -04:00
Ma Jun
01b3297336 drm/amdgpu: Remove dead code in amdgpu_ras_add_mca_err_addr
Remove dead code in amdgpu_ras_add_mca_err_addr

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:09:55 -04:00
Ma Jun
d1a6bfff94 drm/amdgpu: Fix null pointer dereference to bo
Check bo before using it

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:09:47 -04:00
shaoyunl
4488cd671c drm/amdgpu: enable unmapped doorbell handling basic mode on mes 12
This reverts commit fcc5df722d.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:05:38 -04:00
Linus Torvalds
db5d28c0bf drm for 6.10-rc1
new drivers:
 - panthor: ARM Mali/Immortalis CSF-based GPU driver
 
 core:
 - add a CONFIG_DRM_WERROR option
 - make more headers self-contained
 - grab resv lock in pin/unpin
 - fix vmap resv locking
 - EDID/eDP panel matching
 - Kconfig cleanups
 - DT sound bindings
 - Add SIZE_HINTS property for cursor planes
 - Add struct drm_edid_product_id and helpers.
 - Use drm device based logging in more drm functions.
 - drop seq_file.h from a bunch of places
 - use drm_edid driver conversions
 
 dp:
 - DP Tunnel documentation
 - MST read sideband cap
 - Adaptive sync SDP prep work
 
 ttm:
 - improve placement for TTM BOs in idle/busy handling
 
 panic:
 - Fixes for drm-panic, and option to test it.
 - Add drm panic to simpledrm, mgag200, imx, ast
 
 bridge:
 - improve init ordering
 - adv7511: allow GPIO pin sharing
 - tc358775: add tc358675 support
 
 panel:
 - AUO B120XAN01.0
 - Samsung s6e3fa7
 - BOE NT116WHM-N44
 - CMN N116BCA-EA1,
 - CrystalClear CMT430B19N00
 - Startek KD050HDFIA020-C020A
 - powertip PH128800T006-ZHC01
 - Innolux G121X1-L03
 - LG sw43408
 - Khadas TS050 V2
 - EDO RM69380 OLED
 - CSOT MNB601LS1-1
 
 amdgpu:
 - HDCP/ODM/RAS fixes
 - Devcoredump improvements
 - Expose VCN activity via sysfs
 - SMY 13.0.x updates
 - Enable fast updates on DCN 3.1.4
 - Add dclk and vclk reporting on additional devices
 - Add ACA RAS infrastructure
 - Implement TLB flush fence
 - EEPROM handling fixes
 - SMUIO 14.0.2 support
 - SMU 14.0.1 Updates
 - SMU 14.0.2 support
 - Sync page table freeing with TLB flushes
 - DML2 refactor
 - DC debug improvements
 - DCN 3.5.x Updates
 - GPU reset fixes
 - HDP fix for second GFX pipe on GC 10.x
 - Enable secondary GFX pipe on GC 10.3
 - Refactor and clean up BACO/BOCO/BAMACO handling
 - Remove invalid TTM resource start check
 - UAF fix in VA IOCTL
 - GPUVM page fault redirection to secondary IH rings for IH 6.x
 - Initial support for mapping kernel queues via MES
 - Fix VRAM memory accounting
 
 amdkfd:
 - MQD handling cleanup
 - Preemption handling fixes for XCDs
 - TLB flush fix for GC 9.4.2
 - Properly clean up workqueue during module unload
 - Fix memory leak process create failure
 - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace
 - Fix eviction fence handling
 - Fix leak in GPU memory allocation failure case
 - DMABuf import handling fix
 - Enable SQ watchpoint for gfx10
 
 i915:
 - Adding new DG2 PCI ID
 - add context hints for GT frequency
 - enable only one CCS for compute workloads
 - new workarounds
 - Fix UAF on destroy against retire race and remove two earlier partial fixes
 - Limit the reserved VM space to only the platforms that need it
 - Fix gt reset with GuC submission is disable
 - Add and use gt_to_guc() wrapper
 
 i915/xe display:
 - Lunar Lake display enabling, including cdclk and other refactors
 - BIOS/VBT/opregion related refactor
 - Digital port related refactor/clean-up
 - Fix 2s boot time regression on DP panel replay init
 - Remove duplication on audio enable/disable on SDVO and g4x+ DP
 - Disable AuxCCS framebuffers if built for Xe
 - Make crtc disable more atomic
 - Increase DP idle pattern wait timeout to 2ms
 - Start using container_of_const() for some extra const safety
 - Fix Jasper Lake boot freeze
 - Enable MST mode for 128b/132b single-stream sideband
 - Enable Adaptive Sync SDP Support for DP
 - Fix MTL supported DP rates - removal of UHBR13.5
 - PLL refactoring
 - Limit eDP MSO pipe only for display version 20
 - More display refactor towards independence from i915 dev_priv
 - Convert i915/xe fbdev to DRM client
 - More initial work to make display code more independent from i915
 
 xe:
 - improved error capture
 - clean up some uAPI leftovers
 - devcoredump update
 - Add BMG mocs table
 - Handle GSCCS ER interrupt
 - Implement xe2- and GuC workarounds
 - struct xe_device cleanup
 - Hwmon updates
 - Add LRC parsing for more GPU instruction
 - Increase VM_BIND number of per-ioctl Ops
 - drm/xe: Add XE_BO_GGTT_INVALIDATE flag
 - Initial development for SR-IOV support
 - Add new PCI IDs to DG2 platform
 - Move userptr over to start using hmm_range_fault
 
 msm:
 - Switched to generating register header files during build process
   instead of shipping pre-generated headers
 - Merged DPU and MDP4 format databases.
 - DP:
 - Stop using compat string to distinguish DP and eDP cases
 - Added support for X Elite platform (X1E80100)
 - Reworked DP aux/audio support
 - Added SM6350 DP to the bindings
 - GPU:
 - a7xx perfcntr reg fixes
 - MAINTAINERS updates
 - a750 devcoredump support
 
 radeon:
 - Silence UBSAN warnings related to flexible arrays
 
 nouveau:
 - move some uAPI objects to uapi headers
 
 omapdrm:
 - console fix
 
 ast:
 - add i2c polling
 
 qaic:
 - add debugfs entries
 
 exynos:
 - fix platform_driver .owner
 - drop cleanup code
 
 mediatek:
 - Use devm_platform_get_and_ioremap_resource() in mtk_hdmi_ddc_probe()
 - Add GAMMA 12-bit LUT support for MT8188
 - Rename mtk_drm_* to mtk_*
 - Drop driver owner initialization
 - Correct calculation formula of PHY Timing
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZEUU0ACgkQDHTzWXnE
 hr5qMBAAjUFF0w3YOQMsn0LEAm628kMRHpoVeSXmIfO9z9lTyad30EtiS4ggFgj7
 Q/oQ6hHCd5jdsvGSJDgtTTAsTQX+aCkXrgf/18ENbqR5mM3MdefUAPR/zawZ7HR4
 8+b2h6p7gHBw8wDjuIvQ5e9InHcnIkKWJc82qnJG5Urgxa05SDh3mu3cosPTJiBw
 a851vlWaYcxC0yAUwJlWaXDdN8yzdFaSQNboZBS/CMLXF/WE6Ht257uxJmaouc0Y
 Z0kBybok5x0TPQEXF9IV+kuSW3EYpYcwRi0BFFM9sJjkEBdH3rYRZwuYP1LR+7VZ
 HKsmIkie8YzCm2VwTquYzUvHgF+swZX4RRch9XJlGz7UvBLc0eBO/2n4X6fNd8Kl
 QGNNqEfsnUQrAHKvGsOUgoGjSCmEo8voGcMZ3JPIAdJ/GcnJwpMvNxtF6XB08hEu
 rDxuU6o7WkM4dJbtiaFEHNh0Fmjj6aXdBL23UD9pcqPT1fc9cT3xnUd5RJIRuRwV
 /tpb2WfkFAoxCkKFiunaC4rE8oG6ME6wr/trYjvoYuhCI5hCVaXRBGzJEtC30IP6
 lG2YZ8r0jHjktbgjZ0Cz/hY424H4sxSN9SJAnXXFDzcfjBJ/nOgo5nMD1jKajAD5
 SYfqWaD5Y+YygtyLJPMfZQI2XMOpCzteXD8uaNXXFJfpV7Apeyg=
 =ocVM
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "This is the main pull request for the drm subsystems for 6.10.

  In drivers the main thing is a new driver for ARM Mali firmware based
  GPUs, otherwise there are a lot of changes to amdgpu/xe/i915/msm and
  scattered changes to everything else.

  In the core a bunch of headers and Kconfig was refactored, along with
  the addition of a new panic handler which is meant to provide a user
  friendly message when a panic happens and graphical display is
  enabled.

  New drivers:
   - panthor: ARM Mali/Immortalis CSF-based GPU driver

  Core:
   - add a CONFIG_DRM_WERROR option
   - make more headers self-contained
   - grab resv lock in pin/unpin
   - fix vmap resv locking
   - EDID/eDP panel matching
   - Kconfig cleanups
   - DT sound bindings
   - Add SIZE_HINTS property for cursor planes
   - Add struct drm_edid_product_id and helpers.
   - Use drm device based logging in more drm functions.
   - drop seq_file.h from a bunch of places
   - use drm_edid driver conversions

  dp:
   - DP Tunnel documentation
   - MST read sideband cap
   - Adaptive sync SDP prep work

  ttm:
   - improve placement for TTM BOs in idle/busy handling

  panic:
   - Fixes for drm-panic, and option to test it.
   - Add drm panic to simpledrm, mgag200, imx, ast

  bridge:
   - improve init ordering
   - adv7511: allow GPIO pin sharing
   - tc358775: add tc358675 support

  panel:
   - AUO B120XAN01.0
   - Samsung s6e3fa7
   - BOE NT116WHM-N44
   - CMN N116BCA-EA1,
   - CrystalClear CMT430B19N00
   - Startek KD050HDFIA020-C020A
   - powertip PH128800T006-ZHC01
   - Innolux G121X1-L03
   - LG sw43408
   - Khadas TS050 V2
   - EDO RM69380 OLED
   - CSOT MNB601LS1-1

  amdgpu:
   - HDCP/ODM/RAS fixes
   - Devcoredump improvements
   - Expose VCN activity via sysfs
   - SMY 13.0.x updates
   - Enable fast updates on DCN 3.1.4
   - Add dclk and vclk reporting on additional devices
   - Add ACA RAS infrastructure
   - Implement TLB flush fence
   - EEPROM handling fixes
   - SMUIO 14.0.2 support
   - SMU 14.0.1 Updates
   - SMU 14.0.2 support
   - Sync page table freeing with TLB flushes
   - DML2 refactor
   - DC debug improvements
   - DCN 3.5.x Updates
   - GPU reset fixes
   - HDP fix for second GFX pipe on GC 10.x
   - Enable secondary GFX pipe on GC 10.3
   - Refactor and clean up BACO/BOCO/BAMACO handling
   - Remove invalid TTM resource start check
   - UAF fix in VA IOCTL
   - GPUVM page fault redirection to secondary IH rings for IH 6.x
   - Initial support for mapping kernel queues via MES
   - Fix VRAM memory accounting

  amdkfd:
   - MQD handling cleanup
   - Preemption handling fixes for XCDs
   - TLB flush fix for GC 9.4.2
   - Properly clean up workqueue during module unload
   - Fix memory leak process create failure
   - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace
   - Fix eviction fence handling
   - Fix leak in GPU memory allocation failure case
   - DMABuf import handling fix
   - Enable SQ watchpoint for gfx10

  i915:
   - Adding new DG2 PCI ID
   - add context hints for GT frequency
   - enable only one CCS for compute workloads
   - new workarounds
   - Fix UAF on destroy against retire race and remove two earlier partial fixes
   - Limit the reserved VM space to only the platforms that need it
   - Fix gt reset with GuC submission is disable
   - Add and use gt_to_guc() wrapper

  i915/xe display:
   - Lunar Lake display enabling, including cdclk and other refactors
   - BIOS/VBT/opregion related refactor
   - Digital port related refactor/clean-up
   - Fix 2s boot time regression on DP panel replay init
   - Remove duplication on audio enable/disable on SDVO and g4x+ DP
   - Disable AuxCCS framebuffers if built for Xe
   - Make crtc disable more atomic
   - Increase DP idle pattern wait timeout to 2ms
   - Start using container_of_const() for some extra const safety
   - Fix Jasper Lake boot freeze
   - Enable MST mode for 128b/132b single-stream sideband
   - Enable Adaptive Sync SDP Support for DP
   - Fix MTL supported DP rates - removal of UHBR13.5
   - PLL refactoring
   - Limit eDP MSO pipe only for display version 20
   - More display refactor towards independence from i915 dev_priv
   - Convert i915/xe fbdev to DRM client
   - More initial work to make display code more independent from i915

  xe:
   - improved error capture
   - clean up some uAPI leftovers
   - devcoredump update
   - Add BMG mocs table
   - Handle GSCCS ER interrupt
   - Implement xe2- and GuC workarounds
   - struct xe_device cleanup
   - Hwmon updates
   - Add LRC parsing for more GPU instruction
   - Increase VM_BIND number of per-ioctl Ops
   - drm/xe: Add XE_BO_GGTT_INVALIDATE flag
   - Initial development for SR-IOV support
   - Add new PCI IDs to DG2 platform
   - Move userptr over to start using hmm_range_fault

  msm:
   - Switched to generating register header files during build process
     instead of shipping pre-generated headers
   - Merged DPU and MDP4 format databases.
   - DP:
     - Stop using compat string to distinguish DP and eDP cases
     - Added support for X Elite platform (X1E80100)
     - Reworked DP aux/audio support
     - Added SM6350 DP to the bindings
   - GPU:
     - a7xx perfcntr reg fixes
     - MAINTAINERS updates
     - a750 devcoredump support

  radeon:
   - Silence UBSAN warnings related to flexible arrays

  nouveau:
   - move some uAPI objects to uapi headers

  omapdrm:
   - console fix

  ast:
   - add i2c polling

  qaic:
   - add debugfs entries

  exynos:
   - fix platform_driver .owner
   - drop cleanup code

  mediatek:
   - Use devm_platform_get_and_ioremap_resource() in mtk_hdmi_ddc_probe()
   - Add GAMMA 12-bit LUT support for MT8188
   - Rename mtk_drm_* to mtk_*
   - Drop driver owner initialization
   - Correct calculation formula of PHY Timing"

* tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel: (1477 commits)
  drm/xe/ads: Use flexible-array
  drm/xe: Use ordered WQ for G2H handler
  drm/msm/gen_header: allow skipping the validation
  drm/msm/a6xx: Cleanup indexed regs const'ness
  drm/msm: Add devcoredump support for a750
  drm/msm: Adjust a7xx GBIF debugbus dumping
  drm/msm: Update a6xx registers XML
  drm/msm: Fix imported a750 snapshot header for upstream
  drm/msm: Import a750 snapshot registers from kgsl
  MAINTAINERS: Add Konrad Dybcio as a reviewer for the Adreno driver
  MAINTAINERS: Add a separate entry for Qualcomm Adreno GPU drivers
  drm/msm/a6xx: Avoid a nullptr dereference when speedbin setting fails
  drm/msm/adreno: fix CP cycles stat retrieval on a7xx
  drm/msm/a7xx: allow writing to CP_BV counter selection registers
  drm: zynqmp_dpsub: Always register bridge
  Revert "drm/bridge: ti-sn65dsi83: Fix enable error path"
  drm/fb_dma: Add checks in drm_fb_dma_get_scanout_buffer()
  drm/fbdev-generic: Do not set physical framebuffer address
  drm/panthor: Fix the FW reset logic
  drm/panthor: Make sure we handle 'unknown group state' case properly
  ...
2024-05-15 09:43:42 -07:00
Frank Min
b61467778e drm/amdgpu: fix mqd corruption for gfx12
1. restore mqd from backup while resuming
2. use copy_toio and copy_fromio while mqd in vram

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:12:02 -04:00
Frank Min
ef168e6de9 drm/amdgpu: add initial value for gfx12 AGP aperture
add initial value for gfx12 AGP aperture

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:12:02 -04:00
Jesse Zhang
e6ae021adb drm/amdgpu: fix the warning bad bit shift operation for aca_error_type type
Filter invalid aca error types before performing a shift operation.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:12:02 -04:00