2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

12981 Commits

Author SHA1 Message Date
Srinivasan Shanmugam
a0cc8e1512 drm/amdgpu: Update min() to min_t() in 'amdgpu_info_ioctl'
Fixes the following:

WARNING: min() should probably be min_t(size_t, size, sizeof(ip))
+               ret = copy_to_user(out, &ip, min((size_t)size, sizeof(ip)));

And other style fixes:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Missing a blank line after declarations

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:47:26 -04:00
Srinivasan Shanmugam
ce83aa7bad drm/amdgpu: Remove else after return in 'is_fru_eeprom_supported'
Expressions under 'else' branch under case 'CHIP_SIENNA_CICHLID' in
function 'is_fru_eeprom_supported' are executed whenever the expression
in 'if' is False. Otherwise, return from case occurs. Therefore, there
is no need in 'else', and it has been removed.

Fixes the following:

WARNING: else is not generally useful after a break or return
+                               return false;
+                       } else {

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:47:26 -04:00
Srinivasan Shanmugam
50fbe0cc95 drm/amdgpu: Add -ENOMEM error handling when there is no memory
Return -ENOMEM, when there is no sufficient dynamically allocated memory

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:47:26 -04:00
Stanley.Yang
fcb7a1849a drm/amdgpu: Check APU flag to disable RAS
Only disable RAS by default for aqua vanjaram on APU platform.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:47:26 -04:00
Shiwu Zhang
9bc12db4e2 drm/amdgpu: fix the indexing issue during rlcg access ctrl init
In case that the GET_INST() is used for looping, only loops for the
times of actual num of xcc, otherwise GET_INST() will return the invalid
index, a.k.a -1

And also remove the redundant mask checking in case of GET_INST()

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:47:26 -04:00
Pierre-Eric Pelloux-Prayer
818c158fd4 drm/amdgpu: add VISIBLE info in amdgpu_bo_print_info
This allows tools to distinguish between VRAM and visible VRAM.

Use the opportunity to fix locking before accessing bo.

v2: squash in unused variable fix

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:47:26 -04:00
Srinivasan Shanmugam
30953c4d00 drm/amdgpu: Fix style issues in amdgpu_gem.c
Fixes the following to align to linux coding style:

WARNING: braces {} are not necessary for any arm of this statement
WARNING: Missing a blank line after declarations
ERROR: space prohibited before that close parenthesis ')'
WARNING: unnecessary whitespace before a quoted newline
WARNING: %LX is non-standard C, use %llX

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:41:00 -04:00
Lijo Lazar
6cb209ed68 drm/amdgpu: Update ring scheduler info as needed
Not all rings have scheduler associated. Only update scheduler data for
rings with scheduler. It could result in out of bound access as total
rings are more than those associated with particular IPs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:39:29 -04:00
sguttula
c6195ef5ee drm/amdgpu: Enabling FW workaround through shared memory for VCN4_0_2
This patch will enable VCN FW workaround using
DRM KEY INJECT WORKAROUND method,
which is helping in fixing the secure playback.

Signed-off-by: sguttula <Suresh.Guttula@amd.com>
Reviewed-by: Leo Liu <leo.liiu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:39:21 -04:00
Srinivasan Shanmugam
93125cb704 drm/amd/amdgpu: Fix warnings in amdgpu/amdgpu_display.c
Fixes the below checkpatch.pl warnings:

WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line
WARNING: suspect code indent for conditional statements (8, 12)
WARNING: braces {} are not necessary for single statement blocks

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:36:58 -04:00
Srinivasan Shanmugam
37c3fc6620 drm/amdgpu: Return -ENOMEM when there is no memory in 'amdgpu_gfx_mqd_sw_init'
Return -ENOMEM, when there is no sufficient dynamically allocated memory
to create MQD backup for ring

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:36:16 -04:00
Srinivasan Shanmugam
88dd0b188e drm/amdgpu: Fix do not add new typedefs in amdgpu_fw_attestation.c
Fixes the following to align to coding style:

WARNING: do not add new typedefs
+typedef struct FW_ATT_DB_HEADER

WARNING: do not add new typedefs
+typedef struct FW_ATT_RECORD

WARNING: Symbolic permissions 'S_IRUSR' are not preferred. Consider using octal permissions '0400'.
+                           S_IRUSR,

ERROR: "(foo*)" should be "(foo *)"
WARNING: please, no space before tabs

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:36:08 -04:00
Srinivasan Shanmugam
b25b359926 drm/amdgpu: Prefer #if IS_ENABLED over #if defined in amdgpu_drv.c
Adhere to linux coding style

Fixes the following:

WARNING: Prefer IS_ENABLED(<FOO>) to CONFIG_<FOO> || CONFIG_<FOO>_MODULE
+#if defined(CONFIG_DRM_RADEON) || defined(CONFIG_DRM_RADEON_MODULE)

WARNING: Prefer IS_ENABLED(<FOO>) to CONFIG_<FOO> || CONFIG_<FOO>_MODULE
+#if defined(CONFIG_DRM_RADEON) || defined(CONFIG_DRM_RADEON_MODULE)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:36:00 -04:00
Jonathan Kim
7a1c5c6753 drm/amdkfd: enable cooperative groups for gfx11
MES can concurrently schedule queues on the device that require
exclusive device access if marked exclusively_scheduled without the
requirement of GWS.  Similar to the F32 HWS, MES will manage
quality of service for these queues.
Use this for cooperative groups since cooperative groups are device
occupancy limited.

Since some GFX11 devices can only be debugged with partial CUs, do not
allow the debugging of cooperative groups on these devices as the CU
occupancy limit will change on attach.

In addition, zero initialize the MES add queue submission vector for MES
initialization tests as we do not want these to be cooperative
dispatches.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:35:43 -04:00
Horace Chen
83f24a8f05 drm/amdgpu: set sw state to gfxoff after SR-IOV reset
[Why]
Current SR-IOV will not set GC to off state, while it is a real
GC hard reset. Whthout GFX off flag, driver may do gfxhub invalidation
before firmware load and gfxhub gart enable. This operation may cause
CP to become busy because GC is not in the right state for invalidation.

[How]
Add a function for SR-IOV to clean up some sw state before recover. Set
adev->gfx.is_poweron to false to prevent gfxhub invalidation before gfx
firmware autoload complete.

Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: HaiJun Chang <HaiJun.Chang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:35:23 -04:00
Yang Li
f135b0fc31 drm/amdgpu: Fix one kernel-doc comment
Use colon to separate parameter name from their specific meaning.
silence the warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c:793: warning: Function parameter or member 'adev' not described in 'amdgpu_vm_pte_update_noretry_flags'

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-21 16:52:25 -04:00
Mario Limonciello
6b4cf4a35f drm/amd: Fix an error handling mistake in psp_sw_init()
If the second call to amdgpu_bo_create_kernel() fails, the memory
allocated from the first call should be cleared.  If the third call
fails, the memory from the second call should be cleared.

Fixes: b95b539168 ("drm/amdgpu/psp: move PSP memory alloc from hw_init to sw_init")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-21 16:52:25 -04:00
Victor Lu
9196b63bee drm/amdgpu: Fix infinite loop in gfxhub_v1_2_xcc_gart_enable (v2)
An instance of for_each_inst() was not changed to match its new
behaviour and is causing a loop.

v2: remove tmp_mask variable

Fixes: b579ea632f ("drm/amdgpu: Modify for_each_inst macro")
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-21 16:52:25 -04:00
Lijo Lazar
0bdebfef3f drm/amdgpu: Program xcp_ctl registers as needed
XCP_CTL register is expected to be programmed by firmware. Under certain
conditions FW may not have programmed it correctly. As a workaround,
program it when FW has not programmed the right values.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-21 16:52:25 -04:00
sguttula
5dbb59247b drm/amdgpu: allow secure submission on VCN4 ring
This patch will enable secure decode playback on VCN4_0_2

Signed-off-by: sguttula <Suresh.Guttula@amd.com>
Reviewed-by: Leo Liu <leo.liiu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-21 16:52:24 -04:00
Mario Limonciello
adf64e2142 drm/amd: Avoid reading the VBIOS part number twice
The VBIOS part number is read both in amdgpu_atom_parse() as well
as in atom_get_vbios_pn() and stored twice in the `struct atom_context`
structure. Remove the first unnecessary read and move the `pr_info`
line from that read into the second.

v2: squash in unused variable removal

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-21 16:52:24 -04:00
Guchun Chen
18cf073faa drm/amdgpu: use a macro to define no xcp partition case
~0 as no xcp partition is used in several places, so improve its
definition by a macro for code consistency.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:19:02 -04:00
Guchun Chen
e379b5e7dc drm/amdgpu/vm: use the same xcp_id from root PD
Other PDs/PTs allocation should just use the same xcp_id as that
stored in root PD.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:18:53 -04:00
Guchun Chen
5003ca63bc drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_create
Recent code set xcp_id stored from file private data when opening
device to amdgpu bo for accounting memory usage etc, but not all
VMs are attached to this fpriv structure like the vm cases in
amdgpu_mes_self_test, otherwise, KASAN will complain below out
of bound access. And more importantly, VM code should not touch
fpriv structure, so drop fpriv code handling from amdgpu_vm_pt.

[   77.292314] BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.293845] Read of size 4 at addr ffff888102c48a48 by task modprobe/1069
[   77.294146] Call Trace:
[   77.294178]  <TASK>
[   77.294208]  dump_stack_lvl+0x49/0x63
[   77.294260]  print_report+0x16f/0x4a6
[   77.294307]  ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.295979]  ? kasan_complete_mode_report_info+0x3c/0x200
[   77.296057]  ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.297556]  kasan_report+0xb4/0x130
[   77.297609]  ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.299202]  __asan_load4+0x6f/0x90
[   77.299272]  amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.300796]  ? amdgpu_init+0x6e/0x1000 [amdgpu]
[   77.302222]  ? amdgpu_vm_pt_clear+0x750/0x750 [amdgpu]
[   77.303721]  ? preempt_count_sub+0x18/0xc0
[   77.303786]  amdgpu_vm_init+0x39e/0x870 [amdgpu]
[   77.305186]  ? amdgpu_vm_wait_idle+0x90/0x90 [amdgpu]
[   77.306683]  ? kasan_set_track+0x25/0x30
[   77.306737]  ? kasan_save_alloc_info+0x1b/0x30
[   77.306795]  ? __kasan_kmalloc+0x87/0xa0
[   77.306852]  amdgpu_mes_self_test+0x169/0x620 [amdgpu]

v2: without specifying xcp partition for PD/PT bo, the xcp id is -1.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2686
Fixes: 3ebfd221c1 ("drm/amdkfd: Store xcp partition id to amdgpu bo")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:18:16 -04:00
Guchun Chen
50e633081e drm/amdgpu: Allocate root PD on correct partition
file_priv needs to be setup firstly, otherwise, root PD
will always be allocated on partition 0, even if opening
the device from other partitions.

Fixes: 3ebfd221c1 ("drm/amdkfd: Store xcp partition id to amdgpu bo")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:16:48 -04:00
Victor Lu
8ed49dd1d3 drm/amdgpu: Add RLCG interface driver implementation for gfx v9.4.3 (v3)
Add RLCG interface support for gfx v9.4.3 and multiple XCCs.
Do not enable it yet.

v2: Fix amdgpu_rlcg_reg_access_ctrl init, add support for multiple XCCs
    in amdgpu_mm_wreg_mmio_rlc

v3: Use GET_INST() when indexing amdgpu_rlcg_reg_access_ctrl

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:16:41 -04:00
Shashank Sharma
43c064db65 drm/amdgpu: create a new file for doorbell manager
This patch:
- creates a new file for doorbell management.
- moves doorbell code from amdgpu_device.c to this file.

V2:
 - remove doc from function declaration (Christian)
 - remove 'device' from function names to make it consistent (Alex)
 - add SPDX license identifier (Luben)

V3:
 - change license to MIT license(Christian)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:12:08 -04:00
Candice Li
5229a37e17 drm/amdgpu: Allow the initramfs generator to include psp_13_0_6_ta
Allow the initramfs generator to automatically include psp_13_0_6_ta
firmware to initramfs.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:11:49 -04:00
Stanley.Yang
276f6e8cb7 drm/amdgpu: Disable RAS by default on APU flatform
Disable RAS feature by default for aqua vanjaram on APU platform.

Changed from V1:
	Splite Disable RAS by default on APU platform into a
	separated patch.

Changed from V2:
	Avoid to modify global variable amdgpu_ras_enable.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:11:36 -04:00
Stanley.Yang
cb906ce32b drm/amdgpu: Enable aqua vanjaram RAS
Enable RAS for aqua vanjaram.

Changed from V1:
	Split the change in amdgpu_ras_asic_supported into a
	separated patch.

Changed from V2:
	Avoid to modify global variable amdgpu_ras_enable.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:11:23 -04:00
Srinivasan Shanmugam
a62e702ee1 drm/amdgpu: Avoid possiblity of kernel crash in 'gmc_v8_0, gmc_v7_0_init_microcode()'
If the function 'gmc_v8_0_ or gmc_v7_0_init_microcode()' fails, the
driver will just fail to load, hence return -EINVAL rather having BUG(),
fixes WARNING: Do not crash the kernel unless it is absolutely
unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead
of BUG() or variants

Fixes: 2f77b5931f ("drm/amdgpu: Fix error & warnings in gmc_v8_0.c")
Fixes: 0cfc1d6830 ("drm/amdgpu: Fix errors & warnings in gmc_ v6_0, v7_0.c")
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:09:30 -04:00
Saleemkhan Jamadar
33e88286d6 Revert "drm/amdgpu:update kernel vcn ring test"
VCN FW depncencies revert it to unblock others

This reverts commit f3fa86f5c7.

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Acked-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18 11:06:54 -04:00
Daniel Vetter
6c7f27441d drm-misc-next for v6.6:
UAPI Changes:
 
  * fbdev:
    * Make fbdev userspace interfaces optional; only leaves the
      framebuffer console active
 
  * prime:
    * Support dma-buf self-import for all drivers automatically: improves
      support for many userspace compositors
 
 Cross-subsystem Changes:
 
  * backlight:
    * Fix interaction with fbdev in several drivers
 
  * base: Convert struct platform.remove to return void; part of a larger,
    tree-wide effort
 
  * dma-buf: Acquire reservation lock for mmap() in exporters; part
    of an on-going effort to simplify locking around dma-bufs
 
  * fbdev:
    * Use Linux device instead of fbdev device in many places
    * Use deferred-I/O helper macros in various drivers
 
  * i2c: Convert struct i2c from .probe_new to .probe; part of a larger,
    tree-wide effort
 
  * video:
    * Avoid including <linux/screen_info.h>
 
 Core Changes:
 
  * atomic:
    * Improve logging
 
  * prime:
    * Remove struct drm_driver.gem_prime_mmap plus driver updates: all
      drivers now implement this callback with drm_gem_prime_mmap()
 
  * gem:
    * Support execution contexts: provides locking over multiple GEM
      objects
 
  * ttm:
    * Support init_on_free
    * Swapout fixes
 
 Driver Changes:
 
  * accel:
    * ivpu: MMU updates; Support debugfs
 
  * ast:
    * Improve device-model detection
    * Cleanups
 
  * bridge:
    * dw-hdmi: Improve support for YUV420 bus format
    * dw-mipi-dsi: Fix enable/disable of DSI controller
    * lt9611uxc: Use MODULE_FIRMWARE()
    * ps8640: Remove broken EDID code
    * samsung-dsim: Fix command transfer
    * tc358764: Handle HS/VS polarity; Use BIT() macro; Various cleanups
    * Cleanups
 
  * ingenic:
    * Kconfig REGMAP fixes
 
  * loongson:
    * Support display controller
 
  * mgag200:
    * Minor fixes
 
  * mxsfb:
    * Support disabling overlay planes
 
  * nouveau:
    * Improve VRAM detection
    * Various fixes and cleanups
 
  * panel:
    * panel-edp: Support AUO B116XAB01.4
    * Support Visionox R66451 plus DT bindings
    * Cleanups
 
  * ssd130x:
    * Support per-controller default resolution plus DT bindings
    * Reduce memory-allocation overhead
    * Cleanups
 
  * tidss:
    * Support TI AM625 plus DT bindings
    * Implement new connector model plus driver updates
 
  * vkms
    * Improve write-back support
    * Documentation fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmSvvRAACgkQaA3BHVML
 eiNpGQgAs8jq1XjN9t8jZsdgXnoCbkZyVUI2NO0HwoVwpRCLgbXp5AX5qq2oRciE
 TBhe4Fceh/ZsYqHTZQahnguxgRKM5JgXwbI4Z0iiOVcqasNbycaKAqipxJJ7kdo1
 qPhGCbgQFVX7oIq2xjfXehh6O0SYX+R9r88X8dMJxMYv/pcLwOHG74kS040WOcQq
 uATgcnobOf/D8ZmlqvfKGAeTUoFo/RSR2Uhlauka58qgeUbicrTELZT2barY9d+k
 as6U5vv4wx2zMklTkjrlkMpAT1ZpbB9d3jGHwL27VEnjlfd3wV2bdH7Dzn9qZRf/
 gn0ALg/b3u5yBWk/k7YBvijXyNcH6Q==
 =bBuG
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2023-07-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v6.6:

UAPI Changes:

 * fbdev:
   * Make fbdev userspace interfaces optional; only leaves the
     framebuffer console active

 * prime:
   * Support dma-buf self-import for all drivers automatically: improves
     support for many userspace compositors

Cross-subsystem Changes:

 * backlight:
   * Fix interaction with fbdev in several drivers

 * base: Convert struct platform.remove to return void; part of a larger,
   tree-wide effort

 * dma-buf: Acquire reservation lock for mmap() in exporters; part
   of an on-going effort to simplify locking around dma-bufs

 * fbdev:
   * Use Linux device instead of fbdev device in many places
   * Use deferred-I/O helper macros in various drivers

 * i2c: Convert struct i2c from .probe_new to .probe; part of a larger,
   tree-wide effort

 * video:
   * Avoid including <linux/screen_info.h>

Core Changes:

 * atomic:
   * Improve logging

 * prime:
   * Remove struct drm_driver.gem_prime_mmap plus driver updates: all
     drivers now implement this callback with drm_gem_prime_mmap()

 * gem:
   * Support execution contexts: provides locking over multiple GEM
     objects

 * ttm:
   * Support init_on_free
   * Swapout fixes

Driver Changes:

 * accel:
   * ivpu: MMU updates; Support debugfs

 * ast:
   * Improve device-model detection
   * Cleanups

 * bridge:
   * dw-hdmi: Improve support for YUV420 bus format
   * dw-mipi-dsi: Fix enable/disable of DSI controller
   * lt9611uxc: Use MODULE_FIRMWARE()
   * ps8640: Remove broken EDID code
   * samsung-dsim: Fix command transfer
   * tc358764: Handle HS/VS polarity; Use BIT() macro; Various cleanups
   * Cleanups

 * ingenic:
   * Kconfig REGMAP fixes

 * loongson:
   * Support display controller

 * mgag200:
   * Minor fixes

 * mxsfb:
   * Support disabling overlay planes

 * nouveau:
   * Improve VRAM detection
   * Various fixes and cleanups

 * panel:
   * panel-edp: Support AUO B116XAB01.4
   * Support Visionox R66451 plus DT bindings
   * Cleanups

 * ssd130x:
   * Support per-controller default resolution plus DT bindings
   * Reduce memory-allocation overhead
   * Cleanups

 * tidss:
   * Support TI AM625 plus DT bindings
   * Implement new connector model plus driver updates

 * vkms
   * Improve write-back support
   * Documentation fixes

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230713090830.GA23281@linux-uq9g
2023-07-17 15:37:57 +02:00
Dave Airlie
38d88d5e97 amd-drm-fixes-6.5-2023-07-12:
amdgpu:
 - SMU i2c locking fix
 - Fix a possible deadlock in process restoration for ROCm apps
 - Disable PCIe lane/speed switching on Intel platforms (the platforms don't support it)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZK7yYQAKCRC93/aFa7yZ
 2JvYAQDpMj8/rLUsmWRk30jvkaZivgaeUEAG0FGaMpKaaATbvwEA2eHxN3xk5GKs
 ethEPp/zdivIrz6h/JWSCFrpCzqg4g8=
 =rsRJ
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.5-2023-07-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.5-2023-07-12:

amdgpu:
- SMU i2c locking fix
- Fix a possible deadlock in process restoration for ROCm apps
- Disable PCIe lane/speed switching on Intel platforms (the platforms don't support it)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230712184009.7740-1-alexander.deucher@amd.com
2023-07-14 13:19:54 +10:00
Saleemkhan Jamadar
093b21f431 Revert "drm/amdgpu: update kernel vcn ring test"
VCN FW depncencies revert it to unlock others

This reverts commit 3ebfa943b8.

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Acked-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-13 17:32:40 -04:00
Guchun Chen
826c1e923b drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel
In below thousands of screen rotation loop tests with virtual display
enabled, a CPU hard lockup issue may happen, leading system to unresponsive
and crash.

do {
	xrandr --output Virtual --rotate inverted
	xrandr --output Virtual --rotate right
	xrandr --output Virtual --rotate left
	xrandr --output Virtual --rotate normal
} while (1);

NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

? hrtimer_run_softirq+0x140/0x140
? store_vblank+0xe0/0xe0 [drm]
hrtimer_cancel+0x15/0x30
amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
drm_vblank_disable_and_save+0x185/0x1f0 [drm]
drm_crtc_vblank_off+0x159/0x4c0 [drm]
? record_print_text.cold+0x11/0x11
? wait_for_completion_timeout+0x232/0x280
? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
? bit_wait_io_timeout+0xe0/0xe0
? wait_for_completion_interruptible+0x1d7/0x320
? mutex_unlock+0x81/0xd0
amdgpu_vkms_crtc_atomic_disable

It's caused by a stuck in lock dependency in such scenario on different
CPUs.

CPU1                                             CPU2
drm_crtc_vblank_off                              hrtimer_interrupt
    grab event_lock (irq disabled)                   __hrtimer_run_queues
        grab vbl_lock/vblank_time_block                  amdgpu_vkms_vblank_simulate
            amdgpu_vkms_disable_vblank                       drm_handle_vblank
                hrtimer_cancel                                         grab dev->event_lock

So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
current clock base, as that timer queue on CPU2 has no chance to finish it
because of failing to hold the lock. So NMI watchdog will throw the errors
after its threshold, and all later CPUs are impacted/blocked.

So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
does not need to wait the handler to finish. And also it's not necessary
to check the return value of hrtimer_try_to_cancel, because even if it's
-1 which means current timer callback is running, it will be reprogrammed
in hrtimer_start with calling enable_vblank to make it works.

v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
tag as well

v3: drop warn printing (Christian)

v4: drop superfluous check of blank->enabled in timer function, as it's
guaranteed in drm_handle_vblank (Christian)

Fixes: 84ec374bd5 ("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable@vger.kernel.org
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-13 17:32:15 -04:00
Srinivasan Shanmugam
2f77b5931f drm/amdgpu: Fix error & warnings in gmc_v8_0.c
Fix below checkpatch error & warnings:

ERROR: trailing statements should be on next line
+       default: BUG();

WARNING: braces {} are not necessary for single statement blocks
WARNING: braces {} are not necessary for any arm of this statement
WARNING: Block comments should align the * on each line

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-13 17:29:11 -04:00
Luben Tuikov
52b82609bf drm/amdgpu: Rename to amdgpu_vm_tlb_seq_struct
Rename struct amdgpu_vm_tlb_seq_cb {...} to struct amdgpu_vm_tlb_seq_struct
{...}, so as to not conflict with documentation processing tools. Of course, C
has no problem with this.

Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/b5ebc891-ee63-1638-0377-7b512d34b823@infradead.org
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 12:22:52 -04:00
Srinivasan Shanmugam
bd3c414254 drm/amdkfd: Fix stack size in 'amdgpu_amdkfd_unmap_hiq'
Dynamically allocate large local variable instead of putting it onto the
stack to avoid exceeding the stack size:
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c: In function ‘amdgpu_amdkfd_unmap_hiq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:868:1: warning: the frame size of 1280 bytes is larger than 1024 bytes [-Wframe-larger-than=]

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307080505.V12qS0oz-lkp@intel.com
Suggested-by: Guchun Chen <guchun.chen@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 12:22:52 -04:00
Mario Limonciello
188623076d drm/amd: Move helper for dynamic speed switch check out of smu13
This helper is used for checking if the connected host supports
the feature, it can be moved into generic code to be used by other
smu implementations as well.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-07-12 12:09:54 -04:00
gaba
8a774fe912 drm/amdgpu: avoid restore process run into dead loop.
In restore process worker, pinned BO cause update PTE fail, then
the function re-schedule the restore_work. This will generate dead loop.

Signed-off-by: gaba <gaba@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-07-12 12:07:13 -04:00
Mario Limonciello
5d1eb4c4c8 drm/amd: Move helper for dynamic speed switch check out of smu13
This helper is used for checking if the connected host supports
the feature, it can be moved into generic code to be used by other
smu implementations as well.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 11:12:10 -04:00
Saleemkhan Jamadar
3ebfa943b8 drm/amdgpu: update kernel vcn ring test
add session context buffer to decoder ring test for vcn v1 to v3.

v3 - correct the cmd for sesssion ctx buf
v2 - add the buffer into IB (Leo liu)

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 11:12:10 -04:00
Saleemkhan Jamadar
f3fa86f5c7 drm/amdgpu:update kernel vcn ring test
add session context buffer to decoder ring test.

v5 - clear the session ct buffer (Christian)
v4 - data type, explain change of ib size change (Christian)
v3 - indent and  v2 changes correction. (Christian)
v2 - put the buffer at the end of the IB (Christian)

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 11:12:09 -04:00
Tao Zhou
bd97449837 drm/amdgpu: add watchdog timer enablement for gfx_v9_4_3
Configure SQ watchdog timer setting.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 11:12:09 -04:00
Mukul Joshi
1879e009a4 drm/amdkfd: Update CWSR grace period for GFX9.4.3
For GFX9.4.3, setup a reduced default CWSR grace period equal to
1000 cycles instead of 64000 cycles.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 11:12:09 -04:00
Lang Yu
1ddcdb7cb6 drm/amdgpu: use psp_execute_load_ip_fw instead
Replace the old ones with psp_execute_load_ip_fw.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 11:12:09 -04:00
Lang Yu
45b51acb38 drm/amdgpu: rename psp_execute_non_psp_fw_load and make it global
This will make this function more general, and then serve other IPs.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 11:12:09 -04:00
Eric Huang
036e348fdc drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3
Implement the similarities as GC v9.4.2, and the difference
for GC v9.4.3 HW spec, i.e. xcc instance.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 10:58:01 -04:00
Arnd Bergmann
822130b5e8 drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()
On 32-bit architectures comparing a resource against a value larger than
U32_MAX can cause a warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1344:18: error: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
                    res->start > 0x100000000ull)
                    ~~~~~~~~~~ ^ ~~~~~~~~~~~~~~

As gcc does not warn about this in dead code, add an IS_ENABLED() check at
the start of the function. This will always return success but not actually resize
the BAR on 32-bit architectures without high memory, which is exactly what
we want here, as the driver can fall back to bank switching the VRAM
access.

Fixes: 31b8adab32 ("drm/amdgpu: require a root bus window above 4GB for BAR resize")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 10:57:41 -04:00
Philip Yang
bf80d34b6c drm/amdgpu: Increase soft IH ring size
Retry faults are delegated to soft IH ring and then processed by
deferred worker. Current soft IH ring size PAGE_SIZE can store 128
entries, which may overflow and drop retry faults, causes HW stucks
because the retry fault is not recovered.

Increase soft IH ring size to 8KB, enough to store 256 CAM entries
because we clear the CAM entry after handling the retry fault from soft
ring.

Define macro IH_RING_SIZE and IH_SW_RING_SIZE to remove duplicate
constant.

Show warning message if soft IH ring overflows with CAM enabled because
this should not happen.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 10:57:25 -04:00
Alex Deucher
95b88ea1af drm/amdgpu/gfx10: move update_spm_vmid() out of rlc_init()
rlc_init() is part of sw_init() so it should not touch hardware.
Additionally, calling the rlc update_spm_vmid() callback
directly invokes a gfx on/off cycle which could result in
powergating being enabled before hw init is complete.  Split
update_spm_vmid() into an internal implementation for local
use without gfxoff interaction and then the rlc callback
which includes gfxoff handling.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 10:57:22 -04:00
Alex Deucher
08b6e1725d drm/amdgpu/gfx9: move update_spm_vmid() out of rlc_init()
rlc_init() is part of sw_init() so it should not touch hardware.
Additionally, calling the rlc update_spm_vmid() callback
directly invokes a gfx on/off cycle which could result in
powergating being enabled before hw init is complete.  Split
update_spm_vmid() into an internal implementation for local
use without gfxoff interaction and then the rlc callback
which includes gfxoff handling.  lbpw_init also touches
hardware so mvoe that to rlc_resume as well.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-12 10:57:15 -04:00
Christian König
ca6c1e210a drm/amdgpu: use the new drm_exec object for CS v3
Use the new component here as well and remove the old handling.

v2: drop dupplicate handling
v3: fix memory leak pointed out by Tatsuyuki

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711133122.3710-7-christian.koenig@amd.com
2023-07-12 14:14:50 +02:00
Christian König
2acc73f81f drm/amdgpu: use drm_exec for MES testing
Start using the new component here as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711133122.3710-6-christian.koenig@amd.com
2023-07-12 14:14:44 +02:00
Christian König
8a206685d3 drm/amdgpu: use drm_exec for GEM and CSA handling v2
Start using the new component here as well.

v2: ignore duplicates to allow per VM BO mappings

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711133122.3710-5-christian.koenig@amd.com
2023-07-12 14:14:36 +02:00
Christian König
8abc1eb298 drm/amdkfd: switch over to using drm_exec v3
Avoids quite a bit of logic and kmalloc overhead.

v2: fix multiple problems pointed out by Felix
v3: two more nit picks from Felix fixed

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711133122.3710-4-christian.koenig@amd.com
2023-07-12 14:14:26 +02:00
Srinivasan Shanmugam
6dda3f18bd drm/amdgpu: Fix errors & warnings in gfx_v10_0.c
Fix the below checkpatch errors & warnings:

ERROR: that open brace { should be on the previous line
ERROR: space prohibited before that ',' (ctx:WxV)
ERROR: space required after that ',' (ctx:WxV)
ERROR: code indent should use tabs where possible
ERROR: switch and case should be at the same indent

WARNING: please, no spaces at the start of a line
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: space prohibited before semicolon
WARNING: Block comments use a trailing */ on a separate line
WARNING: Block comments use * on subsequent lines
WARNING: braces {} are not necessary for any arm of this statement
WARNING: Missing a blank line after declarations

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:37 -04:00
Srinivasan Shanmugam
fe018cf2a1 drm/amdgpu: Fix warnings in gfxhub_ v3_0, v3_0_3.c
Fix the below checkpatch warnings:

WARNING: static const char * array should probably be static const char * const
+static const char *gfxhub_client_ids[] = {

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+       unsigned i;

WARNING: static const char * array should probably be static const char * const
+static const char *gfxhub_client_ids[] = {

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+       unsigned i;

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:37 -04:00
Srinivasan Shanmugam
e8483e682a drm/amdgpu: Fix warnings in gmc_v8_0.c
Fix below checkpatch warnings:

WARNING: braces {} are not necessary for single statement blocks
WARNING: braces {} are not necessary for any arm of this statement
WARNING: Block comments should align the * on each line

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:37 -04:00
gaba
edc857a682 drm/amdgpu: avoid restore process run into dead loop.
In restore process worker, pinned BO cause update PTE fail, then
the function re-schedule the restore_work. This will generate dead loop.

Signed-off-by: gaba <gaba@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:37 -04:00
Guchun Chen
e2770d76d4 drm/amdgpu/vkms: drop redundant set of fb_modifiers_not_supported
Due to a coding typo.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:37 -04:00
Srinivasan Shanmugam
c7a6c2b6b8 drm/amdgpu: Remove else after return statement in 'gfx_v10_0_check_grbm_cam_remapping'
Fix below checkpatch warnings:

WARNING: else is not generally useful after a break or return
+                       return true;
+               } else {

WARNING: else is not generally useful after a break or return
+                       return true;
+               } else {

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:37 -04:00
Srinivasan Shanmugam
38d47145b0 drm/amdgpu: Fix warnings in gmc_v11_0.c
Fix below checkpatch warnings:

WARNING: quoted string split across lines
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: void function return statements are not generally useful
WARNING: braces {} are not necessary for any arm of this statement

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:37 -04:00
Srinivasan Shanmugam
b8f68f1da5 drm/amdgpu: Remove else after return statement in 'gmc_v8_0_check_soft_reset'
Fix below checkpatch warnings:

WARNING: else is not generally useful after a break or return
+               return true;
+       } else {

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:37 -04:00
Srinivasan Shanmugam
f51f2088f1 drm/amdgpu: Fix warnings in gfxhub_v2_1.c
Fix the below checkpatch warnings:

WARNING: static const char * array should probably be static const char * const
+static const char *gfxhub_client_ids[] = {

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+       unsigned i;

WARNING: Missing a blank line after declarations
+       int i;
+       adev->gmc.VM_L2_CNTL = RREG32_SOC15(GC, 0, mmGCVM_L2_CNTL);

WARNING: Missing a blank line after declarations
+       int i;
+       WREG32_SOC15(GC, 0, mmGCVM_L2_CNTL, adev->gmc.VM_L2_CNTL);

WARNING: braces {} are not necessary for single statement blocks
+       if (!time) {
+               DRM_WARN("failed to wait for GRBM(EA) idle\n");
+       }

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:36 -04:00
Srinivasan Shanmugam
0cfc1d6830 drm/amdgpu: Fix errors & warnings in gmc_ v6_0, v7_0.c
Fix below checkpatch errors & warnings:

ERROR: trailing statements should be on next line
+       default: BUG();
ERROR: trailing statements should be on next line

WARNING: braces {} are not necessary for single statement blocks
WARNING: braces {} are not necessary for any arm of this statement
WARNING: Block comments use * on subsequent lines
WARNING: Missing a blank line after declarations
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:36 -04:00
Srinivasan Shanmugam
8612a435f3 drm/amdgpu: Fix warnings in gmc_v10_0.c
Fix below checkpatch warnings:

WARNING: Consider removing the code enclosed by this #if 0 and its #endif
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: quoted string split across lines

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:36 -04:00
Srinivasan Shanmugam
e2710187bb drm/amdgpu: Prefer dev_warn over printk
Fix the below warning:

WARNING: Prefer [subsystem eg: netdev]_warn([subsystem]dev, ... then
dev_warn(dev, ... then pr_warn(...  to printk(KERN_WARNING ...

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:36 -04:00
Srinivasan Shanmugam
0e2b8507c4 drm/amdgpu: Fix warnings in gfxhub_v2_0.c
Fix the below checkpatch warnings:

WARNING: static const char * array should probably be static const char * const
+static const char *gfxhub_client_ids[] = {

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+       unsigned i;

WARNING: Missing a blank line after declarations
+       u32 tmp;
+       tmp = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL);

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:36 -04:00
Lijo Lazar
67769b7cdd drm/amdgpu: Remove redundant GFX v9.4.3 sequence
Programming of XCC id is already taken care with partition mode change.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:36 -04:00
Srinivasan Shanmugam
62e6771ae8 drm/amdgpu: Fix warnings in gfxhub_ v1_0, v1_2.c
Fix the below checkpatch warnings:

WARNING: Block comments should align the * on each line
+                       /*
+                       * Raven2 has a HW issue that it is unable to use the

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+       unsigned num_level, block_size;

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+       unsigned i;

WARNING: Missing a blank line after declarations
+       u32 tmp;
+       tmp = RREG32_SOC15(GC, 0, mmVM_L2_PROTECTION_FAULT_CNTL);

WARNING: Block comments should align the * on each line
+                               /*
+                               * Raven2 has a HW issue that it is unable to use the

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+       unsigned num_level, block_size;

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-10 09:02:36 -04:00
Srinivasan Shanmugam
08e8521576 drm/amdgpu: Fix error & warnings in gmc_v9_0.c
Fix below checkpatch error & warnings:

ERROR: that open brace { should be on the previous line

WARNING: static const char * array should probably be static const char * const
WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:48 -04:00
Lijo Lazar
4755bfbd99 drm/amdgpu: Change golden settings for GFX v9.4.3
Change the settings applicable for A0. GRBM_MCM_ADDR setting will be
applied by firmware.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:48 -04:00
Sreekant Somasekharan
c4cde7358d drm/amd/amdgpu: Add cu_occupancy sysfs file to GFX9.4.3
Include kgd_gfx_v9_get_cu_occupancy call inside kfd2kgd_calls for
GFX9.4.3 to expose cu_occupancy sysfs file.

Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:48 -04:00
Xiaogang Chen
eb58ad143d drm/amdgpu: have bos for PDs/PTS cpu accessible when kfd uses cpu to update vm
When kfd uses cpu to update vm iterates all current PDs/PTs bos, adds
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag and kmap them to kernel virtual
address space before kfd updates the vm that was created by gfx.

Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:48 -04:00
Mukul Joshi
9041b53a59 drm/amdkfd: Use KIQ to unmap HIQ
Currently, we unmap HIQ by directly writing to HQD
registers. This doesn't work for GFX9.4.3. Instead,
use KIQ to unmap HIQ, similar to how we use KIQ to
map HIQ. Using KIQ to unmap HIQ works for all GFX
series post GFXv9.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:48 -04:00
Tao Zhou
a80fe1a698 drm/amdgpu: skip address adjustment for GFX RAS injection
The address parameter of GFX RAS injection isn't related to XGMI node
number, keep it unchanged.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:47 -04:00
Mukul Joshi
e77673d14f drm/amdgpu: Update invalid PTE flag setting
Update the invalid PTE flag setting with TF enabled.
This is to ensure, in addition to transitioning the
retry fault to a no-retry fault, it also causes the
wavefront to enter the trap handler. With the current
setting, the fault only transitions to a no-retry fault.
Additionally, have 2 sets of invalid PTE settings, one for
TF enabled, the other for TF disabled. The setting with
TF disabled, doesn't work with TF enabled.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:47 -04:00
Alex Deucher
bc8ba5f2da drm/amdgpu: return an error if query_video_caps is not set
Should only be an issue for bring up when the function
pointer is not set, but check it anyway to be safe.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:47 -04:00
Mario Limonciello
a90d36a49a drm/amd: adjust whitespace for amdgpu_psp.h
Adjust the whitespace to be consistent with the rest of the
`struct psp_context` structure.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:47 -04:00
Mario Limonciello
e7347f1c73 drm/amd: Detect IFWI or PD upgrade support in psp_early_init()
Rather than evaluating the IP version for visibility, evaluate it
at the same time as the IP is initialized.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:47 -04:00
Mario Limonciello
649663af73 drm/amd: Add documentation for how to flash a dGPU
The flashing process for dGPUs uses sysfs files in a
non-obvious way, so document it for users.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:47 -04:00
Mario Limonciello
98d19a6c49 drm/amd: Convert USB-C PD F/W attributes into groups
Rather than special casing the creation of the file, special case
the visibility to the supported dGPUs.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:47 -04:00
Mario Limonciello
1cc506f08b drm/amd: Make flashing messages quieter
Debug messages related to the kernel process of flashing an updated
IFWI are needlessly noisy and also confusing.

Downgrade them to debug instead and clarify what they are actually
doing.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:47 -04:00
Mario Limonciello
521289d2a2 drm/amd: Use attribute groups for PSP flashing attributes
Individually creating attributes can be racy, instead make attributes
using attribute groups and control their visibility with an is_visible
callback to only show when using appropriate products.

v2: squash in fix for PSP 13.0.10

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:51:46 -04:00
Lijo Lazar
fe9aaddf90 drm/amdgpu: Rename aqua_vanjaram_reg_init.c
This contains SOC specific functions, rename to a more generic format
<soc>.c => aqua_vanjaram.c

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07 13:38:27 -04:00
Linus Torvalds
5133c9e51d drm fixes for 6.5-rc1
fbdev:
 - Fix module infos on sparc
 
 panel:
 - Fix mode on Starry-ili9882t
 
 i915:
 - Allow DC states along with PW2 only for PWB functionality [adlp+]
 - Fix SSC selection for MPLLA [mtl]
 - Use hw.adjusted mode when calculating io/fast wake times [psr]
 - Apply min softlimit correctly [guc/slpc]
 - Assign correct hdcp content type [hdcp]
 - Add missing forward declarations/includes to display power headers
 - Fix BDW PSR AUX CH data register offsets [psr]
 - Use mock device info for creating mock device
 
 amdgpu:
 - Misc cleanups
 - GFX 9.4.3 fixes
 - DEBUGFS build fix
 - Fix LPDDR5 reporting
 - ASPM fixes
 - DCN 3.1.4 fixes
 - DP MST fixes
 - DCN 3.2.x fixes
 - Display PSR TCON fixes
 - SMU 13.x fixes
 - RAS fixes
 - Vega12/20 SMU fixes
 - PSP flashing cleanup
 - GFX9 MCBP fixes
 - SR-IOV fixes
 - GPUVM clear mappings fix for always valid BOs
 - Add FAMS quirk for problematic monitor
 - Fix possible UAF
 - Better handle monentary temperature fluctuations
 - SDMA 4.4.2 fixes
 - Fencing fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmSnZkkACgkQDHTzWXnE
 hr5qlA//RZ5T0FaCV7lNJUkoHuHkd/nEyeqJCMD1GC+LJH7NgQv2ahIZte218nUZ
 602MMsWqJYLRrP13TbuYy/McKe56w3lq87eeILmV6CGAUB96yjqjTaHoKHU49/PC
 kTzpgBgcKo0gqp73B+FZ1qBE3I3RowPwPYnL3ddH236lv4FBus6nowwJf/pmPU8f
 ok41lefHyo601ypHFVAWRfJNl+qnXpBAvZxpeTlExqklqZLgvgNTwvJvNHmVUoi3
 hDgGJVQaMJH9O3Chwgr7JdE3Lk+ku+XH+CU3Qw0SdBB4NaMxU5KMagwQFYfV18cV
 VZ9DqL9G/y4xpRVl0anr/MM2eF4E0P/MZMCKGBHOPkP/Ol8mmlAUSDoIzDaeDiPc
 tGpSNuWywr/DzWBWUZFeZBYdR4a8A4jMg84SPzVvPNNjRUXMutKxFe9eMNCpDc6C
 yTInY4EuwrYSPFPa2sT/B5efKouieH4XsF6ORK3uct0ZWrGwf05DX+HdykhZQwH1
 eQRpWhR6lZKtovCceGFstxaC0B+Gh58D3uUFznxVKXEQyIVZvjEtm/aMLtHZTYT/
 Wtj0cAr12a1Mbyy5XXh1ZGeq5QqmIaQt/KGnatp2fxWEmccwd22Uw5q1YZuNm2A7
 AlndyRnJtw9u6zfnvkNd/Fj/9v7QdZquuluSKQYoHe4ZPS3+GJs=
 =OBzT
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-07-07' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Lots of fixes, mostly i915 and amdgpu. It's two weeks of i915, and I
  think three weeks of amdgpu.

  fbdev:
   - Fix module infos on sparc

  panel:
   - Fix mode on Starry-ili9882t

  i915:
   - Allow DC states along with PW2 only for PWB functionality [adlp+]
   - Fix SSC selection for MPLLA [mtl]
   - Use hw.adjusted mode when calculating io/fast wake times [psr]
   - Apply min softlimit correctly [guc/slpc]
   - Assign correct hdcp content type [hdcp]
   - Add missing forward declarations/includes to display power headers
   - Fix BDW PSR AUX CH data register offsets [psr]
   - Use mock device info for creating mock device

  amdgpu:
   - Misc cleanups
   - GFX 9.4.3 fixes
   - DEBUGFS build fix
   - Fix LPDDR5 reporting
   - ASPM fixes
   - DCN 3.1.4 fixes
   - DP MST fixes
   - DCN 3.2.x fixes
   - Display PSR TCON fixes
   - SMU 13.x fixes
   - RAS fixes
   - Vega12/20 SMU fixes
   - PSP flashing cleanup
   - GFX9 MCBP fixes
   - SR-IOV fixes
   - GPUVM clear mappings fix for always valid BOs
   - Add FAMS quirk for problematic monitor
   - Fix possible UAF
   - Better handle monentary temperature fluctuations
   - SDMA 4.4.2 fixes
   - Fencing fix"

* tag 'drm-next-2023-07-07' of git://anongit.freedesktop.org/drm/drm: (83 commits)
  drm/i915: use mock device info for creating mock device
  drm/i915/psr: Fix BDW PSR AUX CH data register offsets
  drm/amdgpu: Fix potential fence use-after-free v2
  drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation
  drm/amd/pm: expose swctf threshold setting for legacy powerplay
  drm/amd/display: 3.2.241
  drm/amd/display: Take full update path if number of planes changed
  drm/amd/display: Create debugging mechanism for Gaming FAMS
  drm/amd/display: Add monitor specific edid quirk
  drm/amd/display: For new fast update path, loop through each surface
  drm/amd/display: Remove Phantom Pipe Check When Calculating K1 and K2
  drm/amd/display: Limit new fast update path to addr and gamma / color
  drm/amd/display: Fix the delta clamping for shaper LUT
  drm/amdgpu: Keep non-psp path for partition switch
  drm/amd/display: program DPP shaper and 3D LUT if updated
  Revert "drm/amd/display: edp do not add non-edid timings"
  drm/amdgpu: share drm device for pci amdgpu device with 1st partition device
  drm/amd/pm: Add GFX v9.4.3 unique id to sysfs
  drm/amd/pm: Enable pp_feature attribute
  drm/amdgpu/vcn: Need to unpause dpg before stop dpg
  ...
2023-07-06 22:42:54 -07:00
shanzhulig
2e54154b9f drm/amdgpu: Fix potential fence use-after-free v2
fence Decrements the reference count before exiting.
Avoid Race Vulnerabilities for fence use-after-free.

v2 (chk): actually fix the use after free and not just move it.

Signed-off-by: shanzhulig <shanzhulig@gmail.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:16 -04:00
Evan Quan
b75efe88b2 drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation
An intentional delay is added on soft ctf triggered. Then there will
be a double check for the GPU temperature before taking further
action. This can avoid unintended shutdown due to temperature
momentary fluctuation.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:16 -04:00
Lijo Lazar
a28eb4871a drm/amdgpu: Keep non-psp path for partition switch
When PSP block is not present, use direct programming.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:15 -04:00
James Zhu
150c213139 drm/amdgpu: share drm device for pci amdgpu device with 1st partition device
To save render node resoure, share drm device setting for pci amdgpu
device with 1st XCP partition device.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:15 -04:00
Emily Deng
803f31814f drm/amdgpu/vcn: Need to unpause dpg before stop dpg
Need to unpause dpg first, or it will hit follow error during stop dpg:
"[drm] Register(1) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000000n"

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:15 -04:00
Le Ma
67af691626 drm/amdgpu: remove duplicated doorbell range init for sdma v4.4.2
Handled in earlier phase

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:15 -04:00
YiPeng Chai
2c7cd280e5 drm/amdgpu: gpu recovers from fatal error in poison mode
Fatal error occurs in ras poison mode, mode1 reset
is used to recover gpu.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:15 -04:00
Alex Deucher
50a7c8765c drm/amdgpu: enable mcbp by default on gfx9
It's required for high priority queues.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2535
Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:15 -04:00
Alex Deucher
02ff519e99 drm/amdgpu: make mcbp a per device setting
So we can selectively enable it on certain devices.  No
intended functional change.

Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:14 -04:00
Mario Limonciello
5efe0f3eed drm/amd: Don't initialize PSP twice for Navi3x
PSP functions are already set by psp_early_init() so initializing
them a second time is unnecessary.
No intended functional changes.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:12:14 -04:00
Zhigang Luo
2036b34d4a drm/amdgpu: port SRIOV VF missed changes
port SRIOV VF missed changes from gfx_v9_0 to gfx_v9_4_3.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:11:35 -04:00
Mario Limonciello
5c6d52ff4b drm/amd: Don't try to enable secure display TA multiple times
If the securedisplay TA failed to load the first time, it's unlikely
to work again after a suspend/resume cycle or reset cycle and it appears
to be causing problems in futher attempts.

Fixes: e42dfa66d5 ("drm/amdgpu: Add secure display TA load for Renoir")
Reported-by: Filip Hejsek <filip.hejsek@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2633
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:11:35 -04:00
Christian König
570b295248 drm/amdgpu: fix number of fence calculations
Since adding gang submit we need to take the gang size into account
while reserving fences.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 4624459c84 ("drm/amdgpu: add gang submit frontend v6")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:11:35 -04:00
Lijo Lazar
b579ea632f drm/amdgpu: Modify for_each_inst macro
Modify it such that it doesn't change the instance mask parameter.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Victor Skvortsov <victor.skvortsov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:11:35 -04:00
Mangesh Gadre
7f03b1d14d drm/amdgpu:Remove sdma halt/unhalt during frontdoor load
sdma halt/unhalt is performed by psp when frontdoor
loading used,so this can be skipped.

v2: Instead of removing halt/unhalt completely,
    driver will do it only during backdoor load.

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:11:35 -04:00
Tao Zhou
4ff96bcc0d drm/amdgpu: check RAS irq existence for VCN/JPEG
No RAS irq is allowed.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:11:35 -04:00
Xiaogang Chen
1d7776cc14 drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute
Since we allow kfd and graphic operate on same GPU VM to have interoperation
between them GPU VM may have been used by graphic vm operations before kfd turns
a GPU VM into a compute VM. Remove vm clean checking at amdgpu_vm_make_compute.

Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-30 13:11:35 -04:00
Linus Torvalds
1b722407a1 drm changes for 6.5-rc1:
core:
 - replace strlcpy with strscpy
 - EDID changes to support further conversion to struct drm_edid
 - Move i915 DSC parameter code to common DRM helpers
 - Add Colorspace functionality
 
 aperture:
 - ignore framebuffers with non-primary devices
 
 fbdev:
 - use fbdev i/o helpers
 - add Kconfig options for fb_ops helpers
 - use new fb io helpers directly in drivers
 
 sysfs:
 - export DRM connector ID
 
 scheduler:
 - Avoid an infinite loop
 
 ttm:
 - store function table in .rodata
 - Add query for TTM mem limit
 - Add NUMA awareness to pools
 - Export ttm_pool_fini()
 
 bridge:
 - fsl-ldb: support i.MX6SX
 - lt9211, lt9611: remove blanking packets
 - tc358768: implement input bus formats, devm cleanups
 - ti-snd65dsi86: implement wait_hpd_asserted
 - analogix: fix endless probe loop
 - samsung-dsim: support swapped clock, fix enabling, support var clock
 - display-connector: Add support for external power supply
 - imx: Fix module linking
 - tc358762: Support reset GPIO
 
 panel:
 - nt36523: Support Lenovo J606F
 - st7703: Support Anbernic RG353V-V2
 - InnoLux G070ACE-L01 support
 - boe-tv101wum-nl6: Improve initialization
 - sharp-ls043t1le001: Mode fixes
 - simple: BOE EV121WXM-N10-1850, S6D7AA0
 - Ampire AM-800480L1TMQW-T00H
 - Rocktech RK043FN48H
 - Starry himax83102-j02
 - Starry ili9882t
 
 amdgpu:
 - add new ctx query flag to handle reset better
 - add new query/set shadow buffer for rdna3
 - DCN 3.2/3.1.x/3.0.x updates
 - Enable DC_FP on loongarch
 - PCIe fix for RDNA2
 - improve DC FAMS/SubVP support for better power management
 - partition support for lots of engines
 - Take NUMA into account when allocating memory
 - Add new DRM_AMDGPU_WERROR config parameter to help with CI
 - Initial SMU13 overdrive support
 - Add support for new colorspace KMS API
 - W=1 fixes
 
 amdkfd:
 - Query TTM mem limit rather than hardcoding it
 - GC 9.4.3 partition support
 - Handle NUMA for partitions
 - Add debugger interface for enabling gdb
 - Add KFD event age tracking
 
 radeon:
 - Fix possible UAF
 
 i915:
 - new getparam for PXP support
 - GSC/MEI proxy driver
 - Meteorlake display enablement
 - avoid clearing preallocated framebuffers with TTM
 - implement framebuffer mmap support
 - Disable sampler indirect state in bindless heap
 - Enable fdinfo for GuC backends
 - GuC loading and firmware table handling fixes
 - Various refactors for multi-tile enablement
 - Define MOCS and PAT tables for MTL
 - GSC/MEI support for Meteorlake
 - PMU multi-tile support
 - Large driver kernel doc cleanup
 - Allow VRR toggling and arbitrary refresh rates
 - Support async flips on linear buffers on display ver 12+
 - Expose CRTC CTM property on ILK/SNB/VLV
 - New debugfs for display clock frequencies
 - Hotplug refactoring
 - Display refactoring
 - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
 - Use large rings for compute contexts
 - HuC loading for MTL
 - Allow user to set cache at BO creation
 - MTL powermanagement enhancements
 - Switch to dedicated workqueues to stop using flush_scheduled_work()
 - Move display runtime init under display/
 - Remove 10bit gamma on desktop gen3 parts, they don't support it
 
 habanalabs:
 - uapi: return 0 for user queries if there was a h/w or f/w error
 - Add pci health check when we lose connection with the firmware. This can be used to
   distinguish between pci link down and firmware getting stuck.
 - Add more info to the error print when TPC interrupt occur.
 - Firmware fixes
 
 msm:
 - Adreno A660 bindings
 - SM8350 MDSS bindings fix
 - Added support for DPU on sm6350 and sm6375 platforms
 - Implemented tearcheck support to support vsync on SM150 and newer platforms
 - Enabled missing features (DSPP, DSC, split display) on sc8180x, sc8280xp, sm8450
 - Added support for DSI and 28nm DSI PHY on MSM8226 platform
 - Added support for DSI on sm6350 and sm6375 platforms
 - Added support for display controller on MSM8226 platform
 - A690 GPU support
 - Move cmdstream dumping out of fence signaling path
 - a610 support
 - Support for a6xx devices without GMU
 
 nouveau:
 - NULL ptr before deref fixes
 
 armada:
 - implement fbdev emulation as client
 
 sun4i:
 - fix mipi-dsi dotclock
 - release clocks
 
 vc4:
 - rgb range toggle property
 - BT601 / BT2020 HDMI support
 
 vkms:
 - convert to drmm helpers
 - add reflection and rotation support
 - fix rgb565 conversion
 
 gma500:
 - fix iomem access
 
 shmobile:
 - support renesas soc platform
 - enable fbdev
 
 mxsfb:
 - Add support for i.MX93 LCDIF
 
 stm:
 - dsi: Use devm_ helper
 - ltdc: Fix potential invalid pointer deref
 
 renesas:
 - Group drivers in renesas subdirectory to prepare for new platform
 - Drop deprecated R-Car H3 ES1.x support
 
 meson:
 - Add support for MIPI DSI displays
 
 virtio:
 - add sync object support
 
 mediatek:
 - Add display binding document for MT6795
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmSc3UwACgkQDHTzWXnE
 hr69fQ/+PF9L7FSB/qfjaoqJnk6wJyCehv7pDX2/UK7FUrW0e4EwNVx4KKIRqO/P
 pKSU9wRlC72ViGgqOYnw0pwzuh45630vWo1stbgxipU2cvM6Ywlq8FiQFdymFe+P
 tLYWe5MR55Y+E9Y+bCrKn2yvQ7v+f6EZ6ITIX7mrXL77Bpxhv58VzmZawkxmw5MV
 vwhSqJaaeeWNoyfSIDdN8Oj9fE6ScTyiA0YisOP6jnK/TiQofXQxFrMIdKctCcoA
 HjolfEEPVCDOSBipkV3hLiyN8lXmt47BmuHp9opSL/g1aASteVeD1/GrccTaA4xV
 ah+Jx1hBLcH5sm8CZzbCcHhNu3ILnPCFZFCx8gwflQqmDIOZvoMdL75j7lgqJZG8
 TePEiifG3kYO/ZiDc5TUBdeMfbgeehPOsxbvOlA3LxJrgyxe/5o9oejX2Uvvzhoq
 9fno1PLqeCILqYaMiCocJwyTw/2VKYCCH7Wiypd4o3h0nmAbbqPT3KeZgNOjoa2X
 GXpiIU9rTQ8LZgSmOXdCt2rc9Jb6q+eCiDgrZzAukbP8veQyOvO16Nx1+XzLhOYc
 BfjEOoA7nBJD+UPLWkwj42gKtoEWN7IOMTHgcK11d8jdpGISGupl/1nntGhYk0jO
 +3RRZXMB/Gjwe9ge4K9bFC81pbfuAE7ELQtPsgV9LapMmWHKccY=
 =FmUA
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "There is one set of patches to misc for a i915 gsc/mei proxy driver.

  Otherwise it's mostly amdgpu/i915/msm, lots of hw enablement and lots
  of refactoring.

  core:
   - replace strlcpy with strscpy
   - EDID changes to support further conversion to struct drm_edid
   - Move i915 DSC parameter code to common DRM helpers
   - Add Colorspace functionality

  aperture:
   - ignore framebuffers with non-primary devices

  fbdev:
   - use fbdev i/o helpers
   - add Kconfig options for fb_ops helpers
   - use new fb io helpers directly in drivers

  sysfs:
   - export DRM connector ID

  scheduler:
   - Avoid an infinite loop

  ttm:
   - store function table in .rodata
   - Add query for TTM mem limit
   - Add NUMA awareness to pools
   - Export ttm_pool_fini()

  bridge:
   - fsl-ldb: support i.MX6SX
   - lt9211, lt9611: remove blanking packets
   - tc358768: implement input bus formats, devm cleanups
   - ti-snd65dsi86: implement wait_hpd_asserted
   - analogix: fix endless probe loop
   - samsung-dsim: support swapped clock, fix enabling, support var
     clock
   - display-connector: Add support for external power supply
   - imx: Fix module linking
   - tc358762: Support reset GPIO

  panel:
   - nt36523: Support Lenovo J606F
   - st7703: Support Anbernic RG353V-V2
   - InnoLux G070ACE-L01 support
   - boe-tv101wum-nl6: Improve initialization
   - sharp-ls043t1le001: Mode fixes
   - simple: BOE EV121WXM-N10-1850, S6D7AA0
   - Ampire AM-800480L1TMQW-T00H
   - Rocktech RK043FN48H
   - Starry himax83102-j02
   - Starry ili9882t

  amdgpu:
   - add new ctx query flag to handle reset better
   - add new query/set shadow buffer for rdna3
   - DCN 3.2/3.1.x/3.0.x updates
   - Enable DC_FP on loongarch
   - PCIe fix for RDNA2
   - improve DC FAMS/SubVP support for better power management
   - partition support for lots of engines
   - Take NUMA into account when allocating memory
   - Add new DRM_AMDGPU_WERROR config parameter to help with CI
   - Initial SMU13 overdrive support
   - Add support for new colorspace KMS API
   - W=1 fixes

  amdkfd:
   - Query TTM mem limit rather than hardcoding it
   - GC 9.4.3 partition support
   - Handle NUMA for partitions
   - Add debugger interface for enabling gdb
   - Add KFD event age tracking

  radeon:
   - Fix possible UAF

  i915:
   - new getparam for PXP support
   - GSC/MEI proxy driver
   - Meteorlake display enablement
   - avoid clearing preallocated framebuffers with TTM
   - implement framebuffer mmap support
   - Disable sampler indirect state in bindless heap
   - Enable fdinfo for GuC backends
   - GuC loading and firmware table handling fixes
   - Various refactors for multi-tile enablement
   - Define MOCS and PAT tables for MTL
   - GSC/MEI support for Meteorlake
   - PMU multi-tile support
   - Large driver kernel doc cleanup
   - Allow VRR toggling and arbitrary refresh rates
   - Support async flips on linear buffers on display ver 12+
   - Expose CRTC CTM property on ILK/SNB/VLV
   - New debugfs for display clock frequencies
   - Hotplug refactoring
   - Display refactoring
   - I915_GEM_CREATE_EXT_SET_PAT for Mesa on Meteorlake
   - Use large rings for compute contexts
   - HuC loading for MTL
   - Allow user to set cache at BO creation
   - MTL powermanagement enhancements
   - Switch to dedicated workqueues to stop using flush_scheduled_work()
   - Move display runtime init under display/
   - Remove 10bit gamma on desktop gen3 parts, they don't support it

  habanalabs:
   - uapi: return 0 for user queries if there was a h/w or f/w error
   - Add pci health check when we lose connection with the firmware.
     This can be used to distinguish between pci link down and firmware
     getting stuck.
   - Add more info to the error print when TPC interrupt occur.
   - Firmware fixes

  msm:
   - Adreno A660 bindings
   - SM8350 MDSS bindings fix
   - Added support for DPU on sm6350 and sm6375 platforms
   - Implemented tearcheck support to support vsync on SM150 and newer
     platforms
   - Enabled missing features (DSPP, DSC, split display) on sc8180x,
     sc8280xp, sm8450
   - Added support for DSI and 28nm DSI PHY on MSM8226 platform
   - Added support for DSI on sm6350 and sm6375 platforms
   - Added support for display controller on MSM8226 platform
   - A690 GPU support
   - Move cmdstream dumping out of fence signaling path
   - a610 support
   - Support for a6xx devices without GMU

  nouveau:
   - NULL ptr before deref fixes

  armada:
   - implement fbdev emulation as client

  sun4i:
   - fix mipi-dsi dotclock
   - release clocks

  vc4:
   - rgb range toggle property
   - BT601 / BT2020 HDMI support

  vkms:
   - convert to drmm helpers
   - add reflection and rotation support
   - fix rgb565 conversion

  gma500:
   - fix iomem access

  shmobile:
   - support renesas soc platform
   - enable fbdev

  mxsfb:
   - Add support for i.MX93 LCDIF

  stm:
   - dsi: Use devm_ helper
   - ltdc: Fix potential invalid pointer deref

  renesas:
   - Group drivers in renesas subdirectory to prepare for new platform
   - Drop deprecated R-Car H3 ES1.x support

  meson:
   - Add support for MIPI DSI displays

  virtio:
   - add sync object support

  mediatek:
   - Add display binding document for MT6795"

* tag 'drm-next-2023-06-29' of git://anongit.freedesktop.org/drm/drm: (1791 commits)
  drm/i915: Fix a NULL vs IS_ERR() bug
  drm/i915: make i915_drm_client_fdinfo() reference conditional again
  drm/i915/huc: Fix missing error code in intel_huc_init()
  drm/i915/gsc: take a wakeref for the proxy-init-completion check
  drm/msm/a6xx: Add A610 speedbin support
  drm/msm/a6xx: Add A619_holi speedbin support
  drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
  drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
  drm/msm/a6xx: Fix some A619 tunables
  drm/msm/a6xx: Add A610 support
  drm/msm/a6xx: Add support for A619_holi
  drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
  drm/msm/a6xx: Introduce GMU wrapper support
  drm/msm/a6xx: Move CX GMU power counter enablement to hw_init
  drm/msm/a6xx: Extend and explain UBWC config
  drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
  drm/msm/a6xx: Add a helper for software-resetting the GPU
  drm/msm/a6xx: Improve a6xx_bus_clear_pending_transactions()
  drm/msm/a6xx: Move a6xx_bus_clear_pending_transactions to a6xx_gpu
  drm/msm/a6xx: Move force keepalive vote removal to a6xx_gmu_force_off()
  ...
2023-06-29 11:00:17 -07:00
Linus Torvalds
582c161cf3 hardening updates for v6.5-rc1
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
 
 - Convert strreplace() to return string start (Andy Shevchenko)
 
 - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
 
 - Add missing function prototypes seen with W=1 (Arnd Bergmann)
 
 - Fix strscpy() kerndoc typo (Arne Welzel)
 
 - Replace strlcpy() with strscpy() across many subsystems which were
   either Acked by respective maintainers or were trivial changes that
   went ignored for multiple weeks (Azeem Shaikh)
 
 - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
 
 - Add KUnit tests for strcat()-family
 
 - Enable KUnit tests of FORTIFY wrappers under UML
 
 - Add more complete FORTIFY protections for strlcat()
 
 - Add missed disabling of FORTIFY for all arch purgatories.
 
 - Enable -fstrict-flex-arrays=3 globally
 
 - Tightening UBSAN_BOUNDS when using GCC
 
 - Improve checkpatch to check for strcpy, strncpy, and fake flex arrays
 
 - Improve use of const variables in FORTIFY
 
 - Add requested struct_size_t() helper for types not pointers
 
 - Add __counted_by macro for annotating flexible array size members
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSbftQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj0MD/9X9jzJzCmsAU+yNldeoAzC84Sk
 GVU3RBxGcTNysL1gZXynkIgigw7DWc4htMGeSABHHwQRVP65JCH1Kw/VqIkyumbx
 9LdX6IklMJb4pRT4PVU3azebV4eNmSjlur2UxMeW54Czm91/6I8RHbJOyAPnOUmo
 2oomGdP/hpEHtKR7hgy8Axc6w5ySwQixh2V5sVZG3VbvCS5WKTmTXbs6puuRT5hz
 iHt7v+7VtEg/Qf1W7J2oxfoghvVBsaRrSLrExWT/oZYh1ZxM7DsCAAoG/IsDgHGA
 9LBXiRECgAFThbHVxLvvKZQMXdVk0i8iXLX43XMKC0wTA+NTyH7wlcQQ4RWNMuo8
 sfA9Qm9gMArXaf64aymr3Uwn20Zan0391HdlbhOJZAE6v3PPJbleUnM58AzD2d3r
 5Lz6AIFBxDImy+3f9iDWgacCT5/PkeiXTHzk9QnKhJyKKtRA58XJxj4q2+rPnGJP
 n4haXqoxD5FJbxdXiGKk31RS0U5HBug7wkOcUrTqDHUbc/QNU2b7dxTKUx+zYtCU
 uV5emPzpF4H4z+91WpO47n9gkMAfwV0lt9S2dwS8pxsgqctbmIan+Jgip7rsqZ2G
 OgLXBsb43eEs+6WgO8tVt/ZHYj9ivGMdrcNcsIfikzNs/xweUJ53k2xSEn2xEa5J
 cwANDmkL6QQK7yfeeg==
 =s0j1
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "There are three areas of note:

  A bunch of strlcpy()->strscpy() conversions ended up living in my tree
  since they were either Acked by maintainers for me to carry, or got
  ignored for multiple weeks (and were trivial changes).

  The compiler option '-fstrict-flex-arrays=3' has been enabled
  globally, and has been in -next for the entire devel cycle. This
  changes compiler diagnostics (though mainly just -Warray-bounds which
  is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
  coverage. In other words, there are no new restrictions, just
  potentially new warnings. Any new FORTIFY warnings we've seen have
  been fixed (usually in their respective subsystem trees). For more
  details, see commit df8fc4e934.

  The under-development compiler attribute __counted_by has been added
  so that we can start annotating flexible array members with their
  associated structure member that tracks the count of flexible array
  elements at run-time. It is possible (likely?) that the exact syntax
  of the attribute will change before it is finalized, but GCC and Clang
  are working together to sort it out. Any changes can be made to the
  macro while we continue to add annotations.

  As an example of that last case, I have a treewide commit waiting with
  such annotations found via Coccinelle:

    https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b

  Also see commit dd06e72e68 for more details.

  Summary:

   - Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)

   - Convert strreplace() to return string start (Andy Shevchenko)

   - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)

   - Add missing function prototypes seen with W=1 (Arnd Bergmann)

   - Fix strscpy() kerndoc typo (Arne Welzel)

   - Replace strlcpy() with strscpy() across many subsystems which were
     either Acked by respective maintainers or were trivial changes that
     went ignored for multiple weeks (Azeem Shaikh)

   - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)

   - Add KUnit tests for strcat()-family

   - Enable KUnit tests of FORTIFY wrappers under UML

   - Add more complete FORTIFY protections for strlcat()

   - Add missed disabling of FORTIFY for all arch purgatories.

   - Enable -fstrict-flex-arrays=3 globally

   - Tightening UBSAN_BOUNDS when using GCC

   - Improve checkpatch to check for strcpy, strncpy, and fake flex
     arrays

   - Improve use of const variables in FORTIFY

   - Add requested struct_size_t() helper for types not pointers

   - Add __counted_by macro for annotating flexible array size members"

* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
  netfilter: ipset: Replace strlcpy with strscpy
  uml: Replace strlcpy with strscpy
  um: Use HOST_DIR for mrproper
  kallsyms: Replace all non-returning strlcpy with strscpy
  sh: Replace all non-returning strlcpy with strscpy
  of/flattree: Replace all non-returning strlcpy with strscpy
  sparc64: Replace all non-returning strlcpy with strscpy
  Hexagon: Replace all non-returning strlcpy with strscpy
  kobject: Use return value of strreplace()
  lib/string_helpers: Change returned value of the strreplace()
  jbd2: Avoid printing outside the boundary of the buffer
  checkpatch: Check for 0-length and 1-element arrays
  riscv/purgatory: Do not use fortified string functions
  s390/purgatory: Do not use fortified string functions
  x86/purgatory: Do not use fortified string functions
  acpi: Replace struct acpi_table_slit 1-element array with flex-array
  clocksource: Replace all non-returning strlcpy with strscpy
  string: use __builtin_memcpy() in strlcpy/strlcat
  staging: most: Replace all non-returning strlcpy with strscpy
  drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
  ...
2023-06-27 21:24:18 -07:00
Thomas Zimmermann
71e801b9b4 drm: Clear fd/handle callbacks in struct drm_driver
Clear all assignments of struct drm_driver's fd/handle callbacks to
drm_gem_prime_fd_to_handle() and drm_gem_prime_handle_to_fd(). These
functions are called by default. Add a TODO item to convert vmwgfx
to the defaults as well.

v2:
	* remove TODO item (Zack)
	* also update amdgpu's amdgpu_partition_driver

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simon Ser <contact@emersion.fr>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Jeffrey Hugo <quic_jhugo@quicinc.com> # qaic
Link: https://patchwork.freedesktop.org/patch/msgid/20230620080252.16368-3-tzimmermann@suse.de
2023-06-26 11:08:41 +02:00
Evan Quan
1af3d0a8e8 drm/amd/pm: update the LC_L1_INACTIVITY setting to address possible noise issue
It is proved that insufficient LC_L1_INACTIVITY setting can cause audio
noise on some platform. With the LC_L1_INACTIVITY increased to 4ms, the
issue can be resolved.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:46:22 -04:00
Zhigang Luo
acbe761046 drm/amdgpu: Skip TMR for MP0_HWIP 13.0.6
For SRIOV VF, no TMR needed.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:36:48 -04:00
Evan Quan
fd21987274 drm/amd/pm: revise the ASPM settings for thunderbolt attached scenario
Also, correct the comment for NAVI10_PCIE__LC_L1_INACTIVITY_TBT_DEFAULT
as 0x0000000E stands for 400ms instead of 4ms.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:36:03 -04:00
Samuel Pitoiset
ea2c3c0855 drm/amdgpu: fix clearing mappings for BOs that are always valid in VM
Per VM BOs must be marked as moved or otherwise their ranges are not
updated on use which might be necessary when the replace operation
splits mappings.

This fixes random GPU hangs when replacing sparse mappings from the
userspace, while OP_MAP/OP_UNMAP works fine because always valid BOs
are correctly handled there.

Cc: stable@vger.kernel.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:35:51 -04:00
Lijo Lazar
184d838482 drm/amdgpu: Add vbios attribute only if supported
Not all devices carry VBIOS version information. Add the device
attribute only if supported.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:35:26 -04:00
Alex Deucher
c09b3bf736 drm/amdgpu/atomfirmware: fix LPDDR5 width reporting
LPDDR5 channels are 32 bit rather than 64, report the width properly
in the log.

v2: Only LPDDR5 are 32 bits per channel.  DDR5 is 64 bits per channel

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2468
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:35:16 -04:00
Nathan Chancellor
4e3f85d1c0 drm/amdgpu: Remove CONFIG_DEBUG_FS guard around body of amdgpu_rap_debugfs_init()
After commit 8020f0f931 ("drm/amd/amdgpu: enable W=1 for amdgpu"),
there is an instance of -Wunused-const-variable when CONFIG_DEBUG_FS is
disabled:

  drivers/gpu/drm/amd/amdgpu/amdgpu_rap.c:110:37: error: unused variable 'amdgpu_rap_debugfs_ops' [-Werror,-Wunused-const-variable]
    110 | static const struct file_operations amdgpu_rap_debugfs_ops = {
        |                                     ^
  1 error generated.

There is no reason for the body of this function to be guarded when
CONFIG_DEBUG_FS is disabled, as debugfs_create_file() is a stub that
just returns an error pointer in that situation. Remove the preprocessor
guards so that the variable never appears unused, while not changing
anything at run time.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:33:09 -04:00
Lijo Lazar
025654ae42 drm/amdgpu: Move calculation of xcp per memory node
Its value is required for finding the memory id of xcp.

Fixes: d26ea1b346 ("drm/amdgpu: Add xcp manager num_xcp_per_mem_partition")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:31:55 -04:00
Jiadong Zhu
ef3c36a6e0 drm/amdgpu: Skip mark offset for high priority rings
Only low priority rings are using chunks to save the offset.
Bypass the mark offset callings from high priority rings.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:31:50 -04:00
Thomas Zimmermann
042aeecc02 drm/amdgpu: Remove struct drm_driver.gem_prime_mmap
The callback struct drm_driver.gem_prime_mmap as been removed in
commit 0adec22702 ("drm: Remove struct drm_driver.gem_prime_mmap").
Do not assign to it. The assigned function, drm_gem_prime_mmap(), is
now the default for the operation, so there is no change in functionality.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 0adec22702 ("drm: Remove struct drm_driver.gem_prime_mmap")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230619141129.2002-1-tzimmermann@suse.de
2023-06-19 16:35:27 +02:00
Thomas Zimmermann
de8a334f21 Merge drm/drm-next into drm-misc-next
Backmerging into drm-misc-next to get commit 2c1c7ba457
("drm/amdgpu: support partition drm devices"), which is required to fix
commit 0adec22702 ("drm: Remove struct drm_driver.gem_prime_mmap").

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-06-19 16:33:14 +02:00
Thomas Zimmermann
0adec22702 drm: Remove struct drm_driver.gem_prime_mmap
All drivers initialize this field with drm_gem_prime_mmap(). Call
the function directly and remove the field. Simplifies the code and
resolves a long-standing TODO item.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230613150441.17720-3-tzimmermann@suse.de
2023-06-19 13:56:40 +02:00
Philip Yang
5d1c70bb6e drm/amdgpu: Increase hmm range get pages timeout
If hmm_range_fault returns -EBUSY, we should call hmm_range_fault again
to validate the remaining pages. On one system with NUMA auto balancing
enabled, hmm_range_fault takes 6 seconds for 1GB range because CPU
migrate the range one page at a time. To be safe, increase timeout value
to 1 second for 128MB range.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 17:55:41 -04:00
Philip Yang
d728eda3c5 drm/amdgpu: Enable translate further for GC v9.4.3
To extend UTCL2 reach.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 17:55:41 -04:00
Lijo Lazar
0e41639d9a drm/amdgpu: Remove unused NBIO interface
Set compute partition mode interface in NBIO is no longer used. Remove
the only implementation from NBIO v7.9

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 14:33:56 -04:00
ZhenGuo Yin
71eaac368d drm/amdgpu: add entity error check in amdgpu_ctx_get_entity
[Why]
UMD is not aware of entity error, and will keep submitting jobs
into the error entity.

[How]
Add entity error check when getting entity from ctx.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
f88e295e90 drm/amdgpu: add VM generation token
Instead of using the VRAM lost counter add a 64bit token which indicates
if a context or job is still valid to use.

Should the VRAM be lost or the page tables need re-creation the token will
change indicating that userspace needs to act and re-create the contexts
and re-submit the work.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
55bf196f60 drm/amdgpu: reset VM when an error is detected
When some problem with the updates of page tables is detected reset the
state machine of the VM and re-create all page tables from scratch.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
e84e697d92 drm/amdgpu: abort submissions during prepare on error
Forward errors from previous submissions to this one.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
89fae8dc41 drm/amdgpu: mark soft recovered fences with -ENODATA
Set the fence error code before trying to soft-recover it.

It gets overwritten when a hard recovery is required.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
0a33b11d26 drm/amdgpu: mark force completed fences with -ECANCELED
When we force complete fences we should mark them as canceled.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
b13eb02ba8 drm/amdgpu: add amdgpu_error_* debugfs file
This allows us to insert some error codes into the bottom of the pipeline
on an engine.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:54 -04:00
Alex Deucher
2eb841bdbc drm/amdgpu: mark GC 9.4.3 experimental for now
Mark as experimental for now until we get closer to production
to avoid possible undesireable behavior when mixing newer
boards with older kernels.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:54 -04:00
Lijo Lazar
b00f55374c drm/amdgpu: Use PSP FW API for partition switch
Use PSP firmware interface for switching compute partitions.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Lijo Lazar
fe381726c9 drm/amdgpu: Change nbio v7.9 xcp status definition
PARTITION_MODE field in PARTITION_COMPUTE_STATUS register is defined as
below by firmware.

	SPX = 0, DPX = 1, TPX = 2, QPX = 3, CPX = 4

Change driver definition accordingly.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Christian König
ca0b954a43 drm/amdgpu: make sure that BOs have a backing store
It's perfectly possible that the BO is about to be destroyed and doesn't
have a backing store associated with it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Christian König
e2ad8e2df4 drm/amdgpu: make sure BOs are locked in amdgpu_vm_get_memory
We need to grab the lock of the BO or otherwise can run into a crash
when we try to inspect the current location.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Stanley.Yang
43aedbf4da drm/amdgpu: Add checking mc_vram_size
Do not compare injection address with mc_vram_size
if mc_vram_size is zero.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Stanley.Yang
38298ce6fc drm/amdgpu: Optimize checking ras supported
Using "is_app_apu" to identify device in the native
APU mode or carveout mode.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Candice Li
6fac3964a9 drm/amdgpu: Add channel_dis_num to ras init flags
Add disabled channel number to ras init flags.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Candice Li
bcd9a5f8b9 drm/amdgpu: Update total channel number for umc v8_10
Update total channel number for umc v8_10.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Luben Tuikov
71344a718a drm/amdgpu: Fix usage of UMC fill record in RAS
The fixed commit listed in the Fixes tag below, introduced a bug in
amdgpu_ras.c::amdgpu_reserve_page_direct(), in that when introducing the new
amdgpu_umc_fill_error_record() and internally in that new function the physical
address (argument "uint64_t retired_page"--wrong name) is right-shifted by
AMDGPU_GPU_PAGE_SHIFT. Thus, in amdgpu_reserve_page_direct() when we pass
"address" to that new function, we should NOT right-shift it, since this
results, erroneously, in the page address to be 0 for first
2^(2*AMDGPU_GPU_PAGE_SHIFT) memory addresses.

This commit fixes this bug.

Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: 400013b268 ("drm/amdgpu: add umc_fill_error_record to make code more simple")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20230610113536.10621-1-luben.tuikov@amd.com
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Alex Deucher
e5df16d942 drm/amdgpu/sdma4: set align mask to 255
The wptr needs to be incremented at at least 64 dword intervals,
use 256 to align with windows.  This should fix potential hangs
with unaligned updates.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Luben Tuikov
740f42a28f drm/amdgpu: Report ras_num_recs in debugfs
Report the number of records stored in the RAS EEPROM table in debugfs.

This can be used by user-space to calculate the capacity of the RAS EEPROM
table since "bad_page_cnt_threshold" is also reported in the same place in
debugfs.

See commit 7fb6407145 ("drm/amdgpu: Add bad_page_cnt_threshold to debugfs").

ras_num_recs can already be inferred by dumping the RAS EEPROM table, also in
the same debugfs location, see commit reference c65b0805e7 (drm/amdgpu:
RAS EEPROM table is now in debugfs, 2021-04-08). This commit makes it an
integer value easily shown in a single file.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: John Clements <john.clements@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20230603051043.211548-1-luben.tuikov@amd.com
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Lijo Lazar
82a1f42f6a drm/amdgpu: Release SDMAv4.4.2 ecc irq properly
Release ECC irq only if irq is enabled - only when RAS feature is enabled
ECC irq gets enabled.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Likun Gao
d4a4ff1c8e drm/amdgpu: add wait_for helper for spirom update
Spirom update typically requires extremely long
duration for command execution, and special helper
function to wait for it completion.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Ma Jun
a1c23485b8 drm/amdgpu: Print client id for the unregistered interrupt resource
Modify the debug information and print the clien id for these
interrupts as well.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:57 -04:00
Arunpravin Paneer Selvam
59eddd4e21 Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system"
This reverts commit c105518679.

This patch disables the TOPDOWN flag for APU and few dGPU cards
which has the VRAM size equal to the BAR size.

When we enable the TOPDOWN flag, we get the free blocks at
the highest available memory region and we don't split the
lower order blocks. This change is required to keep off
the fragmentation related issues particularly in ASIC
which has VRAM space <= 500MiB

Hence, we are reverting this patch.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2270
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:43:36 -04:00
Shiwu Zhang
8d8ffe3740 drm/amdgpu: expose num_hops and num_links xgmi info through dev attr
Add these two dev attrs for xgmi info details which is helpful for
developers checking the xgmi topology by catting the sys file directly.

Take 4 cards with xgmi connection as an example, get the num_hops for each
device or node through xmig_hive_info dir like,
cat /sys/bus/pci/devices/0000:41:00.0/xgmi_hive_info/node1/num_hops
will return "00 41 41 41" where "00" stands for the hops to node1 itself
and "41" is the hops in hex format to every other node in the same hive.
There are node1/node2/node3/node4 representing 4 cards in the hive.

The same for num_links dev attr.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:43:31 -04:00
Sonny Jiang
188d3f80fc drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1
Only vcn0 can process AV1 codecx. In order to use both vcn0 and
vcn1 in h264/265 transcode to AV1 cases, set vcn0 sched score to 1
at initialization time.

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:43:26 -04:00
Hamza Mahfooz
8020f0f931 drm/amd/amdgpu: enable W=1 for amdgpu
We have a clean build with W=1 as of
commit c168feed5d ("drm/amd/display/amdgpu_dm/amdgpu_dm_helpers: Move
SYNAPTICS_DEVICE_ID into CONFIG_DRM_AMD_DC_DCN ifdef"). So, let's enable
these checks unconditionally for the entire module to catch these errors
during development.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:42:42 -04:00
Srinivasan Shanmugam
e06da81749 drm/amdgpu: Fix kdoc warning
Fixes the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c:76: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * EEPROM Table structure v1
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c:98: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * EEPROM Table structrue v2.1

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:42:39 -04:00
Mukul Joshi
41ce6d6d03 drm/amdgpu: Rename DRM schedulers in amdgpu TTM
Rename mman.entity to mman.high_pr to make the distinction
clearer that this is a high priority scheduler. Similarly,
rename the recently added mman.delayed to mman.low_pr to
make it clear it is a low priority scheduler.
No functional change in this patch.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:42:33 -04:00
Dave Airlie
901bdf5ea1 Merge tag 'amd-drm-next-6.5-2023-06-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.5-2023-06-02:

amdgpu:
- SR-IOV fixes
- Warning fixes
- Misc code cleanups and spelling fixes
- DCN 3.2 updates
- Improved DC FAMS support for better power management
- Improved DC SubVP support for better power management
- DCN 3.1.x fixes
- Max IB size query
- DC GPU reset fixes
- RAS updates
- DCN 3.0.x fixes
- S/G display fixes
- CP shadow buffer support
- Implement connector force callback
- Z8 power improvements
- PSP 13.0.10 vbflash support
- Mode2 reset fixes
- Store MQDs in VRAM to improve queue switch latency
- VCN 3.x fixes
- JPEG 3.x fixes
- Enable DC_FP on LoongArch
- GFXOFF fixes
- GC 9.4.3 partition support
- SDMA 4.4.2 partition support
- VCN/JPEG 4.0.3 partition support
- VCN 4.0.3 updates
- NBIO 7.9 updates
- GC 9.4.3 updates
- Take NUMA into account when allocating memory
- Handle NUMA for partitions
- SMU 13.0.6 updates
- GC 9.4.3 RAS updates
- Stop including unused swiotlb.h
- SMU 13.0.7 fixes
- Fix clock output ordering on some APUs
- Clean up DC FPGA code
- GFX9 preemption fixes
- Misc irq fixes
- S0ix fixes
- Add new DRM_AMDGPU_WERROR config parameter to help with CI
- PCIe fix for RDNA2
- kdoc fixes
- Documentation updates

amdkfd:
- Query TTM mem limit rather than hardcoding it
- GC 9.4.3 partition support
- Handle NUMA for partitions

radeon:
- Fix possible double free
- Stop including unused swiotlb.h
- Fix possible division by zero

ttm:
- Add query for TTM mem limit
- Add NUMA awareness to pools
- Export ttm_pool_fini()

UAPI:
- Add new ctx query flag to better handle GPU resets
  Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290
- Add new interface to query and set shadow buffer for RDNA3
  Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21986
- Add new INFO query for max IB size
  Proposed userspace: https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/-/commits/ib-rejection-v3

amd-drm-next-6.5-2023-06-09:

amdgpu:
- S0ix fixes
- Initial SMU13 Overdrive support
- kdoc fixes
- Misc clode cleanups
- Flexible array fixes
- Display OTG fixes
- SMU 13.0.6 updates
- Revert some broken clock counter updates
- Misc display fixes
- GFX9 preemption fixes
- Add support for newer EEPROM bad page table format
- Add missing radeon secondary id
- Add support for new colorspace KMS API
- CSA fix
- Stable pstate fixes for APUs
- make vbl interface admin only
- Handle PCI accelerator class

amdkfd:
- Add debugger support for gdb

radeon:
- Fix possible UAF

drm:
- Add Colorspace functionality

UAPI:
- Add debugger interface for enabling gdb
  Proposed userspace: https://github.com/ROCm-Developer-Tools/ROCdbgapi/tree/wip-dbgapi
- Add KMS colorspace API
  Discussion: https://lists.freedesktop.org/archives/dri-devel/2023-June/408128.html

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230609174817.7764-1-alexander.deucher@amd.com
2023-06-15 14:11:22 +10:00
Arunpravin Paneer Selvam
34e5a54327 Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system"
This reverts commit c105518679.

This patch disables the TOPDOWN flag for APU and few dGPU cards
which has the VRAM size equal to the BAR size.

When we enable the TOPDOWN flag, we get the free blocks at
the highest available memory region and we don't split the
lower order blocks. This change is required to keep off
the fragmentation related issues particularly in ASIC
which has VRAM space <= 500MiB

Hence, we are reverting this patch.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2270
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-06-13 17:12:41 -04:00
Sonny Jiang
9db5ec1ceb drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1
Only vcn0 can process AV1 codecx. In order to use both vcn0 and
vcn1 in h264/265 transcode to AV1 cases, set vcn0 sched score to 1
at initialization time.

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-06-13 17:08:42 -04:00
Mario Limonciello
7ab1a4913d drm/amd: Tighten permissions on VBIOS flashing attributes
Non-root users shouldn't be able to try to trigger a VBIOS flash
or query the flashing status.  This should be reserved for users with the
appropriate permissions.

Cc: stable@vger.kernel.org
Fixes: 8424f2ccb3 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-13 17:04:51 -04:00
Mario Limonciello
3eb1a3a040 drm/amd: Make sure image is written to trigger VBIOS image update flow
The VBIOS image update flow requires userspace to:
1) Write the image to `psp_vbflash`
2) Read `psp_vbflash`
3) Poll `psp_vbflash_status` to check for completion

If userspace reads `psp_vbflash` before writing an image, it's
possible that it causes problems that can put the dGPU into an invalid
state.

Explicitly check that an image has been written before letting a read
succeed.

Cc: stable@vger.kernel.org
Fixes: 8424f2ccb3 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-13 17:04:45 -04:00
Alex Deucher
e61f67749b drm/amdgpu: add missing radeon secondary PCI ID
0x5b70 is a missing RV370 secondary id.  Add it so
we don't try and probe it with amdgpu.

Cc: michel@daenzer.net
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-06-13 17:02:51 -04:00
Jiadong Zhu
5b711e7f9c drm/amdgpu: Implement gfx9 patch functions for resubmission
Patch the packages including CONTEXT_CONTROL and WRITE_DATA for gfx9
during the resubmission scenario.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.3.x
2023-06-13 16:58:23 -04:00
Jiadong Zhu
87af86ae89 drm/amdgpu: Modify indirect buffer packages for resubmission
When the preempted IB frame resubmitted to cp, we need to modify the frame
data including:
1. set PRE_RESUME 1 in CONTEXT_CONTROL.
2. use meta data(DE and CE) read from CSA in WRITE_DATA.

Add functions to save the location the first time IBs emitted and callback
to patch the package when resubmission happens.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.3.x
2023-06-13 16:57:44 -04:00
Jiadong Zhu
94034b306d drm/amdgpu: Program gds backup address as zero if no gds allocated
It is firmware requirement to set gds_backup_addrlo and gds_backup_addrhi
of DE meta both zero if no gds partition is allocated for the frame.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.3.x
2023-06-13 16:57:14 -04:00
Jiadong Zhu
1dbcf770cc drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled
When MEC executes unmap_queue for mid command buffer preemption, it will
kick the write pointer of the gfx ring, set CP_VMID_PREEMPT to trigger the
preemption and wait for CP_VMID_PREEMPT becomes zero after the preemption
done. There is a race condition that PFP may excute the resetting command
before MEC set CP_VMID_PREEMPT. As a result, hang happens as
CP_VMID_PREEMPT is always 0xffff.

To avoid this, we send resetting CP_VMID_PREEMPT command after the trailing
fence is siganled and update gfx write pointer explicitly.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.3.x
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2535
2023-06-13 16:51:05 -04:00
Chong Li
80e709ee6e drm/amdgpu: add option params to enforce process isolation between graphics and compute
enforce process isolation between graphics and compute via using the same reserved vmid.

v2: remove params "struct amdgpu_vm *vm" from
    amdgpu_vmid_alloc_reserved and amdgpu_vmid_free_reserved.

Signed-off-by: Chong Li <chongli2@amd.com>
Reviewed-by: Christian Koenig <Christian.Koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:49:48 -04:00
Nathan Chancellor
4e70da985c drm/amdgpu: Wrap -Wunused-but-set-variable in cc-option
-Wunused-but-set-variable was only supported in clang starting with
13.0.0, so earlier versions will emit a warning, which is turned into a
hard error for the kernel to mirror GCC:

  error: unknown warning option '-Wunused-but-set-variable'; did you mean '-Wunused-const-variable'? [-Werror,-Wunknown-warning-option]

The minimum supported version of clang for building the kernel is
11.0.0, so match the rest of the kernel and wrap
-Wunused-but-set-variable in a cc-option call, so that it is only used
when supported by the compiler.

Closes: https://github.com/ClangBuiltLinux/linux/issues/1869
Fixes: 1b320ad3f5 ("drm/amd/amdgpu: introduce DRM_AMDGPU_WERROR")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:49:06 -04:00
Shiwu Zhang
9d65b1b4bc drm/amdgpu: add the accelerator PCIe class
Add the accelerator PCIe class and match the
class in amdgpu for 0x1002 devices of that class.

From PCI spec:
"PCI Code and ID Assignment, r1.9, sec 1, 1.19"

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>  # pci_ids.h
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:48:57 -04:00
Jonathan Kim
09d49e14ea drm/amdkfd: fix and enable debugging for gfx11
There are a couple of fixes required to enable gfx11 debugging.

First, ADD_QUEUE.trap_en is an inappropriate place to toggle
a per-process register so move it to SET_SHADER_DEBUGGER.trap_en.
When ADD_QUEUE.skip_process_ctx_clear is set, MES will prioritize
the SET_SHADER_DEBUGGER.trap_en setting.

Second, to preserve correct save/restore priviledged wave states
in coordination with the trap enablement setting, resume suspended
waves early in the disable call.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:48:19 -04:00
Mario Limonciello
fe56c6ee04 drm/amd: Tighten permissions on VBIOS flashing attributes
Non-root users shouldn't be able to try to trigger a VBIOS flash
or query the flashing status.  This should be reserved for users with the
appropriate permissions.

Cc: stable@vger.kernel.org
Fixes: 8424f2ccb3 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:48:10 -04:00
Mario Limonciello
3537d6a48c drm/amd: Make sure image is written to trigger VBIOS image update flow
The VBIOS image update flow requires userspace to:
1) Write the image to `psp_vbflash`
2) Read `psp_vbflash`
3) Poll `psp_vbflash_status` to check for completion

If userspace reads `psp_vbflash` before writing an image, it's
possible that it causes problems that can put the dGPU into an invalid
state.

Explicitly check that an image has been written before letting a read
succeed.

Cc: stable@vger.kernel.org
Fixes: 8424f2ccb3 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:47:50 -04:00
Yang Wang
cab69d36cc drm/amdgpu: skip to resume rlcg for gc 9.4.3 in vf side
skip to resume rlcg, because rlcg is already enabled in pf side.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:47:35 -04:00
Yang Wang
731b48463b drm/amdgpu: disable virtual display support on APP device
virtual display is not support on APP device.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Gavin Wan <gavin.wan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:47:31 -04:00
Lang Yu
5daff15cd0 drm/amdgpu: unmap and remove csa_va properly
Root PD BO should be reserved before unmap and remove
a bo_va from VM otherwise lockdep will complain.

v2: check fpriv->csa_va is not NULL instead of amdgpu_mcbp (christian)

[14616.936827] WARNING: CPU: 6 PID: 1711 at drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1762 amdgpu_vm_bo_del+0x399/0x3f0 [amdgpu]
[14616.937096] Call Trace:
[14616.937097]  <TASK>
[14616.937102]  amdgpu_driver_postclose_kms+0x249/0x2f0 [amdgpu]
[14616.937187]  drm_file_free+0x1d6/0x300 [drm]
[14616.937207]  drm_close_helper.isra.0+0x62/0x70 [drm]
[14616.937220]  drm_release+0x5e/0x100 [drm]
[14616.937234]  __fput+0x9f/0x280
[14616.937239]  ____fput+0xe/0x20
[14616.937241]  task_work_run+0x61/0x90
[14616.937246]  exit_to_user_mode_prepare+0x215/0x220
[14616.937251]  syscall_exit_to_user_mode+0x2a/0x60
[14616.937254]  do_syscall_64+0x48/0x90
[14616.937257]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:47:26 -04:00
Alex Deucher
c1ac2ea802 drm/amdgpu: add missing radeon secondary PCI ID
0x5b70 is a missing RV370 secondary id.  Add it so
we don't try and probe it with amdgpu.

Cc: michel@daenzer.net
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:49 -04:00
Stanley.Yang
0bc3137b21 drm/amdgpu: Set EEPROM ras info
Set EEPROM ras info: rma status, health percent and bad
page threshold.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:40 -04:00
Stanley.Yang
7c2551fa1d drm/amdgpu: Calculate EEPROM table ras info bytes sum
It's more reasonable to check EEPROM table ras info bytes.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:38 -04:00
Stanley.Yang
7f599fed3b drm/amdgpu: Add support EEPROM table v2.1
Add ras info to EEPROM table, app can analyse device ECC
status without GPU driver through EEPROM table ras info.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:34 -04:00
Stanley.Yang
b573cf88c0 drm/amdgpu: Support setting EEPROM table version
Add setting EEPROM table version interface for umcv8.10,
Add EEPROM table v2.1 to UMC v8.10.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:31 -04:00
Stanley.Yang
65183faec8 drm/amdgpu: Add RAS table v2.1 macro definition
Add RAS EEPROM table version 2.1 macro definition.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:28 -04:00
Stanley.Yang
71c79a1960 drm/amdgpu: Rename ras table version
Rename RAS_TABLE_VER to RAS_TABLE_VER_V1,
move RAS_TABLE_VER_V1 from amdgpu_ras_eeprom.c to amdgpu_ras_eeprom.h.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:24 -04:00
Jiadong Zhu
ea791e704b drm/amdgpu: Implement gfx9 patch functions for resubmission
Patch the packages including CONTEXT_CONTROL and WRITE_DATA for gfx9
during the resubmission scenario.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:22 -04:00
Jiadong Zhu
8ff865be93 drm/amdgpu: Modify indirect buffer packages for resubmission
When the preempted IB frame resubmitted to cp, we need to modify the frame
data including:
1. set PRE_RESUME 1 in CONTEXT_CONTROL.
2. use meta data(DE and CE) read from CSA in WRITE_DATA.

Add functions to save the location the first time IBs emitted and callback
to patch the package when resubmission happens.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:15 -04:00
Emily Deng
f2bcc0c7db drm/amdgpu/mmsch: Correct the definition for mmsch init header
For the header, it is version related, shouldn't use MAX_VCN_INSTANCES.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:44:12 -04:00
YiPeng Chai
869bcf59fd drm/amdgpu: change reserved vram info print
The link object of mgr->reserved_pages is the blocks
variable in struct amdgpu_vram_reservation, not the
link variable in struct drm_buddy_block.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-06-09 12:41:14 -04:00
Chia-I Wu
5a863904ba drm/amdgpu: fix xclk freq on CHIP_STONEY
According to Alex, most APUs from that time seem to have the same issue
(vbios says 48Mhz, actual is 100Mhz).  I only have a CHIP_STONEY so I
limit the fixup to CHIP_STONEY

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:41:11 -04:00
Alex Deucher
5a03159ab7 Revert "drm/amdgpu: switch to golden tsc registers for raven/raven2"
This reverts commit f03eb1d26c.

This results in inconsistent timing reported via asynchronous
GPU queries.

Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html
Cc: Jesse.Zhang@amd.com
Cc: michel@daenzer.net
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:40:56 -04:00
Alex Deucher
f9bfc9fff2 Revert "drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according to revision id"
This reverts commit 9d2d1827af.

This results in inconsistent timing reported via asynchronous
GPU queries.

Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html
Cc: Jesse.Zhang@amd.com
Cc: michel@daenzer.net
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:40:17 -04:00
Alex Deucher
a15a77c8e6 Revert "drm/amdgpu: change the reference clock for raven/raven2"
This reverts commit fbc24293ca.

This results in inconsistent timing reported via asynchronous
GPU queries.

Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html
Cc: Jesse.Zhang@amd.com
Cc: michel@daenzer.net
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:40:01 -04:00
Stanley.Yang
3898c8fc42 drm/amdgpu: convert vcn/jpeg logical mask to physical mask
Changed from V1:
 	Remove amdgpu_ras_logical_mask_to_physical_mask
	due to GET_MASK provides same feature.
	Support convert VCN/JPEG logical mask to physical
	mask.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:39:58 -04:00
Stanley.Yang
e3959cb547 drm/amdgpu: support check vcn jpeg block mask
Support VCN/JPEG instance mask checking, pass logical
mask directly except GFX/SDMA/VCN/JPEG blocks.

Changed from V1:
	correct a typo

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:39:54 -04:00
Stanley.Yang
61a7c16239 drm/amdgpu: pass xcc mask to ras ta
pass xcc mask to ras ta, ras ta will compare
the mask with the one from chiplet topology.

Changed from V1:
	Remove IP version checking.
	Set ras_cmd->ras_init_message.init_flags.xcc_mask
	directly due to xcc_mask is common structres to
	all the devices.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:39:51 -04:00
Srinivasan Shanmugam
25c30a12d7 drm/amdgpu: Mark 'kgd_gfx_aldebaran_clear_address_watch' & 'kgd_gfx_v11_clear_address_watch' functions as static
Below two functions cause a warning because they lack a prototype:

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c:164:10: warning: no previous prototype for ‘kgd_gfx_aldebaran_clear_address_watch’ [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c:782:10: warning: no previous prototype for ‘kgd_gfx_v11_clear_address_watch’ [-Wmissing-prototypes]

There are no callers from other files, so just mark them as 'static'.

Also fixes the following checks:

CHECK: Alignment should match open parenthesis +static uint32_t
kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev,
uint32_t watch_id)

CHECK: Alignment should match open parenthesis +static uint32_t
kgd_gfx_v11_clear_address_watch(struct amdgpu_device *adev, uint32_t
watch_id)

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:39:36 -04:00
Chia-I Wu
9f0bcf49e9 amdgpu: validate offset_in_bo of drm_amdgpu_gem_va
This is motivated by OOB access in amdgpu_vm_update_range when
offset_in_bo+map_size overflows.

v2: keep the validations in amdgpu_vm_bo_map
v3: add the validations to amdgpu_vm_bo_map/amdgpu_vm_bo_replace_map
    rather than to amdgpu_gem_va_ioctl

Fixes: 9f7eb5367d ("drm/amdgpu: actually use the VM map parameters")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:37:56 -04:00
Jonathan Kim
9bd443cb74 drm/amdgpu: fix debug wait on idle for gfx9.4.1
Wait calls for amd_ip_block_type not amd_hw_ip_block_type.

Reported-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:37:50 -04:00
Jonathan Kim
e0f85f4690 drm/amdkfd: add debug set and clear address watch points operation
Shader read, write and atomic memory operations can be alerted to the
debugger as an address watch exception.

Allow the debugger to pass in a watch point to a particular memory
address per device.

Note that there exists only 4 watch points per devices to date, so have
the KFD keep track of what watch points are allocated or not.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:36:46 -04:00
Jonathan Kim
a70a93fa56 drm/amdkfd: add debug suspend and resume process queues operation
In order to inspect waves from the saved context at any point during a
debug session, the debugger must be able to preempt queues to trigger
context save by suspending them.

On queue suspend, the KFD will copy the context save header information
so that the debugger can correctly crawl the appropriate size of the saved
context. The debugger must then also be allowed to resume suspended queues.

A queue that is newly created cannot be suspended because queue ids are
recycled after destruction so the debugger needs to know that this has
occurred.  Query functions will be later added that will clear a given
queue of its new queue status.

A queue cannot be destroyed while it is suspended to preserve its saved
context during debugger inspection.  Have queue destruction block while
a queue is suspended and unblocked when it is resumed.  Likewise, if a
queue is about to be destroyed, it cannot be suspended.

Return the number of queues successfully suspended or resumed along with
a per queue status array where the upper bits per queue status show that
the request was invalid (new/destroyed queue suspend request, missing
queue) or an error occurred (HWS in a fatal state so it can't suspend or
resume queues).

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:36:43 -04:00
Jonathan Kim
aea1b4738b drm/amdkfd: add debug wave launch mode operation
Allow the debugger to set wave behaviour on to either normally operate,
halt at launch, trap on every instruction, terminate immediately or
stall on allocation.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:36:39 -04:00
Jonathan Kim
101827e130 drm/amdkfd: add debug wave launch override operation
This operation allows the debugger to override the enabled HW
exceptions on the device.

On debug devices that only support the debugging of a single process,
the HW exceptions are global and set through the SPI_GDBG_TRAP_MASK
register.
Because they are global, only address watch exceptions are allowed to
be enabled.  In other words, the debugger must preserve all non-address
watch exception states in normal mode operation by barring a full
replacement override or a non-address watch override request.

For multi-process debugging, all HW exception overrides are per-VMID so
all exceptions can be overridden or fully replaced.

In order for the debugger to know what is permissible, returned the
supported override mask back to the debugger along with the previously
enable overrides.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:36:37 -04:00
Jonathan Kim
12fb1ad70d drm/amdkfd: update process interrupt handling for debug events
The debugger must be notified by any debugger subscribed exception
that comes from hardware interrupts.

If a debugger session exits, any exceptions it subscribed to may still
have interrupts in the interrupt ring buffer or KGD/KFD pipeline.
To prevent a new session from inheriting stale interrupts, when a new
queue is created, open an interrupt drain and allow the IH ring to drain
from a timestamped checkpoint.  Then inject a custom IV so that once
the custom IV is picked up by the KFD, it's safe to close the drain
and proceed with queue creation.

The drain must also be on debug disable as SW interrupts may still
be processed.  Drain at this time and clear all the exception status.

The debugger may also not be attached nor subscibed to certain
exceptions so forward them directly to the runtime.

GFX10 also requires its own IV processing, hence the creation of
kfd_int_process_v10.c.  This is because the IV from SQ interrupts are
packed into a new continguous format unlike GFX9. To make this clear,
a separate interrupting handling code file was created.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:36:17 -04:00
Jonathan Kim
69a8c3ae2d drm/amdkfd: apply trap workaround for gfx11
Due to a HW bug, waves in only half the shader arrays can enter trap.

When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.

When ending a debug session, re-enable the 2nd shader array per
shader engine.

User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.

Make user CU masking and debugging mutual exclusive with respect to
availability.

If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.

If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.

In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:35:52 -04:00
Jonathan Kim
a9818854ea drm/amdgpu: expose debug api for mes
Similar to the F32 HWS, the RS64 HWS for GFX11 now supports a multi-process
debug API.

The skip_process_ctx_clear ADD_QUEUE requirement is to prevent the MES
from clearing the process context when the first queue is added to the
scheduler in order to maintain debug mode settings during queue preemption
and restore.  The MES clears the process context in this case due to an
unresolved FW caching bug during normal mode operations.
During debug mode, the KFD will hold a reference to the target process
so the process context should never go stale and MES can afford to skip
this requirement.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:35:43 -04:00
Jonathan Kim
7cee6a6824 drm/amdgpu: add configurable grace period for unmap queues
The HWS schedule allows a grace period for wave completion prior to
preemption for better performance by avoiding CWSR on waves that can
potentially complete quickly. The debugger, on the other hand, will
want to inspect wave status immediately after it actively triggers
preemption (a suspend function to be provided).

To minimize latency between preemption and debugger wave inspection, allow
immediate preemption by setting the grace period to 0.

Note that setting the preepmtion grace period to 0 will result in an
infinite grace period being set due to a CP FW bug so set it to 1 for now.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:35:31 -04:00
Jonathan Kim
33f3437ae1 drm/amdgpu: add gfx11 hw debug mode enable and disable calls
Implement the per-device calls to enable or disable HW debug mode
for GFX11.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:35:28 -04:00
Jonathan Kim
be6f94039e drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls
GFX9.4.2 now supports per-VMID debug mode controls registers
(SPI_GDBG_PER_VMID_CNTL).

Because the KFD lets the HWS handle PASID-VMID mapping, the KFD will
forward all debug mode setting register writes to the HWS scheduler
using a new MAP_PROCESS API, so instead of writing to registers, return
the required register values that the HWS needs to write on debug enable
and disable.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:35:25 -04:00
Jonathan Kim
d13f050fee drm/amdgpu: add gfx10 hw debug mode enable and disable calls
Similar to GFX9 debug devices, set the hardware debug mode by draining
the SPI appropriately prior the mode setting request.

Because GFX10 has waves allocated by the work group boundary and each
SE's SPI instances do not communicate, the SPI drain time is much longer.
This long drain time will be fixed for GFX11 onwards.

Also remove a bunch of deprecated misplaced references for GFX10.3.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:35:21 -04:00
Jonathan Kim
01f648202c drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls
On GFX9.4.1, the implicit wait count instruction on s_barrier is
disabled by default in the driver during normal operation for
performance requirements.

There is a hardware bug in GFX9.4.1 where if the implicit wait count
instruction after an s_barrier instruction is disabled, any wave that
hits an exception may step over the s_barrier when returning from the
trap handler with the barrier logic having no ability to be
aware of this, thereby causing other waves to wait at the barrier
indefinitely resulting in a shader hang.  This bug has been corrected
for GFX9.4.2 and onward.

Since the debugger subscribes to hardware exceptions, in order to avoid
this bug, the debugger must enable implicit wait count on s_barrier
for a debug session and disable it on detach.

In order to change this setting in the in the device global SQ_CONFIG
register, the GFX pipeline must be idle.  GFX9.4.1 as a compute device
will either dispatch work through the compute ring buffers used for
image post processing or through the hardware scheduler by the KFD.

Have the KGD suspend and drain the compute ring buffer, then suspend the
hardware scheduler and block any future KFD process job requests before
changing the implicit wait count setting.  Once set, resume all work.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:35:15 -04:00
Jonathan Kim
cde2e087a3 drm/amdgpu: add gfx9 hw debug mode enable and disable calls
Implement the per-device calls to enable or disable HW debug mode for
GFX9 prior to GFX9.4.1.

GFX9.4.1 and onward will require their own enable/disable sequence as
follow on patches.

When hardware debug mode setting is requested, waves will inherit
these settings in the Shader Processor Input's (SPI) Sequencer Global
Block (SQG). This means that the KGD must drain all waves from the SPI
into SQG (approximately 96 SPI clock cycles) prior to debug mode setting
to ensure that the order of operations that the debugger expects with
regards to debug mode setting transaction requests and wave inheritence
of that mode is upheld.

Also ensure that exception overrides are reset to their original state
prior to debug enable or disable.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:35:11 -04:00
Mario Limonciello
257d7b7be2 drm/amd: Make lack of ACPI_FADT_LOW_POWER_S0 or CONFIG_AMD_PMC louder during suspend path
Users have reported that s2idle wasn't working on OEM Phoenix systems,
but it was root caused to be because `CONFIG_AMD_PMC` wasn't set in
the distribution kernel config.

To make this more apparent, raise the messaging to err instead of warn.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217497
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:34:59 -04:00
Jonathan Kim
4504f14338 drm/amdgpu: setup hw debug registers on driver initialization
Add missing debug trap registers references and initialize all debug
registers on boot by clearing the hardware exception overrides and the
wave allocation ID index.

The debugger requires that TTMPs 6 & 7 save the dispatch ID to map
waves onto dispatch during compute context inspection.
In order to correctly set this up, set the special reserved CP bit by
default whenever the MQD is initailized.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:34:56 -04:00
Horatio Zhang
cbb63eccc0 drm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vram
Use the function of amdgpu_bo_vm_destroy to handle the resource release
of shadow bo. During the amdgpu_mes_self_test, shadow bo released, but
vmbo->shadow_list was not, which caused a null pointer reference error
in amdgpu_device_recover_vram when GPU reset.

Fixes: 6c032c37ac ("drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)")
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:34:02 -04:00
Srinivasan Shanmugam
2e9fee9b8e drm/amdgpu: Fix up kdoc in amdgpu_device.c
Fix these warnings by deleting the deviant arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:799: warning: Excess function parameter 'pcie_index' description in 'amdgpu_device_indirect_wreg'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:799: warning: Excess function parameter 'pcie_data' description in 'amdgpu_device_indirect_wreg'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:870: warning: Excess function parameter 'pcie_index' description in 'amdgpu_device_indirect_wreg64'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:870: warning: Excess function parameter 'pcie_data' description in 'amdgpu_device_indirect_wreg64'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:33:55 -04:00
Srinivasan Shanmugam
ebe884e8b9 drm/amdgpu: Fix up kdoc 'ring' parameter in sdma_v6_0_ring_pad_ib
Fix this warning by adding 'ring' arguments to kdoc.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1128: warning: Function parameter or member 'ring' not described in 'sdma_v6_0_ring_pad_ib'

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:33:50 -04:00
Hamza Mahfooz
1b320ad3f5 drm/amd/amdgpu: introduce DRM_AMDGPU_WERROR
We want to do -Werror builds on our CI. However, non-amdgpu breakages
have prevented us from doing so thus far. Also, there are a number of
additional checks that we should enable, that the community cares about
and are hidden behind -Wextra. So, define DRM_AMDGPU_WERROR to only
enable -Werror for the amdgpu kernel module and enable -Wextra while
disabling all of the checks that are too noisy.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Kenny Ho <kenny.ho@amd.com>
Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Kenny Ho <Kenny.Ho@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:33:42 -04:00
Mario Limonciello
09521b5d49 drm/amd: Disallow s0ix without BIOS support again
commit cf488dcd0a ("drm/amd: Allow s0ix without BIOS support") showed
improvements to power consumption over suspend when s0ix wasn't enabled in
BIOS and the system didn't support S3.

This patch however was misguided because the reason the system didn't
support S3 was because SMT was disabled in OEM BIOS setup.
This prevented the BIOS from allowing S3.

Also allowing GPUs to use the s2idle path actually causes problems if
they're invoked on systems that may not support s2idle in the platform
firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails
for any reason, which could lead to unexpected flows.

The original commit also fixed a problem during resume from suspend to idle
without hardware support, but this is no longer necessary with commit
ca47518663 ("drm/amd: Don't allow s0ix on APUs older than Raven")

Revert commit cf488dcd0a ("drm/amd: Allow s0ix without BIOS support")
to make it match the expected behavior again.

Cc: Rafael Ávila de Espíndola <rafael@espindo.la>
Link: https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c#L1060
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:33:12 -04:00
ZhenGuo Yin
7a66ad6c08 drm/amdgpu: set finished fence error if job timedout
Set finished fence to ETIME error if job timedout.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:43 -04:00
Srinivasan Shanmugam
932fc49479 drm/amdgpu: Fix missing parameter desc for 'xcp_id' in amdgpu_amdkfd_reserve_mem_limit
Fix these warnings by adding 'xcp_id' argument.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:160: warning: Function parameter or member 'xcp_id' not described in 'amdgpu_amdkfd_reserve_mem_limit'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:40 -04:00
Srinivasan Shanmugam
1bae03aab2 drm/amdgpu: Fix up missing parameter in kdoc for 'inst' in gmc_ v7, v8, v9, v10, v11.c
Fix these warnings by adding 'inst' arguments to kdocs.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:428: warning: Function parameter or member 'inst' not described in 'gmc_v7_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:626: warning: Function parameter or member 'inst' not described in 'gmc_v8_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c:423: warning: Function parameter or member 'inst' not described in 'gmc_v10_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c:328: warning: Function parameter or member 'inst' not described in 'gmc_v11_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:950: warning: Function parameter or member 'inst' not described in 'gmc_v9_0_flush_gpu_tlb_pasid'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:35 -04:00
Srinivasan Shanmugam
3eeb0d037a drm/amdgpu: Fix up missing kdoc parameter 'inst' in get_wave_count() & kgd_gfx_v9_get_cu_occupancy()
Fix these warnings by adding 'inst' arguments to kdocs.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:692: warning: Function parameter or member 'inst' not described in 'get_wave_count'
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:763: warning: Function parameter or member 'inst' not described in 'kgd_gfx_v9_get_cu_occupancy'

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:32 -04:00
Srinivasan Shanmugam
ca2943fe0a drm/amdgpu: Fix missing parameter desc for 'xcc_id' in gfx_v7_0.c & amdgpu_rlc.c
Fix these warnings by adding 'xcc_id' arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c:1557: warning: Function parameter or member 'xcc_id' not described in 'gfx_v7_0_select_se_sh'
drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c:38: warning: Function parameter or member 'xcc_id' not described in 'amdgpu_gfx_rlc_enter_safe_mode'
drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c:62: warning: Function parameter or member 'xcc_id' not described in 'amdgpu_gfx_rlc_exit_safe_mode'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:30 -04:00
Lijo Lazar
c6a64ad9b7 drm/amdgpu: Initialize xcc mask
For ASICs which are not initialized through discovery, initialize GFX
cluster as 1.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:21 -04:00
Srinivasan Shanmugam
837d4e071d drm/amdgpu: Fix create_dmamap_sg_bo kdoc warnings
Fix the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:292: warning: Cannot understand  * @create_dmamap_sg_bo: Creates a amdgpu_bo object to reflect information

Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:12 -04:00
Srinivasan Shanmugam
9eba1b8b70 drm/amdgpu: Fix up missing kdoc in sdma_v6_0.c
Address a bunch of kdoc warnings:

gcc with W=1
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:248: warning: Function parameter or member 'job' not described in 'sdma_v6_0_ring_emit_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:248: warning: Function parameter or member 'flags' not described in 'sdma_v6_0_ring_emit_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:946: warning: Function parameter or member 'timeout' not described in 'sdma_v6_0_ring_test_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1125: warning: Function parameter or member 'ring' not described in 'sdma_v6_0_ring_pad_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1176: warning: Function parameter or member 'vmid' not described in 'sdma_v6_0_ring_emit_vm_flush'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1176: warning: Function parameter or member 'pd_addr' not described in 'sdma_v6_0_ring_emit_vm_flush'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:29 -04:00
Srinivasan Shanmugam
66dadf1ab1 drm/amdgpu: Fix up kdoc in amdgpu_acpi.c
Fix these warnings by adding & deleting the deviant arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:906: warning: Function parameter or member 'numa_info' not described in 'amdgpu_acpi_get_node_id'
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:906: warning: Excess function parameter 'nid' description in 'amdgpu_acpi_get_node_id'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:26 -04:00
Srinivasan Shanmugam
23616d1ff3 drm/amdgpu: Fix up kdoc in sdma_v4_4_2.c
Address a bunch of kdoc warnings:

gcc with W=1
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:426: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_gfx_stop'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:457: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_rlc_stop'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:470: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_page_stop'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:506: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_ctx_switch_enable'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:794: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_rlc_resume'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:810: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_load_microcode'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:854: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_start'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:23 -04:00
Ikshwaku Chauhan
3a25071a97 drm/amdgpu: enable tmz by default for GC 11.0.1
Add IP GC 11.0.1 in the list of target to have
tmz enabled by default.

Signed-off-by: Ikshwaku Chauhan <ikshwaku.chauhan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:20 -04:00
Guchun Chen
8ffd6f0442 drm/amdgpu: keep irq count in amdgpu_irq_disable_all
This can clean up all irq warnings because of unbalanced
amdgpu_irq_get/put when unplugging/unbinding device, and leave
irq count decrease in each ip fini function.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:12 -04:00
Dan Carpenter
9c9d501b28 drm/amd/amdgpu: Fix up locking etc in amdgpu_debugfs_gprwave_ioctl()
There are two bugs here.
1) Drop the lock if copy_from_user() fails.
2) If the copy fails then the correct error code is -EFAULT instead of
   -EINVAL.

I also broke up the long line and changed "sizeof rd->id" to
"sizeof(rd->id)".

Fixes: 553f973a0d ("drm/amd/amdgpu: Update debugfs for XCC support (v3)")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:10:04 -04:00
James Zhu
9938333a46 drm/amdgpu: use amdxcp platform device as spatial partition
Use amdxcp platform device as spatial partition device.

-v2: remove unused variable

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:08:58 -04:00
Srinivasan Shanmugam
3034983db3 drm/amdgpu: Mark mmhub_v1_8_mmea_err_status_reg as __maybe_unused
Silencing the compiler from below compilation error:

drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c:704:23: error: variable 'mmhub_v1_8_mmea_err_status_reg' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static const uint32_t mmhub_v1_8_mmea_err_status_reg[] = {
                      ^
1 error generated.

Mark the variable as __maybe_unused to make it clear to clang that this
is expected, so there is no more warning.

Cc: Christian König <christian.koenig@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:08:43 -04:00
Shiwu Zhang
5d6cd20075 drm/amdgpu: add the accelerator pcie class
v2: add the base class id for accelerator (lijo)
v3: add the new pci class in amdgpu tree (hawking)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:07:16 -04:00
James Zhu
2310554172 drm/amdgpu: save/restore part of xcp drm_device fields
Redirect xcp allocated drm_device::rdev/pdev/driver with
amdgpu pci_device/drm_device setting. They need be saved
before redirect and restored after unregister xcp drm_device.

-v2: fix warning discarded-qualifiers

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:07:12 -04:00
Shiwu Zhang
20997c04b7 drm/amdgpu: set the APU flag based on package type
Since currently APU and dGPU share the same pcie class
while gmc init needs the flag to set up correctly for upcomming
memory allocations

v2: call get_pkg_type in smuio 13_0_3 is enough (hawking)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:07:04 -04:00
James Zhu
28bb7f13e7 drm/jpeg: add init value for num_jpeg_rings
Need init new num_jpeg_rings to 1 on jpeg.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Richard Liang <rliang1@amd.com>
Tested-by: Ying Li <ying.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:07:00 -04:00
Shiwu Zhang
491ae27829 drm/amdgpu: complement the 4, 6 and 8 XCC cases
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:17 -04:00
Shiwu Zhang
89f8576555 drm/amdgpu: golden settings for ASIC rev_id 0
Suggested by FW team that GB_ADDR_CONFIG is handled by golden
settings in driver to get the expected value

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:15 -04:00
Shiwu Zhang
9535a86a40 drm/amdgpu: bypass bios dependent operations
Since bios reading does not work currently so just bypass all operations
related to bios

v2: hardcode the vram info for APP_APU case (hawking)
v3: correct the vram_width with channel number * channel size (lijo)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:12 -04:00
Jiadong Zhu
cfdce59417 drm/amdgpu: Program gds backup address as zero if no gds allocated
It is firmware requirement to set gds_backup_addrlo and gds_backup_addrhi
of DE meta both zero if no gds partition is allocated for the frame.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:08 -04:00
Jiadong Zhu
b7941e2fef drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled
When MEC executes unmap_queue for mid command buffer preemption, it will
kick the write pointer of the gfx ring, set CP_VMID_PREEMPT to trigger the
preemption and wait for CP_VMID_PREEMPT becomes zero after the preemption
done. There is a race condition that PFP may excute the resetting command
before MEC set CP_VMID_PREEMPT. As a result, hang happens as
CP_VMID_PREEMPT is always 0xffff.

To avoid this, we send resetting CP_VMID_PREEMPT command after the trailing
fence is siganled and update gfx write pointer explicitly.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:05 -04:00
Srinivasan Shanmugam
8cce16826f drm/amdgpu: Fix unused variable in amdgpu_gfx.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kcq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:497:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
  497 |  int j;
      |      ^
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kgq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:528:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
  528 |  int j;
      |      ^
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_enable_kgq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:630:12: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
  630 |  int r, i, j;
      |

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:01:10 -04:00
Srinivasan Shanmugam
a09e206510 drm/amdgpu: Fix defined but not used gfx9_cs_data in gfx_v9_4_3.c
gcc with W=1
In file included from drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:33:
drivers/gpu/drm/amd/amdgpu/clearstate_gfx9.h:939:36: warning: ‘gfx9_cs_data’ defined but not used [-Wunused-const-variable=]
  939 | static const struct cs_section_def gfx9_cs_data[] = {
      |

gfx9_cs_data is not used in gfx_v9_4_3.c, hence remove its
include in gfx_v9_4_3.c

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:00:53 -04:00
Nathan Chancellor
6c882a573b drm/amdgpu: Fix return types of certain NBIOv7.9 callbacks
When building with clang's -Wincompatible-function-pointer-types-strict,
which ensures that function pointer signatures match exactly to avoid
tripping clang's Control Flow Integrity (kCFI) checks at run time and
will eventually be turned on for the kernel, the following instances
appear in the NBIOv7.9 code:

  drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c:465:32: error: incompatible function pointer types initializing 'int (*)(struct amdgpu_device *)' with an expression of type 'enum amdgpu_gfx_partition (struct amdgpu_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .get_compute_partition_mode = nbio_v7_9_get_compute_partition_mode,
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c:467:31: error: incompatible function pointer types initializing 'u32 (*)(struct amdgpu_device *, u32 *)' (aka 'unsigned int (*)(struct amdgpu_device *, unsigned int *)') with an expression of type 'enum amdgpu_memory_partition (struct amdgpu_device *, u32 *)' (aka 'enum amdgpu_memory_partition (struct amdgpu_device *, unsigned int *)') [-Werror,-Wincompatible-function-pointer-types-strict]
          .get_memory_partition_mode = nbio_v7_9_get_memory_partition_mode,
                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  2 errors generated.

Change the return types of these callbacks to match the prototypes to
clear up the warning and avoid tripping kCFI at run time. Both functions
return a value from ffs(), which is an integer that can fit into either
int or unsigned int.

Fixes: 98a54e88e8 ("drm/amdgpu: add sysfs node for compute partition mode")
Fixes: ea2d2f8ece ("drm/amdgpu: detect current GPU memory partition mode")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:59:07 -04:00
Lijo Lazar
3525844d48 drm/amdgpu: Use single copy per SDMA instance type (v2)
v1: Only single copy per instance type is required for PSP. All instance types
use the same firmware copy. (Lijo)

v2: Apply the change into amdgpu_sdma_init_microcode() due to rebase. (Morris)

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:57:16 -04:00
Guchun Chen
93ab59ac6d drm/amdgpu: switch to unified amdgpu_ring_test_helper
This will simplify code.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:57:13 -04:00
Guchun Chen
232f243189 drm/amdgpu/gfx: set sched.ready status after ring/IB test in gfx
sched.ready is nothing with ring initialization, it needs to set
to be true after ring/IB test in amdgpu_ring_test_helper to tell
the ring is ready for submission.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:57:11 -04:00
Guchun Chen
61c31b8b6c drm/amdgpu/sdma: set sched.ready status after ring/IB test in sdma
sched.ready is nothing with ring initialization, it needs to set
to be true after ring/IB test in amdgpu_ring_test_helper to tell
the ring is ready for submission.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:57:07 -04:00
Lijo Lazar
194224a54c drm/amdgpu: Fix warnings
Fix warnings reported by kernel test bot/smatch

smatch warnings:
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c:65 amdgpu_xcp_run_transition()
error: buffer overflow 'xcp_mgr->xcp' 8 <= 8

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202305231453.I0bXngYn-lkp@intel.com/
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:56:58 -04:00
Srinivasan Shanmugam
413521a4c9 drm/amd/amdgpu: Fix warnings in amdgpu_irq.c
checkpatch reports:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: braces {} are not necessary for any arm of this statement
+               if (nvec <= 0) {
[...]
+               } else {
[...]
WARNING: Block comments use a trailing */ on a separate line

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:55:38 -04:00
Mukul Joshi
c3aaca43fb drm/amdgpu: Add a low priority scheduler for VRAM clearing
Add a low priority DRM scheduler for VRAM clearing instead of using
the exisiting high priority scheduler. Use the high priority scheduler
for migrations and evictions.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:54:40 -04:00
Jiapeng Chong
1fbc69b8f5 drm/amdgpu/vcn: Modify mismatched function name
No functional modification involved.

drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:374: warning: expecting prototype for vcn_v4_0_mc_resume_dpg_mode(). Prototype was for vcn_v4_0_3_mc_resume_dpg_mode() instead.
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:631: warning: expecting prototype for vcn_v4_0_enable_clock_gating(). Prototype was for vcn_v4_0_3_enable_clock_gating() instead.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=5284
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:54:37 -04:00
Jiapeng Chong
aaff9c0899 drm/amdgpu: Modify mismatched function name
No functional modification involved.

drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:426: warning: expecting prototype for sdma_v4_4_2_gfx_stop(). Prototype was for sdma_v4_4_2_inst_gfx_stop() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:457: warning: expecting prototype for sdma_v4_4_2_rlc_stop(). Prototype was for sdma_v4_4_2_inst_rlc_stop() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:470: warning: expecting prototype for sdma_v4_4_2_page_stop(). Prototype was for sdma_v4_4_2_inst_page_stop() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:506: warning: expecting prototype for sdma_v4_4_2_ctx_switch_enable(). Prototype was for sdma_v4_4_2_inst_ctx_switch_enable() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:561: warning: expecting prototype for sdma_v4_4_2_enable(). Prototype was for sdma_v4_4_2_inst_enable() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:798: warning: expecting prototype for sdma_v4_4_2_rlc_resume(). Prototype was for sdma_v4_4_2_inst_rlc_resume() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:814: warning: expecting prototype for sdma_v4_4_2_load_microcode(). Prototype was for sdma_v4_4_2_inst_load_microcode() instead.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=5283
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:54:34 -04:00
Jiapeng Chong
e7665d0ca7 drm/amdgpu: Remove duplicate include
./drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c: amdgpu_xcp.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=5281
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:54:31 -04:00
Philip Yang
acf429dcac drm/amdkfd: Align partition memory size to page size
The compute partition memory size calculated from KFD_XCP_MEMORY_SIZE
may not align to page size if xcp_mgr->num_xcp_per_mem_partition is 6.

Change the KFD_XCP_MEMORY_SIZE macro to return page align size, so KFD
node memory size reported in sysfs is page align size, to avoid
application VRAM allocation failure because application may use the size
directly and Thunk requires the memory allocation size is page size
align.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:53:03 -04:00
Tom Rix
803e4c9efc drm/amdgpu: remove unused variable num_xcc
gcc with W=1 reports
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:2138:13: error: variable
  ‘num_xcc’ set but not used [-Werror=unused-but-set-variable]
 2138 |         int num_xcc;
      |             ^~~~~~~

This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:52:45 -04:00