2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

12597 Commits

Author SHA1 Message Date
Mario Limonciello
09521b5d49 drm/amd: Disallow s0ix without BIOS support again
commit cf488dcd0a ("drm/amd: Allow s0ix without BIOS support") showed
improvements to power consumption over suspend when s0ix wasn't enabled in
BIOS and the system didn't support S3.

This patch however was misguided because the reason the system didn't
support S3 was because SMT was disabled in OEM BIOS setup.
This prevented the BIOS from allowing S3.

Also allowing GPUs to use the s2idle path actually causes problems if
they're invoked on systems that may not support s2idle in the platform
firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails
for any reason, which could lead to unexpected flows.

The original commit also fixed a problem during resume from suspend to idle
without hardware support, but this is no longer necessary with commit
ca47518663 ("drm/amd: Don't allow s0ix on APUs older than Raven")

Revert commit cf488dcd0a ("drm/amd: Allow s0ix without BIOS support")
to make it match the expected behavior again.

Cc: Rafael Ávila de Espíndola <rafael@espindo.la>
Link: https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c#L1060
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:33:12 -04:00
ZhenGuo Yin
7a66ad6c08 drm/amdgpu: set finished fence error if job timedout
Set finished fence to ETIME error if job timedout.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:43 -04:00
Srinivasan Shanmugam
932fc49479 drm/amdgpu: Fix missing parameter desc for 'xcp_id' in amdgpu_amdkfd_reserve_mem_limit
Fix these warnings by adding 'xcp_id' argument.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:160: warning: Function parameter or member 'xcp_id' not described in 'amdgpu_amdkfd_reserve_mem_limit'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:40 -04:00
Srinivasan Shanmugam
1bae03aab2 drm/amdgpu: Fix up missing parameter in kdoc for 'inst' in gmc_ v7, v8, v9, v10, v11.c
Fix these warnings by adding 'inst' arguments to kdocs.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:428: warning: Function parameter or member 'inst' not described in 'gmc_v7_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:626: warning: Function parameter or member 'inst' not described in 'gmc_v8_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c:423: warning: Function parameter or member 'inst' not described in 'gmc_v10_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c:328: warning: Function parameter or member 'inst' not described in 'gmc_v11_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:950: warning: Function parameter or member 'inst' not described in 'gmc_v9_0_flush_gpu_tlb_pasid'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:35 -04:00
Srinivasan Shanmugam
3eeb0d037a drm/amdgpu: Fix up missing kdoc parameter 'inst' in get_wave_count() & kgd_gfx_v9_get_cu_occupancy()
Fix these warnings by adding 'inst' arguments to kdocs.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:692: warning: Function parameter or member 'inst' not described in 'get_wave_count'
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:763: warning: Function parameter or member 'inst' not described in 'kgd_gfx_v9_get_cu_occupancy'

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:32 -04:00
Srinivasan Shanmugam
ca2943fe0a drm/amdgpu: Fix missing parameter desc for 'xcc_id' in gfx_v7_0.c & amdgpu_rlc.c
Fix these warnings by adding 'xcc_id' arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c:1557: warning: Function parameter or member 'xcc_id' not described in 'gfx_v7_0_select_se_sh'
drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c:38: warning: Function parameter or member 'xcc_id' not described in 'amdgpu_gfx_rlc_enter_safe_mode'
drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c:62: warning: Function parameter or member 'xcc_id' not described in 'amdgpu_gfx_rlc_exit_safe_mode'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:30 -04:00
Lijo Lazar
c6a64ad9b7 drm/amdgpu: Initialize xcc mask
For ASICs which are not initialized through discovery, initialize GFX
cluster as 1.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:21 -04:00
Srinivasan Shanmugam
837d4e071d drm/amdgpu: Fix create_dmamap_sg_bo kdoc warnings
Fix the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:292: warning: Cannot understand  * @create_dmamap_sg_bo: Creates a amdgpu_bo object to reflect information

Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:32:12 -04:00
Srinivasan Shanmugam
9eba1b8b70 drm/amdgpu: Fix up missing kdoc in sdma_v6_0.c
Address a bunch of kdoc warnings:

gcc with W=1
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:248: warning: Function parameter or member 'job' not described in 'sdma_v6_0_ring_emit_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:248: warning: Function parameter or member 'flags' not described in 'sdma_v6_0_ring_emit_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:946: warning: Function parameter or member 'timeout' not described in 'sdma_v6_0_ring_test_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1125: warning: Function parameter or member 'ring' not described in 'sdma_v6_0_ring_pad_ib'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1176: warning: Function parameter or member 'vmid' not described in 'sdma_v6_0_ring_emit_vm_flush'
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1176: warning: Function parameter or member 'pd_addr' not described in 'sdma_v6_0_ring_emit_vm_flush'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:29 -04:00
Srinivasan Shanmugam
66dadf1ab1 drm/amdgpu: Fix up kdoc in amdgpu_acpi.c
Fix these warnings by adding & deleting the deviant arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:906: warning: Function parameter or member 'numa_info' not described in 'amdgpu_acpi_get_node_id'
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:906: warning: Excess function parameter 'nid' description in 'amdgpu_acpi_get_node_id'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:26 -04:00
Srinivasan Shanmugam
23616d1ff3 drm/amdgpu: Fix up kdoc in sdma_v4_4_2.c
Address a bunch of kdoc warnings:

gcc with W=1
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:426: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_gfx_stop'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:457: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_rlc_stop'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:470: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_page_stop'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:506: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_ctx_switch_enable'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:794: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_rlc_resume'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:810: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_load_microcode'
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:854: warning: Function parameter or member 'inst_mask' not described in 'sdma_v4_4_2_inst_start'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:23 -04:00
Ikshwaku Chauhan
3a25071a97 drm/amdgpu: enable tmz by default for GC 11.0.1
Add IP GC 11.0.1 in the list of target to have
tmz enabled by default.

Signed-off-by: Ikshwaku Chauhan <ikshwaku.chauhan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:20 -04:00
Guchun Chen
8ffd6f0442 drm/amdgpu: keep irq count in amdgpu_irq_disable_all
This can clean up all irq warnings because of unbalanced
amdgpu_irq_get/put when unplugging/unbinding device, and leave
irq count decrease in each ip fini function.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:31:12 -04:00
Dan Carpenter
9c9d501b28 drm/amd/amdgpu: Fix up locking etc in amdgpu_debugfs_gprwave_ioctl()
There are two bugs here.
1) Drop the lock if copy_from_user() fails.
2) If the copy fails then the correct error code is -EFAULT instead of
   -EINVAL.

I also broke up the long line and changed "sizeof rd->id" to
"sizeof(rd->id)".

Fixes: 553f973a0d ("drm/amd/amdgpu: Update debugfs for XCC support (v3)")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:10:04 -04:00
James Zhu
9938333a46 drm/amdgpu: use amdxcp platform device as spatial partition
Use amdxcp platform device as spatial partition device.

-v2: remove unused variable

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:08:58 -04:00
Srinivasan Shanmugam
3034983db3 drm/amdgpu: Mark mmhub_v1_8_mmea_err_status_reg as __maybe_unused
Silencing the compiler from below compilation error:

drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c:704:23: error: variable 'mmhub_v1_8_mmea_err_status_reg' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static const uint32_t mmhub_v1_8_mmea_err_status_reg[] = {
                      ^
1 error generated.

Mark the variable as __maybe_unused to make it clear to clang that this
is expected, so there is no more warning.

Cc: Christian König <christian.koenig@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:08:43 -04:00
Shiwu Zhang
5d6cd20075 drm/amdgpu: add the accelerator pcie class
v2: add the base class id for accelerator (lijo)
v3: add the new pci class in amdgpu tree (hawking)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:07:16 -04:00
James Zhu
2310554172 drm/amdgpu: save/restore part of xcp drm_device fields
Redirect xcp allocated drm_device::rdev/pdev/driver with
amdgpu pci_device/drm_device setting. They need be saved
before redirect and restored after unregister xcp drm_device.

-v2: fix warning discarded-qualifiers

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:07:12 -04:00
Shiwu Zhang
20997c04b7 drm/amdgpu: set the APU flag based on package type
Since currently APU and dGPU share the same pcie class
while gmc init needs the flag to set up correctly for upcomming
memory allocations

v2: call get_pkg_type in smuio 13_0_3 is enough (hawking)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:07:04 -04:00
James Zhu
28bb7f13e7 drm/jpeg: add init value for num_jpeg_rings
Need init new num_jpeg_rings to 1 on jpeg.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Richard Liang <rliang1@amd.com>
Tested-by: Ying Li <ying.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:07:00 -04:00
Shiwu Zhang
491ae27829 drm/amdgpu: complement the 4, 6 and 8 XCC cases
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:17 -04:00
Shiwu Zhang
89f8576555 drm/amdgpu: golden settings for ASIC rev_id 0
Suggested by FW team that GB_ADDR_CONFIG is handled by golden
settings in driver to get the expected value

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:15 -04:00
Shiwu Zhang
9535a86a40 drm/amdgpu: bypass bios dependent operations
Since bios reading does not work currently so just bypass all operations
related to bios

v2: hardcode the vram info for APP_APU case (hawking)
v3: correct the vram_width with channel number * channel size (lijo)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:12 -04:00
Jiadong Zhu
cfdce59417 drm/amdgpu: Program gds backup address as zero if no gds allocated
It is firmware requirement to set gds_backup_addrlo and gds_backup_addrhi
of DE meta both zero if no gds partition is allocated for the frame.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:08 -04:00
Jiadong Zhu
b7941e2fef drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled
When MEC executes unmap_queue for mid command buffer preemption, it will
kick the write pointer of the gfx ring, set CP_VMID_PREEMPT to trigger the
preemption and wait for CP_VMID_PREEMPT becomes zero after the preemption
done. There is a race condition that PFP may excute the resetting command
before MEC set CP_VMID_PREEMPT. As a result, hang happens as
CP_VMID_PREEMPT is always 0xffff.

To avoid this, we send resetting CP_VMID_PREEMPT command after the trailing
fence is siganled and update gfx write pointer explicitly.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:02:05 -04:00
Srinivasan Shanmugam
8cce16826f drm/amdgpu: Fix unused variable in amdgpu_gfx.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kcq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:497:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
  497 |  int j;
      |      ^
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kgq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:528:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
  528 |  int j;
      |      ^
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_enable_kgq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:630:12: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
  630 |  int r, i, j;
      |

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:01:10 -04:00
Srinivasan Shanmugam
a09e206510 drm/amdgpu: Fix defined but not used gfx9_cs_data in gfx_v9_4_3.c
gcc with W=1
In file included from drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:33:
drivers/gpu/drm/amd/amdgpu/clearstate_gfx9.h:939:36: warning: ‘gfx9_cs_data’ defined but not used [-Wunused-const-variable=]
  939 | static const struct cs_section_def gfx9_cs_data[] = {
      |

gfx9_cs_data is not used in gfx_v9_4_3.c, hence remove its
include in gfx_v9_4_3.c

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 11:00:53 -04:00
Nathan Chancellor
6c882a573b drm/amdgpu: Fix return types of certain NBIOv7.9 callbacks
When building with clang's -Wincompatible-function-pointer-types-strict,
which ensures that function pointer signatures match exactly to avoid
tripping clang's Control Flow Integrity (kCFI) checks at run time and
will eventually be turned on for the kernel, the following instances
appear in the NBIOv7.9 code:

  drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c:465:32: error: incompatible function pointer types initializing 'int (*)(struct amdgpu_device *)' with an expression of type 'enum amdgpu_gfx_partition (struct amdgpu_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .get_compute_partition_mode = nbio_v7_9_get_compute_partition_mode,
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c:467:31: error: incompatible function pointer types initializing 'u32 (*)(struct amdgpu_device *, u32 *)' (aka 'unsigned int (*)(struct amdgpu_device *, unsigned int *)') with an expression of type 'enum amdgpu_memory_partition (struct amdgpu_device *, u32 *)' (aka 'enum amdgpu_memory_partition (struct amdgpu_device *, unsigned int *)') [-Werror,-Wincompatible-function-pointer-types-strict]
          .get_memory_partition_mode = nbio_v7_9_get_memory_partition_mode,
                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  2 errors generated.

Change the return types of these callbacks to match the prototypes to
clear up the warning and avoid tripping kCFI at run time. Both functions
return a value from ffs(), which is an integer that can fit into either
int or unsigned int.

Fixes: 98a54e88e8 ("drm/amdgpu: add sysfs node for compute partition mode")
Fixes: ea2d2f8ece ("drm/amdgpu: detect current GPU memory partition mode")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:59:07 -04:00
Lijo Lazar
3525844d48 drm/amdgpu: Use single copy per SDMA instance type (v2)
v1: Only single copy per instance type is required for PSP. All instance types
use the same firmware copy. (Lijo)

v2: Apply the change into amdgpu_sdma_init_microcode() due to rebase. (Morris)

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:57:16 -04:00
Guchun Chen
93ab59ac6d drm/amdgpu: switch to unified amdgpu_ring_test_helper
This will simplify code.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:57:13 -04:00
Guchun Chen
232f243189 drm/amdgpu/gfx: set sched.ready status after ring/IB test in gfx
sched.ready is nothing with ring initialization, it needs to set
to be true after ring/IB test in amdgpu_ring_test_helper to tell
the ring is ready for submission.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:57:11 -04:00
Guchun Chen
61c31b8b6c drm/amdgpu/sdma: set sched.ready status after ring/IB test in sdma
sched.ready is nothing with ring initialization, it needs to set
to be true after ring/IB test in amdgpu_ring_test_helper to tell
the ring is ready for submission.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:57:07 -04:00
Lijo Lazar
194224a54c drm/amdgpu: Fix warnings
Fix warnings reported by kernel test bot/smatch

smatch warnings:
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c:65 amdgpu_xcp_run_transition()
error: buffer overflow 'xcp_mgr->xcp' 8 <= 8

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202305231453.I0bXngYn-lkp@intel.com/
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:56:58 -04:00
Srinivasan Shanmugam
413521a4c9 drm/amd/amdgpu: Fix warnings in amdgpu_irq.c
checkpatch reports:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: braces {} are not necessary for any arm of this statement
+               if (nvec <= 0) {
[...]
+               } else {
[...]
WARNING: Block comments use a trailing */ on a separate line

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:55:38 -04:00
Mukul Joshi
c3aaca43fb drm/amdgpu: Add a low priority scheduler for VRAM clearing
Add a low priority DRM scheduler for VRAM clearing instead of using
the exisiting high priority scheduler. Use the high priority scheduler
for migrations and evictions.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:54:40 -04:00
Jiapeng Chong
1fbc69b8f5 drm/amdgpu/vcn: Modify mismatched function name
No functional modification involved.

drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:374: warning: expecting prototype for vcn_v4_0_mc_resume_dpg_mode(). Prototype was for vcn_v4_0_3_mc_resume_dpg_mode() instead.
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:631: warning: expecting prototype for vcn_v4_0_enable_clock_gating(). Prototype was for vcn_v4_0_3_enable_clock_gating() instead.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=5284
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:54:37 -04:00
Jiapeng Chong
aaff9c0899 drm/amdgpu: Modify mismatched function name
No functional modification involved.

drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:426: warning: expecting prototype for sdma_v4_4_2_gfx_stop(). Prototype was for sdma_v4_4_2_inst_gfx_stop() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:457: warning: expecting prototype for sdma_v4_4_2_rlc_stop(). Prototype was for sdma_v4_4_2_inst_rlc_stop() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:470: warning: expecting prototype for sdma_v4_4_2_page_stop(). Prototype was for sdma_v4_4_2_inst_page_stop() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:506: warning: expecting prototype for sdma_v4_4_2_ctx_switch_enable(). Prototype was for sdma_v4_4_2_inst_ctx_switch_enable() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:561: warning: expecting prototype for sdma_v4_4_2_enable(). Prototype was for sdma_v4_4_2_inst_enable() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:798: warning: expecting prototype for sdma_v4_4_2_rlc_resume(). Prototype was for sdma_v4_4_2_inst_rlc_resume() instead.
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:814: warning: expecting prototype for sdma_v4_4_2_load_microcode(). Prototype was for sdma_v4_4_2_inst_load_microcode() instead.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=5283
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:54:34 -04:00
Jiapeng Chong
e7665d0ca7 drm/amdgpu: Remove duplicate include
./drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c: amdgpu_xcp.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=5281
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:54:31 -04:00
Philip Yang
acf429dcac drm/amdkfd: Align partition memory size to page size
The compute partition memory size calculated from KFD_XCP_MEMORY_SIZE
may not align to page size if xcp_mgr->num_xcp_per_mem_partition is 6.

Change the KFD_XCP_MEMORY_SIZE macro to return page align size, so KFD
node memory size reported in sysfs is page align size, to avoid
application VRAM allocation failure because application may use the size
directly and Thunk requires the memory allocation size is page size
align.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:53:03 -04:00
Tom Rix
803e4c9efc drm/amdgpu: remove unused variable num_xcc
gcc with W=1 reports
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:2138:13: error: variable
  ‘num_xcc’ set but not used [-Werror=unused-but-set-variable]
 2138 |         int num_xcc;
      |             ^~~~~~~

This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:52:45 -04:00
Arnd Bergmann
1501fe94ee drm/amdgpu: fix acpi build warnings
Two newly introduced functions are in the global namespace but have no prototypes
or callers outside of amdgpu_acpi.c, another function is static but only has
a caller inside of an #ifdef:

drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:902:13: error: no previous prototype for 'amdgpu_acpi_get_node_id' [-Werror=missing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:928:30: error: no previous prototype for 'amdgpu_acpi_get_dev' [-Werror=missing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:860:33: error: 'amdgpu_acpi_get_numa_info' defined but not used [-Werror=unused-function]

Avoid the warnings by marking all of them static and ensuring that the compiler is
able to see the callsites.

v2: rebase on latest code (Alex)

Fixes: fa0497c34e ("drm/amdgpu: Add API to get numa information of XCC")
Fixes: 1cc823011a ("drm/amdgpu: Store additional numa node information")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:51:35 -04:00
Arnd Bergmann
1f9bb94f12 drm/amdgpu: use %pad format string for dma_addr_t
DMA addresses can be shorter than u64, which results in a broken debug output:

drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c: In function 'amdgpu_gart_table_ram_alloc':
drivers/gpu/drm/amd/amdgpu/amdgpu.h:41:22: error: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'dma_addr_t' {aka 'unsigned int'} [-Werror=format=]
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:146:9: note: in expansion of macro 'dev_info'
  146 |         dev_info(adev->dev, "%s dma_addr:%llx\n", __func__, dma_addr);

Use the special %pad format string and pass the DMA address by reference.

Fixes: c9a502e981 ("drm/amdgpu: Allocate GART table in RAM for AMD APU")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:50:55 -04:00
Arnd Bergmann
1b177b5c68 drm/amdgpu:mark aqua_vanjaram_reg_init.c function as static
A few newly added global functions have no prototype, which causes warnings:

drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c:169:5: error: no previous prototype for 'aqua_vanjaram_select_scheds' [-Werror=missing-prototypes]
drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c:310:5: error: no previous prototype for '__aqua_vanjaram_get_xcc_per_xcp' [-Werror=missing-prototypes]
drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c:337:5: error: no previous prototype for '__aqua_vanjaram_get_xcp_ip_info' [-Werror=missing-prototypes]
drivers/gpu/drm/amd/amdgpu/aqua_vanjaram_reg_init.c:593:5: error: no previous prototype for 'aqua_vanjaram_get_xcp_ip_details' [-Werror=missing-prototypes]

There are no callers from other files, so just mark them as 'static'.

Fixes: cd7d8400aa ("drm/amdgpu: add partition schedule for GC(9, 4, 3)")
Fixes: 9cb18287d8 ("drm/amdgpu: Add SOC partition funcs for GC v9.4.3")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:48:43 -04:00
Arnd Bergmann
d522458e63 drm/amdkfd: mark local functions as static
The file was newly added and causes some -Wmissing-prototype warnings:

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c:57:5: error: no previous prototype for 'kgd_gfx_v9_4_3_hqd_sdma_load' [-Werror=missing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c:126:5: error: no previous prototype for 'kgd_gfx_v9_4_3_hqd_sdma_dump' [-Werror=missing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c:163:6: error: no previous prototype for 'kgd_gfx_v9_4_3_hqd_sdma_is_occupied' [-Werror=missing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c:181:5: error: no previous prototype for 'kgd_gfx_v9_4_3_hqd_sdma_destroy' [-Werror=missing-prototypes]

Mark these all as 'static' since there are no outside callers.

Fixes: a805889a15 ("drm/amdkfd: Update SDMA queue management for GFX9.4.3")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:47:53 -04:00
Harshit Mogalapalli
6dabce860d drm/amdgpu: Fix unsigned comparison with zero in gmc_v9_0_process_interrupt()
Smatch warns:
	drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:579:
	unsigned 'xcc_id' is never less than zero.

gfx_v9_4_3_ih_to_xcc_inst() returns negative numbers as well.
Fix this by changing type of xcc_id to int.

Fixes: 98b2e9cad2 ("drm/amdgpu: correct the vmhub index when page fault occurs")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:45:46 -04:00
Colin Ian King
9f77af014c drm/amdgpu: Fix a couple of spelling mistakes in info and debug messages
There are a couple of spelling mistakes, one in a dev_info message
and the other in a dev_debug message. Fix them.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:45:43 -04:00
Lijo Lazar
a3ffabb250 drm/amdgpu: Disable interrupt tracker on NBIOv7.9
Enabling nBIF interrupt history tracker prevents LCLK deep sleep,
hence disable it

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:45:23 -04:00
Shiwu Zhang
89fb3020d6 drm/amdgpu: init the XCC_DOORBELL_FENCE regs
add the the init_registers callback for nbio_v7_9

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:44:37 -04:00
Tao Zhou
332bb09352 drm/amdgpu: remove unused definition
mmhub_v1_8_mmea_cgtt_clk_cntl_reg is defined but not used.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:44:33 -04:00
Srinivasan Shanmugam
1893549af6 drm/amdgpu: Fix uninitialized variable in gfxhub_v1_2_xcp_resume
drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c:657:6: error: variable 'ret' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
        if (!amdgpu_sriov_vf(adev))
            ^~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c:660:9: note: uninitialized use occurs here
        return ret;
               ^~~
drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c:657:2: note: remove the 'if' if its condition is always true
        if (!amdgpu_sriov_vf(adev))
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c:648:9: note: initialize the variable 'ret' to silence this warning
        int ret;
               ^
                = 0
1 error generated.

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:44:30 -04:00
Srinivasan Shanmugam
40e39d7227 drm/amdgpu: Fix unused amdgpu_acpi_get_numa_info function in amdgpu_acpi_get_node_id()
Fix the below compiler complaining error:

drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:860:33: error: unused function 'amdgpu_acpi_get_numa_info' [-Werror,-Wunused-function]
static struct amdgpu_numa_info *amdgpu_acpi_get_numa_info(uint32_t pxm)
                                ^
1 error generated.

By guarding amdgpu_acpi_get_numa_info & amdgpu_acpi_get_numa_size
function, only when CONFIG_ACPI_NUMA is enabled.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:44:27 -04:00
Christoph Hellwig
73ade646c5 drm/amdgpu: stop including swiotlb.h
amdgpu does not need swiotlb.h, so stop including it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:42:21 -04:00
Srinivasan Shanmugam
b3b0e016ec drm/amdgpu: Fix uninitalized variable in jpeg_v4_0_3_is_idle & jpeg_v4_0_3_wait_for_idle
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c:752:4: error: variable 'ret' is uninitialized when used here [-Werror,-Wuninitialized]
                        ret &= ((RREG32_SOC15_OFFSET(
                        ^~~
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c:745:10: note: initialize the variable 'ret' to silence this warning
        bool ret;
                ^
                 = 0
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c:774:4: error: variable 'ret' is uninitialized when used here [-Werror,-Wuninitialized]
                        ret &= SOC15_WAIT_ON_RREG_OFFSET(
                        ^~~
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c:767:9: note: initialize the variable 'ret' to silence this warning
        int ret;
               ^
                = 0
2 errors generated.

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: James Zhu <James.Zhu@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:58 -04:00
Srinivasan Shanmugam
ff6b11cc72 drm/amd/amdgpu: Fix errors & warnings in mmhub_v1_8.c
Fix below errors & warnings reported by checkpatch:

ERROR: code indent should use tabs where possible
WARNING: please, no space before tabs
WARNING: please, no spaces at the start of a line
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
ERROR: space prohibited before that '++' (ctx:WxB)
WARNING: Block comments use a trailing */ on a separate line

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:53 -04:00
Likun Gao
20a29ac091 drm/amdgpu: retire set_vga_state for some ASIC
set_vga_state operation only allowed on SI generation
ASIC, retire the realted function on those ASIC which
did not do anything.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:50 -04:00
Likun Gao
b336c681bd drm/amdgpu: fix vga_set_state NULL pointer issue
Fix NULL pointer issue for vga_set_state function
as not all the ASIC need this operation.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:40 -04:00
Srinivasan Shanmugam
d7b8e68dc0 drm/amdgpu: Fix uninitialized variable in gfx_v9_4_3_cp_resume
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1925:6: error: variable 'r' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
        if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1931:6: note: uninitialized use occurs here
        if (r)
            ^
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1925:2: note: remove the 'if' if its condition is always true
        if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1923:7: note: initialize the variable 'r' to silence this warning
        int r, i, num_xcc;
             ^
              = 0
1 error generated.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:35 -04:00
Horatio Zhang
a34b09060a drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v4_0
Add ras_poison_irq and functions. And fix the amdgpu_irq_put
call trace in jpeg_v4_0_hw_fini.

[   50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu]
[   50.497619] RSP: 0018:ffffaa2400fcfcb0 EFLAGS: 00010246
[   50.497620] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[   50.497621] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   50.497621] RBP: ffffaa2400fcfcd0 R08: 0000000000000000 R09: 0000000000000000
[   50.497622] R10: 0000000000000000 R11: 0000000000000000 R12: ffff99b2105242d8
[   50.497622] R13: 0000000000000000 R14: ffff99b210500000 R15: ffff99b210500000
[   50.497623] FS:  0000000000000000(0000) GS:ffff99b518480000(0000) knlGS:0000000000000000
[   50.497623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   50.497624] CR2: 00007f9d32aa91e8 CR3: 00000001ba210000 CR4: 0000000000750ee0
[   50.497624] PKRU: 55555554
[   50.497625] Call Trace:
[   50.497625]  <TASK>
[   50.497627]  jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu]
[   50.497693]  jpeg_v4_0_suspend+0x13/0x30 [amdgpu]
[   50.497751]  amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu]
[   50.497802]  amdgpu_device_ip_suspend+0x41/0x80 [amdgpu]
[   50.497854]  amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu]
[   50.497905]  amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu]
[   50.498005]  amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu]
[   50.498060]  process_one_work+0x21f/0x400
[   50.498063]  worker_thread+0x200/0x3f0
[   50.498064]  ? process_one_work+0x400/0x400
[   50.498065]  kthread+0xee/0x120
[   50.498067]  ? kthread_complete_and_exit+0x20/0x20
[   50.498068]  ret_from_fork+0x22/0x30

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:29 -04:00
Horatio Zhang
674f90f83b drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v2_6
Add ras_poison_irq and functions.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:27 -04:00
Horatio Zhang
18dad20c3d drm/amdgpu: separate ras irq from jpeg instance irq for UVD_POISON
Separate jpegbRAS poison consumption handling from the instance irq, and
register dedicated ras_poison_irq src and funcs for UVD_POISON.

v2:
- Separate ras irq from jpeg instance irq
- Improve the subject and code comments

v3:
- Split the patch into three parts
- Improve the code comments

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:23 -04:00
Horatio Zhang
66a11ecbde drm/amdgpu: add RAS POISON interrupt funcs for vcn_v4_0
Add ras_poison_irq and functions. And fix the amdgpu_irq_put
call trace in vcn_v4_0_hw_fini.

[   44.563572] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu]
[   44.563629] RSP: 0018:ffffb36740edfc90 EFLAGS: 00010246
[   44.563630] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[   44.563630] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   44.563631] RBP: ffffb36740edfcb0 R08: 0000000000000000 R09: 0000000000000000
[   44.563631] R10: 0000000000000000 R11: 0000000000000000 R12: ffff954c568e2ea8
[   44.563631] R13: 0000000000000000 R14: ffff954c568c0000 R15: ffff954c568e2ea8
[   44.563632] FS:  0000000000000000(0000) GS:ffff954f584c0000(0000) knlGS:0000000000000000
[   44.563632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   44.563633] CR2: 00007f028741ba70 CR3: 000000026ca10000 CR4: 0000000000750ee0
[   44.563633] PKRU: 55555554
[   44.563633] Call Trace:
[   44.563634]  <TASK>
[   44.563634]  vcn_v4_0_hw_fini+0x62/0x160 [amdgpu]
[   44.563700]  vcn_v4_0_suspend+0x13/0x30 [amdgpu]
[   44.563755]  amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu]
[   44.563806]  amdgpu_device_ip_suspend+0x41/0x80 [amdgpu]
[   44.563858]  amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu]
[   44.563909]  amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu]
[   44.564006]  amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu]
[   44.564061]  process_one_work+0x21f/0x400
[   44.564062]  worker_thread+0x200/0x3f0
[   44.564063]  ? process_one_work+0x400/0x400
[   44.564064]  kthread+0xee/0x120
[   44.564065]  ? kthread_complete_and_exit+0x20/0x20
[   44.564066]  ret_from_fork+0x22/0x30

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:16 -04:00
Horatio Zhang
46d75d2300 drm/amdgpu: add RAS POISON interrupt funcs for vcn_v2_6
Add ras_poison_irq and functions.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:13 -04:00
Horatio Zhang
2ecf927b17 drm/amdgpu: separate ras irq from vcn instance irq for UVD_POISON
Separate vcn RAS poison consumption handling from the instance irq, and
register dedicated ras_poison_irq src and funcs for UVD_POISON.

v2:
- Separate ras irq from vcn instance irq
- Improve the subject and code comments

v3:
- Split the patch into three parts
- Improve the code comments

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:11 -04:00
YuanShang
c77b3608b8 drm/amdgpu: Remove IMU ucode in vf2pf
The IMU firmware is loaded on the host side, not the guest.
Remove IMU in vf2pf ucode id enum.

Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-By: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:07 -04:00
Shiwu Zhang
e825fb641b drm/amdgpu: fix the memory override in kiq ring struct
This is introduced by the code merge and will let the
adev->gfx.kiq[0].ring struct being overrided

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:04 -04:00
Shiwu Zhang
c796d7e039 drm/amdgpu: add the smu_v13_0_6 and gfx_v9_4_3 ip block
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:41:02 -04:00
Jesse Zhang
b3122c9269 drm/amdgpu: don't enable secure display on incompatible platforms
[why]
[drm] psp gfx command LOAD_TA(0x1) failed and response status is (0x7)
[drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4)
amdgpu 0000:04:00.0: amdgpu: Secure display: Generic Failure.

[how]
don't enable secure display on incompatible platforms

Suggested-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Jesse zhang <jesse.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:40:57 -04:00
Sukrut Bellary
1385d88c6a drm:amd:amdgpu: Fix missing buffer object unlock in failure path
smatch warning -
1) drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:3615 gfx_v9_0_kiq_resume()
warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.

2) drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:6901 gfx_v10_0_kiq_resume()
warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.

Signed-off-by: Sukrut Bellary <sukrut.bellary@linux.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:40:52 -04:00
Bas Nieuwenhuizen
a2b308044d drm/amdgpu: Validate VM ioctl flags.
None have been defined yet, so reject anybody setting any. Mesa sets
it to 0 anyway.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-06-09 10:40:25 -04:00
Su Hui
109b4d8cfe drm/amdgpu: remove unnecessary (void*) conversions
No need cast (void*) to (struct amdgpu_device *).

Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:40:12 -04:00
Tong Liu01
04e8595819 drm/amdgpu: fix incorrect pcie_gen_mask in passthrough case
[why]
Passthrough case is treated as root bus and pcie_gen_mask is set as
default value that does not support GEN 3 and GEN 4 for PCIe link
speed. So PCIe link speed will be downgraded at smu hw init in
passthrough condition

[how]
Move get pci info after detect virtualization and check if it is
passthrough case when set pcie_gen_mask

Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:40:00 -04:00
Jack Xiao
e602157ec0 drm/amdgpu: fix S3 issue if MQD in VRAM
1. Need flush HDP for MQD putting in vram
2. Zero out mes MQD

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:38:33 -04:00
Srinivasan Shanmugam
e03f04b849 drm/amdgpu: Fix warnings in amdgpu _sdma, _ucode.c
Fix below checkpatch warnings:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Comparisons should place the constant on the right side of the test
WARNING: Missing a blank line after declarations

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:38:29 -04:00
Srinivasan Shanmugam
f10984a353 drm/amd/amdgpu: Fix errors & warnings in amdgpu _uvd, _vce.c
Fix below checkpatch errors & warnings:

In amdgpu_uvd.c:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Prefer 'unsigned int *' to bare use of 'unsigned *'
WARNING: Missing a blank line after declarations
WARNING: %Lx is non-standard C, use %llx
ERROR: space required before the open parenthesis '('
ERROR: space required before the open brace '{'
WARNING: %LX is non-standard C, use %llX
WARNING: Block comments use * on subsequent lines
+/* multiple fence commands without any stream commands in between can
+   crash the vcpu so just try to emmit a dummy create/destroy msg to
WARNING: Block comments use a trailing */ on a separate line
+   avoid this */
WARNING: braces {} are not necessary for single statement blocks
+               for (j = 0; j < adev->uvd.num_enc_rings; ++j) {
+                       fences += amdgpu_fence_count_emitted(&adev->uvd.inst[i].ring_enc[j]);
+               }

In amdgpu_vce.c:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Missing a blank line after declarations
WARNING: %Lx is non-standard C, use %llx
WARNING: Possible repeated word: 'we'
ERROR: space required before the open parenthesis '('

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:38:22 -04:00
YiPeng Chai
6c47a79b3b drm/amdgpu: perform mode2 reset for sdma fed error on gfx v11_0_3
perform mode2 reset for sdma fed error on gfx v11_0_3.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:38:19 -04:00
Srinivasan Shanmugam
5d0622705f drm/amd/amdgpu: Fix errors & warnings in amdgpu_vcn.c
Fix below checkpatch insisted error & warnings:

ERROR: space required before the open brace '{'
WARNING: braces {} are not necessary for any arm of this statement
+       if ((type == VCN_ENCODE_RING) && (vcn_config & VCN_BLOCK_ENCODE_DISABLE_MASK)) {
[...]
+       } else if ((type == VCN_DECODE_RING) && (vcn_config & VCN_BLOCK_DECODE_DISABLE_MASK)) {
[...]
+       } else if ((type == VCN_UNIFIED_RING) && (vcn_config & VCN_BLOCK_QUEUE_DISABLE_MASK)) {
[...]
ERROR: code indent should use tabs where possible
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: braces {} are not necessary for single statement blocks
+               for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
+                       fence[j] += amdgpu_fence_count_emitted(&adev->vcn.inst[j].ring_enc[i]);
+
ERROR: space required before the open parenthesis '('
WARNING: Missing a blank line after declarations
WARNING: please, no spaces at the start of a line
WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:38:15 -04:00
Srinivasan Shanmugam
29f187f71e drm/amd/amdgpu: Fix warnings in amdgpu_encoders.c
Fix below checkpatch warnings:

WARNING: Missing a blank line after declarations
+                       struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(connector);
+                       amdgpu_encoder->active_device = amdgpu_encoder->devices & amdgpu_connector->devices;

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:38:09 -04:00
Srinivasan Shanmugam
01c3f46474 drm/amd/amdgpu: Fix errors & warnings in amdgpu_ttm.c
Fix below checkpatch insisted error & warnings:

ERROR: Macros with complex values should be enclosed in parentheses
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: braces {} are not necessary for single statement blocks
WARNING: Block comments use a trailing */ on a separate line
WARNING: Missing a blank line after declarations

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:38:03 -04:00
Alex Deucher
0ce50b2efe drm/amdgpu/vcn4: fix endian conversion
sq.is_enabled is a byte so there is no need to endian swap it.

Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:59 -04:00
Alex Deucher
45b3a914d4 drm/amdgpu/gmc9: fix 64 bit division in partition code
Rework logic or use do_div() to avoid problems on 32 bit.

v2: add a missing case for XCP macro
v3: fix out of bounds array access
v4: fix xcp handling harder

Acked-by: Guchun Chen <guchun.chen@amd.com> (v1)
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> (v3)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:52 -04:00
Tao Zhou
92ecb92ccc drm/amdgpu: initialize RAS for gfx_v9_4_3
Register GFX RAS functions and initialize GFX RAS.

v2: remove xcp operations.
v3: reuse the return value of gfx_ras_sw_init.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:50 -04:00
Tao Zhou
0386d52d15 drm/amdgpu: add sq timeout status functions for gfx_v9_4_3
Query and reset sq timeout status.

v2: change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:47 -04:00
Tao Zhou
30feef0676 drm/amdgpu: add RAS error count reset for gfx_v9_4_3
Add GFX RAS error count reset function.

v2: remove xcp operation.
    only select_se_sh when instance number is more than 1.
v3: add check for se_num before select_se_sh.
    change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:44 -04:00
Tao Zhou
bfa84da618 drm/amdgpu: add RAS error count query for gfx_v9_4_3
Query GFX RAS ce/ue count.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:42 -04:00
Tao Zhou
5c1c09a716 drm/amdgpu: add RAS error count definitions for gfx_v9_4_3
Prepare for the query of GFX RAS ce/ue count.

v2: remove xcp operation.
    only select_se_sh when instance number is more than 1.
v3: add more CE/UE registsers to query list.
    add check for se_num before select_se_sh.
    change instance from 0 to xcc_id for register access.
v4: move gfx memory id definitions to gfx_v9_4_3.
v5: create a dedicated patch for adding error count query function.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:39 -04:00
Tao Zhou
77462ab8c6 drm/amdgpu: add RAS definitions for GFX
Add common GFX RAS definitions.

v2: remove instance from amdgpu_gfx_ras_reg_entry,
    amdgpu_ras_err_status_reg_entry has already defined it.
v3: remove memory id definitions from amdgpu_gfx.h, they are
    related to IP version.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:37 -04:00
Tao Zhou
47e7f527c8 drm/amdgpu: add RAS status reset for gfx_v9_4_3
Reset GFX RAS status registers.

v2: fix typo in title.
    remove xcp operation.
v3: change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:32 -04:00
Tao Zhou
bf16235b39 drm/amdgpu: add RAS status query for gfx_v9_4_3
Query GFX RAS status.

v2: remove xcp operation.
v3: change instance from 0 to xcc_id for register access.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:29 -04:00
Tao Zhou
d78c71321e drm/amdgpu: add GFX RAS common function
The common function can help reduce redundant code.

v2: remove xcp operation, only need to do RAS operations for all
instances.
v3: remove check for GFX RAS support, will be checked in higher level.
    add amdgpu prefix for the function name.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:26 -04:00
Hawking Zhang
9a3ce1a7a9 drm/amdgpu: Do not access members of xcp w/o check (v2)
Not all the asic needs xcp. ensure check xcp availabity
before accessing its member.

v2: add missing change in kfd_topology.c

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:24 -04:00
Tao Zhou
f464c5dd4d drm/amdgpu: add check for RAS instance mask
The mask is only needed to be set when RAS block instance number is
more than 1 and invalid bits should be also masked out.
We only check valid bits for GFX and SDMA block for now, and will
add check for other RAS blocks in the future.

v2: move the check under injection operation since the mask is only
    used by RAS error inject.
v3: add valid bits handling for SDMA.
v4: print message if the mask is adjusted.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:18 -04:00
Tao Zhou
6e3c51a581 drm/amdgpu: remove RAS GFX injection for gfx_v9_4/gfx_v9_4_2
No special requirement in RAS injection for the two versions, switch to
use default injection interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:15 -04:00
Tao Zhou
27c5f29526 drm/amdgpu: reorganize RAS injection flow
So GFX RAS injection could use default function if it doesn't define its
own injection interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:13 -04:00
Tao Zhou
2c22ed0bdb drm/amdgpu: add instance mask for RAS inject
User can specify injected instances by the mask. For backward
compatibility, the mask value is incorporated into sub block index
without interface change of RAS TA.
User uses logical mask and driver should convert it to physical value
before sending it to RAS TA.

v2: update parameter name.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:10 -04:00
Tao Zhou
af2ba36883 drm/amdgpu: convert logical instance mask to physical one
Convert instance mask for the convenience of RAS TA.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:08 -04:00
Mukul Joshi
40b832aac0 drm/amdgpu: Enable IH CAM on GFX9.4.3
This patch enables IH CAM on GFX9.4.3 ASIC.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:04 -04:00
Philip Yang
3cde91172d drm/amdgpu: Correct get_xcp_mem_id calculation
Current calculation only works for NPS4/QPX mode, correct it for
NPS4/CPX mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:37:02 -04:00
Philip Yang
84b4dd3f84 drm/amdkfd: Refactor migrate init to support partition switch
Rename smv_migrate_init to a better name kgd2kfd_init_zone_device
because it setup zone devive pgmap for page migration and keep it in
kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it
only once in amdgpu_device_ip_init after adev ip blocks are initialized,
but before amdgpu_amdkfd_device_init initialize kfd nodes which enable
SVM support based on pgmap.

svm_range_set_max_pages is called by kgd2kfd_device_init everytime after
switching compute partition mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:36:58 -04:00
Shiwu Zhang
44a9766555 drm/amdgpu: route ioctls on primary node of XCPs to primary device
During XCP init, unlike the primary device, there is no amdgpu_device
attached to each XCP's drm_device

In case that user trying to open/close the primary node of XCP drm_device
this rerouting is to solve the NULL pointer issue causing by referring
to any member of the amdgpu_device

 BUG: unable to handle page fault for address: 0000000000020c80
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 Call Trace:
  <TASK>
  lock_timer_base+0x6b/0x90
  try_to_del_timer_sync+0x2b/0x80
  del_timer_sync+0x29/0x40
  flush_delayed_work+0x1c/0x50
  amdgpu_driver_open_kms+0x2c/0x280 [amdgpu]
  drm_file_alloc+0x1b3/0x260 [drm]
  drm_open+0xaa/0x280 [drm]
  drm_stub_open+0xa2/0x120 [drm]
  chrdev_open+0xa6/0x1c0

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:36:53 -04:00
Mukul Joshi
1c77527a69 drm/amdkfd: Fix memory reporting on GFX 9.4.3
This patch fixes memory reporting on the GFX 9.4.3 APU and dGPU
by reporting available memory on a per partition basis. If its an
APU, available and used memory calculations take into account
system and TTM memory.

v2: squash in fix ("drm/amdkfd: Fix array out of bound warning")
    squash in fix ("drm/amdgpu: Update memory reporting for GFX9.4.3")

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 10:36:48 -04:00