linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Hawking Zhang	8e9f1575d1	drm/amdgpu: Add mmhub v4_1_0 ip block support (v4) Add initial support for MMHUB 4.1.0. v1: Add mmhub v4_1_0 ip block support. v2: Switch to AMDGPU_MMHUB0(0). v3: squash in fix for ip version check (Alex) v4: squash in vm_contexts_disable fix (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:25 -04:00
Philip Yang	7005b169da	drm/amdgpu: Evict BOs from same process for contiguous allocation When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system memory then retry the allocation, this skips the KFD BOs from the same process because KFD require all BOs are resident for user queues. If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow TTM evict KFD BOs from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:15 -04:00
Philip Yang	b2dba064c9	drm/amdgpu: Handle sg size limit for contiguous allocation Define macro AMDGPU_MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than AMDGPU_MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:08 -04:00
Yang Wang	7e0357bef4	drm/amdgpu: remove unused MCA driver codes - remove unused callback functions. - make part of mca functions static and refine the function order. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:51:44 -04:00
Likun Gao	d22c075676	drm/amdgpu/discovery: Add common soc24 ip block Add common soc24 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:51:06 -04:00
Jack Xiao	ff518e13eb	drm/amdgpu/mes11: adjust mes initialization sequence Adjust mes queue initialization before kgq/kcq initialization to enable mes mapping legacy queue. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:50:47 -04:00
Jack Xiao	486eb6b5a8	drm/amdgpu/mes11: add mes mapping legacy queue support Add mes11 map legacy queue packet submission. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:50:40 -04:00
Christian König	ffda708148	drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2 This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes were suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. v2: add missing NULL check Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `94aeb41173` ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-04-30 09:49:51 -04:00
Hawking Zhang	98b912c50e	drm/amdgpu: Add soc24 common ip block (v2) Add initial soc24 support. v1: Add soc24 common ip block. v2: Switch to new select_se_sh/enter_safe_mode interface. v3: squash in correct ext rev id, etc. (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:46:51 -04:00
Philip Yang	155ce502e9	drm/amdgpu: Support contiguous VRAM allocation RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:44:46 -04:00
Ma Jun	c0d6bd3cd2	drm/amdgpu: Fix uninitialized variable warning in amdgpu_afmt_acr Assign value to clock to fix the warning below: "Using uninitialized value res. Field res.clock is uninitialized" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:44:34 -04:00
Aurabindo Pillai	96557f785a	drm/amd: GFX12 changes for converting tiling flags to modifiers GFX12 swizzle mode and GCC formats changed and is much simpler. Use a seperate function for the same. Changes: * Swizzle mode is now 3 bits only * DCC enablement doesn't come from tiling_flags, it is always set in modifiers * DCC max compressed block size of 128B Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:58 -04:00
Jesse Zhang	ea686fef54	drm/amdgpu: fix the warning about the expression (int)size - len Converting size from size_t to int may overflow. v2: keep reverse xmas tree order (Christian) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Jack Xiao	029c2b0389	drm/amdgpu/mes: add mes mapping legacy queue support Add mes mapping legacy queue framework support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Tim Huang	9a5f15d2a2	drm/amdgpu: fix uninitialized scalar variable warning Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Sunil Khatri	2a8f7464d3	drm/amdgpu: skip ip dump if devcoredump flag is set Do not dump the ip registers during driver reload in passthrough environment. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:44 -04:00
Arunpravin Paneer Selvam	e362b7c8f8	drm/amdgpu: Modify the contiguous flags behaviour Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's placement function. This patch will change the default behaviour of the two flags. When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS - This means contiguous is not mandatory. - we will try to allocate the contiguous buffer. Say if the allocation fails, we fallback to allocate the individual pages. When we setTTM_PL_FLAG_CONTIGUOUS - This means contiguous allocation is mandatory. - we are setting this in amdgpu_bo_pin_restricted() before bo validation and check this flag in the vram manager file. - if this is set, we should allocate the buffer pages contiguously. the allocation fails, we return -ENOSPC. v2: - keep the mem_flags and bo->flags check as is(Christian) - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the amdgpu_bo_pin_restricted function placement range iteration loop(Christian) - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block (Christian) - Keep the kernel BO allocation as is(Christain) - If BO pin vram allocation failed, we need to return -ENOSPC as RDMA cannot work with scattered VRAM pages(Philip) v3(Christian): - keep contiguous flag handling outside of pages_per_block calculation - remove the hacky implementation in contiguous flag error handling code v4(Christian): - use any variable and return value for non-contiguous fallback v5: rebase to amd-staging-drm-next branch Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:44 -04:00
Lancelot SIX	bd31e5026d	drm/amdkfd: Enable SQ watchpoint for gfx10 There are new control registers introduced in gfx10 used to configure hardware watchpoints triggered by SMEM instructions: SQ_WATCH{0,1,2,3}_{CNTL_ADDR_HI,ADDR_L}. Those registers work in a similar way as the TCP_WATCH* registers currently used for gfx9 and above. This patch adds support to program the SQ_WATCH registers for gfx10. The SQ_WATCH?_CNTL.MASK field has one bit more than TCP_WATCH?_CNTL.MASK, so SQ watchpoints can have a finer granularity than TCP_WATCH watchpoints. In this patch, we keep the capabilities advertised to the debugger unchanged (HSA_DBG_WATCH_ADDR_MASK_*_BIT_GFX10) as this reflects what both TCP and SQ watchpoints can do and both watchpoints are configured together. Signed-off-by: Lancelot SIX <lancelot.six@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:44 -04:00
Srinivasan Shanmugam	acce6479e3	drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microcode() The function gfx_v9_4_3_init_microcode in gfx_v9_4_3.c was generating about potential truncation of output when using the snprintf function. The issue was due to the size of the buffer 'ucode_prefix' being too small to accommodate the maximum possible length of the string being written into it. The string being written is "amdgpu/%s_mec.bin" or "amdgpu/%s_rlc.bin", where %s is replaced by the value of 'chip_name'. The length of this string without the %s is 16 characters. The warning message indicated that 'chip_name' could be up to 29 characters long, resulting in a total of 45 characters, which exceeds the buffer size of 30 characters. To resolve this issue, the size of the 'ucode_prefix' buffer has been reduced from 30 to 15. This ensures that the maximum possible length of the string being written into the buffer will not exceed its size, thus preventing potential buffer overflow and truncation issues. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c: In function ‘gfx_v9_4_3_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 379 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~ ...... 439 \| r = gfx_v9_4_3_init_rlc_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 379 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 413 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~ ...... 443 \| r = gfx_v9_4_3_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 413 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `8630112969` ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0") Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
YiPeng Chai	6f3b69139c	drm/amdgpu: Fix ras mode2 reset failure in ras aca mode Fix ras mode2 reset failure in ras aca mode. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Bob Zhou	506c245f3f	drm/amdgpu: fix double free err_addr pointer warnings In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages will be run many times so that double free err_addr in some special case. So set the err_addr to NULL to avoid the warnings. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Jesse Zhang	7bfd16d0ec	drm/amdgpu: initialize the last_jump_jiffies in atom_exec_context The parameter "last_jump_jiffies" should be initialized before being used in the function atom_op_jump. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Jesse Zhang	2d10c3dbde	drm/amdgpu: add check before free wb entry Check if ring is not a mes queue before freeing the wb entry, because we only allocate a wb entry when it's not a mes queue. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Bob Zhou	cd48b97ce7	drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be conducted to put wrong value. So return and check the i2c transfer result. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Bob Zhou	8b2faf1a4f	drm/amdgpu: add error handle to avoid out-of-bounds if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should be stop to avoid out-of-bounds read, so directly return -EINVAL. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Ma Jun	2e55bcf3d7	drm/amdgpu: Initialize timestamp for some legacy SOCs Initialize the interrupt timestamp for some legacy SOCs to fix the coverity issue "Uninitialized scalar variable" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
YiPeng Chai	48fa90718b	drm/amdgpu: Use new interface to reserve bad page Use new interface to reserve bad page. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
YiPeng Chai	bcc0934885	drm/amdgpu: Fix address translation defect retired_page is page frame and should be expanded to the full address when querying status. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	e023874081	drm/amdgpu: support ACA logging ecc errors support ACA logging ecc errors. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	370fbff4cc	drm/amdgpu: add poison consumption handler Add poison consumption handler. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	bfa579b38b	drm/amdgpu: prepare to handle pasid poison consumption Prepare to handle pasid poison consumption. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	314c38cde6	drm/amdgpu: retire bad pages for umc v12_0 Retire bad pages for umc v12_0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	e74313be5a	drm/amdgpu: add condition check for amdgpu_umc_fill_error_record Add condition check for amdgpu_umc_fill_error_record. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	2cf8e50ec3	drm/amdgpu: Add delay work to retire bad pages Add delay work to retire bad pages. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	f27defca68	drm/amdgpu: umc v12_0 logs ecc errors 1. umc v12_0 logs ecc errors. 2. Reserve newly detected ecc error pages. 3. Add tag for bad pages, so that they can be retired later. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	b2aa6b108d	drm/amdgpu: umc v12_0 converts error address Umc v12_0 converts error address. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	95b4063de4	drm/amdgpu: add interface to update umc v12_0 ecc status Add interface to update umc v12_0 ecc status. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	a734adfbcd	drm/amdgpu: add poison creation handler Add poison creation handler. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	f493dd64ee	drm/amdgpu: prepare for logging ecc errors Prepare for logging ecc errors. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	98b5bc878d	drm/amdgpu: add message fifo to handle RAS poison events Add message fifo to handle RAS poison events. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
Jesse Zhang	88a9a467c5	drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x03000001. V2: To really improve the handling we would actually need to have a separate value of 0xffffffff.(Christian) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
Alex Deucher	eef016ba89	drm/amdgpu/mes11: Use a separate fence per transaction We can't use a shared fence location because each transaction should be considered independently. Reviewed-by: Shaoyun.liu <shaoyunl@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
Alex Deucher	497d7cee24	drm/amdgpu: add a spinlock to wb allocation As we use wb slots more dynamically, we need to lock access to avoid racing on allocation or free. Reviewed-by: Shaoyun.liu <shaoyunl@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Sonny Jiang	754c366e41	drm/amdgpu: update fw_share for VCN5 kmd_fw_shared changed in VCN5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Mukul Joshi	8e1d190595	drm/amdgpu: Fix VRAM memory accounting Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Sathishkumar S	c551316e15	drm/amdgpu: update jpeg max decode resolution jpeg ip version v2.1 and higher supports 16kx16k resolution decode Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Sunil Khatri	af8644121e	drm/amdgpu: add ip dump for each ip in devcoredump Add ip dump for each ip of the asic in the devcoredump for all the ips where a callback is registered for register dump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Sunil Khatri	e043a35dc2	drm/amdgpu: dump ip state before reset for each ip Invoke the dump_ip_state function for each ip before the asic resets and save the register values for debugging via devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Sunil Khatri	c8732c80de	drm/amdgpu: add support for gfx v10 print Add support to print ip information to be used to print registers in devcoredump buffer. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Sunil Khatri	40356542c3	drm/amdgpu: add protype for print ip state Add the protoype for print ip state to be used to print the registers in devcoredump during a gpu reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Sunil Khatri	c395dbb68b	drm/amdgpu: add support of gfx10 register dump Adding gfx10 gc registers to be used for register dump via devcoredump during a gpu reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Sunil Khatri	e21d253bd7	drm/amdgpu: add prototype for ip dump Add the prototype to dump ip registers for all ips of different asics and set them to NULL for now. Based on the requirement add a function pointer for each of them. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
YiPeng Chai	af730e0820	drm/amdgpu: Add interface to reserve bad page Add interface to reserve bad page. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Ma Jun	60c448439f	drm/amdgpu: Fix uninitialized variable warnings return 0 to avoid returning an uninitialized variable r Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Jack Xiao	f88da7fbf6	drm/amdgpu/mes: fix use-after-free issue Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Alex Deucher	e0a9bbeea0	drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3 This avoids a potential conflict with firmwares with the newer HDP flush mechanism. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Rajneesh Bhardwaj	a16b951586	drm/amdgpu: Update CGCG settings for GFXIP 9.4.3 Tune coarse grain clock gating idle threshold and rlc idle timeout to achieve better kernel launch latency. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Frank Min	ea9238a81b	drm/amdgpu: replace tmz flag into buffer flag Replace tmz flag into buffer flag to make it easier to understand and extend Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Le Ma	92ed1e9cd5	drm/amdgpu: init microcode chip name from ip versions To adapt to different gc versions in gfx_v9_4_3.c file. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Prike Liang	26de73bc0a	drm/amdgpu: Fix the ring buffer size for queue VM flush Here are the corrections needed for the queue ring buffer size calculation for the following cases: - Remove the KIQ VM flush ring usage. - Add the invalidate TLBs packet for gfx10 and gfx11 queue. - There's no VM flush and PFP sync, so remove the gfx9 real ring and compute ring buffer usage. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Lang Yu	a522ec528c	drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend umsch test needs full GPU functionality(e.g., VM update, TLB flush, possibly buffer moving under memory pressure) which may be not ready under these states. Just skip it to avoid potential issues. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Felix Kuehling	e76691f45a	drm/amdgpu: Update BO eviction priorities Make SVM BOs more likely to get evicted than other BOs. These BOs opportunistically use available VRAM, but can fall back relatively seamlessly to system memory. It also avoids SVM migrations evicting other, more important BOs as they will evict other SVM allocations first. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:31 -04:00
Pierre-Eric Pelloux-Prayer	6e042cee74	drm/amdgpu/vcn: fix unitialized variable warnings Avoid returning an uninitialized value if we never enter the loop. This case should never be hit in practice, but returning 0 doesn't hurt. The same fix is applied to the 4 places using the same pattern. v2: - fixed typos in commit message (Alex) - use "return 0;" before the done label instead of initializing r to 0 Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Alex Deucher	3f0664110a	drm/amdgpu/mes11: print MES opcodes rather than numbers Makes it easier to review the logs when there are MES errors. v2: use dbg for emitted, add helpers for fetching strings v3: fix missing commas (Harish) v4: drop command prefixes (Felix) v5: squash in bounds fix (Jun) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed by Shaoyun.liu <Shaoyun.liu@amd.com> (v2) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Peyton Lee	2476c6bd95	drm/amdgpu/vpe: fix vpe dpm setup failed The vpe dpm settings should be done before firmware is loaded. Otherwise, the frequency cannot be successfully raised. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Lijo Lazar	7b19f1f346	drm/amdgpu: Assign correct bits for SDMA HDP flush HDP Flush request bit can be kept unique per AID, and doesn't need to be unique SOC-wide. Assign only bits 10-13 for SDMA v4.4.2. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Stanley.Yang	939c475181	drm/amdgpu: Support setting reset_method at runtime In order to support more test cases, support user change reset_method at runtime. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Dave Airlie	377b5b397d	amd-drm-next-6.10-2024-04-19: amdgpu: - DC resource allocation logic updates - DC IPS fixes - DC YUV fixes - DMCUB fixes - DML2 fixes - Devcoredump updates - USB-C DSC fix - Misc display code cleanups - PSR fixes - MES timeout fix - RAS updates - UAF fix in VA IOCTL - Fix visible VRAM handling during faults - Fix IP discovery handling during PCI rescans - Misc code cleanups - PSP 14 updates - More runtime PM code rework - SMU 14.0.2 support - GPUVM page fault redirection to secondary IH rings for IH 6.x - Suspend/resume fixes - SR-IOV fixes amdkfd: - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix radeon: - Silence UBSAN warnings related to flexible arrays -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZiLyawAKCRC93/aFa7yZ 2BzfAPoDVZjTunizh6SyCFmQamR3eelnxWeY1xaVzmKBHqLCOAEAo2EyThRGyPCH SjD+f+ZlflaXQZtZpiQrOr0rkLvh5Q4= =96hT -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.10-2024-04-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-19: amdgpu: - DC resource allocation logic updates - DC IPS fixes - DC YUV fixes - DMCUB fixes - DML2 fixes - Devcoredump updates - USB-C DSC fix - Misc display code cleanups - PSR fixes - MES timeout fix - RAS updates - UAF fix in VA IOCTL - Fix visible VRAM handling during faults - Fix IP discovery handling during PCI rescans - Misc code cleanups - PSP 14 updates - More runtime PM code rework - SMU 14.0.2 support - GPUVM page fault redirection to secondary IH rings for IH 6.x - Suspend/resume fixes - SR-IOV fixes amdkfd: - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix radeon: - Silence UBSAN warnings related to flexible arrays Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419224332.2938259-1-alexander.deucher@amd.com	2024-04-22 12:28:49 +10:00
Lang Yu	81bf14519a	drm/amdkfd: make sure VM is ready for updating operations When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: `50661eb1a2` ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:54:49 -04:00
Mukul Joshi	e53a1713de	drm/amdgpu: Fix leak when GPU memory allocation fails Free the sync object if the memory allocation fails for any reason. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:54:49 -04:00
Zhigang Luo	6a009ca1bf	drm/amdgpu: remove virt_init_data_exchange from poison consumption handler Host will initiate an FLR for all poison consumption. Guest should wait for FLR message to re-init data exchange. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:47:26 -04:00
Lijo Lazar	8954c3fbe7	drm/amdgpu: Change AID detection logic On GFX 9.4.3 SOCs, only 2 SDMA instances need to be available to be considered as a valid AID. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:47:19 -04:00
Sunil Khatri	93522c1948	drm/amdgpu: enable redirection of irq's for IH V6.1 Enable redirection of irq for pagefaults for specific clients to avoid overflow without dropping interrupts. So here we redirect the interrupts to another IH ring i.e ring1 where only these interrupts are processed. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:46:56 -04:00
Ahmad Rehman	ea137071ad	drm/amdgpu: Skip the coredump collection on reset during driver reload In passthrough environment, the driver triggers the mode-1 reset on reload. The reset causes the core dump collection which is delayed task and prevents driver from unloading until it is completed. Since we do not need to collect data on "reset on reload" case, we can skip core dump collection. v2: Use the same flag to avoid calling amdgpu_reset_reg_dumps as well. Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:46:45 -04:00
Sunil Khatri	ca0afa2f41	drm/amdgpu: enable redirection of irq's for IH V6.0 Enable redirection of irq for pagefaults for specific clients to avoid overflow without dropping interrupts. So here we redirect the interrupts to another IH ring i.e ring1 where only these interrupts are processed. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:46:37 -04:00
Sunil Khatri	5adcd78fa2	drm:amdgpu: enable IH ring1 for IH v6.1 We need IH ring1 for handling the pagefault interrupts which over flow in default ring for specific usecases. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:45:37 -04:00
Sunil Khatri	eefc85a277	drm:amdgpu: enable IH RB ring1 for IH v6.0 We need IH ring1 for handling the pagefault interrupts which are overflowing the default ring for specific usecases. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:45:22 -04:00
Dave Airlie	34633158b8	amd-drm-next-6.10-2024-04-13: amdgpu: - HDCP fixes - ODM fixes - RAS fixes - Devcoredump improvements - Misc code cleanups - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - SR-IOV fixes - Suspend and Resume fixes - DCN 3.5.x Updates - Z8 fixes - UMSCH fixes - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - VCN partitioning fix - DC DWB fixes - VSC SDP fixes - DCN 3.1.6 fix - GC 11.5 fixes - Remove invalid TTM resource start check - DCN 1.0 fixes amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace radeon: - Misc code cleanups -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZhr4EAAKCRC93/aFa7yZ 2B8jAP9z1JpOnjSQvc2mhHAooXRYO4Mj5HCQ25ZE8N4c8ZZhjAEAqefmEx5/UyLh lv2pWILL4o597qhq9nA7hJ6tTICLPAU= =HUwY -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.10-2024-04-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-13: amdgpu: - HDCP fixes - ODM fixes - RAS fixes - Devcoredump improvements - Misc code cleanups - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - SR-IOV fixes - Suspend and Resume fixes - DCN 3.5.x Updates - Z8 fixes - UMSCH fixes - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - VCN partitioning fix - DC DWB fixes - VSC SDP fixes - DCN 3.1.6 fix - GC 11.5 fixes - Remove invalid TTM resource start check - DCN 1.0 fixes amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace radeon: - Misc code cleanups From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240413213708.3427038-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-04-17 15:48:59 +10:00
Kenneth Feng	0c1195ca0d	drm/amd/swsmu: support smu block discovery for smu v14 Support for smu ip block add for SMU v14. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:16 -04:00
Hawking Zhang	577cbed318	drm/amdgpu: rename DBG_DRV to HAD_DRV for psp v14 Add a psp bl command enum for HAD_DRV. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:16 -04:00
Ma Jun	1347853271	drm/amdgpu: refactoring the runtime pm mode detection code refactor the code of runtime pm mode detection to support amdgpu_runtime_pm =2 and 1 two cases Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:16 -04:00
Hawking Zhang	6c6acc5f33	drm/amdgpu: Load ipkeymgr drv for psp v14 while DBG_DRV is renamed to HAD_DRV for psp v14, part of its APIs/functionality is moved to a new component named Ipkeymgr_Drv. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:16 -04:00
Thorsten Blum	12b8b4e685	drm/amdgpu: Add missing space to DRM_WARN() message s/,please/, please/ Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:15 -04:00
Ma Jun	959056982a	drm/amdgpu: Fix discovery initialization failure during pci rescan Waiting for system ready to fix the discovery initialization failure issue. This failure usually occurs when dGPU is removed and then rescanned via command line. It's caused by following two errors: [1] vram size is 0 [2] wrong binary signature Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:15 -04:00
Christian König	394ae0603a	drm/amdgpu: fix visible VRAM handling during faults When we removed the hacky start code check we actually didn't took into account that all VRAM pages needs to be CPU accessible. Clean up the code and unify the handling into a single helper which checks if the whole resource is CPU accessible. The only place where a partial check would make sense is during eviction, but that is neglitible. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `aed01a6804` ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-04-16 22:39:15 -04:00
xinhui pan	98856136c4	drm/amdgpu: validate the parameters of bo mapping operations more clearly Verify the parameters of amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place. Fixes: `dc54d3d174` ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2") Cc: stable@vger.kernel.org Reported-by: Vlad Stolyarov <hexed@google.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:15 -04:00
Yang Wang	f23558627f	drm/amdgpu: add new aca smu callback func parse_error_code() add new aca smu callback parse_error_code{} to avoid specific asic check in amdgpu_aca.c file Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:15 -04:00
Jonathan Kim	f7c161a4c2	drm/amdgpu: increase mes submission timeout MES internally has a timeout allowance of 2 seconds. Increase driver timeout to 3 seconds to be safe. Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:38:59 -04:00
Sunil Khatri	3c858cf65e	drm/amdgpu: add missing vbios version from devcoredump Add vbios version in the devcoredump along with formatting the information with proper alignment. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 21:25:28 -04:00
Alex Deucher	8b9130bae0	drm/amdgpu/gfx11: properly handle regGRBM_GFX_CNTL in soft reset Need to take the srbm_mutex and while we are here, use the helper function soc21_grbm_select(); Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 21:25:23 -04:00
Christian König	c8962679af	drm/amdgpu: remove invalid resource->start check v2 The majority of those where removed in the commit `aed01a6804` ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") But this one was missed because it's working on the resource and not the BO. Since we also no longer use a fake start address for visible BOs this will now trigger invalid mapping errors. v2: also remove the unused variable Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `aed01a6804` ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") CC: stable@vger.kernel.org Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-12 00:33:51 -04:00
Jack Xiao	a0e002cdac	drm/amdgpu/sdma6: set sdma hang watchdog Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers to improve SDMA reliability. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-12 00:33:43 -04:00
Luqmaan Irshad	6b0d78032f	drm/amd/amdgpu: Update PF2VF Header Adding a new field for GPU Capacity to align the header with the host. Signed-off-by: Luqmaan Irshad <Luqmaan.Irshad@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-12 00:33:11 -04:00
Yifan Zhang	526b184e88	drm/amdgpu: differentiate external rev id for gfx 11.5.0 This patch to differentiate external rev id for gfx 11.5.0. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:20:01 -04:00
Tim Huang	d6d6561f93	drm/amdgpu: fix incorrect number of active RBs for gfx11 The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:14:55 -04:00
Zhigang Luo	d1999b4017	amd/amdgpu: improve VF recover time 1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:14:30 -04:00
Lijo Lazar	b41f742d6f	drm/amdgpu: Set fatal errror detected flag earlier In case of fatal errors, set FED status when interrupt is received. Set the flag on other devices in the hive before RAS recovery work. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:13:36 -04:00
ZhenGuo Yin	05e4014168	drm/amdgpu: clear set_q_mode_offs when VM changed [Why] set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE packet to init shadow memory will be skiped, hence there has a page fault. [How] VM flush is needed after GPU reset, clear set_q_mode_offs when emitting VM flush. Fixes: `8bc75586ea` ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:09:21 -04:00
Tao Zhou	4b0cb230bd	drm/amdgpu: retire UMC v12 mca_addr_to_pa RAS TA will handle it, the function is useless. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:09:15 -04:00
chongli2	f6ac084236	drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriov support MES command SET_HW_RESOURCE1 in sriov Signed-off-by: chongli2 <chongli2@amd.com> Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com> Acked-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:53 -04:00
Tao Zhou	9ecef5b2d0	drm/amdgpu: update check condition for XGMI ACA UE Check more possible ext error codes. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:47 -04:00
Lijo Lazar	91bc860116	drm/amdgpu: Fix VCN allocation in CPX partition VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In certain configs, VCN instance can be exclusively allocated to a partition even under CPX mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:34 -04:00
Ma Jun	fcc0735b00	drm/amdgpu: Add support for BAMACO mode checking Optimize the code to add support for BAMACO mode checking Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:18 -04:00
Hawking Zhang	327eec5427	drm/amdgpu: Bypass asd if display hw is not available ASD is not needed by headless GPU. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:05 -04:00
Ma Jun	b2207dc698	drm/amdgpu/pm: Add support for MACO flag checking Add support for MACO flag checking. MACO mode only works if BACO is supported. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:07:59 -04:00
Kenneth Feng	7c1d9e10e6	drm/amd/pm: fix the high voltage issue after unload fix the high voltage issue after unload on smu 13.0.10 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:07:30 -04:00
Danijel Slivka	3e2dacca54	drm/amdgpu: use vm_update_mode=0 as default in sriov for gfx10.3 onwards Apply this rule to all newer asics in sriov case. For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime. Moved the check to amdgpu_device_init() to ensure it is done after amdgpu_device_ip_early_init() where the IP versions are discovered. Signed-off-by: Danijel Slivka <danijel.slivka@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:02:37 -04:00
Arunpravin Paneer Selvam	b7a1a0ef12	drm/amd/amdgpu: add pipe1 hardware support Enable pipe1 support starting from SIENNA CICHLID asic Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:02:24 -04:00
ZhenGuo Yin	0453e5f220	drm/amdgpu: select HDP ref/mask according to gfx ring pipe Use correct ref/mask for differnent gfx ring pipe. This should fix the gfx hang issue after enabling gfx pipe1. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:59:44 -04:00
Tao Zhou	029faefb73	drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2 SDMA_CNTL is not set in some cases, driver configures it by itself. v2: simplify code Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:59:20 -04:00
Yifan Zhang	f5a3507c4a	drm/amdgpu: add smu 14.0.1 discovery support This patch to add smu 14.0.1 support Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:51:03 -04:00
shaoyunl	8966c31674	drm/amdgpu : Increase the mes log buffer size as per new MES FW version From MES version 0x54, the log entry increased and require the log buffer size to be increased. The 16k is maximum size agreed Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:50:48 -04:00
shaoyunl	e58acb7613	drm/amdgpu : Add mes_log_enable to control mes log feature The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:50:27 -04:00
Yang Wang	81d96e8b5a	drm/amdgpu: refine function signature of amdgpu_aca_get_error_data() refine function signature of amdgpu_aca_get_error_data(); Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:50:09 -04:00
Lijo Lazar	df3c7dc5c5	drm/amdgpu: Reset dGPU if suspend got aborted For SOC21 ASICs, there is an issue in re-enabling PM features if a suspend got aborted. In such cases, reset the device during resume phase. This is a workaround till a proper solution is finalized. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-09 21:49:42 -04:00
Sunil Khatri	6a0e1bafd7	drm/amdgpu: add IP's FW information to devcoredump Add FW information of all the IP's in the devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:49:34 -04:00
Lang Yu	108ab31be9	drm/amdgpu/umsch: reinitialize write pointer in hw init Otherwise the old one will be used during GPU reset. That's not expected. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:49:19 -04:00
Lijo Lazar	cd409dbc69	drm/amdgpu: Refine IB schedule error logging Downgrade to debug information when IBs are skipped. Also, use dev_* to identify the device. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:49:09 -04:00
Dave Airlie	fee54d08bc	Merge tag 'drm-misc-next-2024-03-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next Two misc-next in one. drm-misc-next for v6.10-rc1: The deal of a lifetime! You get ALL of the previous drm-misc-next-2024-03-21-1 tag!! But WAIT, there's MORE! Cross-subsystem Changes: - Assorted DT binding updates. Core Changes: - Clarify how optional wait_hpd_asserted is. - Shuffle Kconfig names around. Driver Changes: - Assorted build fixes for panthor, imagination, - Add AUO B120XAN01.0 panels. - Assorted small fixes to panthor, panfrost. drm-misc-next for v6.10: UAPI Changes: - Move some nouveau magic constants to uapi. Cross-subsystem Changes: - Move drm-misc to gitlab and freedesktop hosting. - Add entries for panfrost. Core Changes: - Improve placement for TTM bo's in idle/busy handling. - Improve drm/bridge init ordering. - Add CONFIG_DRM_WERROR, and use W=1 for drm. - Assorted documentation updates. - Make more (drm and driver) headers self-contained and add header guards. - Grab reservation lock in pin/unpin callbacks. - Fix reservation lock handling for vmap. - Add edp and edid panel matching, use it to fix a nearly identical panel. Driver Changes: - Add drm/panthor driver and assorted fixes. - Assorted small fixes to xlnx, panel-edp, tidss, ci, nouveau, panel and bridge drivers. - Add Samsung s6e3fa7, BOE NT116WHM-N44, CMN N116BCA-EA1, CrystalClear CMT430B19N00, Startek KD050HDFIA020-C020A, powertip PH128800T006-ZHC01 panels. - Fix console for omapdrm. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/bea310a6-6ff6-477e-9363-f9f053cfd12a@linux.intel.com	2024-04-05 13:16:17 +10:00
Maxime Ripard	f6d2dc03fa	drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on Most of our helpers have relied on being selected so far through Kconfig, but that creates issues when we have multiple layers of helpers with some depending on others. Indeed, select doesn't select a dependency's dependencies, and thus isn't super intuitive. Depends on however doesn't have that limitation, so we can just switch all the drivers that were selecting DRM_DISPLAY_HDMI_HELPER to depend on it. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-12-eafee11b84b3@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-03-28 11:26:52 +01:00
Maxime Ripard	3166e7e6d9	drm: Switch DRM_DISPLAY_HDCP_HELPER to depends on Most of our helpers have relied on being selected so far through Kconfig, but that creates issues when we have multiple layers of helpers with some depending on others. Indeed, select doesn't select a dependency's dependencies, and thus isn't super intuitive. Depends on however doesn't have that limitation, so we can just switch all the drivers that were selecting DRM_DISPLAY_HDCP_HELPER to depend on it. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-11-eafee11b84b3@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-03-28 11:26:51 +01:00
Maxime Ripard	0323287de8	drm: Switch DRM_DISPLAY_DP_HELPER to depends on Most of our helpers have relied on being selected so far through Kconfig, but that creates issues when we have multiple layers of helpers with some depending on others. Indeed, select doesn't select a dependency's dependencies, and thus isn't super intuitive. Depends on however doesn't have that limitation, so we can just switch all the drivers that were selecting DRM_DISPLAY_DP_HELPER to depend on it. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-10-eafee11b84b3@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-03-28 11:26:51 +01:00
Maxime Ripard	e075e496f5	drm: Switch DRM_DISPLAY_HELPER to depends on Most of our helpers have relied on being selected so far through Kconfig, but that creates issues when we have multiple layers of helpers with some depending on others. Indeed, select doesn't select a dependency's dependencies, and thus isn't super intuitive. Depends on however doesn't have that limitation, so we can just switch all the drivers that were selecting DRM_DISPLAY_HELPER to depend on it. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-8-eafee11b84b3@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-03-28 11:26:49 +01:00
Johannes Weiner	8678b1060a	drm/amdgpu: fix deadlock while reading mqd from debugfs An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] ====================================================== [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted [ 1318.017598][ T1082] ------------------------------------------------------ [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082] do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082] kernel_init+0x15/0x1a0 [ 1318.027242][ T1082] ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082] do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082] kernel_init+0x15/0x1a0 [ 1318.031168][ T1082] ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}: [ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082] __might_fault+0x58/0x80 [ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082] full_proxy_read+0x55/0x80 [ 1318.034487][ T1082] vfs_read+0xa7/0x360 [ 1318.034788][ T1082] ksys_read+0x70/0xf0 [ 1318.035085][ T1082] do_syscall_64+0x94/0x180 [ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082] CPU0 CPU1 [ 1318.038101][ T1082] ---- ---- [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(&mm->mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] * DEADLOCK * [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] <TASK> [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100 [ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d [ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 [ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d [ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008 [ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007 [ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00 [ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800 [ 1318.050638][ T1082] </TASK> amdgpu_debugfs_mqd_read() holds a reservation when it calls put_user(), which may fault and acquire the mmap_sem. This violates the established locking order. Bounce the mqd data through a kernel buffer to get put_user() out of the illegal section. Fixes: `445d85e3c1` ("drm/amdgpu: add debugfs interface for reading MQDs") Cc: stable@vger.kernel.org # v6.5+ Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 09:30:34 -04:00
Lang Yu	68a2afbcca	drm/amdgpu: enable UMSCH 4.0.6 Share same codes with 4.0.5 and enable collaborate mode for VPE. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 09:30:05 -04:00
Lang Yu	6b154c00cd	drm/amdgpu/umsch: update UMSCH 4.0 FW interface Align with FW changes. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 09:29:42 -04:00
Mario Limonciello	ca299b4512	drm/amd: Flush GFXOFF requests in prepare stage If the system hasn't entered GFXOFF when suspend starts it can cause hangs accessing GC and RLC during the suspend stage. Cc: <stable@vger.kernel.org> # 6.1.y: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.1.y: `cb11ca3233` ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.1.y: `2ceec37b0e` ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.1.y: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.6.y: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.6.y: `cb11ca3233` ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.6.y: `2ceec37b0e` ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.6.y: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.1+ Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132 Fixes: `ab4750332d` ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 08:55:54 -04:00
Peyton Lee	eed14eb48e	drm/amdgpu/vpe: power on vpe when hw_init To fix mode2 reset failure. Should power on VPE when hw_init. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 08:50:20 -04:00
Alex Deucher	d7f1487643	drm/amdgpu: always force full reset for SOC21 There are cases where soft reset seems to succeed, but does not, so always use mode1/2 for now. Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 01:45:22 -04:00
Johannes Weiner	c25d09bcb7	drm/amdgpu: fix deadlock while reading mqd from debugfs An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] ====================================================== [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted [ 1318.017598][ T1082] ------------------------------------------------------ [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082] do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082] kernel_init+0x15/0x1a0 [ 1318.027242][ T1082] ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082] do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082] kernel_init+0x15/0x1a0 [ 1318.031168][ T1082] ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}: [ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082] __might_fault+0x58/0x80 [ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082] full_proxy_read+0x55/0x80 [ 1318.034487][ T1082] vfs_read+0xa7/0x360 [ 1318.034788][ T1082] ksys_read+0x70/0xf0 [ 1318.035085][ T1082] do_syscall_64+0x94/0x180 [ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082] CPU0 CPU1 [ 1318.038101][ T1082] ---- ---- [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(&mm->mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] * DEADLOCK * [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] <TASK> [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100 [ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d [ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 [ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d [ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008 [ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007 [ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00 [ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800 [ 1318.050638][ T1082] </TASK> amdgpu_debugfs_mqd_read() holds a reservation when it calls put_user(), which may fault and acquire the mmap_sem. This violates the established locking order. Bounce the mqd data through a kernel buffer to get put_user() out of the illegal section. Fixes: `445d85e3c1` ("drm/amdgpu: add debugfs interface for reading MQDs") Cc: stable@vger.kernel.org # v6.5+ Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 01:44:24 -04:00
Lang Yu	b9a8aee136	drm/amdgpu: enable UMSCH 4.0.6 Share same codes with 4.0.5 and enable collaborate mode for VPE. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 01:44:17 -04:00
Lang Yu	f3e698978c	drm/amdgpu/umsch: update UMSCH 4.0 FW interface Align with FW changes. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 01:44:07 -04:00
Mario Limonciello	0355b24bde	drm/amd: Flush GFXOFF requests in prepare stage If the system hasn't entered GFXOFF when suspend starts it can cause hangs accessing GC and RLC during the suspend stage. Cc: <stable@vger.kernel.org> # 6.1.y: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.1.y: `cb11ca3233` ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.1.y: `2ceec37b0e` ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.1.y: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.6.y: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.6.y: `cb11ca3233` ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.6.y: `2ceec37b0e` ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.6.y: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.1+ Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132 Fixes: `ab4750332d` ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:57:04 -04:00
Srinivasan Shanmugam	eb4f6eca26	drm/amdgpu: Fix truncations in gfx_v11_0_init_microcode() Reducing the size of ucode_prefix to 25 in the gfx_v11_0_init_microcode function. This would ensure that the total number of characters being written into fw_name does not exceed its size of 40. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c: In function ‘gfx_v11_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:523:54: warning: ‘_pfp.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 523 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:523:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 523 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:540:54: warning: ‘_me.bin’ directive output may be truncated writing 7 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 540 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix); \| ^~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:540:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 40 540 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:557:70: warning: ‘_rlc.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 557 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:557:25: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 557 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:569:54: warning: ‘_mec.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 569 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:569:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 569 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CC [M] drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_clockpowergating.o Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:56:57 -04:00
Tao Zhou	8e4617c25d	drm/amdgpu: simplify convert_error_address interface for UMC v12 Replace separate parameters with struct ta_ras_query_address_input. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:56:18 -04:00
Srinivasan Shanmugam	539ff12ee5	drm/amdgpu: Fix truncation issues in gfx_v9_0.c The size of fw_name is increased to ensure that it can accommodate the maximum possible size of the string being written into it. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c: In function ‘gfx_v9_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1255:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1255 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", chip_name); \| ^~ ...... 1393 \| r = gfx_v9_0_init_cp_gfx_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1255:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 1255 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1261:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1261 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", chip_name); \| ^~ ...... 1393 \| r = gfx_v9_0_init_cp_gfx_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1261:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 30 1261 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1267:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1267 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce.bin", chip_name); \| ^~ ...... 1393 \| r = gfx_v9_0_init_cp_gfx_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1267:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 30 1267 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1303:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1303 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc_am4.bin", chip_name); \| ^~ ...... 1398 \| r = gfx_v9_0_init_rlc_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1303:17: note: ‘snprintf’ output between 20 and 49 bytes into a destination of size 30 1303 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc_am4.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1309:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1309 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_kicker_rlc.bin", chip_name); \| ^~ ...... 1398 \| r = gfx_v9_0_init_rlc_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1309:17: note: ‘snprintf’ output between 23 and 52 bytes into a destination of size 30 1309 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_kicker_rlc.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1311:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1311 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~ ...... 1398 \| r = gfx_v9_0_init_rlc_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1311:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 1311 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1344:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1344 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec.bin", chip_name); \| ^~ ...... 1402 \| r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1344:17: note: ‘snprintf’ output between 20 and 49 bytes into a destination of size 30 1344 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1346:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1346 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~ ...... 1402 \| r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1346:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 1346 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1356:68: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1356 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec2.bin", chip_name); \| ^~ ...... 1402 \| r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1356:25: note: ‘snprintf’ output between 21 and 50 bytes into a destination of size 30 1356 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec2.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1358:68: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1358 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2.bin", chip_name); \| ^~ ...... 1402 \| r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1358:25: note: ‘snprintf’ output between 17 and 46 bytes into a destination of size 30 1358 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:56:10 -04:00
Srinivasan Shanmugam	927a8a800e	drm/amdgpu: Fix truncation in gfx_v10_0_init_microcode The total size of the fw_name buffer is 8 (for "amdgpu/") + 30 (for ucode_prefix) + 5 (for "_pfp") + 5 (for "_wks") + 5 (for ".bin") = 53 characters. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c: In function ‘gfx_v10_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3982:58: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 3982 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp%s.bin", ucode_prefix, wks); \| ^~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3982:9: note: ‘snprintf’ output between 16 and 49 bytes into a destination of size 40 3982 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3988:57: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 1 and 30 [-Wformat-truncation=] 3988 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me%s.bin", ucode_prefix, wks); \| ^~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3988:9: note: ‘snprintf’ output between 15 and 48 bytes into a destination of size 40 3988 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3994:57: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 1 and 30 [-Wformat-truncation=] 3994 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce%s.bin", ucode_prefix, wks); \| ^~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3994:9: note: ‘snprintf’ output between 15 and 48 bytes into a destination of size 40 3994 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4001:62: warning: ‘_rlc.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 4001 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4001:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 4001 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4017:58: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 4017 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec%s.bin", ucode_prefix, wks); \| ^~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4017:9: note: ‘snprintf’ output between 16 and 49 bytes into a destination of size 40 4017 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4024:54: warning: ‘_mec2’ directive output may be truncated writing 5 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 4024 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2%s.bin", ucode_prefix, wks); \| ^~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4024:9: note: ‘snprintf’ output between 17 and 50 bytes into a destination of size 40 4024 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:55:54 -04:00
Srinivasan Shanmugam	20fd14460f	drm/amdgpu: Fix 'fw_name' buffer size to prevent truncations in amdgpu_mes_init_microcode The snprintf function is used to write a formatted string into fw_name. The format of the string is "amdgpu/%s_mes%s.bin", where %s is replaced by the string in ucode_prefix and the second %s is replaced by either "_2" or "1" depending on the condition pipe == AMDGPU_MES_SCHED_PIPE. The length of the string "amdgpu/%s_mes%s.bin" is 16 characters plus the length of ucode_prefix and the length of the string "_2" or "1". The size of ucode_prefix is 30, so the maximum length of ucode_prefix is 29 characters (since one character is needed for the null terminator). Therefore, the maximum possible length of the string written into fw_name is 16 + 29 + 2 = 47 characters. The size of fw_name is 40, so if the length of the string written into fw_name is more than 39 characters (since one character is needed for the null terminator), it will be truncated by the snprintf function, and thus warnings will be seen. By increasing the size of fw_name to 50, we ensure that fw_name is large enough to hold the maximum possible length of the string, so the snprintf function will not truncate the output. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c: In function ‘amdgpu_mes_init_microcode’: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1482:66: warning: ‘%s’ directive output may be truncated writing up to 1 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 1482 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1482:17: note: ‘snprintf’ output between 16 and 46 bytes into a destination of size 40 1482 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1483 \| ucode_prefix, \| ~~~~~~~~~~~~~ 1484 \| pipe == AMDGPU_MES_SCHED_PIPE ? "" : "1"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:66: warning: ‘%s’ directive output may be truncated writing 1 byte into a region of size between 0 and 29 [-Wformat-truncation=] 1477 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~ 1478 \| ucode_prefix, 1479 \| pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); \| ~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:17: note: ‘snprintf’ output between 17 and 46 bytes into a destination of size 40 1477 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1478 \| ucode_prefix, \| ~~~~~~~~~~~~~ 1479 \| pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:66: warning: ‘%s’ directive output may be truncated writing 2 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 1477 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~ 1478 \| ucode_prefix, 1479 \| pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); \| ~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:17: note: ‘snprintf’ output between 18 and 47 bytes into a destination of size 40 1477 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1478 \| ucode_prefix, \| ~~~~~~~~~~~~~ 1479 \| pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1489:62: warning: ‘_mes.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 1489 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes.bin", \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1489:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 1489 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes.bin", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1490 \| ucode_prefix); \| ~~~~~~~~~~~~~ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:55:44 -04:00
Srinivasan Shanmugam	7c2bc34ab9	drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init() Reducing the size of ucode_prefix to 25 in the amdgpu_vcn_early_init function. This would ensure that the total number of characters being written into fw_name does not exceed its size of 40. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c: In function ‘amdgpu_vcn_early_init’: drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:66: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=] 102 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:17: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 40 102 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:66: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=] 102 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:17: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 40 102 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:105:73: warning: ‘.bin’ directive output may be truncated writing 4 bytes into a region of size between 2 and 31 [-Wformat-truncation=] 105 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_%d.bin", ucode_prefix, i); \| ^~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:105:25: note: ‘snprintf’ output between 14 and 43 bytes into a destination of size 40 105 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_%d.bin", ucode_prefix, i); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:55:01 -04:00
Tao Zhou	8b3495eafb	drm/amdgpu: add socket id parameter for psp query address cmd And set the socket id. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:54:54 -04:00
Shashank Sharma	f88a7dd06a	drm/amdgpu: Add a NULL check for freeing root PT This patch adds a NULL check to fix this crash reported during the freeing of root PT entry: BUG: unable to handle page fault for address: ffffc9002d637aa0 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page RIP: 0010:amdgpu_vm_pt_free+0x66/0xe0 [amdgpu] PKRU: 55555554 Call Trace: <TASK> amdgpu_vm_pt_free_root+0x60/0xa0 [amdgpu] amdgpu_vm_fini+0x2cb/0x5d0 [amdgpu] ? amdgpu_ctx_mgr_entity_fini+0x53/0x1c0 [amdgpu] amdgpu_driver_postclose_kms+0x191/0x2d0 [amdgpu] drm_file_free.part.0+0x1e5/0x260 [drm] Cc: Christian König <Christian.Koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Acked-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:51:55 -04:00
Sunil Khatri	9022f01b97	drm/amdgpu: refactor code to split devcoredump code Refractor devcoredump code into new files since its functionality is expanded further and better to slit and devcoredump to have its own file. v2: Fix the build failure caught by arm compiler of implicit function declaration with #ifdef v3: squash in fix for implicit declaration error Cc: Ivan Lipski <ivan.lipski@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:51:48 -04:00
Peyton Lee	9ddafd1d14	drm/amdgpu/vpe: power on vpe when hw_init To fix mode2 reset failure. Should power on VPE when hw_init. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:50:20 -04:00
Yang Wang	31fd330b97	drm/amdgpu: add ras event id support for ACA add ras event id support for ACA. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:48:18 -04:00
Yang Wang	bd15bf742f	drm/amdgpu: avoid update aca bank multi times during ras isr Because the UE Valid MCA count will only be cleared after reset, in order to avoid repeated counting of the error count, the aca bank is only updated once during ras isr. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:48:11 -04:00
Yang Wang	f7bcfb7a56	drm/amdgpu: retrieve umc odecc error count for aca umc v12.0 retrieve umc odecc error count for aca umc v12.0 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:48:03 -04:00
Shashank Sharma	b6c4f90b38	drm/amdgpu: sync page table freeing with tlb flush The idea behind this patch is to delay the freeing of PT entry objects until the TLB flush is done. This patch: - Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the objects that need to be freed after tlb_flush. - Adds PT entries in this list in amdgpu_vm_ptes_update after finding the PT entry. - Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free) to simply freeing of the BOs, also renames it to amdgpu_vm_pt_free_list to reflect this same. - Exports function amdgpu_vm_pt_free_list to be called directly. - Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range. V2: rebase V4: Addressed review comments from Christian - add only locked PTEs entries in TLB flush waitlist. - do not create a separate function for list flush. - do not create a new lock for TLB flush. - there is no need to wait on tlb_flush_fence exclusively. V5: Addressed review comments from Christian - change the amdgpu_vm_pt_free_dfs's functionality to simple freeing of the objects and rename it. - add all the PTE objects in params->tlb_flush_waitlist - let amdgpu_vm_pt_free_root handle the freeing of BOs independently - call amdgpu_vm_pt_free_list directly V6: Rebase V7: Rebase V8: Added a NULL check to fix this backtrace issue: [ 415.351447] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 415.359245] #PF: supervisor write access in kernel mode [ 415.365081] #PF: error_code(0x0002) - not-present page [ 415.370817] PGD 101259067 P4D 101259067 PUD 10125a067 PMD 0 [ 415.377140] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 415.382004] CPU: 0 PID: 25481 Comm: test_with_MPI.e Tainted: G OE 5.18.2-mi300-build-140423-ubuntu-22.04+ #24 [ 415.394437] Hardware name: AMD Corporation Sh51p/Sh51p, BIOS RMO1001AS 02/21/2024 [ 415.402797] RIP: 0010:amdgpu_vm_ptes_update+0x6fd/0xa10 [amdgpu] [ 415.409648] Code: 4c 89 ff 4d 8d 66 30 e8 f1 ed ff ff 48 85 db 74 42 48 39 5d a0 74 40 48 8b 53 20 48 8b 4b 18 48 8d 43 18 48 8d 75 b0 4c 89 ff <48 > 89 51 08 48 89 0a 49 8b 56 30 48 89 42 08 48 89 53 18 4c 89 63 [ 415.430621] RSP: 0018:ffffc9000401f990 EFLAGS: 00010287 [ 415.436456] RAX: ffff888147bb82f0 RBX: ffff888147bb82d8 RCX: 0000000000000000 [ 415.444426] RDX: 0000000000000000 RSI: ffffc9000401fa30 RDI: ffff888161f80000 [ 415.452397] RBP: ffffc9000401fa80 R08: 0000000000000000 R09: ffffc9000401fa00 [ 415.460368] R10: 00000007f0cc0000 R11: 00000007f0c85000 R12: ffffc9000401fb20 [ 415.468340] R13: 00000007f0d00000 R14: ffffc9000401faf0 R15: ffff888161f80000 [ 415.476312] FS: 00007f132ff89840(0000) GS:ffff889f87c00000(0000) knlGS:0000000000000000 [ 415.485350] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 415.491767] CR2: 0000000000000008 CR3: 0000000161d46003 CR4: 0000000000770ef0 [ 415.499738] PKRU: 55555554 [ 415.502750] Call Trace: [ 415.505482] <TASK> [ 415.507825] amdgpu_vm_update_range+0x32a/0x880 [amdgpu] [ 415.513869] amdgpu_vm_clear_freed+0x117/0x250 [amdgpu] [ 415.519814] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x18c/0x250 [amdgpu] [ 415.527729] kfd_ioctl_unmap_memory_from_gpu+0xed/0x340 [amdgpu] [ 415.534551] kfd_ioctl+0x3b6/0x510 [amdgpu] V9: Addressed review comments from Christian - No NULL check reqd for root PT freeing - Free PT list regardless of needs_flush - Move adding BOs in list in a separate function V10: Added Christian's RB V11: squash in list fix Cc: Christian König <Christian.Koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:47:17 -04:00
Linus Torvalds	7ee0490121	drm fixes for 6.9-rc1 core: - fix rounding in drm_fixp2int_round() bridge: - fix documentation for DRM_BRIDGE_OP_EDID sun4i: - fix 64-bit division on 32-bit architectures tests: - fix dependency on DRM_KMS_HELPER probe-helper: - never return negative values from .get_modes() plus driver fixes xe: - invalidate userptr vma on page pin fault - fail early on sysfs file creation error - skip VMA pinning on xe_exec if no batches nouveau: - clear bo resource bus after eviction - documentation fixes - don't check devinit disable on GSP amdgpu: - Freesync fixes - UAF IOCTL fixes - Fix mmhub client ID mapping - IH 7.0 fix - DML2 fixes - VCN 4.0.6 fix - GART bind fix - GPU reset fix - SR-IOV fix - OD table handling fixes - Fix TA handling on boards without display hardware - DML1 fix - ABM fix - eDP panel fix - DPPCLK fix - HDCP fix - Revert incorrect error case handling in ioremap - VPE fix - HDMI fixes - SDMA 4.4.2 fix - Other misc fixes amdkfd: - Fix duplicate BO handling in process restore -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmX833kACgkQDHTzWXnE hr4ZDA/8CC/jW8drOInD52pqPxtFrhLPZ/pD+Vz4BWkcLnzRiM2d5gM3z3/JE7xi BGcxmhPgwT/M9oZENRxtSf2uUV/RZ0OMj+Mpwnew1YpANwWBe8pKeiwbHA4A4qzP Z46VLK0+BXEjh0btC4RY/Ji6yEuNCNAh0FBWTaLYoakN8M1JAJCUYFrXeWp8gYVm 4yZETXO64iGNYy4wz9tD5fohC3xo1t9WRcskBV97uNrntDQlagoEdAnh1VF2K3yC SAwF3O8J60xh7osNx5YE4ENXynvh7UaAc75kliSsWoZKoTb1TFlyu8abE7NQRRXc 9fjzwB6tQ6BJRFpmQGF6RHAhuoddqm9nPaOYOfQ/wXbVV1ajYvXmK+eGhHfR6/VO YYzhXksd9LGX34RWAcs9lJbV+EjG4buYnSThkvCYcPs2Ys+JppwlPYSd9sC8vqgP 6D7slAoa8rh99WWz4mZ7ZuOveiOUS3Yzie5Vms2Dlwl/kHW5E1WpeEw+fXAqq08M m83whU2cod/oUvJtcuYFLoNAlhYYngVOI9XGgdM+eL/dpdKjTpyH9JHHEGj/Nejr W7Kts9CLLBShNKR8Wo2fyTu1n9dwY/eFVA1P48Mt03345G/fMNtPxy+M1Rt6LHQ2 fmeBSU1P6mqoFeji4xCRXdJ4oDveNnHlyW9J9QJGXG44mN89PCc= =EW4i -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Fixes from the last week (or 3 weeks in amdgpu case), after amdgpu, it's xe and nouveau then a few scattered core fixes. core: - fix rounding in drm_fixp2int_round() bridge: - fix documentation for DRM_BRIDGE_OP_EDID sun4i: - fix 64-bit division on 32-bit architectures tests: - fix dependency on DRM_KMS_HELPER probe-helper: - never return negative values from .get_modes() plus driver fixes xe: - invalidate userptr vma on page pin fault - fail early on sysfs file creation error - skip VMA pinning on xe_exec if no batches nouveau: - clear bo resource bus after eviction - documentation fixes - don't check devinit disable on GSP amdgpu: - Freesync fixes - UAF IOCTL fixes - Fix mmhub client ID mapping - IH 7.0 fix - DML2 fixes - VCN 4.0.6 fix - GART bind fix - GPU reset fix - SR-IOV fix - OD table handling fixes - Fix TA handling on boards without display hardware - DML1 fix - ABM fix - eDP panel fix - DPPCLK fix - HDCP fix - Revert incorrect error case handling in ioremap - VPE fix - HDMI fixes - SDMA 4.4.2 fix - Other misc fixes amdkfd: - Fix duplicate BO handling in process restore" * tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel: (50 commits) drm/amdgpu/pm: Don't use OD table on Arcturus drm/amdgpu: drop setting buffer funcs in sdma442 drm/amd/display: Fix noise issue on HDMI AV mute drm/amd/display: Revert Remove pixle rate limit for subvp Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode" Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()" drm/amd/display: Add a dc_state NULL check in dc_state_release drm/amd/display: Return the correct HDCP error code drm/amd/display: Implement wait_for_odm_update_pending_complete drm/amd/display: Lock all enabled otg pipes even with no planes drm/amd/display: Amend coasting vtotal for replay low hz drm/amd/display: Fix idle check for shared firmware state drm/amd/display: Update odm when ODM combine is changed on an otg master pipe with no plane drm/amd/display: Init DPPCLK from SMU on dcn32 drm/amd/display: Add monitor patch for specific eDP drm/amd/display: Allow dirty rects to be sent to dmub when abm is active drm/amd/display: Override min required DCFCLK in dml1_validate drm/amdgpu: Bypass display ta if display hw is not available drm/amdgpu: correct the KGQ fallback message drm/amdgpu/pm: Check the validity of overdiver power limit ...	2024-03-21 19:04:31 -07:00
Hawking Zhang	a61e2ce9d4	drm/amdgpu: Enable smuio v14_0_2 callbacks Enable smuio v14_0_2_callbacks Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:16 -04:00
Hawking Zhang	d80e44a34e	drm/amdgpu: Add smuio callback to get gpu clk counter Add smuio callback to get gpu clk counter Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:16 -04:00

1 2 3 4 5 ...

14044 Commits