linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Evan Quan	b75efe88b2	drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation An intentional delay is added on soft ctf triggered. Then there will be a double check for the GPU temperature before taking further action. This can avoid unintended shutdown due to temperature momentary fluctuation. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-30 13:12:16 -04:00
Lijo Lazar	b579ea632f	drm/amdgpu: Modify for_each_inst macro Modify it such that it doesn't change the instance mask parameter. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-30 13:11:35 -04:00
Chong Li	80e709ee6e	drm/amdgpu: add option params to enforce process isolation between graphics and compute enforce process isolation between graphics and compute via using the same reserved vmid. v2: remove params "struct amdgpu_vm *vm" from amdgpu_vmid_alloc_reserved and amdgpu_vmid_free_reserved. Signed-off-by: Chong Li <chongli2@amd.com> Reviewed-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:49:48 -04:00
Jonathan Kim	01f648202c	drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls On GFX9.4.1, the implicit wait count instruction on s_barrier is disabled by default in the driver during normal operation for performance requirements. There is a hardware bug in GFX9.4.1 where if the implicit wait count instruction after an s_barrier instruction is disabled, any wave that hits an exception may step over the s_barrier when returning from the trap handler with the barrier logic having no ability to be aware of this, thereby causing other waves to wait at the barrier indefinitely resulting in a shader hang. This bug has been corrected for GFX9.4.2 and onward. Since the debugger subscribes to hardware exceptions, in order to avoid this bug, the debugger must enable implicit wait count on s_barrier for a debug session and disable it on detach. In order to change this setting in the in the device global SQ_CONFIG register, the GFX pipeline must be idle. GFX9.4.1 as a compute device will either dispatch work through the compute ring buffers used for image post processing or through the hardware scheduler by the KFD. Have the KGD suspend and drain the compute ring buffer, then suspend the hardware scheduler and block any future KFD process job requests before changing the implicit wait count setting. Once set, resume all work. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:35:15 -04:00
Likun Gao	b336c681bd	drm/amdgpu: fix vga_set_state NULL pointer issue Fix NULL pointer issue for vga_set_state function as not all the ASIC need this operation. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 10:41:40 -04:00
Tao Zhou	af2ba36883	drm/amdgpu: convert logical instance mask to physical one Convert instance mask for the convenience of RAS TA. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 10:37:08 -04:00
James Zhu	be3800f57c	drm/amdgpu: find partition ID when open device Find partition ID when open device from render device minor. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-and-tested-by: Philip Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:59:27 -04:00
James Zhu	2c1c7ba457	drm/amdgpu: support partition drm devices Support partition drm devices on GC_HWIP IP_VERSION(9, 4, 3). This is a temporary solution and will be superceded. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-and-tested-by: Philip Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:59:20 -04:00
David Francis	76eb9c95a4	drm/amdgpu/bu: add mtype_local as a module parameter Selects the MTYPE to be used for local memory, (0 = MTYPE_CC (default), 1 = MTYPE_NC, 2 = MTYPE_RW) v2: squash in build fix (Alex) Reviewed-by: Graham Sider <Graham.Sider@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:59:13 -04:00
Graham Sider	895797d919	drm/amdgpu/bu: Add use_mtype_cc_wa module param By default, set use_mtype_cc_wa to 1 to set PTE coherence flag MTYPE_CC instead of MTYPE_RW by default. This is required for the time being to mitigate a bug causing XCCs to hit stale data due to TCC marking fully dirty lines as exclusive. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:59:03 -04:00
Lijo Lazar	570de94b9c	drm/amdgpu: Add auto mode for compute partition When auto mode is specified, driver will choose the right compute partition mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:57:27 -04:00
Lijo Lazar	fa0497c34e	drm/amdgpu: Add API to get numa information of XCC Add interface to get numa information of ACPI XCC object. The interface uses logical id to identify an XCC. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:56:59 -04:00
Lijo Lazar	6e01882267	drm/amdgpu: Add API to get tmr info from acpi In certain configs, TMR information is available from ACPI. Add API to fetch the information. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:55:40 -04:00
Lijo Lazar	4d5275ab0b	drm/amdgpu: Add parsing of acpi xcc objects Add parsing of ACPI xcc objects and fill in relevant info from them by invoking the DSM methods. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:55:38 -04:00
Mukul Joshi	9e4216cf2d	drm/amdgpu: Increase Max GPU instance to 64 Increase Max GPU instances to 64 to handle multi-socket system with GFX 9.4.3 asic. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:54:09 -04:00
Lijo Lazar	7a1efad04c	drm/amdgpu: Use mask for active clusters Use a mask of available active clusters instead of using only the number of active clusters. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:50:08 -04:00
Lijo Lazar	75d1692393	drm/amdgpu: Add initial version of XCP routines Within a device, an accelerator core partition can be constituted with different IP instances. These partitions are spatial in nature. Number of partitions which can exist at the same time depends on the 'partition mode'. Add a manager entity which is responsible for switching between different partition modes and maintaining partitions. It is also responsible for suspend/resume of different partitions. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:49:31 -04:00
Lijo Lazar	dd1a02e280	drm/amdgpu: Add xcc specific functions for gfxhub GFXHUB 1.2 supports multiple XCC instances. Add XCC specific functions to handle XCC instances separately. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:49:14 -04:00
Le Ma	2fa480d36e	drm/amdgpu: add helpers to access registers on different AIDs SMN address which is larger than 32bit has different indications through bit[34:32] on different AIDs. v2: put smn addressing of different AIDs into asic specific place v3: change to ext_id/ext_offset naming Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:48:05 -04:00
Le Ma	7e0eebdc47	drm/amdgpu: extend max instances Number of instances is extended. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:47:46 -04:00
Lijo Lazar	5d30cbb4db	drm/amdgpu: Add map of logical to physical inst Add a map for logical to physical instances of an IP. For ex: on some device configurations, the first logical XCC may not be the first physical XCC. Software may continue to access in logical IP instance order. The map provides a convenient way to get to the actual physical instance. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:45:42 -04:00
Le Ma	0c552ed387	drm/amdgpu: add indirect r/w interface for smn address greater than 32bits On multiple AIDs platform, bit[34:32] in SMD address is leveraged to access nonAID0 register smn address and new PCI_INDEX_HI register is introduced to access the higher bits. v2: rebase on latest register accessors (Alex) Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:45:29 -04:00
Le Ma	386ea27c3b	drm/amdgpu: adjust some basic elements for multiple AID case add some elements below: - num_aid - aid_id for each sdma instance - num_inst_per_aid for sdma and extend macro size below: - SDMA_MAX_INSTANCES to 16 - AMDGPU_MAX_RINGS to 96 - AMDGPU_MAX_HWIP_RINGS to 32 v2: move aid_id from amdgpu_ring to amdgpu_sdma_instance. (Lijo) Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:43:35 -04:00
James Zhu	81283fee15	drm/amdgpu/: add more macro to support offset variant Add more macro to support offset variant and simplify macro SOC15_WAIT_ON_RREG. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:46 -04:00
Shiwu Zhang	0fa49d1083	drm/amdgpu: override partition mode through module parameter Add a module parameter to override the partition mode. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:32 -04:00
Le Ma	d9426c3d9b	drm/amdgpu: add bitmask to iterate vmhubs As the layout of VMHUB definition has been changed to cover multiple XCD/AID case, the original num_vmhubs is not appropriate to do vmhub iteration any more. Drop num_vmhubs and introduce vmhubs_mask instead. v2: switch to the new VMHUB layout v3: use DECLARE_BITMAP to define vmhubs_mask Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:17 -04:00
Mukul Joshi	f1f6f48a33	drm/amdgpu: Set GTT size equal to TTM mem limit Use the helper function in TTM to get TTM mem limit and set GTT size to be equal to TTL mem limit. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:24:46 -04:00
Le Ma	541372bb62	drm/amdgpu: add some basic elements for multiple XCD case Add some basic definitions and structure member. Inscrease MAX_WB slots to 1024 to support the increasing number of rings for multiple partitions. v2: unify naming style Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-04-14 13:47:49 -04:00
Lijo Lazar	418431bcc9	drm/amdgpu: Fix warnings Fix below warning due to incompatible types in conditional operator ../pm/swsmu/smu13/smu_v13_0_6_ppt.c:315:17: sparse: sparse: incompatible types in conditional expression (different base types): Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Link: https://lore.kernel.org/oe-kbuild-all/202303082135.NjdX1Bij-lkp@intel.com/ Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-04-11 18:03:44 -04:00
Srinivasan Shanmugam	11f25c844e	drm/amd/amdgpu: Drop the hang limit parameter The driver doesn't resubmit jobs on hangs any more, hence drop the hang limit parameter - amdgpu_job_hang_limit, wherever it is used. Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-04-11 18:03:43 -04:00
Daniel Vetter	82bbec189a	Linux 6.3-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmQgu8QeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG2fAH/0+pSM+u9ZiTCRcc 9cxpHqRu4J16oDJVroxD5c95C8t6pqq8emla7OcPjEbt+2jijEmTVda6i4IpAscc e8c3t4zonzIjxWQxukZ2iPb7rUAsDYjom9Y5NpHB5J4eJMEvrgiDo7MIBtvwECYp SejOpecquUcpFF1JTsDqqUUKgt56A/F/9jNyC/3p31doHbUfQWwTAiEA60gXwjiV DtFFwUlKivLgkTOgigL0MUCrKLpwKXfpBOAbbp5h+zkV/6yjBmkx/OViovsJfrwX kauOPOA/bSLXby4YyHkIB3nUY7mu7KDJfkpy2KMvZkRnJkYYW1Gt3urf714KVo5G dnjfrA8= =q5sO -----END PGP SIGNATURE----- Merge v6.3-rc4 into drm-next I just landed the fence deadline PR from Rob that a bunch of drivers want/need to apply driver-specific patches. Backmerge -rc4 so that they don't have to be stuck on -rc2 for no reason at all. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2023-03-29 16:00:23 +02:00
Kai-Heng Feng	2b072442f4	drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is caused by commit `0064b0ce85` ("drm/amd/pm: enable ASPM by default"). The root cause is still not clear for now. So extend and apply the ASPM quirk from commit `e02fe3bc7a` ("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to workaround the issue on Navi cards too. Fixes: `0064b0ce85` ("drm/amd/pm: enable ASPM by default") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-03-23 09:34:35 -04:00
Tim Huang	aaee0ce460	drm/amdgpu: reposition the gpu reset checking for reuse Move the amdgpu_acpi_should_gpu_reset out of CONFIG_SUSPEND to share it with hibernate case. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x	2023-03-23 09:29:32 -04:00
Dave Airlie	90031bc33f	Merge tag 'amd-drm-next-6.4-2023-03-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.4-2023-03-17: amdgpu: - Misc code cleanups - Documentation fixes - Make kobj structures const - Add thermal throttling adjustments for supported APUs - UMC RAS fixes - Display reset fixes - DCN 3.2 fixes - Freesync fixes - DC code reorg - Generalize dmabuf import to work with KFD - DC DML fixes - SRIOV fixes - UVD code cleanups - IH 4.4.2 updates - HDP 4.4.2 updates - SDMA 4.4.2 updates - PSP 13.0.6 updates - Add capped/uncapped workload handling for supported APUs - DCN 3.1.4 updates - Re-org DC Kconfig - USB4 fixes - Reorg DC plane and stream handling - Register vga_switcheroo for apple-gmux - SMU 13.0.6 updates - Fix error checking in read_mm_registers functions for affected families - VCN 4.0.4 fix - Drop redundant pci_enable_pcie_error_reporting() call - RDNA2 SMU OD suspend/resume fix - Expose additional memory stats via fdinfo - RAS fixes - Misc display fixes - DP MST fixes - IOMMU regression fix for KFD amdkfd: - Make kobj structures const - Support for exporting buffers via dmabuf - Multi-VMA page migration fixes - NBIO fixes - Misc code cleanups - Fix possible double free - Fix possible UAF radeon: - iMac fix UAPI: - KFD dmabuf export support. Required for importing KFD buffers into GEM contexts and for RDMA P2P support. Proposed user mode changes: https://github.com/fxkamd/ROCT-Thunk-Interface/commits/fxkamd/dmabuf From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230317164416.138340-1-alexander.deucher@amd.com	2023-03-20 16:44:36 +10:00
Hawking Zhang	dabc114e4b	drm/amdgpu: Move to common helper to query soc rev_id Replace soc15, nv, soc21 get_rev_id callback with common helper so we don't need to duplicate code when introduce new asics. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
Hawking Zhang	65ba96e91b	drm/amdgpu: Move to common indirect reg access helper Replace soc15, nv, soc21 specific callbacks with common one. so we don't need to duplicate code when introduce new asics. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-15 18:45:27 -04:00
Dave Airlie	faf0d83e10	Merge tag 'drm-misc-next-2023-03-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.4-rc1: Note: Only changes since pull request from 2023-02-23 are included here. UAPI Changes: - Convert rockchip bindings to YAML. - Constify kobj_type structure in dma-buf. - FBDEV cmdline parser fixes, and other small fbdev fixes for mode parsing. Cross-subsystem Changes: - Add Neil Armstrong as linaro maintainer. - Actually signal the private stub dma-fence. Core Changes: - Add function for adding syncobj dep to sched_job and use it in panfrost, v3d. - Improve DisplayID 2.0 topology parsing and EDID parsing in general. - Add a gem eviction function and callback for generic GEM shrinker purposes. - Prepare to convert shmem helper to use the GEM reservation lock instead of own locking. (Actual commit itself got reverted for now) - Move the suballocator from radeon and amdgpu drivers to core in preparation for Xe. - Assorted small fixes and documentation. - Fixes to HPD polling. - Assorted small fixes in simpledrm, bridge, accel, shmem-helper, and the selftest of format-helper. - Remove dummy resource when ttm bo is created, and during pipelined gutting. Fix all drivers to accept a NULL ttm_bo->resource. - Handle pinned BO moving prevention in ttm core. - Set drm panel-bridge orientation before connector is registered. - Remove dumb_destroy callback. - Add documentation to GEM_CLOSE, PRIME_HANDLE_TO_FD, PRIME_FD_TO_HANDLE, GETFB2 ioctl's. - Add atomic enable_plane callback, use it in ast, mgag200, tidss. Driver Changes: - Use drm_gem_objects_lookup in vc4. - Assorted small fixes to virtio, ast, bridge/tc358762, meson, nouveau. - Allow virtio KMS to be disabled and compiled out. - Add Radxa 8/10HD, Samsung AMS495QA01 panels. - Fix ivpu compiler errors. - Assorted fixes to drm/panel, malidp, rockchip, ivpu, amdgpu, vgem, nouveau, vc4. - Assorted cleanups, simplifications and fixes to vmwgfx. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ac1f5186-54bb-02f4-ac56-907f5b76f3de@linux.intel.com	2023-03-14 12:18:54 +10:00
Guchun Chen	53e9d836ea	drm/amdgpu: drop pm_sysfs_en flag from amdgpu_device structure pm_sysfs_en is overlapped with pm.sysfs_initialized, so drop it for simplifying code(no functional change). Signed-off-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-13 17:27:48 -04:00
Bjorn Helgaas	58265640fb	drm/amdgpu: Drop redundant pci_enable_pcie_error_reporting() pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since `f26e58bf6f` ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-03-08 14:05:57 -05:00
Maarten Lankhorst	c103a23f2f	drm/amd: Convert amdgpu to use suballocation helper. Now that we have a generic suballocation helper, Use it in amdgpu. For lines that get moved or changed, also fix up pre-existing style issues. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230224095152.30134-3-thomas.hellstrom@linux.intel.com	2023-03-01 17:18:19 +01:00
Alex Deucher	bf0207e172	drm/amdgpu: add S/G display parameter Some users have reported flickerng with S/G display. We've tried extensively to reproduce and debug the issue on a wide variety of platform configurations (DRAM bandwidth, etc.) and a variety of monitors, but so far have not been able to. We disabled S/G display on a number of platforms to address this but that leads to failure to pin framebuffers errors and blank displays when there is memory pressure or no displays at all on systems with limited carveout (e.g., Chromebooks). Add a option to disable this as a debugging option as a way for users to disable this, depending on their use case, and for us to help debug this further. v2: fix typo Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-02-09 10:30:36 -05:00
Dave Airlie	45be204806	Merge tag 'amd-drm-next-6.3-2023-01-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.3-2023-01-06: amdgpu: - secure display support for multiple displays - DML optimizations - DCN 3.2 updates - PSR updates - DP 2.1 updates - SR-IOV RAS updates - VCN RAS support - SMU 13.x updates - Switch 1 element arrays to flexible arrays - Add RAS support for DF 4.3 - Stack size improvements - S0ix rework - Soft reset fix - Allow 0 as a vram limit on APUs - Display fixes - Misc code cleanups - Documentation fixes - Handle profiling modes for SMU13.x amdkfd: - Error handling fixes - PASID fixes radeon: - Switch 1 element arrays to flexible arrays drm: - Add DP adaptive sync DPCD definitions UAPI: - Add new INFO queries for peak and min sclk/mclk for profile modes on newer chips Proposed mesa patch: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/278 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230106222037.7870-1-alexander.deucher@amd.com	2023-01-16 14:00:12 +10:00
xurui	90f56611fc	drm/amdgpu: Retry DDC probing on DVI on failure if we got an HPD interrupt HPD signals on DVI ports can be fired off before the pins required for DDC probing actually make contact, due to the pins for HPD making contact first. This results in a HPD signal being asserted but DDC probing failing, resulting in hotplugging occasionally failing. Rescheduling the hotplug work for a second when we run into an HPD signal with a failing DDC probe usually gives enough time for the rest of the connector's pins to make contact, and fixes this issue. Signed-off-by: xurui <xurui@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-05 11:42:09 -05:00
Michel Dänzer	4243c84aa0	Revert "drm/amd/display: Enable Freesync Video Mode by default" This reverts commit `de05abe6b9`. The bug referenced below was bisected to this commit. There has been no activity toward fixing it in 3 months, so let's revert for now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:57:58 -05:00
Christian König	0b04ea391c	drm/amdgpu: allow zero as vram limit This allows testing the driver without any VRAM. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:50:22 -05:00
Christian König	7ccfd79fdd	drm/amdgpu: rename vram_scratch into mem_scratch Rename vram_scratch into mem_scratch and allow allocating it into GTT as well. The only problem with that is that we won't have a default page for the system aperture any more. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:50:03 -05:00
Christian König	58ab2c08d7	drm/amdgpu: use VRAM\|GTT for a bunch of kernel allocations Technically all of those can use GTT as well, no need to force things into VRAM. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-01-03 16:49:54 -05:00
Christian König	a3185f91d0	drm/ttm: merge ttm_bo_api.h and ttm_bo_driver.h v2 Merge and cleanup the two headers into a single description of the object API. Also move all the documentation to the implementation and drop unnecessary includes from the header. No functional change. v2: minimal checkpatch.pl cleanup Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221125102137.1801-4-christian.koenig@amd.com	2022-12-06 12:54:14 +01:00
Christian König	d9483ecd32	drm/amdgpu: rename the files for HMM handling Clean that up a bit, no functional change. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-17 00:23:36 -05:00
Alex Deucher	220c8cc855	drm/amdgpu: there is no vbios fb on devices with no display hw (v2) If we enable virtual display functionality on parts with no display hardware we can end up trying to check for and reserve the vbios FB area on devices where it doesn't exist. Check if display hardware is actually present on the hardware before trying to reserve the memory. v2: move the check into common code Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-15 13:35:16 -05:00
Alex Deucher	d09ef24303	drm/amdgpu: clarify DC checks There are several places where we don't want to check if a particular asic could support DC, but rather, if DC is enabled. Set a flag if DC is enabled and check for that rather than if a device supports DC or not. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-15 11:51:45 -05:00
Alex Deucher	25263da376	drm/amdgpu: rework SR-IOV virtual display handling virtual display is enabled unconditionally in SR-IOV, but without specifying the virtual_display module, the number of crtcs defaults to 0. Set a single display by default for SR-IOV if the virtual_display parameter is not set. Only enable virtual display by default on SR-IOV on asics which actually have display hardware. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-15 11:51:32 -05:00
Graham Sider	9a1662f549	drm/amdgpu: extend halt_if_hws_hang to MES Hang on MES timeout if halt_if_hws_hang is set to 1. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-04 16:05:53 -04:00
Victor Zhao	6aa5893926	Revert "drm/amdgpu: add debugfs amdgpu_reset_level" This reverts commit `5bd8d53f6f`. This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler. Fixes: `5bd8d53f6f` ("drm/amdgpu: add debugfs amdgpu_reset_level") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-17 17:41:20 -04:00
Hawking Zhang	7a94c8602f	drm/amdgpu: extend HWIP_MAX_INSTANCE to 28 more ip instances are available Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-17 17:41:19 -04:00
Christian König	68ce8b2422	drm/amdgpu: add gang submit backend v2 Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. v2: fix logic inversion, improve documentation, fix rcu Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-09-20 12:40:32 -04:00
Victor Zhao	194eb174cb	drm/amdgpu: reduce reset time In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset v2: add a hang flag to indicate the reset comes from a job timeout, skip ring test and cp halt wait in this case v3: move hang flag to adev Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-08-16 18:14:31 -04:00
Victor Zhao	5bd8d53f6f	drm/amdgpu: add debugfs amdgpu_reset_level Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled v2: make this debugfs in adev and use debugfs_create_u32 Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-08-16 18:14:31 -04:00
Dusica Milinkovic	373008bfc9	drm/amdgpu: Increase tlb flush timeout for sriov [Why] During multi-vf executing benchmark (Luxmark) observed kiq error timeout. It happenes because all of VFs do the tlb invalidation at the same time. Although each VF has the invalidate register set, from hardware side the invalidate requests are queue to execute. [How] In case of 12 VF increase timeout on 12*100ms Signed-off-by: Dusica Milinkovic <Dusica.Milinkovic@amd.com> Acked-by: Shaoyun Liu <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-08-16 18:08:01 -04:00
Roy Sun	1f83db6be3	drm/amdgpu: Fix the incomplete product number The comments say that the product number is a 16-digit HEX string so the buffer needs to be at least 17 characters to hold the NUL terminator. Expand the buffer size to 20 to avoid the alignment issues. The comment:Product number should only be 16 characters. Any more,and something could be wrong. Cap it at 16 to be safe Signed-off-by: Roy Sun <Roy.Sun@amd.com> Reviewed-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-07-28 16:05:14 -04:00
Leo Li	792a0cdde3	drm/amd/display: Add visualconfirm module parameter [Why] Being able to configure visual confirm at boot or in cmdline is helpful when debugging. [How] Add a module parameter to configure DC visual confirm, which works the same way as the equivalent debugfs entry. Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-07-25 09:31:03 -04:00
Guchun Chen	9c913f3803	drm/amdgpu: drop runpm from amdgpu_device structure It's redundant, as now switching to rpm_mode to indicate runtime power management mode. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-07-18 16:42:39 -04:00
Likun Gao	f1549c09c5	drm/amdgpu: support reset flag set for gpu reset Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-07-13 11:25:17 -04:00
Andrey Grodzovsky	cf72704414	drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-06-10 15:26:12 -04:00
Andrey Grodzovsky	2f83658ffc	drm/amdgpu: Add work_struct for GPU reset from debugfs We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-06-10 15:25:48 -04:00
Evan Quan	62f8f5c3bf	drm/amdgpu: enable ASPM support for PCIE 7.4.0/7.6.0 Enable ASPM support for PCIE 7.4.0 and 7.6.0. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-06-08 11:43:00 -04:00
Ramesh Errabolu	08a2fd23c6	drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUs Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-06-08 11:40:12 -04:00
Somalapuram Amaranath	3d8785f6c0	drm/amdgpu: adding device coredump support Added device coredump information: - Kernel version - Module - Time - VRAM status - Guilty process name and PID - GPU register dumps v1 -> v2: Variable name change v1 -> v2: NULL check v1 -> v2: Code alignment v1 -> v2: Adding dummy amdgpu_devcoredump_free v1 -> v2: memset reset_task_info to zero v2 -> v3: add CONFIG_DEV_COREDUMP for variables v2 -> v3: remove NULL check on amdgpu_devcoredump_read Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Shashank Sharma <Shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-06-06 14:41:19 -04:00
Somalapuram Amaranath	651d7ee63f	drm/amdgpu: save the reset dump register value for devcoredump Allocate memory for register value and use the same values for devcoredump. v1 -> v2: Change krealloc_array() to kmalloc_array() v2 -> v3: Fix alignment Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Shashank Sharma <Shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-06-06 14:41:12 -04:00
pengfuyuan	faf26f2b12	drm/amd: Fix spelling typo in comments Fix spelling typo in comments. Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: pengfuyuan <pengfuyuan@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-06-03 16:43:36 -04:00
Mario Limonciello	0223e51647	drm/amd: Don't reset dGPUs if the system is going to s2idle An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC reset for handling aborted suspend can't work with s2idle. This functionality was introduced in commit `daf8de0874` ("drm/amdgpu: always reset the asic in suspend (v2)"). A few other commits have gone on top of the ASIC reset, but this still doesn't work on the A+A configuration in s2idle. Avoid doing the reset on dGPUs specifically when using s2idle. Fixes: `daf8de0874` ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-18 15:20:18 -04:00
Likun Gao	1b49133042	drm/amdgpu: add lsdma block Add Light SDMA (LSDMA) block and related function. LSDMA is a small instance of SDMA mainly for kernel driver use. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-10 17:53:11 -04:00
Likun Gao	8424f2ccb3	drm/amdgpu/psp: Add vbflash sysfs interface support Add sysfs interface to copy VBIOS. v2: squash in fix for proper vmalloc API (Alex) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-10 17:53:10 -04:00
Alex Deucher	5a90c24ad0	Revert "drm/amdgpu: disable runpm if we are the primary adapter" This reverts commit `b95dc06af3`. This workaround is no longer necessary. We have a better workaround in commit `f95af4a923` ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)"). Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-05 16:50:00 -04:00
Jack Xiao	928fe236c0	drm/amdgpu: add mes_kiq module parameter v2 mes_kiq parameter is used to enable mes kiq pipe. This module parameter is unneccessary or enabled by default in final version. v2: reword commit message. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-04 10:43:49 -04:00
Jack Xiao	2bc956ef54	drm/amdgpu: add the per-context meta data v3 The per-context meta data is a per-context data structure associated with a mes-managed hardware ring, which includes MCBP CSA, ring buffer and etc. v2: fix typo v3: a. use structure instead of typedef b. move amdgpu_mes_ctx_get_offs_* to amdgpu_ring.h c. use __aligned to make alignement Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-04 10:03:17 -04:00
Jack Xiao	5405a52627	drm/amdgpu: define MQD abstract layer for hw ip Define MQD abstract layer for hw ip, for the passing mqd configuration not only from ring but more sources, like user queue. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-04 10:03:11 -04:00
Likun Gao	7f318f4e30	drm/amdgpu: add tracking for the enablement of SCPM Add parmeter to shows whether SCPM feature is enabled or not, and whether is valid. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-04 09:56:51 -04:00
Likun Gao	563fcfbf31	drm/amdgpu: add hdp version 6 functions Unify hdp related function into hdp structure for hdp version 6. V2: Remove hdp invalidate function as hdp v6 doesn't have read cache. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-04 09:53:58 -04:00
Likun Gao	1d5eee7dd6	drm/amdgpu: add function to decode ip version Add function to decode IP version. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-04-28 17:46:28 -04:00
Likun Gao	3202c7e782	drm/amdgpu: increase HWIP MAX INSTANCE Extend HWIP MAX INSTANCE to 11. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-04-28 17:46:24 -04:00
Evan Quan	25faeddcf3	drm/amdgpu: expand cg_flags from u32 to u64 With this, we can support more CG flags. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-04-08 17:24:24 -04:00
Christian König	a190f8dc4a	drm/amdgpu: header cleanup No function change, just move a bunch of definitions from amdgpu.h into separate header files. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-03-04 13:03:30 -05:00
Ruijing Dong	11eb648d01	drm/amdgpu/vcn: Add vcn firmware log vcn fwlog is for debugging purpose only, by default, it is disabled. Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-03-04 13:03:30 -05:00
Dave Airlie	38a15ad948	Merge tag 'amd-drm-next-5.18-2022-02-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.18-2022-02-25: amdgpu: - Raven2 suspend/resume fix - SDMA 5.2.6 updates - VCN 3.1.2 updates - SMU 13.0.5 updates - DCN 3.1.5 updates - Virtual display fixes - SMU code cleanup - Harvest fixes - Expose benchmark tests via debugfs - Drop no longer relevant gart aperture tests - More RAS restructuring - W=1 fixes - PSR rework - DP/VGA adapter fixes - DP MST fixes - GPUVM eviction fix - GPU reset debugfs register dumping support - Misc display fixes - SR-IOV fix - Aldebaran mGPU fix - Add module parameter to disable XGMI for testing amdkfd: - IH ring overflow logging fixes - CRIU fixes - Misc fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225183535.5907-1-alexander.deucher@amd.com	2022-03-01 16:19:02 +10:00
Alex Sierra	158a05a0b8	drm/amdgpu: Add use_xgmi_p2p module parameter This parameter controls xGMI p2p communication, which is enabled by default. However, it can be disabled by setting it to 0. In case xGMI p2p is disabled in a dGPU, PCIe p2p interface will be used instead. This parameter is ignored in GPUs that do not support xGMI p2p configuration. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-24 17:25:12 -05:00
Dave Airlie	54f43c17d6	drm-misc-next for v5.18: UAPI Changes: Cross-subsystem Changes: - Split out panel-lvds and lvds dt bindings . - Put yes/no on/off disabled/enabled strings in linux/string_helpers.h and use it in drivers and tomoyo. - Clarify dma_fence_chain and dma_fence_array should never include eachother. - Flatten chains in syncobj's. - Don't double add in fbdev/defio when page is already enlisted. - Don't sort deferred-I/O pages by default in fbdev. Core Changes: - Fix missing pm_runtime_put_sync in bridge. - Set modifier support to only linear fb modifier if drivers don't advertise support. - As a result, we remove allow_fb_modifiers. - Add missing clear for EDID Deep Color Modes in drm_reset_display_info. - Assorted documentation updates. - Warn once in drm_clflush if there is no arch support. - Add missing select for dp helper in drm_panel_edp. - Assorted small fixes. - Improve fb-helper's clipping handling. - Don't dump shmem mmaps in a core dump. - Add accounting to ttm resource manager, and use it in amdgpu. - Allow querying the detected eDP panel through debugfs. - Add helpers for xrgb8888 to 8 and 1 bits gray. - Improve drm's buddy allocator. - Add selftests for the buddy allocator. Driver Changes: - Add support for nomodeset to a lot of drm drivers. - Use drm_module__driver in a lot of drm drivers. - Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau, bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86. - Add bridge/it6505. - Create DP and DVI-I connectors in ast. - Assorted nouveau backlight fixes. - Rework amdgpu reset handling. - Add dt bindings for ingenic,jz4780-dw-hdmi. - Support reading edid through aux channel in ingenic. - Add a drm driver for Solomon SSD130x OLED displays. - Add simple support for sharp LQ140M1JW46. - Add more panels to nt35560. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmIWLSEACgkQ/lWMcqZw E8OP7hAAjix94EX5fhFa7OAdqUbFtsiKhK/4zNtV9FWpFiEsDBz+dlbfDQWIx5an FIiiiQtSfWjpDv6pcMhoNf80w+dDbc/Cuauz6nNGO7Pkaerh2D/EPG74FD7f7nE3 EIScVs1heYtzM9usKrFKupNYgIdgZxmeovClWuE0OTjLOes2PGvvxXK6iJqNqJMX VlDO5SR7GRqsDUDV6vmwl63uKL77xJXAahAXIx+BQ/1xrtEhlu6NwsgHIsmPmMSN YluX34zc1xD/6/uUqvEdp7u46/5/He1c5Q/ia1WV3wRxsO/eMZ+axXqCZP3XGZdt rMdGNtj1MWKkudYiowStWkCVSG/0fXJCFIAhvRmeZy+YqPdVlqZ2W7g4H1l9iJoo UVfT9cHrKoxHsukvIEckC5Ov9v1yr39Bd4wUuqaUTUSxY8VID5vjY63TsXl9Zke1 SluTFe9qybbnRNz/hYRvwIS1eT8HvUauAfAhypGTLI5DYHTD7PawcfMJkNzCtJm4 Ta4SC3rTpkpN+7oc8SoNgqRHQ8U9KL5oksP0wVa8vwHsMptSd3X4pUljc6TcfjLv GEo41D5AuJz3HRVcn9yqPbLoPE2FFB7bfwIMH77yNnoos4Izy/LGhKpN0YdImmI5 W5XVFB0jltGSIhkzLe1mFpLrdJwdUTSUVeCK4H5PhZZQEHLkVtg= =HuwD -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2022-02-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.18: UAPI Changes: Cross-subsystem Changes: - Split out panel-lvds and lvds dt bindings . - Put yes/no on/off disabled/enabled strings in linux/string_helpers.h and use it in drivers and tomoyo. - Clarify dma_fence_chain and dma_fence_array should never include eachother. - Flatten chains in syncobj's. - Don't double add in fbdev/defio when page is already enlisted. - Don't sort deferred-I/O pages by default in fbdev. Core Changes: - Fix missing pm_runtime_put_sync in bridge. - Set modifier support to only linear fb modifier if drivers don't advertise support. - As a result, we remove allow_fb_modifiers. - Add missing clear for EDID Deep Color Modes in drm_reset_display_info. - Assorted documentation updates. - Warn once in drm_clflush if there is no arch support. - Add missing select for dp helper in drm_panel_edp. - Assorted small fixes. - Improve fb-helper's clipping handling. - Don't dump shmem mmaps in a core dump. - Add accounting to ttm resource manager, and use it in amdgpu. - Allow querying the detected eDP panel through debugfs. - Add helpers for xrgb8888 to 8 and 1 bits gray. - Improve drm's buddy allocator. - Add selftests for the buddy allocator. Driver Changes: - Add support for nomodeset to a lot of drm drivers. - Use drm_module__driver in a lot of drm drivers. - Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau, bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86. - Add bridge/it6505. - Create DP and DVI-I connectors in ast. - Assorted nouveau backlight fixes. - Rework amdgpu reset handling. - Add dt bindings for ingenic,jz4780-dw-hdmi. - Support reading edid through aux channel in ingenic. - Add a drm driver for Solomon SSD130x OLED displays. - Add simple support for sharp LQ140M1JW46. - Add more panels to nt35560. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/686ec871-e77f-c230-22e5-9e3bb80f064a@linux.intel.com	2022-02-25 05:50:18 +10:00
Somalapuram Amaranath	5ce5a584cb	drm/amdgpu: add debugfs for reset registers list List of register populated for dump collection during the GPU reset. Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-23 14:26:36 -05:00
Alex Deucher	b784f42cf7	drm/amdgpu: drop testing module parameter This test is not particularly useful now that GTT and GART are decoupled in the driver. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-23 14:02:51 -05:00
Alex Deucher	0b1a63487b	drm/amdgpu: drop benchmark module parameter Now that we expose the benchmarks via debugfs, there is no longer a need for the module parameter. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-23 14:02:51 -05:00
Alex Deucher	f113cc32e3	drm/amdgpu: add a benchmark mutex To avoid multiple runs in parallel to avoid mixing results. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-23 14:02:50 -05:00
Alex Deucher	e460f244fb	drm/amdgpu: plumb error handling though amdgpu_benchmark() So we can tell when this function fails. v2: squash in error handling fix (Alex) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-23 14:02:50 -05:00
Mario Limonciello	0ab5d711ec	drm/amd: Refactor `amdgpu_aspm` to be evaluated per device Evaluating `pcie_aspm_enabled` as part of driver probe has the implication that if one PCIe bridge with an AMD GPU connected doesn't support ASPM then none of them do. This is an invalid assumption as the PCIe core will configure ASPM for individual PCIe bridges. Create a new helper function that can be called by individual dGPUs to react to the `amdgpu_aspm` module parameter without having negative results for other dGPUs on the PCIe bus. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-17 15:59:05 -05:00
Luben Tuikov	a6c40b1780	drm/amdgpu: Show IP discovery in sysfs Add IP discovery data in sysfs. The format is: /sys/class/drm/cardX/device/ip_discovery/die/D/B/I/<attrs> where, X is the card ID, an integer, D is the die ID, an integer, B is the IP HW ID, an integer, aka block type, I is the IP HW ID instance, an integer. <attrs> are the attributes of the block instance. At the moment these include HW ID, instance number, major, minor, revision, number of base addresses, and the base addresses themselves. A symbolic link of the acronym HW ID is also created, under D/, if you prefer to browse by something humanly accessible. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Tom StDenis <tom.stdenis@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-14 15:08:40 -05:00
Dave Airlie	123db17ddf	Merge tag 'amd-drm-next-5.18-2022-02-11-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.18-2022-02-11-1: amdgpu: - Clean up of power management code - Enable freesync video mode by default - Clean up of RAS code - Improve VRAM access for debug using SDMA - Coding style cleanups - SR-IOV fixes - More display FP reorg - TLB flush fixes for Arcuturus, Vega20 - Misc display fixes - Rework special register access methods for SR-IOV - DP2 fixes - DP tunneling fixes - DSC fixes - More IP discovery cleanups - Misc RAS fixes - Enable both SMU i2c buses where applicable - s2idle improvements - DPCS header cleanup - Add new CAP firmware support for SR-IOV amdkfd: - Misc cleanups - SVM fixes - CRIU support - Clean up MQD manager UAPI: - Add interface to amdgpu CTX ioctl to request a stable power state for profiling https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/207 - Add amdkfd support for CRIU https://github.com/checkpoint-restore/criu/pull/1709 - Remove old unused amdkfd debugger interface Was only implemented for Kaveri and was only ever used by an old HSA tool that was never open sourced radeon: - Fix error handling in radeon_driver_open_kms - UVD suspend fix - Misc fixes From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220211220706.5803-1-alexander.deucher@amd.com	2022-02-14 10:31:51 +10:00
Andrey Grodzovsky	89a7a87093	drm/amdgpu: Move in_gpu_reset into reset_domain We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74116.html	2022-02-09 12:17:57 -05:00
Andrey Grodzovsky	d0fb18b535	drm/amdgpu: Move reset sem into reset_domain We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html	2022-02-09 12:17:32 -05:00
Andrey Grodzovsky	cfbb6b0047	drm/amdgpu: Rework reset domain to be refcounted. The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html	2022-02-09 12:17:09 -05:00
Andrey Grodzovsky	54f329cc7a	drm/amdgpu: Serialize non TDR gpu recovery with TDRs Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish. v2: Rename to amdgpu_recover_work_struct Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74113.html	2022-02-09 12:15:23 -05:00
Andrey Grodzovsky	a4c63cafa5	drm/amdgpu: Introduce reset domain Defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another. Do it for both single device and XGMI hive cases. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Suggested-by: Christian König <ckoenig.leichtzumerken@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74111.html	2022-02-09 12:14:32 -05:00

1 2 3 4 5 ...

1101 Commits