2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

15196 Commits

Author SHA1 Message Date
Alex Deucher
4c28e645aa drm/amdgpu/gmc7: fix wait_for_idle callers
The wait_for_idle signature was changed, but the callers
were not.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reported-by: Michel Dänzer <michel@daenzer.net>
Fixes: 82ae6619a4 ("drm/amdgpu: update the handle ptr in wait_for_idle")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
2024-11-21 15:56:22 -05:00
Thomas Weißschuh
c2753b2471 drm/amd/display: Add support for minimum backlight quirk
Not all platforms provide the full range of PWM backlight capabilities
supported by the hardware through ATIF.
Use the generic drm panel minimum backlight quirk infrastructure to
override the capabilities where necessary.

Testing the backlight quirk together with the "panel_power_savings"
sysfs file has not shown any negative impact.
One quirk seems to be that 0% at panel_power_savings=0 seems to be
slightly darker than at panel_power_savings=4.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Tested-by: Dustin L. Howett <dustin@howett.net>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241111-amdgpu-min-backlight-quirk-v7-2-f662851fda69@weissschuh.net
2024-11-21 09:28:13 -06:00
Lijo Lazar
e283f4fb08 drm/amdgpu: Use reset recovery state checks
Some in_reset checks are infact checking whether the state is
reinitialization after reset. Replace with reset_in_recovery calls to
identify that it's really checking for recovery stage after reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20 10:03:05 -05:00
Lijo Lazar
a86e0c0e94 drm/amdgpu: Add init level for post reset reinit
When device needs to be reset before initialization, it's not required
for all IPs to be initialized before a reset. In such cases, it needs to
identify whether the IP/feature is initialized for the first time or
whether it's reinitialized after a reset.

Add RESET_RECOVERY init level to identify post reset reinitialization
phase. This only provides a device level identification, IP/features may
choose to track their state independently also.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20 10:03:05 -05:00
Mario Limonciello
349af06a3a drm/amd: Fix initialization mistake for NBIO 7.11 devices
There is a strapping issue on NBIO 7.11.x that can lead to spurious PME
events while in the D0 state.

Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241118174611.10700-2-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20 10:03:05 -05:00
Asad Kamal
466a59abac drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0
Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for
SMU_v_13_0_6

v2: Get link status register value for each soc from separate
function (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20 09:36:48 -05:00
Linus Torvalds
0f25f0e4ef the bulk of struct fd memory safety stuff
Making sure that struct fd instances are destroyed in the same
 scope where they'd been created, getting rid of reassignments
 and passing them by reference, converting to CLASS(fd{,_pos,_raw}).
 
 We are getting very close to having the memory safety of that stuff
 trivial to verify.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdikAAKCRBZ7Krx/gZQ
 69nJAQCmbQHK3TGUbQhOw6MJXOK9ezpyEDN3FZb4jsu38vTIdgEA6OxAYDO2m2g9
 CN18glYmD3wRyU6Bwl4vGODouSJvDgA=
 =gVH3
 -----END PGP SIGNATURE-----

Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 'struct fd' class updates from Al Viro:
 "The bulk of struct fd memory safety stuff

  Making sure that struct fd instances are destroyed in the same scope
  where they'd been created, getting rid of reassignments and passing
  them by reference, converting to CLASS(fd{,_pos,_raw}).

  We are getting very close to having the memory safety of that stuff
  trivial to verify"

* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
  deal with the last remaing boolean uses of fd_file()
  css_set_fork(): switch to CLASS(fd_raw, ...)
  memcg_write_event_control(): switch to CLASS(fd)
  assorted variants of irqfd setup: convert to CLASS(fd)
  do_pollfd(): convert to CLASS(fd)
  convert do_select()
  convert vfs_dedupe_file_range().
  convert cifs_ioctl_copychunk()
  convert media_request_get_by_fd()
  convert spu_run(2)
  switch spufs_calls_{get,put}() to CLASS() use
  convert cachestat(2)
  convert do_preadv()/do_pwritev()
  fdget(), more trivial conversions
  fdget(), trivial conversions
  privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
  o2hb_region_dev_store(): avoid goto around fdget()/fdput()
  introduce "fd_pos" class, convert fdget_pos() users to it.
  fdget_raw() users: switch to CLASS(fd_raw)
  convert vmsplice() to CLASS(fd)
  ...
2024-11-18 12:24:06 -08:00
Thomas Zimmermann
b86711c6d6 drm/client: Move public client header to clients/ subdirectory
Move the public header file drm_client_setup.h to the clients/
subdirectory and update all drivers. No functional changes.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241108154600.126162-3-tzimmermann@suse.de
2024-11-15 09:42:13 +01:00
Vijendar Mukunda
7013a8268d drm/amd: Fix initialization mistake for NBIO 7.7.0
There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME
events while in the D0 state.

Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 447a54a0f7)
Cc: stable@vger.kernel.org
2024-11-12 17:37:39 -05:00
Christian König
5a67c31669 drm/amdgpu: enable GTT fallback handling for dGPUs only
That is just a waste of time on APUs.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704
Fixes: 216c1282dd ("drm/amdgpu: use GTT only as fallback for VRAM|GTT")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e8fc090d32)
Cc: stable@vger.kernel.org
2024-11-12 17:37:38 -05:00
Vijendar Mukunda
447a54a0f7 drm/amd: Fix initialization mistake for NBIO 7.7.0
There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME
events while in the D0 state.

Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-12 17:10:40 -05:00
Christian König
e8fc090d32 drm/amdgpu: enable GTT fallback handling for dGPUs only
That is just a waste of time on APUs.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704
Fixes: 216c1282dd ("drm/amdgpu: use GTT only as fallback for VRAM|GTT")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-12 17:10:05 -05:00
Shaoyun Liu
8521e3c5f0 drm/amd/amdgpu: limit single process inside MES
This is for MES to limit only one process for the user queues

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-12 17:02:04 -05:00
Jack Xiao
79365ea707 drm/amdgpu/mes12: correct kiq unmap latency
Correct kiq unmap queue timeout value.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cfe98204a0)
Cc: stable@vger.kernel.org # 6.11.x
2024-11-11 14:05:51 -05:00
Christian König
0e5ac88fb9 drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()
The coherency flags can only be determined when the BO is locked and that
in turn is only guaranteed when the mapping is validated.

Fix the check, move the resource check into the function and add an assert
that the BO is locked.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: d1a372af1c ("drm/amdgpu: Set MTYPE in PTE based on BO flags")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1b4ca8546f)
Cc: stable@vger.kernel.org
2024-11-11 14:05:44 -05:00
David Rosca
d641a151fc drm/amdgpu: Fix video caps for H264 and HEVC encode maximum size
H264 supports 4096x4096 starting from Polaris.
HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352
is supported.

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 69e9a9e65b)
Cc: stable@vger.kernel.org
2024-11-11 14:05:36 -05:00
Jack Xiao
cfe98204a0 drm/amdgpu/mes12: correct kiq unmap latency
Correct kiq unmap queue timeout value.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11 12:22:58 -05:00
Advait Dhamorikar
408d208127 drm/amdgpu: Cleanup shift coding style
Improves the coding style by updating bit-shift
operations in the amdgpu_jpeg.c driver file.
It ensures consistency and avoids potential issues
by explicitly using 1U and 1ULL for unsigned
and unsigned long long shifts in all relevant instances.

Signed-off-by: Advait Dhamorikar <advaitdhamorikar@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11 11:56:03 -05:00
shaoyunl
92fd1714ee drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data
MES internal scratch data is useful for mes debug, it can only located
in VRAM, change the allocation type and increase size for mes 11

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11 11:55:49 -05:00
Victor Skvortsov
84a2947ecc drm/amdgpu: Implement virt req_ras_err_count
Enable RAS late init  if VF RAS Telemetry is supported.

When enabled, the VF can use this interface to query total
RAS error counts from the host.

The VF FB access may abruptly end due to a fatal error,
therefore the VF must cache and sanitize the input.

The Host allows 15 Telemetry messages every 60 seconds, afterwhich
the host will ignore any more in-coming telemetry messages. The VF will
rate limit its msg calling to once every 5 seconds (12 times in 60 seconds).
While the VF is rate limited, it will continue to report the last
good cached data.

v2: Flip generate report & update statistics order for VF

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Acked-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11 11:55:42 -05:00
Victor Skvortsov
907fec2dfd drm/amdgpu: VF Query RAS Caps from Host if supported
If VF RAS Capability support is enabled, guest is able to
retrieve the real RAS support from the host.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11 11:55:36 -05:00
Victor Skvortsov
9928509dfc drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry
Add message handlers for RAS telemetry.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11 11:55:08 -05:00
Victor Skvortsov
60c58d72af drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support
The SRIOV PF/VF Data exchange is extended by 64KB for VF RAS Telemetry data.
Add Host RAS Telemetry enable capabilities bitfields.
Add a new VF msg REQ_RAS_ERROR_COUNT, the host response data will be populated
in the RAS Telemetry region.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11 11:55:01 -05:00
Srinivasan Shanmugam
dfb214ec91 drm/amdgpu/gfx11: Enable cleaner shader for GFX11.0.0/11.0.2 GPUs
Enable the cleaner shader for GFX11.0.0/11.0.2 GPUs to provide data
isolation between GPU workloads. The cleaner shader is responsible for
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps
prevent data leakage and ensures accurate computation results.

This update extends cleaner shader support to GFX11.0.0/11.0.2 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:45:29 -05:00
Christian König
1b4ca8546f drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()
The coherency flags can only be determined when the BO is locked and that
in turn is only guaranteed when the mapping is validated.

Fix the check, move the resource check into the function and add an assert
that the BO is locked.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: d1a372af1c ("drm/amdgpu: Set MTYPE in PTE based on BO flags")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:45:29 -05:00
Ramesh Errabolu
8e29057eec drm/amdgpu: Inform if PCIe based P2P links are not available
Raise an info message in kernel log if PCIe root complex
determines that a AMD GPU device D<i> cannot have P2P
communication with another AMD GPU device D<j>

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:45:29 -05:00
David Rosca
69e9a9e65b drm/amdgpu: Fix video caps for H264 and HEVC encode maximum size
H264 supports 4096x4096 starting from Polaris.
HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352
is supported.

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:45:29 -05:00
Jesse.zhang@amd.com
96f0b56c34 drm/amdgpu: Add sysfs interface for jpeg reset mask
Add the sysfs interface for jpeg:
jpeg_reset_mask

The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
    and print the strings in the order they are applied (Christian)

    check amdgpu_gpu_recovery  before creating sysfs file itself,
    and initialize supported_reset_types in IP version files (Lijo)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:45:29 -05:00
Jesse.zhang@amd.com
ea02ea9437 drm/amdgpu: Add sysfs interface for vpe reset mask
Add the sysfs interface for vpe:
    vpe_reset_mask

The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
    and print the strings in the order they are applied (Christian)

    check amdgpu_gpu_recovery  before creating sysfs file itself,
    and initialize supported_reset_types in IP version files (Lijo)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:45:29 -05:00
Jesse.zhang@amd.com
59fd50b866 drm/amdgpu: Add sysfs interface for sdma reset mask
Add the sysfs interface for sdma:
sdma_reset_mask

The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
   and print the strings in the order they are applied (Christian)

   check amdgpu_gpu_recovery  before creating sysfs file itself,
   and initialize supported_reset_types in IP version files (Lijo)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:45:18 -05:00
Sathishkumar S
edd345f7ef drm/amdgpu: Normalize reg offsets on VCN v4.0.3
Remote access to external AIDs isn't possible with VCN RRMT disabled
and it is disabled on SoCs with GC 9.4.4, so use only local offsets.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:11:49 -05:00
Lijo Lazar
7b1ebbe856 drm/amdgpu: Avoid kcq disable during reset
Reset sequence indicates that hardware already ran into a bad state.
Avoid sending unmap queue request to reset KCQ. This will also cover RAS
error scenarios which need a reset to recover, hence remove the check.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:11:42 -05:00
Lijo Lazar
fa31798582 drm/amdgpu: Fix map/unmap queue logic
In current logic, it calls ring_alloc followed by a ring_test. ring_test
in turn will call another ring_alloc. This is illegal usage as a
ring_alloc is expected to be closed properly with a ring_commit. Change
to commit the map/unmap queue packet first followed by a ring_test. Add a
comment about the usage of ring_test.

Also, reorder the current pre-condition checks of job hang or kiq ring
scheduler not ready. Without them being met, it is not useful to attempt
ring or memory allocations.

Fixes tag refers to the original patch which introduced this issue which
then got carried over into newer code.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 6c10b5cc4e ("drm/amdgpu: Remove duplicate code in gfx_v8_0.c")
2024-11-08 11:10:00 -05:00
Yang Wang
2bb7dced1c drm/amdgpu: fix ACA bank count boundary check error
fix ACA bank count boundary check error.

Fixes: f5e4cc8461 ("drm/amdgpu: implement RAS ACA driver framework")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:08:38 -05:00
Jesse.zhang@amd.com
6c8d1f4b04 drm/amdgpu: Add sysfs interface for gc reset mask
Add two sysfs interfaces for gfx and compute:
gfx_reset_mask
compute_reset_mask

These interfaces are read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
    and print the strings in the order they are applied (Christian)

    check amdgpu_gpu_recovery  before creating sysfs file itself,
    and initialize supported_reset_types in IP version files (Lijo)
v4: Fixing uninitialized variables (Tim)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:08:01 -05:00
chongli2
f4a3246a2c drm/amdgpu: fix return random value when multiple threads read registers via mes.
The currect code use the address "adev->mes.read_val_ptr" to
store the value read from register via mes.
So when multiple threads read register,
multiple threads have to share the one address,
and overwrite the value each other.

Assign an address by "amdgpu_device_wb_get" to store register value.
each thread will has an address to store register value.

Signed-off-by: chongli2 <chongli2@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:07:50 -05:00
Asad Kamal
04e9101766 drm/amdgpu: Add supported NPS modes node
Add sysfs node to show supported NPS mode for the
partition configuration selected using xcp_config

v2: Hide node if dynamic nps switch not supported

v3: Fix removal of files in case of error

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08 11:07:34 -05:00
Dave Airlie
1f8bdc31c7 amd-drm-next-6.13-2024-11-06:
amdgpu:
 - Misc cleanups
 - OLED fixes
 - DCN 4.x fixes
 - DCN 3.5 fixes
 - 8K fixes
 - IPS fixes
 - DSC fixes
 - S3 fix
 - KASAN fix
 - SMU13 fixes
 - fdinfo fixes
 - USB-C fixes
 - ACPI fix
 - Fix dummy page overlapping mappings
 - Fix workload profile handling
 - Add user control for zero RPM on SMU13
 - Cleaner shader updates
 - Stop syncing PRT map operations
 - Debugfs permissions fixes
 - Debugfs bounds check fix
 - RAS cleanups
 - Enforce isolation updates
 
 amdkfd:
 - Add topology cap flag for per queue reset
 - Add an interface to query whether KFD queues are present
 - Use dynamic allocation for get_cu_occupancy
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZyua0wAKCRC93/aFa7yZ
 2DjiAP9aBOidQQX+qgq9brFBcm6QlSOFKnOf8ZNKJEZ3yYOYBwEAv7EY0S2xnox1
 UrmLDd8APpVJZhDbQgJWaQUe09fkIgg=
 =G1Jb
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.13-2024-11-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.13-2024-11-06:

amdgpu:
- Misc cleanups
- OLED fixes
- DCN 4.x fixes
- DCN 3.5 fixes
- 8K fixes
- IPS fixes
- DSC fixes
- S3 fix
- KASAN fix
- SMU13 fixes
- fdinfo fixes
- USB-C fixes
- ACPI fix
- Fix dummy page overlapping mappings
- Fix workload profile handling
- Add user control for zero RPM on SMU13
- Cleaner shader updates
- Stop syncing PRT map operations
- Debugfs permissions fixes
- Debugfs bounds check fix
- RAS cleanups
- Enforce isolation updates

amdkfd:
- Add topology cap flag for per queue reset
- Add an interface to query whether KFD queues are present
- Use dynamic allocation for get_cu_occupancy

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241106163904.189108-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-11-08 12:04:24 +10:00
Alex Deucher
4d75b94680 drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()
Avoid a possible buffer overflow if size is larger than 4K.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f5d873f582)
Cc: stable@vger.kernel.org
2024-11-05 10:54:11 -05:00
Alex Deucher
f790a2c494 drm/amdgpu: Adjust debugfs eviction and IB access permissions
Users should not be able to run these.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7ba9395430)
Cc: stable@vger.kernel.org
2024-11-05 10:53:48 -05:00
Alex Deucher
b46dadf7e3 drm/amdgpu: Adjust debugfs register access permissions
Regular users shouldn't have read access.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c0cfd2e652)
Cc: stable@vger.kernel.org
2024-11-05 10:53:21 -05:00
Lijo Lazar
3ce3f85787 drm/amdgpu: Fix DPX valid mode check on GC 9.4.3
For DPX mode, the number of memory partitions supported should be less
than or equal to 2.

Fixes: 1589c82a10 ("drm/amdgpu: Check memory ranges for valid xcp mode")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 990c4f5807)
Cc: stable@vger.kernel.org
2024-11-05 10:52:40 -05:00
Alex Deucher
f5d873f582 drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()
Avoid a possible buffer overflow if size is larger than 4K.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05 10:35:59 -05:00
Alex Deucher
7ba9395430 drm/amdgpu: Adjust debugfs eviction and IB access permissions
Users should not be able to run these.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05 10:35:56 -05:00
Alex Deucher
c0cfd2e652 drm/amdgpu: Adjust debugfs register access permissions
Regular users shouldn't have read access.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05 10:35:49 -05:00
Christian König
bc56678184 drm/amdgpu: stop syncing PRT map operations
Requested by both Bas and Friedrich. Mapping PTEs as PRT doesn't need to
sync for anything.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05 10:35:39 -05:00
Prike Liang
e2e9743578 drm/amdgpu: set the right AMDGPU sg segment limitation
The driver needs to set the correct max_segment_size;
otherwise debug_dma_map_sg() will complain about the
over-mapping of the AMDGPU sg length as following:

WARNING: CPU: 6 PID: 1964 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2dc/0x370
[  364.049444] Modules linked in: veth amdgpu(OE) amdxcp drm_exec gpu_sched drm_buddy drm_ttm_helper ttm(OE) drm_suballoc_helper drm_display_helper drm_kms_helper i2c_algo_bit rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter br_netfilter nvme_fabrics overlay nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc amd_atl intel_rapl_msr intel_rapl_common sunrpc sch_fq_codel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg edac_mce_amd binfmt_misc snd_hda_codec snd_pci_acp6x snd_hda_core snd_acp_config snd_hwdep snd_soc_acpi kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 sha1_ssse3 aesni_intel snd_seq nls_iso8859_1 crypto_simd snd_seq_device cryptd snd_timer rapl input_leds snd
[  364.049532]  ipmi_devintf wmi_bmof ccp serio_raw k10temp sp5100_tco soundcore ipmi_msghandler cm32181 industrialio mac_hid msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables pci_stub crc32_pclmul nvme ahci libahci i2c_piix4 r8169 nvme_core i2c_designware_pci realtek i2c_ccgx_ucsi video wmi hid_generic cdc_ether usbnet usbhid hid r8152 mii
[  364.049576] CPU: 6 PID: 1964 Comm: rocminfo Tainted: G           OE      6.10.0-custom #492
[  364.049579] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021
[  364.049582] RIP: 0010:debug_dma_map_sg+0x2dc/0x370
[  364.049585] Code: 89 4d b8 e8 36 b1 86 00 8b 4d b8 48 8b 55 b0 44 8b 45 a8 4c 8b 4d a0 48 89 c6 48 c7 c7 00 4b 74 bc 4c 89 4d b8 e8 b4 73 f3 ff <0f> 0b 4c 8b 4d b8 8b 15 c8 2c b8 01 85 d2 0f 85 ee fd ff ff 8b 05
[  364.049588] RSP: 0018:ffff9ca600b57ac0 EFLAGS: 00010286
[  364.049590] RAX: 0000000000000000 RBX: ffff88b7c132b0c8 RCX: 0000000000000027
[  364.049592] RDX: ffff88bb0f521688 RSI: 0000000000000001 RDI: ffff88bb0f521680
[  364.049594] RBP: ffff9ca600b57b20 R08: 000000000000006f R09: ffff9ca600b57930
[  364.049596] R10: ffff9ca600b57928 R11: ffffffffbcb46328 R12: 0000000000000000
[  364.049597] R13: 0000000000000001 R14: ffff88b7c19c0700 R15: ffff88b7c9059800
[  364.049599] FS:  00007fb2d3516e80(0000) GS:ffff88bb0f500000(0000) knlGS:0000000000000000
[  364.049601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  364.049603] CR2: 000055610bd03598 CR3: 00000001049f6000 CR4: 0000000000350ef0
[  364.049605] Call Trace:
[  364.049607]  <TASK>
[  364.049609]  ? show_regs+0x6d/0x80
[  364.049614]  ? __warn+0x8c/0x140
[  364.049618]  ? debug_dma_map_sg+0x2dc/0x370
[  364.049621]  ? report_bug+0x193/0x1a0
[  364.049627]  ? handle_bug+0x46/0x80
[  364.049631]  ? exc_invalid_op+0x1d/0x80
[  364.049635]  ? asm_exc_invalid_op+0x1f/0x30
[  364.049642]  ? debug_dma_map_sg+0x2dc/0x370
[  364.049647]  __dma_map_sg_attrs+0x90/0xe0
[  364.049651]  dma_map_sgtable+0x25/0x40
[  364.049654]  amdgpu_bo_move+0x59a/0x850 [amdgpu]
[  364.049935]  ? srso_return_thunk+0x5/0x5f
[  364.049939]  ? amdgpu_ttm_tt_populate+0x5d/0xc0 [amdgpu]
[  364.050095]  ttm_bo_handle_move_mem+0xc3/0x180 [ttm]
[  364.050103]  ttm_bo_validate+0xc1/0x160 [ttm]
[  364.050108]  ? amdgpu_ttm_tt_get_user_pages+0xe5/0x1b0 [amdgpu]
[  364.050263]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0xa12/0xc90 [amdgpu]
[  364.050473]  kfd_ioctl_alloc_memory_of_gpu+0x16b/0x3b0 [amdgpu]
[  364.050680]  kfd_ioctl+0x3c2/0x530 [amdgpu]
[  364.050866]  ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu]
[  364.051054]  ? srso_return_thunk+0x5/0x5f
[  364.051057]  ? tomoyo_file_ioctl+0x20/0x30
[  364.051063]  __x64_sys_ioctl+0x9c/0xd0
[  364.051068]  x64_sys_call+0x1219/0x20d0
[  364.051073]  do_syscall_64+0x51/0x120
[  364.051077]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  364.051081] RIP: 0033:0x7fb2d2f1a94f

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05 10:35:26 -05:00
Lijo Lazar
990c4f5807 drm/amdgpu: Fix DPX valid mode check on GC 9.4.3
For DPX mode, the number of memory partitions supported should be less
than or equal to 2.

Fixes: 1589c82a10 ("drm/amdgpu: Check memory ranges for valid xcp mode")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05 10:34:06 -05:00
Srinivasan Shanmugam
949d817c78 drm/amdgpu/gfx11: Add cleaner shader for GFX11.0.3
This commit adds the cleaner shader microcode for GFX11.0.3 GPUs. The
cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).

Clearing these resources is important for ensuring data isolation
between different workloads running on the GPU. Without the cleaner
shader, residual data from a previous workload could potentially be
accessed by a subsequent workload, leading to data leaks and incorrect
computation results.

The cleaner shader microcode is represented as an array of 32-bit words
(`gfx_11_0_3_cleaner_shader_hex`). This array is the binary
representation of the cleaner shader code, which is written in a
low-level GPU instruction set.

When the cleaner shader feature is enabled, the AMDGPU driver loads this
array into a specific location in the GPU memory. The GPU then reads
this memory location to fetch and execute the cleaner shader
instructions.

The cleaner shader is executed automatically by the GPU at the end of
each workload, before the next workload starts. This ensures that all
GPU resources are in a clean state before the start of each workload.

This addition is part of the cleaner shader feature implementation. The
cleaner shader feature helps resource utilization by cleaning up GPU
resources after they are used. It also enhances security and reliability
by preventing data leaks between workloads.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05 10:33:56 -05:00
Alex Deucher
e89bd3615b drm/amdgpu/mes: fetch fw version from firmware header
We need this prior to the firmware being loaded so fetch
from the header.

v2: fetch directly from the firmware
v3: store both fw versions

Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05 10:33:39 -05:00
Thomas Weißschuh
b626816fdd sysfs: treewide: constify attribute callback of bin_is_visible()
The is_bin_visible() callbacks should not modify the struct
bin_attribute passed as argument.
Enforce this by marking the argument as const.

As there are not many callback implementers perform this change
throughout the tree at once.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05 14:00:28 +01:00
Antonio Quartulli
a6dd15981c drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported
acpi_evaluate_object() may return AE_NOT_FOUND (failure), which
would result in dereferencing buffer.pointer (obj) while being NULL.

Although this case may be unrealistic for the current code, it is
still better to protect against possible bugs.

Bail out also when status is AE_NOT_FOUND.

This fixes 1 FORWARD_NULL issue reported by Coverity
Report: CID 1600951:  Null pointer dereferences  (FORWARD_NULL)

Signed-off-by: Antonio Quartulli <antonio@mandelbit.com>
Fixes: c9b7c809b8 ("drm/amd: Guard against bad data for ATIF ACPI method")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 91c9e221fe)
Cc: stable@vger.kernel.org
2024-11-04 12:48:21 -05:00
Lijo Lazar
e5ad71779d drm/amdgpu: Add compatible NPS mode info
Populate the compatible NPS modes also for providing partition
configuration details through sysfs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:06:23 -05:00
Lijo Lazar
81db4eab28 drm/amdgpu: Skip IP coredump for RAS errors
For RAS errors, source of error is known. Skip the core dump of IP
states.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:06:23 -05:00
Lijo Lazar
047767ddc9 drm/amdgpu: Group gfx sysfs functions
Make amdgpu_gfx_sysfs_init/fini functions as common entry points for all
gfx related sysfs nodes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:06:23 -05:00
Candice Li
12e5df81bb drm/amdgpu: Add nps_mode in RAS init_flag
Add nps_mode in RAS init_flag.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:06:23 -05:00
Jesse Zhang
d2e3961ae3 drm/amdgpu: add amdgpu_sdma_sched_mask debugfs
Userspace wants to run jobs on a specific sdma ring for verification purposes.
This debugfs entry helps to disable or enable submitting jobs to a specific ring.
This entry is populated only if there are at least two or more cores in the sdma ip.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:06:15 -05:00
Jesse Zhang
c5c63d9cb5 drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs
compute/gfx may have multiple rings on some hardware.
In some cases, userspace wants to run jobs on a specific ring for validation purposes.
This debugfs entry helps to disable or enable submitting jobs to a specific ring.
This entry is populated only if there are at least two or more cores in the gfx/compute ip.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:05:53 -05:00
Prike Liang
b78612939d drm/amdgpu: Fix dummy_read_page overlapping mappings
Use the dma_map_page_attrs() with DMA_ATTR_SKIP_CPU_SYNC
attribute setting to handle the dummy page overlapping
mappings.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:05:30 -05:00
Victor Zhao
afe260df55 drm/amdgpu: skip amdgpu_device_cache_pci_state under sriov
Under sriov, host driver will save and restore vf pci cfg space during
reset. And during device init, under sriov, pci_restore_state happens after
fullaccess released, and it can have race condition with mmio protection
enable from host side leading to missing interrupts.

So skip amdgpu_device_cache_pci_state for sriov.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:05:30 -05:00
Antonio Quartulli
91c9e221fe drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported
acpi_evaluate_object() may return AE_NOT_FOUND (failure), which
would result in dereferencing buffer.pointer (obj) while being NULL.

Although this case may be unrealistic for the current code, it is
still better to protect against possible bugs.

Bail out also when status is AE_NOT_FOUND.

This fixes 1 FORWARD_NULL issue reported by Coverity
Report: CID 1600951:  Null pointer dereferences  (FORWARD_NULL)

Signed-off-by: Antonio Quartulli <antonio@mandelbit.com>
Fixes: c9b7c809b8 ("drm/amd: Guard against bad data for ATIF ACPI method")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 12:05:16 -05:00
R Sundar
b95264cf75 drm/amdgpu: use string choice helpers
Use string choice helpers for better readability.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202410161814.I6p2Nnux-lkp@intel.com/
Signed-off-by: R Sundar <prosunofficial@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:33:24 -05:00
jeffbai@aosc.io
0174c0791c drm/amdgpu: fix comment about amdgpu.abmlevel defaults
Since 040fdcde28 ("drm/amdgpu: respect the abmlevel module parameter value
if it is set"), the default value for amdgpu.abmlevel was set to -1, or auto.
However, the comment explaining the default value was not updated to reflect
the change (-1, or auto; not -1, or disabled).

Clarify that the default value (-1) means auto.

Fixes: 040fdcde28 ("drm/amdgpu: respect the abmlevel module parameter value if it is set")
Reported-by: Ruikai Liu <rickliu2000@outlook.com>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:32:57 -05:00
Tvrtko Ursulin
aa2ac51c8e drm/amdgpu: Expose special on chip memory pools in fdinfo
In the past these specialized on chip memory pools were reported as system
memory (aka 'cpu') which was not correct and misleading. That has since
been removed so lets make them visible as their own respective memory
regions.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Yunxiang Li <Yunxiang.Li@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:32:52 -05:00
Tvrtko Ursulin
cd3037f3fc drm/amdgpu: Stop reporting special chip memory pools as CPU memory in fdinfo
So far these specialized on chip memory pools were reported as system
memory (aka 'cpu') which is not correct and misleading. Lets remove that
and consider later making them visible as their own thing.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Yunxiang Li <Yunxiang.Li@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:32:47 -05:00
Yunxiang Li
fdee0872a2 drm/amdgpu: stop tracking visible memory stats
Since on modern systems all of vram can be made visible anyways, to
simplify the new implementation, drops tracking how much memory is
visible for now. If this is really needed we can add it back on top of
the new implementation, or just report all the BOs as visible.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:32:40 -05:00
Yunxiang Li
f286365038 drm/amdgpu: make drm-memory-* report resident memory
The old behavior reports the resident memory usage for this key and the
documentation say so as well. However this was accidentally changed to
include buffers that was evicted.

Fixes: 04bdba4654 ("drm/amdgpu: Use drm_print_memory_stats helper from fdinfo")
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:30:58 -05:00
Li Huafei
a1144da794 drm/amdgpu: Fix the memory allocation issue in amdgpu_discovery_get_nps_info()
Fix two issues with memory allocation in amdgpu_discovery_get_nps_info()
for mem_ranges:

 - Add a check for allocation failure to avoid dereferencing a null
   pointer.

 - As suggested by Christophe, use kvcalloc() for memory allocation,
   which checks for multiplication overflow.

Additionally, assign the output parameters nps_type and range_cnt after
the kvcalloc() call to prevent modifying the output parameters in case
of an error return.

Fixes: b194d21b9b ("drm/amdgpu: Use NPS ranges from discovery table")
Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:30:28 -05:00
Alex Deucher
35984fd4a0 drm/amdgpu: add ring reset messages
Add messages to make it clear when a per ring reset
happens.  This is helpful for debugging and aligns with
other reset methods.

v2: add ring name in success/fail messages (Lijo)

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:29:49 -05:00
Alex Deucher
efe6a87743 drm/amdgpu: fix fairness in enforce isolation handling
Make sure KFD gets a turn when serializing access to
the GC IP.  Currently non-KFD jobs can starve KFD if they
submit often enough.  This patch prevents that by stalling
non-KFD if its time period has elapsed.

v2: fix units
v3: check enablement properly

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:27:04 -05:00
Alex Deucher
8fe7cf58ff drm/amdkfd: add an interface to query whether is KFD is active
Add an interface to query whether KFD has any active queues.

v2: fix build issues

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04 11:25:42 -05:00
Al Viro
6348be02ee fdget(), trivial conversions
fdget() is the first thing done in scope, all matching fdput() are
immediately followed by leaving the scope.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03 01:28:06 -05:00
Dave Airlie
8a07b2623e Merge tag 'drm-misc-next-2024-10-31' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.13:

All of the previous pull request, with MORE!

Core Changes:
- Update documentation for scheduler start/stop and job init.
- Add dedede and sm8350-hdk hardware to ci runs.

Driver Changes:
- Small fixes and cleanups to panfrost, omap, nouveau, ivpu, zynqmp, v3d,
  panthor docs, and leadtek-ltk050h3146w.
- Crashdump support for qaic.
- Support DP compliance in zynqmp.
- Add Samsung S6E88A0-AMS427AP24 panel.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/deeef745-f3fb-4e85-a9d0-e8d38d43c1cf@linux.intel.com
2024-11-01 13:46:03 +10:00
Dave Airlie
e7103f8785 amd-drm-next-6.13-2024-10-25:
amdgpu:
 - SDMA queue reset support
 - SMU 13.0.6 updates
 - Add debugfs interface to help limit jpeg queue scheduling for testing
 - JPEG 4.0.3 updates
 - Initial runtime repartitioning support
 - GFX9 fixes
 - Misc code cleanups
 - Rework IP structures to better handle multiple instances of an IP
 - DML updates
 - DSC fixes
 - HDR fixes
 - Brightness control updates
 - Runtime pm cleanup
 - DMCUB fixes
 - DCN 3.5 updates
 - Struct drm_edid cleanup
 - Fetch EDID from _DDC if available
 - Ring noop optimizations
 - MES logging fixes
 - 3DLUT fixes
 - DCN 4.x fixes
 - SMU 13.x fixes
 - Fixes for set_soft_freq_range()
 - ACPI fixes
 - SMU 14.x updates
 - PSR-SU fixes
 - fdinfo cleanup
 - DCN documentation updates
 
 amdkfd:
 - Misc code cleanups
 - Increase event FIFO size
 - Copy wave state fixes for SDMA
 
 radeon:
 - Fix possible overflow in packet3 check
 - Late init connector fix
 - Always set GEM function pointer
 
 Documentation:
 - Update drm-memory documentation
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZxua4QAKCRC93/aFa7yZ
 2C/TAQC3PZqI36hkKOPwdcbFq2ydK1r3xiG7Q60K0PxpTnsqKQEAuF1MEuTXfamv
 mVqZfJuqF3wWXzoqM190qf3947f0eQk=
 =MSZa
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.13-2024-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.13-2024-10-25:

amdgpu:
- SDMA queue reset support
- SMU 13.0.6 updates
- Add debugfs interface to help limit jpeg queue scheduling for testing
- JPEG 4.0.3 updates
- Initial runtime repartitioning support
- GFX9 fixes
- Misc code cleanups
- Rework IP structures to better handle multiple instances of an IP
- DML updates
- DSC fixes
- HDR fixes
- Brightness control updates
- Runtime pm cleanup
- DMCUB fixes
- DCN 3.5 updates
- Struct drm_edid cleanup
- Fetch EDID from _DDC if available
- Ring noop optimizations
- MES logging fixes
- 3DLUT fixes
- DCN 4.x fixes
- SMU 13.x fixes
- Fixes for set_soft_freq_range()
- ACPI fixes
- SMU 14.x updates
- PSR-SU fixes
- fdinfo cleanup
- DCN documentation updates

amdkfd:
- Misc code cleanups
- Increase event FIFO size
- Copy wave state fixes for SDMA

radeon:
- Fix possible overflow in packet3 check
- Late init connector fix
- Always set GEM function pointer

Documentation:
- Update drm-memory documentation

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241025132336.2416913-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-10-29 18:25:24 +10:00
Yang Wang
7daa0f6b28 drm/amdgpu: optimize ACA log print
- skip to print CE ACA log.
- optimize ACA log print for MCA.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:41:26 -04:00
Le Ma
ea9d8863da drm/amdgpu: add generic func to check if ta fw is applicable
Separated xgmi ta is required for specific APU, and driver needs
parse the ta binary properly with aux xgmi ta packed.

v2: make the check function more generic (Lijo)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:41:13 -04:00
Prike Liang
d5e3d8a2a6 drm/amdgpu: clean up the suspend_complete
To check the status of S3 suspend completion,
use the PM core pm_suspend_global_flags bit(1)
to detect S3 abort events. Therefore, clean up
the AMDGPU driver's private flag suspend_complete.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:40:58 -04:00
Prike Liang
58a8c756fc drm/amdgpu: correct the S3 abort check condition
In the normal S3 entry, the TOS cycle counter is not
reset during BIOS execution the _S3 method, so it doesn't
determine whether the _S3 method is executed exactly.
Howerver, the PM core performs the S3 suspend will set the
PM_SUSPEND_FLAG_FW_RESUME bit if all the devices suspend
successfully. Therefore, drivers can check the
pm_suspend_global_flags bit(1) to detect the S3 suspend
abort event.

Fixes: 6704dbf719 ("drm/amdgpu: update suspend status for aborting from deeper suspend")
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:39:23 -04:00
Christian König
57e92d991e drm/amdgpu: drop volatile from ring buffer
Volatile only prevents the compiler from re-ordering reads and writes.
Since we always only modify the ring buffer from one CPU thread and have
an explicit barrier before signaling the HW this should have no effect at
all and just prevents compiler optimisations.

While at it drop the local variables as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:32:03 -04:00
Dan Carpenter
dac64cb3e0 drm/amdgpu: Fix amdgpu_ip_block_hw_fini()
This NULL check is reversed so the function doesn't work.

Fixes: dad01f93f4 ("drm/amdgpu: validate hw_fini before function call")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/f4fc849e-4e76-4448-8657-caa4c69910b0@stanley.mountain
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:07:10 -04:00
Kent Russell
3c0be69bad amdgpu: Don't print L2 status if there's nothing to print
If a 2nd fault comes in before the 1st is handled, the 1st fault will
clear out the FAULT STATUS registers before the 2nd fault is handled.
Thus we get a lot of zeroes. If status=0, just skip the L2 fault status
information, to avoid confusion of why some VM fault status prints in
dmesg are all zeroes.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:06:51 -04:00
Jonathan Kim
e46738a58f drm/amdkfd: sever xgmi io link if host driver has disable sharing
Host drivers can create partial hives per guest by disabling xgmi sharing
between certain peers in the main hive.
Typically, these partial hives are fully connected per guest session.
In the event that the host makes a mistake by adding a non-shared node
to a guest session, have the KFD reflect sharing disabled by severing
the IO link.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: James Yao <yiqing.yao@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:06:34 -04:00
Lang Yu
46186667f9 drm/amdgpu: refine error handling in amdgpu_ttm_tt_pin_userptr
Free sg table when dma_map_sgtable() failed to avoid memory leak.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:06:24 -04:00
Lijo Lazar
d37bc6a4ed drm/amdgpu: Fix the logic for NPS request failure
On a hive, NPS request is placed by the first one for all devices in the
hive. If the request fails, mark the mode as UNKNOWN so that subsequent
devices on unload don't request it. Also, fix the mutex double lock
issue in error condition, should have been mutex_unlock.

Fixes: ee52489d12 ("drm/amdgpu: Place NPS mode request on unload")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:04:59 -04:00
YiPeng Chai
3d0ffc6418 drm/amdgpu: Reduce redundant gpu resets on nbio v7.4
On nbio v7.4, ras controller interrupt and athub
interrupt are generated after injecting UE to PCIE,
but gpu reset only needs to be triggered once.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:04:34 -04:00
Frank Min
108bc59fe8 drm/amdgpu: fix random data corruption for sdma 7
There is random data corruption caused by const fill, this is caused by
write compression mode not correctly configured.

So correct compression mode for const fill.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 75400f8d6e)
Cc: stable@vger.kernel.org # 6.11.x
2024-10-22 18:11:43 -04:00
Mario Limonciello
bf58f03931 drm/amd: Guard against bad data for ATIF ACPI method
If a BIOS provides bad data in response to an ATIF method call
this causes a NULL pointer dereference in the caller.

```
? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1))
? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434)
? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2))
? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1))
? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642)
? exc_page_fault (arch/x86/mm/fault.c:1542)
? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu
```

It has been encountered on at least one system, so guard for it.

Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c9b7c809b8)
Cc: stable@vger.kernel.org
2024-10-22 18:08:12 -04:00
Prike Liang
32e7ee293f drm/amdgpu: Dereference the ATCS ACPI buffer
Need to dereference the atcs acpi buffer after
the method is executed, otherwise it will result in
a memory leak.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:51:19 -04:00
Lijo Lazar
591aec150a drm/amdgpu: Save VCN shared memory with init reset
VCN shared memory is in framebuffer and there are some flags initialized
during sw_init. Ideally, such programming should be during hw_init.

Make sure the flags are saved during reset on initialization since that
reset will affect frame buffer region. For clarity, separate it out to
another function.

Fixes: 1e4acf4d93 ("drm/amdgpu: Add reset on init handler for XGMI")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Hao Zhou <hao.zhou@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:51:19 -04:00
Sunil Khatri
971d8e1c3f drm/amdgpu: clean unused functions of uvd/vcn/vce
Some of the functions pointers of amdgpu_ip_funcs
are not used and are left commented out. Hence this
cleans those up which arent used.

Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:51:19 -04:00
Victor Lu
8b22f04833 drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts for vega20_ih
Port this change to vega20_ih.c:
commit afbf7955ff ("drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts")

Original commit message:
"Why:
Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit
if RB_ENABLE is not set.

How to fix:
Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set.
The RB_ENABLE bit is required to be set, together with
WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit
would clear the RB_OVERFLOW."

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:40 -04:00
Sunil Khatri
0016e87054 drm/amdgpu: Clean the functions pointer set as NULL
We dont need to set the functions to NULL which arent
needed as global structure members are by default
set to zero or NULL for pointers.

Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
8231e3af96 drm/amdgpu: clean the dummy soft_reset functions
Remove the dummy soft_reset functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
f13c7da118 drm/amdgpu: clean the dummy wait_for_idle functions
Remove the dummy wait_for_idle functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
aa980de3b5 drm/amdgpu: clean the dummy suspend functions
Remove the dummy suspend functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
fbcd0ad5d1 drm/amdgpu: clean the dummy resume functions
Remove the dummy resume functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
780002b654 drm/amdgpu: validate wait_for_idle before function call
Before making a function call to wait_for_idle,
validate the function pointer like we do in sw_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
502d76308d drm/amdgpu: validate resume before function call
Before making a function call to resume, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_resume where
same checks and calls are repeated.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
e095026f00 drm/amdgpu: validate suspend before function call
Before making a function call to suspend, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_suspend where
same checks and calls are repeated.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
dad01f93f4 drm/amdgpu: validate hw_fini before function call
Before making a function call to hw_fini, validate
the function pointer like we do in sw_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Srinivasan Shanmugam
9343b904e7 drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.2
This commit adds the cleaner shader microcode for GFX9.4.2 GPUs. The
cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).

Clearing these resources is important for ensuring data isolation
between different workloads running on the GPU. Without the cleaner
shader, residual data from a previous workload could potentially be
accessed by a subsequent workload, leading to data leaks and incorrect
computation results.

The cleaner shader microcode is represented as an array of 32-bit words
(`gfx_9_4_2_cleaner_shader_hex`). This array is the binary
representation of the cleaner shader code, which is written in a
low-level GPU instruction set.

Also, this patch updates the `gfx_v9_0_sw_init` function to initialize
the cleaner shader if the MEC firmware version is 88 or higher. It sets
the `cleaner_shader_ptr` and `cleaner_shader_size` to the appropriate
values and attempts to initialize the cleaner shader.

When the cleaner shader feature is enabled, the AMDGPU driver loads this
array into a specific location in the GPU memory. The GPU then reads
this memory location to fetch and execute the cleaner shader
instructions.

The cleaner shader is executed automatically by the GPU at the end of
each workload, before the next workload starts. This ensures that all
GPU resources are in a clean state before the start of each workload.

This change ensures that the GPU memory is properly cleared between
different processes, preventing data leakage and enhancing security. It
also aligns with the serialization mechanism between KGD and KFD,
ensuring that the GPU state is consistent across different workloads.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Frank Min
c379dcf797 drm/amdgpu: fix typo for sdma6 constant fill packet
Fix typo for sdma6 constant fill packet

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:38 -04:00
Frank Min
75400f8d6e drm/amdgpu: fix random data corruption for sdma 7
There is random data corruption caused by const fill, this is caused by
write compression mode not correctly configured.

So correct compression mode for const fill.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:38 -04:00
Sunil Khatri
5ebdb6fd60 drm/amdgpu: clean the dummy sw_fini functions
Remove the dummy sw_fini functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Lijo Lazar
785504dd7f drm/amdgpu: Use SPX as default in partition config
In certain cases - ex: when a reset is required on initialization - XCP
manager won't have a valid partition mode. In such cases, use SPX as the
default selected mode for which partition configuration details are
populated.

Fixes: 4ae86dc878 ("drm/amdgpu: Add sysfs nodes to get xcp details")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Hao Zhou <hao.zhou@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Sunil Khatri
278b8fbf06 drm/amdgpu: validate sw_fini before function call
Before making a function call to sw_fini, validate
the function pointer like we do in sw_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Sunil Khatri
7fd12379bd drm/amdgpu: clean the dummy sw_init functions
Remove the dummy sw_init functions for all
IP blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Sunil Khatri
df6e463d8f drm/amdgpu: validate sw_init before function call
Before making a function call to sw_init, validate
the function pointer like we do in late_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Xiaogang Chen
10112bf828 drm/amdkfd: Not restore userptr buffer if kfd process has been removed
When kfd process has been terminated not restore userptr buffer after mmu
notifier invalidates a range.

Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:12 -04:00
Lijo Lazar
8e3a3e847e drm/amdgpu: Zero-initialize mqd backup memory
Zero-initialize mqd backup memory, otherwise the check for
'already-backed-up' could go wrong.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:12 -04:00
Alex Deucher
32f0028969 Revert "drm/amdgpu/gfx9: put queue resets behind a debug option"
This reverts commit 7c1a2d8aba.

Extended validation has completed successfully, so enable
these features by default.

Acked-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jonathan Kim <jonathan.kim@amd.com>
Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>
2024-10-22 17:50:11 -04:00
Zhu Lingshan
9ee8ab245c drm/amdgpu: init saw registers for mmhub v1.0
This commits init registers in the Stand Along Walker
for mmhub v1.0, to support ISP use cases.

Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
Reported-and-tested-by: Du Bin <bin.du@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:49:38 -04:00
Alex Deucher
d2f57b6d89 drm/amdgpu/discovery: add ISP discovery entries for old APUs
Raven1/2 and Picasso have ISP 2.0.0, however their ISP blocks
are not in the IP discovery table yet.

This commit fixes this issue by adding new ISP entries for
Raven and Picasso in the IP discovery table.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:49:30 -04:00
Mario Limonciello
c9b7c809b8 drm/amd: Guard against bad data for ATIF ACPI method
If a BIOS provides bad data in response to an ATIF method call
this causes a NULL pointer dereference in the caller.

```
? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1))
? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434)
? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2))
? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1))
? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642)
? exc_page_fault (arch/x86/mm/fault.c:1542)
? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu
```

It has been encountered on at least one system, so guard for it.

Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:22:44 -04:00
Thomas Zimmermann
1f828b4dd4 drm/client: Make client support optional
Only build client code if DRM_CLIENT has been selected. Automatially
do so if one of the default clients has been enabled. If client support
has been disabled, the helpers for client-related events are empty and
the regular client functions are not present.

Amdgpu has an internal DRM client, so it has to select DRM_CLIENT by
itself unconditionally.

v3:
- provide empty drm_client_debugfs_init() if DRM_CLIENT=n (kernel
  test robot)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014085740.582287-12-tzimmermann@suse.de
2024-10-18 09:23:03 +02:00
Thomas Zimmermann
4cf50bae05 drm/amdgpu: Suspend and resume internal clients with client helpers
Replace calls to drm_fb_helper_set_suspend_unlocked() with calls
to the client functions drm_client_dev_suspend() and
drm_client_dev_resume(). Any registered in-kernel client will now
receive suspend and resume events.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014085740.582287-9-tzimmermann@suse.de
2024-10-18 09:23:03 +02:00
Srinivasan Shanmugam
e7457532cb drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ring
This patch addresses a double unlock issue in the amdgpu_mes_add_ring
function. The mutex was being unlocked twice under certain error
conditions, which could lead to undefined behavior.

The fix ensures that the mutex is unlocked only once before jumping to
the clean_up_memory label. The unlock operation is moved to just before
the goto statement within the conditional block that checks the return
value of amdgpu_ring_init. This prevents the second unlock attempt after
the clean_up_memory label, which is no longer necessary as the mutex is
already unlocked by this point in the code flow.

This change resolves the potential double unlock and maintains the
correct mutex handling throughout the function.

Fixes below:
Commit d0c423b647 ("drm/amdgpu/mes: use ring for kernel queue
submission"), leads to the following Smatch static checker warning:

	drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring()
	warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213)

drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
    1143 int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id,
    1144                         int queue_type, int idx,
    1145                         struct amdgpu_mes_ctx_data *ctx_data,
    1146                         struct amdgpu_ring **out)
    1147 {
    1148         struct amdgpu_ring *ring;
    1149         struct amdgpu_mes_gang *gang;
    1150         struct amdgpu_mes_queue_properties qprops = {0};
    1151         int r, queue_id, pasid;
    1152
    1153         /*
    1154          * Avoid taking any other locks under MES lock to avoid circular
    1155          * lock dependencies.
    1156          */
    1157         amdgpu_mes_lock(&adev->mes);
    1158         gang = idr_find(&adev->mes.gang_id_idr, gang_id);
    1159         if (!gang) {
    1160                 DRM_ERROR("gang id %d doesn't exist\n", gang_id);
    1161                 amdgpu_mes_unlock(&adev->mes);
    1162                 return -EINVAL;
    1163         }
    1164         pasid = gang->process->pasid;
    1165
    1166         ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL);
    1167         if (!ring) {
    1168                 amdgpu_mes_unlock(&adev->mes);
    1169                 return -ENOMEM;
    1170         }
    1171
    1172         ring->ring_obj = NULL;
    1173         ring->use_doorbell = true;
    1174         ring->is_mes_queue = true;
    1175         ring->mes_ctx = ctx_data;
    1176         ring->idx = idx;
    1177         ring->no_scheduler = true;
    1178
    1179         if (queue_type == AMDGPU_RING_TYPE_COMPUTE) {
    1180                 int offset = offsetof(struct amdgpu_mes_ctx_meta_data,
    1181                                       compute[ring->idx].mec_hpd);
    1182                 ring->eop_gpu_addr =
    1183                         amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset);
    1184         }
    1185
    1186         switch (queue_type) {
    1187         case AMDGPU_RING_TYPE_GFX:
    1188                 ring->funcs = adev->gfx.gfx_ring[0].funcs;
    1189                 ring->me = adev->gfx.gfx_ring[0].me;
    1190                 ring->pipe = adev->gfx.gfx_ring[0].pipe;
    1191                 break;
    1192         case AMDGPU_RING_TYPE_COMPUTE:
    1193                 ring->funcs = adev->gfx.compute_ring[0].funcs;
    1194                 ring->me = adev->gfx.compute_ring[0].me;
    1195                 ring->pipe = adev->gfx.compute_ring[0].pipe;
    1196                 break;
    1197         case AMDGPU_RING_TYPE_SDMA:
    1198                 ring->funcs = adev->sdma.instance[0].ring.funcs;
    1199                 break;
    1200         default:
    1201                 BUG();
    1202         }
    1203
    1204         r = amdgpu_ring_init(adev, ring, 1024, NULL, 0,
    1205                              AMDGPU_RING_PRIO_DEFAULT, NULL);
    1206         if (r)
    1207                 goto clean_up_memory;
    1208
    1209         amdgpu_mes_ring_to_queue_props(adev, ring, &qprops);
    1210
    1211         dma_fence_wait(gang->process->vm->last_update, false);
    1212         dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false);
    1213         amdgpu_mes_unlock(&adev->mes);
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    1214
    1215         r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id);
    1216         if (r)
    1217                 goto clean_up_ring;
                         ^^^^^^^^^^^^^^^^^^

    1218
    1219         ring->hw_queue_id = queue_id;
    1220         ring->doorbell_index = qprops.doorbell_off;
    1221
    1222         if (queue_type == AMDGPU_RING_TYPE_GFX)
    1223                 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id);
    1224         else if (queue_type == AMDGPU_RING_TYPE_COMPUTE)
    1225                 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id,
    1226                         queue_id);
    1227         else if (queue_type == AMDGPU_RING_TYPE_SDMA)
    1228                 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id,
    1229                         queue_id);
    1230         else
    1231                 BUG();
    1232
    1233         *out = ring;
    1234         return 0;
    1235
    1236 clean_up_ring:
    1237         amdgpu_ring_fini(ring);
    1238 clean_up_memory:
    1239         kfree(ring);
--> 1240         amdgpu_mes_unlock(&adev->mes);
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    1241         return r;
    1242 }

Fixes: d0c423b647 ("drm/amdgpu/mes: use ring for kernel queue submission")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Jack Xiao <Jack.Xiao@amd.com>
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bfaf188360)
2024-10-15 11:49:08 -04:00
Michael Chen
7760d7f93c drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipes
With Unified MES enabled in gfx12, need separate event log buffer for the
2 MES pipes to avoid data overwrite.

Signed-off-by: Michael Chen <michael.chen@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 144df260f3)
Cc: stable@vger.kernel.org # 6.11.x
2024-10-15 11:48:36 -04:00
Mohammed Anees
c0ec082f10 drm/amdgpu: prevent BO_HANDLES error from being overwritten
Before this patch, if multiple BO_HANDLES chunks were submitted,
the error -EINVAL would be correctly set but could be overwritten
by the return value from amdgpu_cs_p1_bo_handles(). This patch
ensures that if there are multiple BO_HANDLES, we stop.

Fixes: fec5f8e8c6 ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit")
Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 40f2cd9882)
Cc: stable@vger.kernel.org
2024-10-15 11:48:05 -04:00
Alex Deucher
d2c72d96df drm/amdgpu: enable enforce_isolation sysfs node on VFs
It should be enabled on both bare metal and VFs.

Fixes: e189be9b2e ("drm/amdgpu: Add enforce_isolation sysfs attribute")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Cc: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
(cherry picked from commit dc8847b054)
2024-10-15 11:47:41 -04:00
Dan Carpenter
9f7e94af35 drm/amdgpu: Fix off by one in current_memory_partition_show()
The >= ARRAY_SIZE() should be > ARRAY_SIZE() to prevent an out of
bounds read.

Fixes: 012be6f22c ("drm/amdgpu: Add sysfs interfaces for NPS mode")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:26:35 -04:00
Lijo Lazar
d25d26b8a8 drm/amdgpu: Wait for reset on init completion
When reset on initialization is requested, wait for the reset to finish.
In cases where module is loaded after boot, this makes sure all
initialization work is done after a successful return of modprobe.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ramesh Errabolu <ramesh.errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:22:26 -04:00
Srinivasan Shanmugam
bfaf188360 drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ring
This patch addresses a double unlock issue in the amdgpu_mes_add_ring
function. The mutex was being unlocked twice under certain error
conditions, which could lead to undefined behavior.

The fix ensures that the mutex is unlocked only once before jumping to
the clean_up_memory label. The unlock operation is moved to just before
the goto statement within the conditional block that checks the return
value of amdgpu_ring_init. This prevents the second unlock attempt after
the clean_up_memory label, which is no longer necessary as the mutex is
already unlocked by this point in the code flow.

This change resolves the potential double unlock and maintains the
correct mutex handling throughout the function.

Fixes below:
Commit d0c423b647 ("drm/amdgpu/mes: use ring for kernel queue
submission"), leads to the following Smatch static checker warning:

	drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring()
	warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213)

drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
    1143 int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id,
    1144                         int queue_type, int idx,
    1145                         struct amdgpu_mes_ctx_data *ctx_data,
    1146                         struct amdgpu_ring **out)
    1147 {
    1148         struct amdgpu_ring *ring;
    1149         struct amdgpu_mes_gang *gang;
    1150         struct amdgpu_mes_queue_properties qprops = {0};
    1151         int r, queue_id, pasid;
    1152
    1153         /*
    1154          * Avoid taking any other locks under MES lock to avoid circular
    1155          * lock dependencies.
    1156          */
    1157         amdgpu_mes_lock(&adev->mes);
    1158         gang = idr_find(&adev->mes.gang_id_idr, gang_id);
    1159         if (!gang) {
    1160                 DRM_ERROR("gang id %d doesn't exist\n", gang_id);
    1161                 amdgpu_mes_unlock(&adev->mes);
    1162                 return -EINVAL;
    1163         }
    1164         pasid = gang->process->pasid;
    1165
    1166         ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL);
    1167         if (!ring) {
    1168                 amdgpu_mes_unlock(&adev->mes);
    1169                 return -ENOMEM;
    1170         }
    1171
    1172         ring->ring_obj = NULL;
    1173         ring->use_doorbell = true;
    1174         ring->is_mes_queue = true;
    1175         ring->mes_ctx = ctx_data;
    1176         ring->idx = idx;
    1177         ring->no_scheduler = true;
    1178
    1179         if (queue_type == AMDGPU_RING_TYPE_COMPUTE) {
    1180                 int offset = offsetof(struct amdgpu_mes_ctx_meta_data,
    1181                                       compute[ring->idx].mec_hpd);
    1182                 ring->eop_gpu_addr =
    1183                         amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset);
    1184         }
    1185
    1186         switch (queue_type) {
    1187         case AMDGPU_RING_TYPE_GFX:
    1188                 ring->funcs = adev->gfx.gfx_ring[0].funcs;
    1189                 ring->me = adev->gfx.gfx_ring[0].me;
    1190                 ring->pipe = adev->gfx.gfx_ring[0].pipe;
    1191                 break;
    1192         case AMDGPU_RING_TYPE_COMPUTE:
    1193                 ring->funcs = adev->gfx.compute_ring[0].funcs;
    1194                 ring->me = adev->gfx.compute_ring[0].me;
    1195                 ring->pipe = adev->gfx.compute_ring[0].pipe;
    1196                 break;
    1197         case AMDGPU_RING_TYPE_SDMA:
    1198                 ring->funcs = adev->sdma.instance[0].ring.funcs;
    1199                 break;
    1200         default:
    1201                 BUG();
    1202         }
    1203
    1204         r = amdgpu_ring_init(adev, ring, 1024, NULL, 0,
    1205                              AMDGPU_RING_PRIO_DEFAULT, NULL);
    1206         if (r)
    1207                 goto clean_up_memory;
    1208
    1209         amdgpu_mes_ring_to_queue_props(adev, ring, &qprops);
    1210
    1211         dma_fence_wait(gang->process->vm->last_update, false);
    1212         dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false);
    1213         amdgpu_mes_unlock(&adev->mes);
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    1214
    1215         r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id);
    1216         if (r)
    1217                 goto clean_up_ring;
                         ^^^^^^^^^^^^^^^^^^

    1218
    1219         ring->hw_queue_id = queue_id;
    1220         ring->doorbell_index = qprops.doorbell_off;
    1221
    1222         if (queue_type == AMDGPU_RING_TYPE_GFX)
    1223                 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id);
    1224         else if (queue_type == AMDGPU_RING_TYPE_COMPUTE)
    1225                 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id,
    1226                         queue_id);
    1227         else if (queue_type == AMDGPU_RING_TYPE_SDMA)
    1228                 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id,
    1229                         queue_id);
    1230         else
    1231                 BUG();
    1232
    1233         *out = ring;
    1234         return 0;
    1235
    1236 clean_up_ring:
    1237         amdgpu_ring_fini(ring);
    1238 clean_up_memory:
    1239         kfree(ring);
--> 1240         amdgpu_mes_unlock(&adev->mes);
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    1241         return r;
    1242 }

Fixes: d0c423b647 ("drm/amdgpu/mes: use ring for kernel queue submission")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Jack Xiao <Jack.Xiao@amd.com>
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:21:31 -04:00
Michael Chen
144df260f3 drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipes
With Unified MES enabled in gfx12, need separate event log buffer for the
2 MES pipes to avoid data overwrite.

Signed-off-by: Michael Chen <michael.chen@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:21:08 -04:00
Lijo Lazar
f8588f051d drm/amdgpu: Show current compute partition on VF
Enable sysfs node for current compute partition mode on VFs also.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com>
Tested-by: Vignesh Chander <Vignesh.Chander@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:20:32 -04:00
Lijo Lazar
b3c6871692 drm/amdgpu: Fetch NPS mode for GCv9.4.3 VFs
Use the memory ranges published in discovery table to deduce NPS mode
of GC v9.4.3 VFs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com>
Tested-by: Vignesh Chander <Vignesh.Chander@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:19:46 -04:00
Mohammed Anees
40f2cd9882 drm/amdgpu: prevent BO_HANDLES error from being overwritten
Before this patch, if multiple BO_HANDLES chunks were submitted,
the error -EINVAL would be correctly set but could be overwritten
by the return value from amdgpu_cs_p1_bo_handles(). This patch
ensures that if there are multiple BO_HANDLES, we stop.

Fixes: fec5f8e8c6 ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit")
Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:18:49 -04:00
Alex Deucher
dc8847b054 drm/amdgpu: enable enforce_isolation sysfs node on VFs
It should be enabled on both bare metal and VFs.

Fixes: e189be9b2e ("drm/amdgpu: Add enforce_isolation sysfs attribute")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Cc: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
2024-10-15 11:17:32 -04:00
Lijo Lazar
c29aeadf0b drm/amdgpu: Add NPS switch support for GC 9.4.3
Add dynamic NPS switch support for GC 9.4.3 variants. Only GC v9.4.3 and
GC v9.4.4 currently support this. NPS switch is only supported if an SOC
supports multiple NPS modes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:17:25 -04:00
Srinivasan Shanmugam
d594ddc686 drm/amdgpu/gfx12: Apply Isolation Enforcement to GFX & Compute rings
This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v12_0 module.

The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.

`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.

`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.

These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:17:16 -04:00
Sunil Khatri
f83fc3abd5 drm/amdgpu: optimize fn gfx_v12_ring_insert_nop
Optimize gfx_v12_ring_insert_nop() to call
optimized version of amdgpu_ring_insert_nop
instead of calling amdgpu_ring_write for number
of nop times.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:17:09 -04:00
Sunil Khatri
950dcb0158 drm/amdgpu: optimize fn gfx_v11_ring_insert_nop
Optimize gfx_v11_ring_insert_nop() to call
optimized version of amdgpu_ring_insert_nop
instead of calling amdgpu_ring_write for number
of nop times.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:17:03 -04:00
Sunil Khatri
6aa902938b drm/amdgpu: optimize fn gfx_v10_ring_insert_nop
Optimize gfx_v10_ring_insert_nop() to call
optimized version of amdgpu_ring_insert_nop
instead of calling amdgpu_ring_write for number
of nop times.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:16:57 -04:00
Sunil Khatri
1537638ae3 drm/amdgpu: optimize fn gfx_v9_ring_insert_nop
Optimize gfx_v9_ring_insert_nop() to call
optimized version of amdgpu_ring_insert_nop
instead of calling amdgpu_ring_write for number
of nop times.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:16:52 -04:00
Sunil Khatri
a23575bb3c drm/amdgpu: optimize fn gfx_v9_4_3_ring_insert_nop
Optimize gfx_v9_4_3_ring_insert_nop() to call
optimized version of amdgpu_ring_insert_nop
instead of calling amdgpu_ring_write for number
of nop times.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:16:46 -04:00
Sunil Khatri
ea4e4754c9 drm/amdgpu: optimize insert_nop using multi dwords
Optimize the ring_insert_nop fn for n dwords in one
step rather then call to amdgpu_ring_write for each
nop packet. This avoid function call for each nop
packet and also wptr is updated once only.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:16:40 -04:00
Lijo Lazar
ed3dac4bf9 drm/amdgpu: Check gmc requirement for reset on init
Add a callback to check if there is any condition detected by GMC block
for reset on init. One case is if a pending NPS change request is
detected. If reset is done because of NPS switch, refresh NPS info from
discovery table.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:16:32 -04:00
Lijo Lazar
ee52489d12 drm/amdgpu: Place NPS mode request on unload
If a user has requested NPS mode switch, place the request through PSP
during unload of the driver. For devices which are part of a hive, all
requests are placed together. If one of them fails, revert back to the
current NPS mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:16:20 -04:00
Thomas Zimmermann
ea1d2a38fb drm/amdgpu: Use video aperture helpers
DRM's aperture functions have long been implemented as helpers
under drivers/video/ for use with fbdev. Avoid the DRM wrappers by
calling the video functions directly.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Acked-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930130921.689876-2-tzimmermann@suse.de
2024-10-14 15:28:47 +02:00
Dave Airlie
fc4d262721 amd-drm-fixes-6.12-2024-10-08:
amdgpu:
 - Fix invalid UBSAN warnings
 - Fix artifacts in MPO transitions
 - Hibernation fix
 
 amdkfd:
 - Fix an eviction fence leak
 
 radeon:
 - Add late register for connectors
 - Always set GEM function pointers
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZwVAzQAKCRC93/aFa7yZ
 2JP2AQC/n4RMsATvyJ0iWNL7R9XGNLi6B6NryaZStd/iYh8RlgD9FUZ/S3svF8kQ
 lwRxw61x7+0vCVBOSCM/jyt270oYqwY=
 =pGmT
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.12-2024-10-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.12-2024-10-08:

amdgpu:
- Fix invalid UBSAN warnings
- Fix artifacts in MPO transitions
- Hibernation fix

amdkfd:
- Fix an eviction fence leak

radeon:
- Add late register for connectors
- Always set GEM function pointers

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008142831.3739244-1-alexander.deucher@amd.com
2024-10-09 16:31:16 +10:00
Dave Airlie
54bc1d3255 Merge tag 'drm-misc-next-2024-09-26' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.13:

UAPI Changes:
- panthor: Add realtime group priority and priority query.

Cross-subsystem Changes:
- Add Vivek Kasireddy as udmabuf maintainer.
- Assorted udmabuf changes.
- Device tree binding updates.
- dmabuf documentation fixes.
- Move drm_rect to drm core module from kms helper.

Core Changes:
- Update scheduler documentation and concurrency fixes.
- drm/ci updates.
- Add memory-agnostic fbdev client and client-agnostic setup helper.
- Huge driver conversion for using the above.

Driver Changes:
- Assorted fixes to imx, panel/nt35510, sti, accel/ivpu, v3d, vkms,
  host1x.
- Add panel quirks for AYA NEO panels.
- Make module autoloading work for bridge/it6505 and mcde.
- Add huge page support to v3d using a custom shmfs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a9b95e6f-9f35-464e-83f6-bda75b35ee0b@linux.intel.com
2024-10-09 11:58:39 +10:00
Dave Airlie
7fefa1edc2 Merge tag 'drm-misc-next-2024-09-20' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.12:

UAPI Changes:
- Add panthor/DEV_QUERY_TIMESTAMP_INFO query.

Cross-subsystem Changes:
- Updated dt bindings.
- Add documentation explaining default errnos for fences.
- Mark dma-buf heaps creation functions as __init.

Core Changes:
- Split DSC helpers from DP helpers.
- Clang build fixes for drm/mm test.
- Remove simple pipeline support for gem-vram,
  no longer any users left after converting bochs.
- Add erno to drm_sched_start to distinguish between GPU and queue
  reset.
- Add drm_framebuffer testcases.
- Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=n.
- Use read_trylock instead of read_lock in dma_fence_begin_signalling to
  quiesce lockdep.

Driver Changes:
- Assorted small fixes and updates for tegra, host1x, imagination,
  nouveau, panfrost, panthor, panel/ili9341, mali, exynos,
  panel/samsung-s6e3fa7, ast, bridge/ti-sn65dsi86, panel/himax-hx83112a,
  bridge/tc358767, bridge/imx8mp-hdmi-tx, panel/khadas-ts050,
  panel/nt36523, panel/sony-acx565akm, kmb, accel/qaic, omap, v3d.
- Add bridge/TI TDP158.
- Assorted documentation updates.
- Convert bochs from simple drm to gem shmem, and check modes
  against available memory.
- Many VC4 fixes, most related to scaling and YUV support.
- Convert some drivers to use SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS.
- Rockchip 4k@60 support.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/445713a6-2427-4c53-8ec2-3a894ec62405@linux.intel.com
2024-10-09 09:03:46 +10:00
Sunil Khatri
555cd714bd drm/amdgpu: no need to log error in multi ring write
No need to log error in multi ring write as its taken
care during ring commit.

This is inline with change done in amdgpu_ring_write.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08 09:46:23 -04:00
Sunil Khatri
ccc0a18748 drm/amdgpu: move error log from ring write to commit
Move the error message from ring write as an optimization
to avoid printing that message on every write instead
print once during commit if it exceeds write the allocated
size i.e ring->count_dw.

Also we do not want to log the error message in between a
ring write and complete the write as its mostly not harmful
as it will overwrite stale data only as GPU read from ring
is faster than CPU write to ring.

This reduces the size of amdgpu.ko module by around
600 Kb as write is very often used function and hence
the print.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08 09:46:15 -04:00
Andrew Kreimer
16445e408c drm/amdgpu: fix typos
Fix typos in comments: "wether -> whether".

Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08 09:43:48 -04:00
Tvrtko Ursulin
89cfa73b61 drm/amdgpu: Remove the while loop from amdgpu_job_prepare_job
While loop makes it sound like amdgpu_vmid_grab() potentially needs to be
called multiple times to produce a fence, while in reality all code paths
either return an error, assign a valid job->vmid or assign a vmid which
will be valid once the returned fence signals.

Therefore we can remove the loop to make it clear the call does not need
to be repeated.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08 09:43:43 -04:00
Tvrtko Ursulin
871f44b4ba drm/amdgpu: Drop impossible condition from amdgpu_job_prepare_job
Fence has been initialised to NULL so no need to test it.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08 09:43:39 -04:00
Tvrtko Ursulin
04bdba4654 drm/amdgpu: Use drm_print_memory_stats helper from fdinfo
Convert fdinfo memory stats to use the common drm_print_memory_stats
helper.

This achieves alignment with the common keys as documented in
drm-usage-stats.rst, adding specifically drm-total- key the driver was
missing until now.

Additionally I made the code stop skipping total size for objects which
currently do not have a backing store, and I added resident, active and
purgeable reporting.

Legacy keys have been preserved, with the outlook of only potentially
removing only the drm-memory- when the time gets right.

The example output now looks like this:

 pos:	0
 flags:	02100002
 mnt_id:	24
 ino:	1239
 drm-driver:	amdgpu
 drm-client-id:	4
 drm-pdev:	0000:04:00.0
 pasid:	32771
 drm-total-cpu:	0
 drm-shared-cpu:	0
 drm-active-cpu:	0
 drm-resident-cpu:	0
 drm-purgeable-cpu:	0
 drm-total-gtt:	2392 KiB
 drm-shared-gtt:	0
 drm-active-gtt:	0
 drm-resident-gtt:	2392 KiB
 drm-purgeable-gtt:	0
 drm-total-vram:	44564 KiB
 drm-shared-vram:	31952 KiB
 drm-active-vram:	0
 drm-resident-vram:	44564 KiB
 drm-purgeable-vram:	0
 drm-memory-vram:	44564 KiB
 drm-memory-gtt: 	2392 KiB
 drm-memory-cpu: 	0 KiB
 amd-memory-visible-vram:	44564 KiB
 amd-evicted-vram:	0 KiB
 amd-evicted-visible-vram:	0 KiB
 amd-requested-vram:	44564 KiB
 amd-requested-visible-vram:	11952 KiB
 amd-requested-gtt:	2392 KiB
 drm-engine-compute:	46464671 ns

v2:
 * Track purgeable via AMDGPU_GEM_CREATE_DISCARDABLE.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Rob Clark <robdclark@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08 09:43:25 -04:00
Tvrtko Ursulin
fc282e9e86 drm/amdgpu: Drop unused fence argument from amdgpu_vmid_grab_used
Fence argument is unused so lets drop it.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08 09:43:17 -04:00
Lang Yu
d7d7b947a4 drm/amdkfd: Fix an eviction fence leak
Only creating a new reference for each process instead of each VM.

Fixes: 9a1c1339ab ("drm/amdkfd: Run restore_workers on freezable WQs")
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5fa4362894)
Cc: stable@vger.kernel.org
2024-10-07 14:53:23 -04:00
Lijo Lazar
012be6f22c drm/amdgpu: Add sysfs interfaces for NPS mode
Add a sysfs interface to see available NPS modes to switch to -

	cat /sys/bus/pci/devices/../available_memory_paritition

Make the current_memory_partition sysfs node read/write for requesting a
new NPS mode. The request is only cached and at a later point a driver
unload/reload is required to switch to the new NPS mode.

Ex:
	echo NPS1 > /sys/bus/pci/devices/../current_memory_paritition
	echo NPS4 > /sys/bus/pci/devices/../current_memory_paritition

The above interfaces will be available only if the SOC supports more than
one NPS mode.

Also modify the current memory partition sysfs logic to be more
generic.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:32:23 -04:00
Lijo Lazar
bbc160084e drm/amdgpu: Add gmc interface to request NPS mode
Add a common interface in GMC to request NPS mode through PSP. Also add
a variable in hive and gmc control to track the last requested mode.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:32:23 -04:00
Srinivasan Shanmugam
b1cf3ddcc3 drm/amdgpu/gfx10: Apply Isolation Enforcement to GFX & Compute rings
This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v10_0 module.

The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.

`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.

`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.

These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:32:23 -04:00
Rajneesh Bhardwaj
212cc24119 drm/amdgpu: Add PSP interface for NPS switch
Implement PSP ring command interface for memory partitioning on the fly
on the supported asics.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:32:00 -04:00
Srinivasan Shanmugam
fbca196953 drm/amdgpu/gfx11: Apply Isolation Enforcement to GFX & Compute rings
This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v11_0 module.

The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.

`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.

`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.

These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:09:34 -04:00
Srinivasan Shanmugam
e7cee54595 drm/amdgpu/gfx12: Implement cleaner shader support for GFX12 hardware
This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v12_0 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.

Finally, the patch updates the ring function structures to include the
new gfx_v12_0_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:09:28 -04:00
Colin Ian King
1845752b2f drm/amdgpu: Fix spelling mistake "initializtion" -> "initialization"
There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:09:10 -04:00
Srinivasan Shanmugam
8fc279e5e3 drm/amdgpu/gfx11: Implement cleaner shader support for GFX11 hardware
The patch modifies the gfx_v11_0_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v11_0_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.

This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v11_0 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.

Finally, the patch updates the ring function structures to include the
new gfx_v11_0_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:08:56 -04:00
Sunil Khatri
7e6487ab21 drm/amdgpu: change the comment from handle to ip_block
htmldoc generation depend upon the input arguments etc
to generate the document. After update of handle to
ip_block then update needs in comments too to fix the
warnings.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410021904.YyGjlpk9-lkp@intel.com
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:08:35 -04:00
Srinivasan Shanmugam
2d5f74a867 drm/amdgpu/gfx10: Implement cleaner shader support for GFX10 hardware
The patch modifies the gfx_v10_0_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v10_0_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.

This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v10_0 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.

Finally, the patch updates the ring function structures to include the
new gfx_v10_0_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:08:29 -04:00
Lang Yu
5fa4362894 drm/amdkfd: Fix an eviction fence leak
Only creating a new reference for each process instead of each VM.

Fixes: 9a1c1339ab ("drm/amdkfd: Run restore_workers on freezable WQs")
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:03:52 -04:00
Sunil Khatri
692d2cd180 drm/amdgpu: update the handle ptr in hw_fini
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of hw_fini.

Also update the ip_block ptr where ever needed as
there were cyclic dependency of hw_fini on suspend
and some followed clean up.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:03:25 -04:00
Sunil Khatri
58608034ed drm/amdgpu: update the handle ptr in hw_init
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of hw_init.

Also update the ip_block ptr where ever needed as
there were cyclic dependency of hw_init on resume.

v2: squash in isp fix

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:03:25 -04:00
Sunil Khatri
7feb4f3ad8 drm/amdgpu: update the handle ptr in resume
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of resume.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:02:50 -04:00
Sunil Khatri
982d7f9bfe drm/amdgpu: update the handle ptr in suspend
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of suspend.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:02:45 -04:00
Sunil Khatri
82ae6619a4 drm/amdgpu: update the handle ptr in wait_for_idle
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of wait_for_idle.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:02:36 -04:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
Sunil Khatri
e15ec812b5 drm/amdgpu: update the handle ptr in post_soft_reset
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of post_soft_reset.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:45:51 -04:00
Sunil Khatri
0ef2a1e7af drm/amdgpu: update the handle ptr in soft_reset
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of soft_reset.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:45:44 -04:00
Srinivasan Shanmugam
e47cb9d253 drm/amdgpu/gfx9: Add Cleaner Shader Deinitialization in gfx_v9_0 Module
This commit addresses an omission in the previous patch related to the
cleaner shader support for GFX9 hardware. Specifically, it adds the
necessary deinitialization code for the cleaner shader in the
gfx_v9_0_sw_fini function.

The added line amdgpu_gfx_cleaner_shader_sw_fini(adev); ensures that any
allocated resources for the cleaner shader are freed correctly, avoiding
potential memory leaks and ensuring that the GPU state is clean for the
next initialization sequence.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Fixes: c2e70d307f ("drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware")
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:44:47 -04:00
Sunil Khatri
9d5ee7ce88 drm/amdgpu: update the handle ptr in pre_soft_reset
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of pre_soft_reset.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:44:41 -04:00
Lijo Lazar
f0b919960d drm/amdgpu: Fix logic to determine TOS reload
Avoid comparing TOS version on APUs. On APUs driver doesn't take care of
TOS load.

Fixes: 0ff3822613 ("drm/amdgpu: Add interface for TOS reload cases")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Rajneesh Bhardwaj <Rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:43:50 -04:00
Sunil Khatri
6a9456e0e3 drm/amdgpu: update the handle ptr in check_soft_reset
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of check_soft_reset.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:43:45 -04:00
Sunil Khatri
94b2e07ad4 drm/amdgpu: update the handle ptr in prepare_suspend
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of prepare_suspend.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:43:38 -04:00
Sunil Khatri
47d827f9c7 drm/amdgpu: update the handle ptr in late_fini
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of late_fini.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:43:33 -04:00
Sunil Khatri
904c402e97 drm/amdgpu: remove the dummy fn acp_early_init
acp_early_init is a dummy function and is not being
used and hence removed.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:42:53 -04:00
Mario Limonciello
b472b8d829 drm/amd: Taint the kernel when enabling overdrive
Some distributions have been patching amdgpu to enable overdrive by
default which may compromise stability.  Furthermore when bug reports
are brought upstream it's not obvious that the system has been tampered
with.

When overdrive is enabled taint the kernel and leave a critical message
in the logs for users so that it's obvious in a bug report it's been
tampered with.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:41:21 -04:00
Srinivasan Shanmugam
a443852f85 drm/amdkfd: Fix kdoc entry for 'get_wave_count()' function parameters
Update kdoc entries to reflect the function's parameters. The descriptor
for the 'queue_cnt' parameter has been added, and the incorrect mentions
of 'wave_cnt' and 'vmid', which are not parameters but local variables,
have been removed.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:954: warning: Function parameter or struct member 'queue_cnt' not described in 'get_wave_count'
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:954: warning: Excess function parameter 'wave_cnt' description in 'get_wave_count'
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:954: warning: Excess function parameter 'vmid' description in 'get_wave_count'

Cc: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Cc: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:56 -04:00
Sunil Khatri
90410d3996 drm/amdgpu: update the handle ptr in early_fini
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of early_fini.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:49 -04:00
Sunil Khatri
36aa9ab9c0 drm/amdgpu: update the handle ptr in sw_fini
update the *handle to amdgpu_ip_block ptr for all
functions pointers of sw_fini.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:43 -04:00
Sunil Khatri
d5347e8d27 drm/amdgpu: update the handle ptr in sw_init
update the *handle to amdgpu_ip_block ptr for all
functions pointers of sw_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:37 -04:00
Sunil Khatri
3138ab2c5b drm/amdgpu: update the handle ptr in late_init
Update the ptr handle to amdgpu_ip_block ptr in all
the functions of late_init function ptr.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:31 -04:00
Sunil Khatri
146b085ead drm/amdgpu: update the handle ptr in early_init
update the handle ptr to amdgpu_ip_block ptr
for all functions pointers on early_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:22 -04:00
Asad Kamal
1007264254 drm/amdgpu: Add supported partition mode node
Add sysfs node to show supported partition modes across all NPS modes

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:15 -04:00
Lijo Lazar
fcd91a95df drm/amdgpu: Add option to refresh NPS data
In certain use cases, NPS data needs to be refreshed again from
discovery table. Add API parameter to refresh NPS data from discovery
table.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:07 -04:00
Jiadong Zhu
5682cd86d6 drm/amdgpu/sdma5.2: implement ring reset callback for sdma5.2
Implement sdma queue reset callback via MMIO.

v2: enter/exit safemode for mmio queue reset.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:39:57 -04:00
YuanShang
1fd7c37e3f drm/amdgpu: Flush tlb by VM_INVALIDATION packet in sdma_v5_2
In order for SDMA not to be switched between VM_INVALIDATION
request and ack, use an single VM_INVALIDATION packet in function
sdma_v5_2_ring_emit_vm_flush.

Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-By: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:39:38 -04:00
Jiadong Zhu
64acf8f69e drm/amdgpu/sdma5.2: split out per instance resume function
Extract the resume sequence from sdma_v5_2_gfx_resume for
starting/restarting an individual instance.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:38:44 -04:00
Jiadong Zhu
5fbba6bb98 drm/amdgpu/sdma5: implement ring reset callback for sdma5
Implement sdma queue reset callback via MMIO.

v2: enter/exit safemode when sdma queue reset.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:38:38 -04:00
Sunil Khatri
d60e78bdef drm/amdgpu: update the handle ptr in print_ip_state
Update the ptr handle to amdgpu_ip_block ptr in all
the functions affected.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:36:17 -04:00
Lijo Lazar
4ae86dc878 drm/amdgpu: Add sysfs nodes to get xcp details
Add partition config nodes in sysfs to get resource instance details for
a particular partition mode. A resource could be anything like an xcc,
vcn decoder, system dma units etc.

Details of various resource instances are available under
/sys/bus/pci/devices/.../compute_partition_config/

Select a partition configuration:
/sys/bus/pci/devices/.../compute_partition_config/xcp_config

Number of instances of a resource:
/sys/bus/pci/devices/.../compute_partition_config/<rsrc_name>/num_inst

Total partitions sharing the resource:
/sys/bus/pci/devices/.../compute_partition_config/<rsrc_name>/num_shared

v2: Update node name as per spec

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:28:56 -04:00
Sunil Khatri
fa73462dc0 drm/amdgpu: update the handle ptr in dump_ip_state
Update the ptr handle to amdgpu_ip_block ptr in all
the functions.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:28:51 -04:00
Jiadong Zhu
94daae9744 drm/amdgpu/sdma5: split out per instance resume function
Extract the resume sequence from sdma_v5_0_gfx_resume for
starting/restarting an individual instance.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:28:43 -04:00
Linus Torvalds
994aeacbb3 drm fixes for 6.12-rc1
i915:
 - Fix BMG support to UHBR13.5
 - Two PSR fixes
 - Fix colorimetry detection for DP
 
 xe
 - Fix macro for checking minimum GuC version
 - Fix CCS offset calculation for some BMG SKUs
 - Fix locking on memory usage reporting via fdinfo and BO destroy
 - Fix GPU page fault handler on a closed VM
 - Fix overflow in oa batch buffer
 
 amdgpu:
 - MES 12 fix
 - KFD fence sync fix
 - SR-IOV fixes
 - VCN 4.0.6 fix
 - SDMA 7.x fix
 - Bump driver version to note cleared VRAM support
 - SWSMU fix
 
 amdgpu:
 - CU occupancy logic fix
 - SDMA queue fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmb3QQoACgkQDHTzWXnE
 hr7+yQ//dBbBOdxwWcQA6N4p5KOQdhMecU1LaGkv0nbV6ft4mxhHq2XSczr7DMJM
 C56cgTplw4Lajfo0Q/gwgoLcgK4EzRc3kReb8gC9diyZ+zgolZbS5uSnJsS3IsST
 xriP2sb+lsPH6UAqtUZABMA6aOE7VYmJnRZlo0tOulRRYSnX++gTPQIi2PprVzBh
 jFlFmLABCqvZ5md0ux8NITzRpE2sODuawKTpdXoTMTVsrXF+YBtRaJD170eC4mj1
 3JDmsY90TpvHWri4BHQ98VqJBpzLiIU8COQHaZab2cfV+yH+KfUo2puH1RS4swW5
 gbrOAbK/OXzoX+6aT1rYuDihcrX5+88MZovhRW7Ik0dEm5Ysl8PRlRkftuMa2mg9
 tUJjjUfmGDf9eiYCUt7/BuDguN2lc+r/TOM4F+2kmB0dxDkYn3u1W95DxseoiLHt
 Sq/M2sWm9p/TjDC9XW+vy9dfuoucEyQfdiPqKP27BheckCGF1SskLFW+oZCq3iF9
 0RJsvpwQBSxsLR0/oJok9cxmSAhpZoUiV0zKuqCcP+OTIFI4urKujom/XrJIjayU
 fg0vaXzPd9crzSZX1rqF8/UDx8uV4uf4IHD9MNrCYIXpiVJHWzx0afU1AE5576F5
 sT335W/nG6BHsrV/PIRR62v3QU0yLkjQv6VbWqJwMZumuQ2x/iI=
 =r1M/
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Regular fixes for the week to end the merge window, i915 and xe have a
  few each, amdgpu makes up most of it with a bunch of SR-IOV related
  fixes amongst others.

  i915:
   - Fix BMG support to UHBR13.5
   - Two PSR fixes
   - Fix colorimetry detection for DP

  xe:
   - Fix macro for checking minimum GuC version
   - Fix CCS offset calculation for some BMG SKUs
   - Fix locking on memory usage reporting via fdinfo and BO destroy
   - Fix GPU page fault handler on a closed VM
   - Fix overflow in oa batch buffer

  amdgpu:
   - MES 12 fix
   - KFD fence sync fix
   - SR-IOV fixes
   - VCN 4.0.6 fix
   - SDMA 7.x fix
   - Bump driver version to note cleared VRAM support
   - SWSMU fix
   - CU occupancy logic fix
   - SDMA queue fix"

* tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel: (79 commits)
  drm/amd/pm: update workload mask after the setting
  drm/amdgpu: bump driver version for cleared VRAM
  drm/amdgpu: fix vbios fetching for SR-IOV
  drm/amdgpu: fix PTE copy corruption for sdma 7
  drm/amdkfd: Add SDMA queue quantum support for GFX12
  drm/amdgpu/vcn: enable AV1 on both instances
  drm/amdkfd: Fix CU occupancy for GFX 9.4.3
  drm/amdkfd: Update logic for CU occupancy calculations
  drm/amdgpu: skip coredump after job timeout in SRIOV
  drm/amdgpu: sync to KFD fences before clearing PTEs
  drm/amdgpu/mes12: set enable_level_process_quantum_check
  drm/i915/dp: Fix colorimetry detection
  drm/amdgpu/mes12: reduce timeout
  drm/amdgpu/mes11: reduce timeout
  drm/amdgpu: use GEM references instead of TTMs v2
  drm/amd/display: Allow backlight to go below `AMDGPU_DM_DEFAULT_MIN_BACKLIGHT`
  drm/amd/display: Fix kdoc entry for 'tps' in 'dc_process_dmub_dpia_set_tps_notification'
  drm/amdgpu: update golden regs for gfx12
  drm/amdgpu: clean up vbios fetching code
  drm/amd/display: handle nulled pipe context in DCE110's set_drr()
  ...
2024-09-28 08:47:46 -07:00
Lijo Lazar
c75c5285e5 drm/amdgpu: Add PSP reload case to reset-on-init
A reset on initialization will be needed if a new PSP TOS needs to be
loaded than the one currently active on the system. This is possible
only on SOCs which support a full device reset which results in unload
of active PSP TOS.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:07:16 -04:00
Lijo Lazar
0ff3822613 drm/amdgpu: Add interface for TOS reload cases
Add interface to check if a different TOS needs to be loaded than the
one which is which is already active on the SOC. Presently the interface
is restricted to specific variants of PSPv13.0.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:07:07 -04:00
Lijo Lazar
c4f00312c1 drm/amdgpu: Support reset-on-init on select SOCs
Add XGMI reset on init support to aldebaran and SOCs with GC v9.4.3.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:58 -04:00
Lijo Lazar
2accf9d683 drm/amdgpu: Drop delayed reset work handler
Drop delayed reset work handler as it is no longer used.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:53 -04:00
Lijo Lazar
631af731ee drm/amdgpu: Refactor XGMI reset on init handling
Use XGMI hive information to rely on resetting XGMI devices on
initialization rather than using mgpu structure. mgpu structure may have
other devices as well.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <feifxu@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:46 -04:00
Lijo Lazar
b17f87329d drm/amdgpu: Add helper to initialize badpage info
Add a separate function to read badpage data during initialization.
Reading bad pages will need hardware access and cannot be done during
reset. Hence in cases where device needs a full reset during
init itself, attempting to read will cause a deadlock.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:38 -04:00
Dr. David Alan Gilbert
0ee2399116 drm/amdgpu: Remove unused amdgpu_i2c functions
amdgpu_i2c_add and amdgpu_i2c_init were added in 2015's commit
d38ceaf99e ("drm/amdgpu: add core driver (v4)")
but never used.

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:36 -04:00
Dr. David Alan Gilbert
9d7a8bdb90 drm/amdgpu: Remove unused amdgpu_gfx_bit_to_me_queue
amdgpu_gfx_bit_to_me_queue has been unused since it was added in
commit 7470bfcf20 ("drm/amdgpu: add helper function for gfx queue/bitmap
transition")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:33 -04:00
Dr. David Alan Gilbert
1e10c12263 drm/amdgpu: Remove unused amdgpu_gmc_vram_cpu_pa
amdgpu_gmc_vram_cpu_pa has been unused since commit
087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:29 -04:00
Dr. David Alan Gilbert
6e261ecbb2 drm/amdgpu: Remove unused amdgpu_atpx functions
amdgpu_atpx_dgpu_req_power_for_displays has been unused since
commit bdb1ccb080 ("drm/amdgpu: remove ATPX_DGPU_REQ_POWER_FOR_DISPLAYS
check when hotplug-in")

amdgpu_atpx_get_dhandle has been unused since commit
f9b7f3703f ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)")

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:24 -04:00
Dr. David Alan Gilbert
632aac6299 drm/amdgpu: Remove unused amdgpu_device_ip_is_idle
amdgpu_device_ip_is_idle is unused.
It was renamed from 'amdgpu_is_idle' which was originally added in
commit 5dbbb60ba6 ("drm/amdgpu: add IP helpers for wait_for_idle and is_idle")

but hasn't been used.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:19 -04:00
Lijo Lazar
1e4acf4d93 drm/amdgpu: Add reset on init handler for XGMI
In some cases, device needs to be reset before first use. Add handlers
for doing device reset during driver init sequence.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <feifxu@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:19 -04:00
Lijo Lazar
f501057aff drm/amdgpu: Add callback get xcp resource info
Add a callback interface to get the resource information of a partition
mode. Presently the information has number of resources and number of
entities sharing the resource.

Add the implementation for aquavanjaram SOCs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Lijo Lazar
1bc0b33915 drm/amd: Add helper to get partition config modes
Add helper to get supported/available partition config modes

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
WangYuli
307b4ab7ba drm/amdgpu: Fix typo "acccess" and improve the comment style here
There are some spelling mistakes of 'acccess' in comments which
should be instead of 'access'.

And the comment style should be like this:
 /*
  * Text
  * Text
  */

Suggested-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/all/f75fbe30-528e-404f-97e4-854d27d7a401@amd.com/
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/all/0c768bf6-bc19-43de-a30b-ff5e3ddfd0b3@suse.de/
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Alex Deucher
b1281b6d55 drm/amdgpu/gfx9: Explicitly halt CP before init
Need to make sure it's halted as we don't know what state
the GPU may have been left in previously.

Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Alex Deucher
993fcc40ae drm/amdgpu/gfx9: set additional bits on CP halt
Need to set the pipe reset and cache invalidation bits
on halt otherwise we can get stale state if the CP firmware
changes (e.g., on module unload and reload).

Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Sunil Khatri
37b993225d drm/amdgpu: add amdgpu_device reference in ip block
To handle amdgpu_device reference for different GPUs
we add it's reference in each ip block which can be
used to differentiate between difference gpu devices.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Lijo Lazar
6e37ae8b08 drm/amdgpu: Separate reinitialization after reset
Move the reinitialization part after a reset to another function. No
functional changes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Tim Huang
381ec8161d drm/amdgpu: check return for setting engine dram timings
This resolves the unchecded return value warning reported by Coverity.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Lijo Lazar
5839d27d5b drm/amdgpu: Use init level for pending_reset flag
Drop pending_reset flag in gmc block. Instead use init level to
determine which type of init is preferred - in this case MINIMAL.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
YiPeng Chai
9e0feb7946 amd/amdgpu: Reduce unnecessary repetitive GPU resets
In multiple GPUs case, after a GPU has started
resetting all GPUs on hive, other GPUs do not
need to trigger GPU reset again.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:18 -04:00
Lijo Lazar
14f2fe34f5 drm/amdgpu: Add init levels
Add init levels to define the level to which device needs to be
initialized.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:17 -04:00
Jane Jian
8a84d2a472 drm/amdgpu: Remove unneeded write in JPEG v4.0.3
HDP_DEBUG1(offset = 0x3fbc) is no longer functional, remove the redundant write.

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:17 -04:00
Lijo Lazar
8c50bf9beb drm/amdgpu: Fix JPEG v4.0.3 register write
EXTERNAL_REG_INTERNAL_OFFSET/EXTERNAL_REG_WRITE_ADDR should be used in
pairs. If an external register shouldn't be written, both packets
shouldn't be sent.

Fixes: a78b481469 ("drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:17 -04:00
Feifei Xu
3eebfd5e9c drm/amdkfd:Add kfd function to config sq perfmon
Expose the interface for kfd to config sq perfmon.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:17 -04:00
Sathishkumar S
f0b19b84d3 drm/amdgpu: add amdgpu_jpeg_sched_mask debugfs
JPEG_4_0_3 has up to 32 jpeg cores and a single mjpeg video decode
will use all available cores on the hardware. This debugfs entry
helps to disable or enable job submission to a cluster of cores or
one specific core in the ip for debugging. The entry is populated
only if there is at least two or more cores in the jpeg ip.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:17 -04:00
Feifei Xu
400a7591d9 drm/amdgpu: Add psp command CONFIG_SQ_PERFMON
Add support for enable/disable perfmon profiling.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:16 -04:00
Prike Liang
6704dbf719 drm/amdgpu: update suspend status for aborting from deeper suspend
There're some other suspend abort cases which can call the noirq
suspend except for executing _S3 method. In those cases need to
process as incomplete suspendsion.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:16 -04:00
Asad Kamal
dc443aa4ab drm/amd/amdgpu: Add helper to get ip block valid
Add helper function to check if ip block is enabled

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:16 -04:00
Jiadong Zhu
92c9b3e8e4 drm/amdgpu/sdma6: implement ring reset callback for sdma6
Implement sdma queue reset callback using mes_reset_queue_mmio.

v2: check instance id before reset queue.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:16 -04:00
Jiadong Zhu
df190e6753 drm/amdgpu/sdma6: split out per instance resume function
Extract the resume sequence for individual sdma instance from sdma_v6_0_gfx_resume.
The function could be used for start/restart scenario on a certain instance.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:16 -04:00
Jiadong Zhu
ced65debf4 drm/amdgpu/mes11: update mes_reset_queue function to support sdma queue
Reset sdma queue through mmio based on me_id and queue_id.

v2: simplify callflows and register calculation.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:16 -04:00
Alex Deucher
34ad56a467 drm/amdgpu: bump driver version for cleared VRAM
Driver now clears VRAM on allocation.  Bump the
driver version so mesa knows when it will get
cleared vram by default.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-09-26 17:04:47 -04:00
Alex Deucher
a8387ddc0d drm/amdgpu: fix vbios fetching for SR-IOV
SR-IOV fetches the vbios from VRAM in some cases.
Re-enable the VRAM path for dGPUs and rename the function
to make it clear that it is not IGP specific.

Fixes: 042658d17a ("drm/amdgpu: clean up vbios fetching code")
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Tested-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:04:10 -04:00
Frank Min
3cb576bc6d drm/amdgpu: fix PTE copy corruption for sdma 7
Without setting dcc bit, there is ramdon PTE copy corruption on sdma 7.

so add this bit and update the packet format accordingly.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-09-26 17:03:39 -04:00
Thomas Zimmermann
32acc286b2 drm/amdgpu: Run DRM default client setup
Call drm_client_setup() to run the kernel's default client setup
for DRM. Set fbdev_probe in struct drm_driver, so that the client
setup can start the common fbdev client.

The amdgpu driver specifies a preferred color mode depending on
the available video memory, with a default of 32. Adapt this for
the new client interface.

v5:
- select DRM_CLIENT_SELECTION
v2:
- style changes

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Tested-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240924071734.98201-66-tzimmermann@suse.de
2024-09-26 09:31:28 +02:00
Saleemkhan Jamadar
8048e5ade8 drm/amdgpu/vcn: enable AV1 on both instances
v1 - remove cs parse code (Christian)

On VCN v4_0_6 AV1 is supported on both the instances.
Remove cs IB parse code since explict handling of AV1 schedule is
not required.

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-09-25 12:56:19 -04:00
Mukul Joshi
e45b011d2c drm/amdkfd: Fix CU occupancy for GFX 9.4.3
Make CU occupancy calculations work on GFX 9.4.3 by
updating the logic to handle multiple XCCs correctly.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-25 12:56:07 -04:00
Mukul Joshi
6ae9e1aba9 drm/amdkfd: Update logic for CU occupancy calculations
Currently, the code uses the IH_VMID_X_LUT register to map
a queue's vmid to the corresponding PASID. This logic is racy
since CP can update the VMID-PASID mapping anytime especially
when there are more processes than number of vmids. Update the
logic to calculate CU occupancy by matching doorbell offset of
the queue with valid wave counts against the process's queues.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-25 12:56:00 -04:00
ZhenGuo Yin
e1d27f7a9c drm/amdgpu: skip coredump after job timeout in SRIOV
VF FLR will be triggered by host driver before job timeout,
hence the error status of GPU get cleared. Performing a
coredump here is unnecessary.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-25 12:55:52 -04:00
Christian König
126be9b2be drm/amdgpu: sync to KFD fences before clearing PTEs
This patch tries to solve the basic problem we also need to sync to
the KFD fences of the BO because otherwise it can be that we clear
PTEs while the KFD queues are still running.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-25 12:55:44 -04:00
Jack Xiao
4771d2ecb7 drm/amdgpu/mes12: set enable_level_process_quantum_check
enable_level_process_quantum_check is requried to enable process
quantum based scheduling.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-09-25 12:55:14 -04:00
Linus Torvalds
f8ffbc365f struct fd layout change (and conversion to accessor helpers)
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZvDNmgAKCRBZ7Krx/gZQ
 63zrAP9vI0rf55v27twiabe9LnI7aSx5ckoqXxFIFxyT3dOYpQD/bPmoApnWDD3d
 592+iDgLsema/H/0/CqfqlaNtDNY8Q0=
 =HUl5
 -----END PGP SIGNATURE-----

Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 'struct fd' updates from Al Viro:
 "Just the 'struct fd' layout change, with conversion to accessor
  helpers"

* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  add struct fd constructors, get rid of __to_fd()
  struct fd: representation change
  introduce fd_file(), convert all accessors to it.
2024-09-23 09:35:36 -07:00
Linus Torvalds
de848da12f drm next for 6.12-rc1
string:
 - add mem_is_zero()
 
 core:
 - support more device numbers
 - use XArray for minor ids
 - add backlight constants
 - Split dma fence array creation into alloc and arm
 
 fbdev:
 - remove usage of old fbdev hooks
 
 kms:
 - Add might_fault() to drm_modeset_lock priming
 - Add dynamic per-crtc vblank configuration support
 
 dma-buf:
 - docs cleanup
 
 buddy:
 - Add start address support for trim function
 
 printk:
 - pass description to kmsg_dump
 
 scheduler;
 - Remove full_recover from drm_sched_start
 
 ttm:
 - Make LRU walk restartable after dropping locks
 - Allow direct reclaim to allocate local memory
 
 panic:
 - add display QR code (in rust)
 
 displayport:
 - mst: GUID improvements
 
 bridge:
 - Silence error message on -EPROBE_DEFER
 - analogix: Clean aup
 - bridge-connector: Fix double free
 - lt6505: Disable interrupt when powered off
 - tc358767: Make default DP port preemphasis configurable
 - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR
 - anx7625: simplify OF array handling
 - dw-hdmi: simplify clock handling
 - lontium-lt8912b: fix mode validation
 - nwl-dsi: fix mode vsync/hsync polarity
 
 xe:
 - Enable LunarLake and Battlemage support
 - Introducing Xe2 ccs modifiers for integrated and discrete graphics
 - rename xe perf to xe observation
 - use wb caching on DGFX for system memory
 - add fence timeouts
 - Lunar Lake graphics/media/display workarounds
 - Battlemage workarounds
 - Battlemage GSC support
 - GSC and HuC fw updates for LL/BM
 - use dma_fence_chain_free
 - refactor hw engine lookup and mmio access
 - enable priority mem read for Xe2
 - Add first GuC BMG fw
 - fix dma-resv lock
 - Fix DGFX display suspend/resume
 - Use xe_managed for kernel BOs
 - Use reserved copy engine for user binds on faulting devices
 - Allow mixing dma-fence jobs and long-running faulting jobs
 - fix media TLB invalidation
 - fix rpm in TTM swapout path
 - track resources and VF state by PF
 
 i915:
 - Type-C programming fix for MTL+
 - FBC cleanup
 - Calc vblank delay more accurately
 - On DP MST, Enable LT fallback for UHBR<->non-UHBR rates
 - Fix DP LTTPR detection
 - limit relocations to INT_MAX
 - fix long hangs in buddy allocator on DG2/A380
 
 amdgpu:
 - Per-queue reset support
 - SDMA devcoredump support
 - DCN 4.0.1 updates
 - GFX12/VCN4/JPEG4 updates
 - Convert vbios embedded EDID to drm_edid
 - GFX9.3/9.4 devcoredump support
 - process isolation framework for GFX 9.4.3/4
 - take IOMMU mappings into account for P2P DMA
 
 amdkfd:
 - CRIU fixes
 - HMM fix
 - Enable process isolation support for GFX 9.4.3/4
 - Allow users to target recommended SDMA engines
 - KFD support for targetting queues on recommended SDMA engines
 
 radeon:
 - remove .load and drm_dev_alloc
 - Fix vbios embedded EDID size handling
 - Convert vbios embedded EDID to drm_edid
 - Use GEM references instead of TTM
 - r100 cp init cleanup
 - Fix potential overflows in evergreen CS offset tracking
 
 msm:
 - DPU:
 - implement DP/PHY mapping on SC8180X
 - Enable writeback on SM8150, SC8180X, SM6125, SM6350
 - DP:
 - Enable widebus on all relevant chipsets
 - MSM8998 HDMI support
 - GPU:
 - A642L speedbin support
 - A615/A306/A621 support
 - A7xx devcoredump support
 
 ast:
 - astdp: Support AST2600 with VGA
 - Clean up HPD
 - Fix timeout loop for DP link training
 - reorganize output code by type (VGA, DP, etc)
 - convert to struct drm_edid
 - fix BMC handling for all outputs
 
 exynos:
 - drop stale MAINTAINERS pattern
 - constify struct
 
 loongson:
 - use GEM refcount over TTM
 
 mgag200:
 - Improve BMC handling
 - Support VBLANK intterupts
 - transparently support BMC outputs
 
 nouveau:
 - Refactor and clean up internals
 - Use GEM refcount over TTM's
 
 gm12u320:
 - convert to struct drm_edid
 
 gma500:
 - update i2c terms
 
 lcdif:
 - pixel clock fix
 
 host1x:
 - fix syncpoint IRQ during resume
 - use iommu_paging_domain_alloc()
 
 imx:
 - ipuv3: convert to struct drm_edid
 
 omapdrm:
 - improve error handling
 - use common helper for_each_endpoint_of_node()
 
 panel:
 - add support for BOE TV101WUM-LL2 plus DT bindings
 - novatek-nt35950: improve error handling
 - nv3051d: improve error handling
 - panel-edp: add support for BOE NE140WUM-N6G; revert support for
   SDC ATNA45AF01
 - visionox-vtdr6130: improve error handling; use
   devm_regulator_bulk_get_const()
 - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus
   DT; Fix porch parameter
 - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1,
   BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2,
   CMN N116BCP-EA2, CSW MNB601LS1-4
 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT
 - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT
 - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor
   for code sharing
 - panel-edp: fix name for HKC MB116AN01
 - jd9365da: fix "exit sleep" commands
 - jdi-fhd-r63452: simplify error handling with DSI multi-style
   helpers
 - mantix-mlaf057we51: simplify error handling with DSI multi-style
   helpers
 - simple:
   support Innolux G070ACE-LH3 plus DT bindings
   support On Tat Industrial Company KD50G21-40NT-A1 plus DT bindings
 - st7701:
   decouple DSI and DRM code
   add SPI support
   support Anbernic RG28XX plus DT bindings
 
 mediatek:
 - support alpha blending
 - remove cl in struct cmdq_pkt
 - ovl adaptor fix
 - add power domain binding for mediatek DPI controller
 
 renesas:
 - rz-du: add support for RZ/G2UL plus DT bindings
 
 rockchip:
 - Improve DP sink-capability reporting
 - dw_hdmi: Support 4k@60Hz
 - vop: Support RGB display on Rockchip RK3066; Support 4096px width
 
 sti:
 - convert to struct drm_edid
 
 stm:
 - Avoid UAF wih managed plane and CRTC helpers
 - Fix module owner
 - Fix error handling in probe
 - Depend on COMMON_CLK
 - ltdc: Fix transparency after disabling plane; Remove unused interrupt
 
 tegra:
 - gr3d: improve PM domain handling
 - convert to struct drm_edid
 - Call drm_atomic_helper_shutdown()
 
 vc4:
 - fix PM during detect
 - replace DRM_ERROR() with drm_error()
 - v3d: simplify clock retrieval
 
 v3d:
 - Clean up perfmon
 
 virtio:
 - add DRM capset
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmbq43gACgkQDHTzWXnE
 hr4+lg/+O/r41E7ioitcM0DWeWem0dTlvQr41pJ8jujHvw+bXNdg0BMGWtsTyTLA
 eOft2AwofsFjg+O7l8IFXOT37mQLdIdfjb3+w5brI198InL3OWC3QV8ZSwY9VGET
 n8crO9jFoxNmHZnFniBZbtI6egTyl6H+2ey3E0MTnKiPUKZQvsK/4+x532yVLPob
 UUOze5wcjyGZc7LJEIZPohPVneCb9ki7sabDQqh4cxIQ0Eg+nqPpWjYM4XVd+lTS
 8QmssbR49LrJ7z9m90qVE+8TjYUCn+ChDPMs61KZAAnc8k++nK41btjGZ23mDKPb
 YEguahCYthWJ4U8K18iXBPnLPxZv5+harQ8OIWAUYqdIOWSXHozvuJ2Z84eHV13a
 9mQ5vIymXang8G1nEXwX/vml9uhVhBCeWu3qfdse2jfaTWYUb1YzhqUoFvqI0R0K
 8wT03MyNdx965CSqAhpH5Jd559ueZmpd+jsHOfhAS+1gxfD6NgoPXv7lpnMUmGWX
 SnaeC9RLD4cgy7j2Swo7TEqQHrvK5XhZSwX94kU6RPmFE5RRKqWgFVQmwuikDMId
 UpNqDnPT5NL2UX4TNG4V4coyTXvKgVcSB9TA7j8NSLfwdGHhiz73pkYosaZXKyxe
 u6qKMwMONfZiT20nhD7RhH0AFnnKosAcO14dhn0TKFZPY6Ce9O8=
 =7jR+
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "This adds a couple of patches outside the drm core, all should be
  acked appropriately, the string and pstore ones are the main ones that
  come to mind.

  Otherwise it's the usual drivers, xe is getting enabled by default on
  some new hardware, we've changed the device number handling to allow
  more devices, and we added some optional rust code to create QR codes
  in the panic handler, an idea first suggested I think 10 years ago :-)

  string:
   - add mem_is_zero()

  core:
   - support more device numbers
   - use XArray for minor ids
   - add backlight constants
   - Split dma fence array creation into alloc and arm

  fbdev:
   - remove usage of old fbdev hooks

  kms:
   - Add might_fault() to drm_modeset_lock priming
   - Add dynamic per-crtc vblank configuration support

  dma-buf:
   - docs cleanup

  buddy:
   - Add start address support for trim function

  printk:
   - pass description to kmsg_dump

  scheduler:
   - Remove full_recover from drm_sched_start

  ttm:
   - Make LRU walk restartable after dropping locks
   - Allow direct reclaim to allocate local memory

  panic:
   - add display QR code (in rust)

  displayport:
   - mst: GUID improvements

  bridge:
   - Silence error message on -EPROBE_DEFER
   - analogix: Clean aup
   - bridge-connector: Fix double free
   - lt6505: Disable interrupt when powered off
   - tc358767: Make default DP port preemphasis configurable
   - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR
   - anx7625: simplify OF array handling
   - dw-hdmi: simplify clock handling
   - lontium-lt8912b: fix mode validation
   - nwl-dsi: fix mode vsync/hsync polarity

  xe:
   - Enable LunarLake and Battlemage support
   - Introducing Xe2 ccs modifiers for integrated and discrete graphics
   - rename xe perf to xe observation
   - use wb caching on DGFX for system memory
   - add fence timeouts
   - Lunar Lake graphics/media/display workarounds
   - Battlemage workarounds
   - Battlemage GSC support
   - GSC and HuC fw updates for LL/BM
   - use dma_fence_chain_free
   - refactor hw engine lookup and mmio access
   - enable priority mem read for Xe2
   - Add first GuC BMG fw
   - fix dma-resv lock
   - Fix DGFX display suspend/resume
   - Use xe_managed for kernel BOs
   - Use reserved copy engine for user binds on faulting devices
   - Allow mixing dma-fence jobs and long-running faulting jobs
   - fix media TLB invalidation
   - fix rpm in TTM swapout path
   - track resources and VF state by PF

  i915:
   - Type-C programming fix for MTL+
   - FBC cleanup
   - Calc vblank delay more accurately
   - On DP MST, Enable LT fallback for UHBR<->non-UHBR rates
   - Fix DP LTTPR detection
   - limit relocations to INT_MAX
   - fix long hangs in buddy allocator on DG2/A380

  amdgpu:
   - Per-queue reset support
   - SDMA devcoredump support
   - DCN 4.0.1 updates
   - GFX12/VCN4/JPEG4 updates
   - Convert vbios embedded EDID to drm_edid
   - GFX9.3/9.4 devcoredump support
   - process isolation framework for GFX 9.4.3/4
   - take IOMMU mappings into account for P2P DMA

  amdkfd:
   - CRIU fixes
   - HMM fix
   - Enable process isolation support for GFX 9.4.3/4
   - Allow users to target recommended SDMA engines
   - KFD support for targetting queues on recommended SDMA engines

  radeon:
   - remove .load and drm_dev_alloc
   - Fix vbios embedded EDID size handling
   - Convert vbios embedded EDID to drm_edid
   - Use GEM references instead of TTM
   - r100 cp init cleanup
   - Fix potential overflows in evergreen CS offset tracking

  msm:
   - DPU:
      - implement DP/PHY mapping on SC8180X
      - Enable writeback on SM8150, SC8180X, SM6125, SM6350
   - DP:
      - Enable widebus on all relevant chipsets
      - MSM8998 HDMI support
   - GPU:
      - A642L speedbin support
      - A615/A306/A621 support
      - A7xx devcoredump support

  ast:
   - astdp: Support AST2600 with VGA
   - Clean up HPD
   - Fix timeout loop for DP link training
   - reorganize output code by type (VGA, DP, etc)
   - convert to struct drm_edid
   - fix BMC handling for all outputs

  exynos:
   - drop stale MAINTAINERS pattern
   - constify struct

  loongson:
   - use GEM refcount over TTM

  mgag200:
   - Improve BMC handling
   - Support VBLANK intterupts
   - transparently support BMC outputs

  nouveau:
   - Refactor and clean up internals
   - Use GEM refcount over TTM's

  gm12u320:
   - convert to struct drm_edid

  gma500:
   - update i2c terms

  lcdif:
   - pixel clock fix

  host1x:
   - fix syncpoint IRQ during resume
   - use iommu_paging_domain_alloc()

  imx:
   - ipuv3: convert to struct drm_edid

  omapdrm:
   - improve error handling
   - use common helper for_each_endpoint_of_node()

  panel:
   - add support for BOE TV101WUM-LL2 plus DT bindings
   - novatek-nt35950: improve error handling
   - nv3051d: improve error handling
   - panel-edp:
      - add support for BOE NE140WUM-N6G
      - revert support for SDC ATNA45AF01
   - visionox-vtdr6130:
      - improve error handling
      - use devm_regulator_bulk_get_const()
   - boe-th101mb31ig002:
      - Support for starry-er88577 MIPI-DSI panel plus DT
      - Fix porch parameter
   - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE
     NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2,
     CMN N116BCP-EA2, CSW MNB601LS1-4
   - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT
   - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT
   - jd9365da:
      - Support Melfas lmfbx101117480 MIPI-DSI panel plus DT
      - Refactor for code sharing
   - panel-edp: fix name for HKC MB116AN01
   - jd9365da: fix "exit sleep" commands
   - jdi-fhd-r63452: simplify error handling with DSI multi-style
     helpers
   - mantix-mlaf057we51: simplify error handling with DSI multi-style
     helpers
   - simple:
      - support Innolux G070ACE-LH3 plus DT bindings
      - support On Tat Industrial Company KD50G21-40NT-A1 plus DT
        bindings
   - st7701:
      - decouple DSI and DRM code
      - add SPI support
      - support Anbernic RG28XX plus DT bindings

  mediatek:
   - support alpha blending
   - remove cl in struct cmdq_pkt
   - ovl adaptor fix
   - add power domain binding for mediatek DPI controller

  renesas:
   - rz-du: add support for RZ/G2UL plus DT bindings

  rockchip:
   - Improve DP sink-capability reporting
   - dw_hdmi: Support 4k@60Hz
   - vop:
      - Support RGB display on Rockchip RK3066
      - Support 4096px width

  sti:
   - convert to struct drm_edid

  stm:
   - Avoid UAF wih managed plane and CRTC helpers
   - Fix module owner
   - Fix error handling in probe
   - Depend on COMMON_CLK
   - ltdc:
      - Fix transparency after disabling plane
      - Remove unused interrupt

  tegra:
   - gr3d: improve PM domain handling
   - convert to struct drm_edid
   - Call drm_atomic_helper_shutdown()

  vc4:
   - fix PM during detect
   - replace DRM_ERROR() with drm_error()
   - v3d: simplify clock retrieval

  v3d:
   - Clean up perfmon

  virtio:
   - add DRM capset"

* tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel: (1326 commits)
  drm/xe: Fix missing conversion to xe_display_pm_runtime_resume
  drm/xe/xe2hpg: Add Wa_15016589081
  drm/xe: Don't keep stale pointer to bo->ggtt_node
  drm/xe: fix missing 'xe_vm_put'
  drm/xe: fix build warning with CONFIG_PM=n
  drm/xe: Suppress missing outer rpm protection warning
  drm/xe: prevent potential UAF in pf_provision_vf_ggtt()
  drm/amd/display: Add all planes on CRTC to state for overlay cursor
  drm/i915/bios: fix printk format width
  drm/i915/display: Fix BMG CCS modifiers
  drm/amdgpu: get rid of bogus includes of fdtable.h
  drm/amdkfd: CRIU fixes
  drm/amdgpu: fix a race in kfd_mem_export_dmabuf()
  drm: new helper: drm_gem_prime_handle_to_dmabuf()
  drm/amdgpu/atomfirmware: Silence UBSAN warning
  drm/amdgpu: Fix kdoc entry in 'amdgpu_vm_cpu_prepare'
  drm/amd/amdgpu: apply command submission parser for JPEG v1
  drm/amd/amdgpu: apply command submission parser for JPEG v2+
  drm/amd/pm: fix the pp_dpm_pcie issue on smu v14.0.2/3
  drm/amd/pm: update the features set on smu v14.0.2/3
  ...
2024-09-19 10:18:15 +02:00
Alex Deucher
84f76408ab drm/amdgpu/mes12: reduce timeout
The firmware timeout is 2s.  Reduce the driver timeout to
2.1 seconds to avoid back pressure on queue submissions.

Fixes: 94b51a3d01 ("drm/amdgpu/mes12: increase mes submission timeout")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-09-18 16:15:13 -04:00
Alex Deucher
856265caa9 drm/amdgpu/mes11: reduce timeout
The firmware timeout is 2s.  Reduce the driver timeout to
2.1 seconds to avoid back pressure on queue submissions.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3627
Fixes: f7c161a4c2 ("drm/amdgpu: increase mes submission timeout")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-09-18 16:15:13 -04:00
Christian König
6dcba0975d drm/amdgpu: use GEM references instead of TTMs v2
Instead of a TTM reference grab a GEM reference whenever necessary.

v2: fix typo in amdgpu_bo_unref pointed out by Vitaly,
    initialize the GEM funcs for kernel allocations as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18 16:15:13 -04:00
Frank Min
7b6df1d732 drm/amdgpu: update golden regs for gfx12
update golden regs for gfx12

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-09-18 16:15:09 -04:00
Alex Deucher
042658d17a drm/amdgpu: clean up vbios fetching code
After splitting the logic between APU and dGPU,
clean up some of the APU and dGPU specific logic
that no longer applied.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18 16:15:09 -04:00
Alex Deucher
375b035f68 drm/amdgpu/bios: split vbios fetching between APU and dGPU
We need some different logic for dGPUs and the APU path
can be simplified because there are some methods which
are never used on APUs.  This also fixes a regression
on some older APUs causing the driver to fetch the
unpatched ROM image rather than the patched image.

Fixes: 9c081c11c6 ("drm/amdgpu: Reorder to read EFI exported ROM first")
Reviewed-by: George Zhang <George.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18 16:15:09 -04:00
Christian König
f2be7b39e4 drm/amdgpu: remove amdgpu_pin_restricted()
We haven't used the functionality to pin BOs in a certain range at all
while the driver existed. Just nuke it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18 16:15:09 -04:00
Christian König
54b86443fd drm/amdgpu: explicitely set the AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flag
Instead of having that in the amdgpu_bo_pin() function applied for all
pinned BOs.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18 16:15:09 -04:00
Lijo Lazar
42ac749d5b drm/amdgpu: Fix XCP instance mask calculation
Fix instance mask calculation for VCN IP. There are cases where VCN
instance could be shared across partitions. Fix here so that other
blocks don't need to check for any shared instances based on partition
mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18 16:15:09 -04:00
Asad Kamal
ef126c06a9 drm/amdgpu: Fix get each xcp macro
Fix get each xcp macro to loop over each partition correctly

Fixes: 4bdca20579 ("drm/amdgpu: Add utility functions for xcp")
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18 16:15:08 -04:00
Le Ma
2778701b16 drm/amdgpu: load sos binary properly on the basis of pmfw version
To be compatible with legacy IFWI, driver needs to carry legacy tOS and
query pmfw version to load them accordingly.

Add psp_firmware_header_v2_1 to handle the combined sos binary.

Double the sos count limit for the case of aux sos fw packed.

v2: pass the correct fw_bin_desc to parse_sos_bin_descriptor

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18 16:15:06 -04:00