drm/todo: Add section with task for GPU scheduler

The GPU scheduler has a great many problems and deserves its own TODO
section.

Add a section and a first task describing the problem of
drm_sched_resubmit_jobs() being deprecated without a successor.

Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20251107135701.244659-3-phasta@kernel.org
This commit is contained in:
Philipp Stanner
2025-11-07 14:57:00 +01:00
parent 7068d42048
commit 9d56cbaf12

View File

@@ -878,6 +878,37 @@ Contact: Christian König
Level: Starter
DRM GPU Scheduler
=================
Provide a universal successor for drm_sched_resubmit_jobs()
-----------------------------------------------------------
drm_sched_resubmit_jobs() is deprecated. Main reason being that it leads to
reinitializing dma_fences. See that function's docu for details. The better
approach for valid resubmissions by amdgpu and Xe is (apparently) to figure out
which job (and, through association: which entity) caused the hang. Then, the
job's buffer data, together with all other jobs' buffer data currently in the
same hardware ring, must be invalidated. This can for example be done by
overwriting it. amdgpu currently determines which jobs are in the ring and need
to be overwritten by keeping copies of the job. Xe obtains that information by
directly accessing drm_sched's pending_list.
Tasks:
1. implement scheduler functionality through which the driver can obtain the
information which *broken* jobs are currently in the hardware ring.
2. Such infrastructure would then typically be used in
drm_sched_backend_ops.timedout_job(). Document that.
3. Port a driver as first user.
4. Document the new alternative in the docu of deprecated
drm_sched_resubmit_jobs().
Contact: Christian König <christian.koenig@amd.com>
Philipp Stanner <phasta@kernel.org>
Level: Advanced
Outside DRM
===========