2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

11 Commits

Author SHA1 Message Date
Boris Brezillon
3ce4322b1a drm/panthor: Call panthor_sched_post_reset() even if the reset failed
We need to undo what was done in panthor_sched_pre_reset() even if the
reset failed. We just flag all previously running groups as terminated
when that happens to unblock things.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502183813.1612017-5-boris.brezillon@collabora.com
2024-05-13 09:52:22 +02:00
Boris Brezillon
ff60c8da0a drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
Avoids use-after-free situations when panthor_fw_unplug() is called
and the kernel BO was mapped to the FW VM.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502183813.1612017-3-boris.brezillon@collabora.com
2024-05-13 09:52:13 +02:00
Boris Brezillon
2b2a26b331 drm/panthor: Force an immediate reset on unrecoverable faults
If the FW reports an unrecoverable fault, we need to reset the GPU
before we can start re-using it again.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502183813.1612017-2-boris.brezillon@collabora.com
2024-05-13 09:52:09 +02:00
Antonino Maniscalco
d214329757 drm/panthor: Fix tiler OOM handling to allow incremental rendering
If the kernel couldn't allocate memory because we reached the maximum
number of chunks but no render passes are in flight
(panthor_heap_grow() returning -ENOMEM), we should defer the OOM
handling to the FW by returning a NULL chunk. The FW will then call
the tiler OOM exception handler, which is supposed to implement
incremental rendering (execute an intermediate fragment job to flush
the pending primitives, release the tiler memory that was used to
store those primitives, and start over from where it stopped).

Instead of checking for both ENOMEM and EBUSY, make panthor_heap_grow()
return ENOMEM no matter the reason of this allocation failure, the FW
doesn't care anyway.

v3:
- Add R-bs

v2:
- Make panthor_heap_grow() return -ENOMEM for all kind of allocation
  failures
- Document the panthor_heap_grow() semantics

Fixes: de85488138 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502165158.1458959-2-boris.brezillon@collabora.com
2024-05-13 09:49:12 +02:00
Boris Brezillon
8bdbd8b558 drm/panthor: Make sure we handle 'unknown group state' case properly
When we check for state values returned by the FW, we only cover part of
the 0:7 range. Make sure we catch FW inconsistencies by adding a default
to the switch statement, and flagging the group state as unknown in that
case.

When an unknown state is detected, we trigger a reset, and consider the
group as unusable after that point, to prevent the potential corruption
from creeping in other places if we continue executing stuff on this
context.

v2:
- Add Steve's R-b
- Fix commit message

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/dri-devel/3b7fd2f2-679e-440c-81cd-42fc2573b515@moroto.mountain/T/#u
Suggested-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502155248.1430582-1-boris.brezillon@collabora.com
2024-05-02 18:42:13 +02:00
Boris Brezillon
be2d3e9d06 drm/panthor: Kill the faulty_slots variable in panthor_sched_suspend()
We can use upd_ctx.timedout_mask directly, and the faulty_slots update
in the flush_caches_failed situation is never used.

Suggested-by: Suggested-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425103920.826458-1-boris.brezillon@collabora.com
2024-05-02 17:54:05 +02:00
Dan Carpenter
a9b7dfd1d1 drm/panthor: clean up some types in panthor_sched_suspend()
These variables should be u32 instead of u64 because they're only
storing u32 values.  Also static checkers complain when we do:

	suspended_slots &= ~upd_ctx.timedout_mask;

In this code "suspended_slots" is a u64 and "upd_ctx.timedout_mask".  The
mask clears out the top 32 bits which would likely be a bug if anything
were stored there.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/85356b15-4840-4e64-8c75-922cdd6a5fef@moroto.mountain
2024-04-22 09:39:40 +02:00
Harshit Mogalapalli
45c734fdd4 drm/panthor: Don't return NULL from panthor_vm_get_heap_pool()
The kernel doc says this function returns either a valid pointer
or an ERR_PTR(), but in practice this function can return NULL if
create=false. Fix the function to match the doc (return
ERR_PTR(-ENOENT) instead of NULL) and adjust all call-sites
accordingly.

Fixes: 4bdca11507 ("drm/panthor: Add the driver frontend block")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240402141412.1707949-1-harshit.m.mogalapalli@oracle.com
2024-04-03 09:11:38 +02:00
Liviu Dudau
be7ffc821f drm/panthor: Fix some kerneldoc warnings
When compiling with W=1 the build process will flag empty comments,
misnamed documented variables and incorrect tagging of functions.
Fix them in one go.

Fixes: de85488138 ("drm/panthor: Add the scheduler logical block")
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240402215423.360341-2-liviu.dudau@arm.com
2024-04-03 09:06:25 +02:00
Nathan Chancellor
d76653c32d drm/panthor: Fix clang -Wunused-but-set-variable in tick_ctx_apply()
Clang warns (or errors with CONFIG_WERROR):

  drivers/gpu/drm/panthor/panthor_sched.c:2048:6: error: variable 'csg_mod_mask' set but not used [-Werror,-Wunused-but-set-variable]
   2048 |         u32 csg_mod_mask = 0, free_csg_slots = 0;
        |             ^
  1 error generated.

The variable is an artifact left over from refactoring that occurred
during the development of the initial series for this driver. Remove it
to resolve the warning.

Fixes: de85488138 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240328-panthor-drop-csg_mod_mask-v1-1-5a80be3df581@kernel.org
2024-04-02 09:35:18 +02:00
Boris Brezillon
de85488138 drm/panthor: Add the scheduler logical block
This is the piece of software interacting with the FW scheduler, and
taking care of some scheduling aspects when the FW comes short of slots
scheduling slots. Indeed, the FW only expose a few slots, and the kernel
has to give all submission contexts, a chance to execute their jobs.

The kernel-side scheduler is timeslice-based, with a round-robin queue
per priority level.

Job submission is handled with a 1:1 drm_sched_entity:drm_gpu_scheduler,
allowing us to delegate the dependency tracking to the core.

All the gory details should be documented inline.

v6:
- Add Maxime's and Heiko's acks
- Make sure the scheduler is initialized before queueing the tick work
  in the MMU fault handler
- Keep header inclusion alphabetically ordered

v5:
- Fix typos
- Call panthor_kernel_bo_destroy(group->syncobjs) unconditionally
- Don't move the group to the waiting list tail when it was already
  waiting for a different syncobj
- Fix fatal_queues flagging in the tiler OOM path
- Don't warn when more than one job timesout on a group
- Add a warning message when we fail to allocate a heap chunk
- Add Steve's R-b

v4:
- Check drmm_mutex_init() return code
- s/drm_gem_vmap_unlocked/drm_gem_vunmap_unlocked/ in
  panthor_queue_put_syncwait_obj()
- Drop unneeded WARN_ON() in cs_slot_sync_queue_state_locked()
- Use atomic_xchg() instead of atomic_fetch_and(0)
- Fix typos
- Let panthor_kernel_bo_destroy() check for IS_ERR_OR_NULL() BOs
- Defer TILER_OOM event handling to a separate workqueue to prevent
  deadlocks when the heap chunk allocation is blocked on mem-reclaim.
  This is just a temporary solution, until we add support for
  non-blocking/failable allocations
- Pass the scheduler workqueue to drm_sched instead of instantiating
  a separate one (no longer needed now that heap chunk allocation
  happens on a dedicated wq)
- Set WQ_MEM_RECLAIM on the scheduler workqueue, so we can handle
  job timeouts when the system is under mem pressure, and hopefully
  free up some memory retained by these jobs

v3:
- Rework the FW event handling logic to avoid races
- Make sure MMU faults kill the group immediately
- Use the panthor_kernel_bo abstraction for group/queue buffers
- Make in_progress an atomic_t, so we can check it without the reset lock
  held
- Don't limit the number of groups per context to the FW scheduler
  capacity. Fix the limit to 128 for now.
- Add a panthor_job_vm() helper
- Account for panthor_vm changes
- Add our job fence as DMA_RESV_USAGE_WRITE to all external objects
  (was previously DMA_RESV_USAGE_BOOKKEEP). I don't get why, given
  we're supposed to be fully-explicit, but other drivers do that, so
  there must be a good reason
- Account for drm_sched changes
- Provide a panthor_queue_put_syncwait_obj()
- Unconditionally return groups to their idle list in
  panthor_sched_suspend()
- Condition of sched_queue_{,delayed_}work fixed to be only when a reset
  isn't pending or in progress.
- Several typos in comments fixed.

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-11-boris.brezillon@collabora.com
2024-03-01 10:04:17 +01:00