linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

Author	SHA1	Message	Date
Rob Clark	d7a5ac67d8	drm/msm: Extend gpu devcore dumps with pgtbl info In the case of iova fault triggered devcore dumps, include additional debug information based on what we think is the current page tables, including the TTBR0 value (which should match what we have in adreno_smmu_fault_info unless things have gone horribly wrong), and the pagetable entries traversed in the process of resolving the faulting iova. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/628117/	2025-02-27 11:58:30 -08:00
Frederic Weisbecker	b04e317b52	treewide: Introduce kthread_run_worker[_on_cpu]() kthread_create() creates a kthread without running it yet. kthread_run() creates a kthread and runs it. On the other hand, kthread_create_worker() creates a kthread worker and runs it. This difference in behaviours is confusing. Also there is no way to create a kthread worker and affine it using kthread_bind_mask() or kthread_affine_preferred() before starting it. Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]() that behaves just like kthread_run(). kthread_create_worker[_on_cpu]() will now only create a kthread worker without starting it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>	2025-01-08 18:15:03 +01:00
Antonino Maniscalco	3241504ea2	drm/msm/a6xx: Track current_ctx_seqno per ring With preemption it is not enough to track the current_ctx_seqno globally as execution might switch between rings. This is especially problematic when current_ctx_seqno is used to determine whether a page table switch is necessary as it might lead to security bugs. Track current context per ring. Tested-by: Rob Clark <robdclark@gmail.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK Signed-off-by: Antonino Maniscalco <antomani103@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/618012/ Signed-off-by: Rob Clark <robdclark@chromium.org>	2024-10-03 13:18:34 -07:00
Konrad Dybcio	1600776855	drm/msm/adreno: Assign msm_gpu->pdev earlier to avoid nullptrs There are some cases, such as the one uncovered by Commit `46d4efcccc` ("drm/msm/a6xx: Avoid a nullptr dereference when speedbin setting fails") where msm_gpu_cleanup() : platform_set_drvdata(gpu->pdev, NULL); is called on gpu->pdev == NULL, as the GPU device has not been fully initialized yet. Turns out that there's more than just the aforementioned path that causes this to happen (e.g. the case when there's speedbin data in the catalog, but opp-supported-hw is missing in DT). Assigning msm_gpu->pdev earlier seems like the least painful solution to this, therefore do so. Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/602742/ Signed-off-by: Rob Clark <robdclark@chromium.org>	2024-09-01 08:17:53 -07:00
Rob Clark	f2608b70f6	drm/msm: Add obj flags to gpu devcoredump When debugging faults, it is useful to know how the BO is mapped (cached vs WC, gpu readonly, etc). Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/593854/	2024-06-21 13:41:43 -07:00
Dmitry Baryshkov	84935a85a6	drm/msm: remove dependencies from core onto adreno headers Two core driver files include headers from Adreno subdir, which also brings dependency on the Adreno register headers. Rework those includes to remove unnecessary dependency. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/585850/ Link: https://lore.kernel.org/r/20240401-fd-xml-shipped-v5-5-4bdb277a85a1@linaro.org	2024-04-22 16:22:49 +03:00
Rob Clark	917e9b7c23	Revert "drm/msm/gpu: Push gpu lock down past runpm" This reverts commit `abe2023b4c`. Changing the locking order means that scheduler/msm_job_run() can race with the recovery kthread worker, with the result that the GPU gets an extra runpm get when we are trying to power it off. Leaving the GPU in an unrecovered state. I'll need to come up with a different scheme for appeasing lockdep. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/573835/	2024-02-01 15:24:10 -08:00
Rob Clark	12578c075f	drm/msm/gpu: Skip retired submits in recover worker If we somehow raced with submit retiring, either while waiting for worker to have a chance to run or acquiring the gpu lock, then the recover worker should just bail. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/568034/	2023-11-20 17:15:19 -08:00
Rob Clark	548b61a8ce	drm/msm/gpu: Move gpu devcore's to gpu device The dpu devcore's are already associated with the dpu device. So we should associate the gpu devcore's with the gpu device, for easier classification. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/567738/	2023-11-20 17:14:53 -08:00
Rob Clark	abe2023b4c	drm/msm/gpu: Push gpu lock down past runpm Avoid holding gpu lock when calling runpm, to avoid this lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.4.3-debug+ #14 Not tainted ------------------------------------------------------ ring0/373 is trying to acquire lock: ffffffead86efb98 (prepare_lock){+.+.}-{3:3}, at: clk_prepare_lock+0x70/0x98 but task is already holding lock: ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&gpu->lock){+.+.}-{3:3}: __mutex_lock+0xc8/0x388 mutex_lock_nested+0x2c/0x38 msm_job_run+0x7c/0x128 [msm] drm_sched_main+0x264/0x354 [gpu_sched] kthread+0xf0/0x100 ret_from_fork+0x10/0x20 -> #3 (dma_fence_map){++++}-{0:0}: __dma_fence_might_wait+0x74/0xc0 dma_resv_lockdep+0x1f0/0x2e8 do_one_initcall+0xb4/0x214 kernel_init_freeable+0x338/0x33c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: fs_reclaim_acquire+0x7c/0x9c slab_pre_alloc_hook.constprop.0+0x40/0x250 __kmem_cache_alloc_node+0x60/0x18c kmalloc_node_trace+0x40/0x84 alloc_worker+0x2c/0x64 init_rescuer+0x34/0xe0 workqueue_init+0x168/0x1fc kernel_init_freeable+0x15c/0x33c kernel_init+0x30/0x134 ret_from_fork+0x10/0x20 -> #1 (fs_reclaim){+.+.}-{0:0}: __fs_reclaim_acquire+0x3c/0x48 fs_reclaim_acquire+0x50/0x9c slab_pre_alloc_hook.constprop.0+0x40/0x250 __kmem_cache_alloc_node+0x60/0x18c kmalloc_trace+0x44/0x88 clk_rcg2_dfs_determine_rate+0x60/0x214 clk_core_determine_round_nolock+0xb8/0xf0 clk_core_round_rate_nolock+0x84/0x118 clk_core_round_rate_nolock+0xd8/0x118 clk_round_rate+0x6c/0xd0 geni_se_clk_tbl_get+0x78/0xc0 geni_se_clk_freq_match+0x44/0xe4 get_spi_clk_cfg+0x50/0xf4 geni_spi_set_clock_and_bw+0x54/0x104 spi_geni_prepare_message+0x130/0x174 __spi_pump_transfer_message+0x200/0x4d8 __spi_sync+0x13c/0x23c spi_sync_locked+0x18/0x24 do_cros_ec_pkt_xfer_spi+0x124/0x3f0 cros_ec_xfer_high_pri_work+0x28/0x3c kthread_worker_fn+0x14c/0x27c kthread+0xf0/0x100 ret_from_fork+0x10/0x20 -> #0 (prepare_lock){+.+.}-{3:3}: __lock_acquire+0xdf8/0x109c lock_acquire+0x234/0x284 __mutex_lock+0xc8/0x388 mutex_lock_nested+0x2c/0x38 clk_prepare_lock+0x70/0x98 clk_prepare+0x24/0x50 clk_bulk_prepare+0x50/0x9c a6xx_gmu_resume+0x94/0x800 [msm] a6xx_gmu_pm_resume+0x38/0x158 [msm] adreno_runtime_resume+0x2c/0x38 [msm] pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x4c/0x134 rpm_callback+0x78/0x7c rpm_resume+0x3a4/0x46c __pm_runtime_resume+0x78/0xbc pm_runtime_get_sync.isra.0+0x14/0x20 [msm] msm_gpu_submit+0x4c/0x12c [msm] msm_job_run+0x88/0x128 [msm] drm_sched_main+0x264/0x354 [gpu_sched] kthread+0xf0/0x100 ret_from_fork+0x10/0x20 other info that might help us debug this: Chain exists of: prepare_lock --> dma_fence_map --> &gpu->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&gpu->lock); lock(dma_fence_map); lock(&gpu->lock); lock(prepare_lock); * DEADLOCK * 2 locks held by ring0/373: #0: ffffffead875ae50 (dma_fence_map){++++}-{0:0}, at: drm_sched_main+0x54/0x354 [gpu_sched] #1: ffffff809cd19170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm] stack backtrace: CPU: 2 PID: 373 Comm: ring0 Not tainted 6.4.3-debug+ #14 Hardware name: Google Villager (rev1+) with LTE (DT) Call trace: dump_backtrace+0xb4/0xf0 show_stack+0x20/0x30 dump_stack_lvl+0x60/0x84 dump_stack+0x18/0x24 print_circular_bug+0x1cc/0x234 check_noncircular+0x78/0xac __lock_acquire+0xdf8/0x109c lock_acquire+0x234/0x284 __mutex_lock+0xc8/0x388 mutex_lock_nested+0x2c/0x38 clk_prepare_lock+0x70/0x98 clk_prepare+0x24/0x50 clk_bulk_prepare+0x50/0x9c a6xx_gmu_resume+0x94/0x800 [msm] a6xx_gmu_pm_resume+0x38/0x158 [msm] adreno_runtime_resume+0x2c/0x38 [msm] pm_generic_runtime_resume+0x30/0x44 __rpm_callback+0x4c/0x134 rpm_callback+0x78/0x7c rpm_resume+0x3a4/0x46c __pm_runtime_resume+0x78/0xbc pm_runtime_get_sync.isra.0+0x14/0x20 [msm] msm_gpu_submit+0x4c/0x12c [msm] msm_job_run+0x88/0x128 [msm] drm_sched_main+0x264/0x354 [gpu_sched] kthread+0xf0/0x100 ret_from_fork+0x10/0x20 Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/552298/	2023-08-15 10:09:30 -07:00
Rob Clark	6ba5daa5d5	drm/msm: Use drm_gem_object in submit bos table Basically everywhere wants the base ptr type. So store that instead of msm_gem_object. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/551021/	2023-08-10 10:44:02 -07:00
Ruan Jinjie	f09f5459bd	drm/msm: Remove redundant DRM_DEV_ERROR() There is no need to call the DRM_DEV_ERROR() function directly to print a custom message when handling an error from platform_get_irq() function as it is going to display an appropriate error message in case of a failure. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/549499/ Link: https://lore.kernel.org/r/20230727112407.2916029-1-ruanjinjie@huawei.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	2023-08-02 12:36:05 +03:00
Dmitry Baryshkov	03c601927b	Merge branch 'drm-next' of git://anongit.freedesktop.org/drm/drm into msm-next-lumag-base Merge the drm-next tree to pick up the DRM DSC helpers (merged via drm-intel-next tree). MSM DSC v1.2 patches depend on these helpers. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	2023-06-13 00:23:33 +03:00
Rob Clark	171f580e32	drm/msm: Move cmdstream dumping out of sched kthread This is something that can block for arbitrary amounts of time as userspace consumes from the FIFO. So we don't really want this to be in the fence signaling path. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/532617/	2023-06-10 06:46:12 -07:00
Rob Clark	51d86ee5e0	drm/msm: Switch to fdinfo helper Now that we have a common helper, use it. v2: Rebase on drm-misc-next Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230524155956.382440-4-robdclark@gmail.com	2023-05-24 18:03:27 +02:00
Konrad Dybcio	9f251f9340	drm/msm/adreno: Use OPP for every GPU generation Some older GPUs (namely a2xx with no opp tables at all and a320 with downstream-remnants gpu pwrlevels) used not to have OPP tables. They both however had just one frequency defined, making it extremely easy to construct such an OPP table from within the driver if need be. Do so and switch all clk_set_rate calls on core_clk to their OPP counterparts. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/523784/ Link: https://lore.kernel.org/r/20230223-topic-opp-v3-3-5f22163cd1df@linaro.org Signed-off-by: Rob Clark <robdclark@chromium.org>	2023-03-20 11:04:59 -07:00
Akhil P Oommen	d484301225	drm/msm/a6xx: Remove cx gdsc polling using 'reset' Remove the unused 'reset' interface which was supposed to help to ensure that cx gdsc has collapsed during gpu recovery. This is was not enabled so far due to missing gpucc driver support. Similar functionality using genpd framework will be implemented in the upcoming patch. This effectively reverts commit `1f6cca4049` ("drm/msm/a6xx: Ensure CX collapse during gpu recovery"). Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Patchwork: https://patchwork.freedesktop.org/patch/516470/ Link: https://lore.kernel.org/r/20230102161757.v5.4.I96e0bf9eaf96dd866111c1eec8a4c9b70fd7cbcb@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>	2023-03-20 10:53:47 -07:00
Dave Airlie	fc70e13dd1	Merge tag 'drm-msm-fixes-2023-01-16' of https://gitlab.freedesktop.org/drm/msm into drm-fixes msm-fixes for v6.3-rc5 Two GPU fixes which were meant to be part of the previous pull request, but I'd forgotten to fetch from gitlab after the MR was merged so that git tag was applied to the wrong commit. - kexec shutdown fix - fix potential double free Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGskguoVsz2wqAK2k+f32LwcVY5JC6+e2RwLqZswz3RY2Q@mail.gmail.com	2023-01-20 07:49:01 +10:00
Rob Clark	a66f1efcf7	drm/msm/gpu: Fix potential double-free If userspace was calling the MSM_SET_PARAM ioctl on multiple threads to set the COMM or CMDLINE param, it could trigger a race causing the previous value to be kfree'd multiple times. Fix this by serializing on the gpu lock. Signed-off-by: Rob Clark <robdclark@chromium.org> Fixes: `d4726d7700` ("drm/msm: Add a way to override processes comm/cmdline") Patchwork: https://patchwork.freedesktop.org/patch/517778/ Link: https://lore.kernel.org/r/20230110212903.1925878-1-robdclark@gmail.com	2023-01-11 09:00:14 -08:00
Dave Airlie	077bd80083	Merge tag 'drm-msm-next-2022-11-28' of https://gitlab.freedesktop.org/drm/msm into drm-next msm-next for v6.2 (the gpu/gem bits) - Remove exclusive-fence hack that caused over-synchronization - Fix speed-bin detection vs. probe-defer - Enable clamp_to_idle on 7c3 - Improved hangcheck detection Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvT1h_S4d=YRgphgR8i7aMaxQaNW8mru7QaoUo9uiUk2A@mail.gmail.com	2022-11-30 17:19:18 +10:00
Rob Clark	d73b1d02de	drm/msm: Hangcheck progress detection If the hangcheck timer expires, check if the fw's position in the cmdstream has advanced (changed) since last timer expiration, and allow it up to three additional "extensions" to it's alotted time. The intention is to continue to catch "shader stuck in a loop" type hangs quickly, but allow more time for things that are actually making forward progress. Because we need to sample the CP state twice to detect if there has not been progress, this also cuts the the timer's duration in half. v2: Fix typo (REG_A6XX_CP_CSQ_IB2_STAT), add comment v3: Only halve hangcheck timer duration for generations which support progress detection (hdanton); removed unused a5xx progress (without knowing how to adjust for data buffered in ROQ it is too likely to report a false negative) v4: Comment updates to better describe the total hangcheck duration when progress detection is applied Reviewed-by: Chia-I Wu <olvaffe@gmail.com> Tested-by: Chia-I Wu <olvaffe@gmail.com> # dEQP-GLES2.functional.flush_finish.wait Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/511584/ Link: https://lore.kernel.org/r/20221114193049.1533391-3-robdclark@gmail.com	2022-11-17 10:39:12 -08:00
Akhil P Oommen	76efc2453d	drm/msm/gpu: Fix crash during system suspend after unbind In adreno_unbind, we should clean up gpu device's drvdata to avoid accessing a stale pointer during system suspend. Also, check for NULL ptr in both system suspend/resume callbacks. Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/505075/ Link: https://lore.kernel.org/r/20220928124830.2.I5ee0ac073ccdeb81961e5ec0cce5f741a7207a71@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-09-30 09:01:41 -07:00
Akhil P Oommen	1f6cca4049	drm/msm/a6xx: Ensure CX collapse during gpu recovery Because there could be transient votes from other drivers/tz/hyp which may keep the cx gdsc enabled, we should poll until cx gdsc collapses. We can use the reset framework to poll for cx gdsc collapse from gpucc clk driver. This feature requires support from the platform's gpucc driver. Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Patchwork: https://patchwork.freedesktop.org/patch/498397/ Link: https://lore.kernel.org/r/20220819015030.v5.5.I176567525af2b9439a7e485d0ca130528666a55c@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-08-28 09:29:27 -07:00
Akhil P Oommen	f350bfb92b	drm/msm: Fix cx collapse issue during recovery There are some hardware logic under CX domain. For a successful recovery, we should ensure cx headswitch collapses to ensure all the stale states are cleard out. This is especially true to for a6xx family where we can GMU co-processor. Currently, cx doesn't collapse due to a devlink between gpu and its smmu. So the struct gpu device needs to be runtime suspended to ensure that the iommu driver removes its vote on cx gdsc. Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/498398/ Link: https://lore.kernel.org/r/20220819015030.v5.4.I4ac27a0b34ea796ce0f938bb509e257516bc6f57@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-08-28 09:29:27 -07:00
Akhil P Oommen	06097e372a	drm/msm: Correct pm_runtime votes in recover worker In the scenario where there is one a single submit which is hung, gpu is power collapsed when it is retired. Because of this, by the time we call reover(), gpu state would be already clear. Fix this by correctly managing the pm runtime votes. Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/498391/ Link: https://lore.kernel.org/r/20220819015030.v5.3.Ib07ecec3d5c17cb0e1efa6fcddaaa019ec2fb556@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-08-28 09:29:27 -07:00
Akhil P Oommen	5b26f37d13	drm/msm: Take single rpm refcount on behalf of all submits Instead of separate refcount for each submit, take single rpm refcount on behalf of all the submits. This makes it easier to drop the rpm refcount during recovery in an upcoming patch. Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/498392/ Link: https://lore.kernel.org/r/20220819015030.v5.2.Ifee853f6d8217a0fdacc459092bbc9e81a8a7ac7@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-08-28 09:29:27 -07:00
Rob Clark	b352ba54a8	drm/msm/gem: Convert to using drm_gem_lru This converts over to use the shared GEM LRU/shrinker helpers. Note that it means we are no longer tracking purgeable or willneed buffers that are active separately. But the most recently pinned buffers should be at the tail of the various LRUs, and the shrinker is already prepared to encounter objects which are still active. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/496131/ Link: https://lore.kernel.org/r/20220802155152.1727594-11-robdclark@gmail.com	2022-08-27 09:32:45 -07:00
Rob Clark	8b5de73595	drm/msm: Deprecate MSM_BO_UNCACHED harder Handle the demotion to MSM_BO_WC at the userspace ABI level, and fix the remaining internal MSM_BO_UNCACHED user. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/489339/ Link: https://lore.kernel.org/r/20220613194623.2588353-1-robdclark@gmail.com	2022-07-06 18:54:42 -07:00
Rob Clark	18514c3848	drm/msm/gpu: Add GEM debug label to devcore When trying to understand an iova fault devcore, once you figure out which buffer we accessed beyond the end of, it is useful to see the buffer's debug label. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/491910/ Link: https://lore.kernel.org/r/20220629211919.563585-3-robdclark@gmail.com	2022-07-06 18:54:41 -07:00
Rob Clark	cc66a42c94	drm/msm/gpu: Capture all BO addr+size in devcore It is useful to know what buffers userspace thinks are associated with the submit, even if we don't care to capture their content. This brings things more inline with $debugfs/rd cmdstream dumping. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/491908/ Link: https://lore.kernel.org/r/20220629211919.563585-2-robdclark@gmail.com	2022-07-06 18:54:41 -07:00
Rob Clark	cfebe3fd59	drm/msm: Expose client engine utilization via fdinfo Similar to AMD commit `8744425411` ("drm/amdgpu: Add show_fdinfo() interface"), using the infrastructure added in previous patches, we add basic client info and GPU engine utilisation for msm. Example output: # cat /proc/`pgrep glmark2`/fdinfo/6 pos: 0 flags: 02400002 mnt_id: 21 ino: 162 drm-driver: msm drm-client-id: 7 drm-engine-gpu: 1734371319 ns drm-cycles-gpu: 1153645024 drm-maxfreq-gpu: 800000000 Hz See also: https://patchwork.freedesktop.org/patch/468505/ v2: Add dev-maxfreq-$engine and update drm-usage-stats.rst v3: spelling and compiler warning Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Patchwork: https://patchwork.freedesktop.org/patch/488906/ Link: https://lore.kernel.org/r/20220609174213.2265938-2-robdclark@gmail.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	2022-07-05 06:01:11 +03:00
Rob Clark	c8af219d18	drm/msm: Don't overwrite hw fence in hw_init Prior to the last commit, this could result in setting the GPU written fence value back to an older value, if we had missed updating completed_fence prior to suspend. This was mostly harmless as the GPU would eventually overwrite it again with the correct value. But we should just not do this. Instead just leave a sanity check that the fence looks plausible (in case the GPU scribbled on memory). Reported-by: Steev Klimaszewski <steev@kali.org> Fixes: `95d1deb02a` ("drm/msm/gem: Add fenced vma unpin") Signed-off-by: Rob Clark <robdclark@chromium.org> Tested-by: Steev Klimaszewski <steev@kali.org> Patchwork: https://patchwork.freedesktop.org/patch/490138/ Link: https://lore.kernel.org/r/20220618161120.3451993-2-robdclark@gmail.com	2022-06-18 09:13:33 -07:00
Rob Clark	3c7a52217a	drm/msm: Drop update_fences() I noticed while looking at some traces, that we could miss calls to msm_update_fence(), as the irq could have raced with retire_submits() which could have already popped the last submit on a ring out of the queue of in-flight submits. But walking the list of submits in the irq handler isn't really needed, as dma_fence_is_signaled() will dtrt. So lets just drop it entirely. v2: use spin_lock_irqsave/restore as we are no longer protected by the spin_lock_irqsave/restore() in update_fences() Reported-by: Steev Klimaszewski <steev@kali.org> Fixes: `95d1deb02a` ("drm/msm/gem: Add fenced vma unpin") Signed-off-by: Rob Clark <robdclark@chromium.org> Tested-by: Steev Klimaszewski <steev@kali.org> Patchwork: https://patchwork.freedesktop.org/patch/490136/ Link: https://lore.kernel.org/r/20220618161120.3451993-1-robdclark@gmail.com	2022-06-18 09:13:32 -07:00
Rob Clark	49e4776100	drm/msm: Switch ordering of runpm put vs devfreq_idle In msm_devfreq_suspend() we cancel idle_work synchronously so that it doesn't run after we power of the hw or in the resume path. But this means that we want to ensure that idle_work is not scheduled after we no longer hold a runpm ref. So switch the ordering of pm_runtime_put() vs msm_devfreq_idle(). v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy() Fixes: `9bc9557017` ("drm/msm: Devfreq tuning") Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20210927152928.831245-1-robdclark@gmail.com Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20220608161334.2140611-1-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-06-13 15:32:32 -07:00
Luca Weiss	36a1d1bda7	drm/msm: Fix null pointer dereferences without iommu Check if 'aspace' is set before using it as it will stay null without IOMMU, such as on msm8974. Fixes: `bc2112583a` ("drm/msm/gpu: Track global faults per address-space") Signed-off-by: Luca Weiss <luca@z3ntu.xyz> Link: https://lore.kernel.org/r/20220421203455.313523-1-luca@z3ntu.xyz Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-05-02 10:11:44 -07:00
Rob Clark	f9d5355fa5	drm/msm/gpu: Drop duplicate fence counter The ring seqno counter duplicates the fence-context last_fence counter. They end up getting incremented in lock-step, on the same scheduler thread, but the split just makes things less obvious. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220411215849.297838-3-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-04-21 15:03:11 -07:00
Rob Clark	d4726d7700	drm/msm: Add a way to override processes comm/cmdline In the cause of using the GPU via virtgpu, the host side process is really a sort of proxy, and not terribly interesting from the PoV of crash/fault logging. Add a way to override these per process so that we can see the guest process's name. v2: Handle kmalloc failure, add comment to explain kstrdup returns NULL if passed NULL [Dan Carpenter] Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220317165144.222101-4-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-04-21 15:01:09 -07:00
Rob Clark	39ba0c0d6c	drm/msm: Split out helper to get comm/cmdline Deduplicate this from fault_worker and recover_worker. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220317165144.222101-3-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-04-21 15:01:08 -07:00
Rob Clark	90f45c42d7	drm/msm: Add SYSPROF param (v2) Add a SYSPROF param for system profiling tools like Mesa's pps-producer (perfetto) to control behavior related to system-wide performance counter collection. In particular, for profiling, one wants to ensure that GPU context switches do not effect perfcounter state, and might want to suppress suspend (which would cause counters to lose state). v2: Swap the order in msm_file_private_set_sysprof() [sboyd] and initialize the sysprof_active refcount to one (because the under/ overflow checking in refcount_t doesn't expect a 0->1 transition) meaning that values greater than 1 means sysprof is active. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220304005317.776110-4-robdclark@gmail.com	2022-03-04 11:59:31 -08:00
Akhil P Oommen	0737ab95a0	drm/msm: Use generic name for gpu resources Use generic name for resources like irq and kthread instead of hardware specific name to make it easier to grep. Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Link: https://lore.kernel.org/r/20220226005021.v2.1.Id3d2e7391192c86d0783aeb307d3f9fb61f9efee@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-02-25 13:29:57 -08:00
Rob Clark	bc2112583a	drm/msm/gpu: Track global faults per address-space Other processes don't need to know about faults that they are isolated from by virtue of address space isolation. They are only interested in whether some of their state might have been corrupted. But to be safe, also track unattributed faults. This case should really never happen unless there is a kernel bug (and that would never happen, right?) v2: Instead of adding a new param, just change the behavior of the existing param to match what userspace actually wants [anholt] Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5934 Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20220201161618.778455-3-robdclark@gmail.com Reviewed-by: Emma Anholt <emma@anholt.net> Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-02-20 09:44:52 -08:00
Dmitry Baryshkov	c0e745d73a	drm/msm: drop dbgname argument from msm_ioremap*() msm_ioremap() functions take additional argument dbgname which is now unused. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20220105232700.444170-2-dmitry.baryshkov@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	2022-02-18 18:32:52 +03:00
Rob Clark	167a668ab0	drm/msm/gpu: Wait for idle before suspending System suspend uses pm_runtime_force_suspend(), which cheekily bypasses the runpm reference counts. This doesn't actually work so well when the GPU is active. So add a reasonable delay waiting for the GPU to become idle. Alternatively we could just return -EBUSY in this case, but that has the disadvantage of causing system suspend to fail. v2: s/ret/remaining [sboyd], and switch to using active_submits count to ensure we aren't racing with submit cleanup (and devfreq idle work getting scheduled, etc) v3: fix inverted logic Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20220108180913.814448-2-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2022-01-25 08:54:41 -08:00
Rob Clark	5f3aee4ceb	drm/msm: Handle fence rollover Add some helpers for fence comparision, which handle rollover properly, and stop open coding fence seqno comparisions. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Akhil P Oommen <akhilpo@codeaurora.org> Link: https://lore.kernel.org/r/20211109181117.591148-5-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2021-11-28 09:56:47 -08:00
Rob Clark	c28e2f2b41	drm/msm: Remove struct_mutex usage The remaining struct_mutex usage is just to serialize various gpu related things (submit/retire/recover/fault/etc), so replace struct_mutex with gpu->lock. Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20211109181117.591148-4-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2021-11-28 09:50:33 -08:00
Rob Clark	1d054c9b84	drm/msm: Drop priv->lastctx cur_ctx_seqno already does the same thing, but handles the edge cases where a refcnt'd context can live after lastclose. So let's not have two ways to do the same thing. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Akhil P Oommen <akhilpo@codeaurora.org> Link: https://lore.kernel.org/r/20211109181117.591148-3-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2021-11-28 09:50:33 -08:00
Tim Gardner	b220c15483	drm/msm: prevent NULL dereference in msm_gpu_crashstate_capture() Coverity complains of a possible NULL dereference: CID 120718 (#1 of 1): Dereference null return value (NULL_RETURNS) 23. dereference: Dereferencing a pointer that might be NULL state->bos when calling msm_gpu_crashstate_get_bo. [show details] 301 msm_gpu_crashstate_get_bo(state, submit->bos[i].obj, 302 submit->bos[i].iova, submit->bos[i].flags); Fix this by employing the same state->bos NULL check as is used in the next for loop. Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: linux-arm-msm@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: freedreno@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20210929162554.14295-1-tim.gardner@canonical.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Rob Clark <robdclark@chromium.org>	2021-10-15 13:26:33 -07:00
Rob Clark	1d8a5ca436	drm/msm: Conversion to drm scheduler For existing adrenos, there is one or more ringbuffer, depending on whether preemption is supported. When preemption is supported, each ringbuffer has it's own priority. A submitqueue (which maps to a gl context or vk queue in userspace) is mapped to a specific ring- buffer at creation time, based on the submitqueue's priority. Each ringbuffer has it's own drm_gpu_scheduler. Each submitqueue maps to a drm_sched_entity. And each submit maps to a drm_sched_job. Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/4 Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20210728010632.2633470-10-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2021-07-28 09:19:00 -07:00
Rob Clark	be40596bb5	drm/msm: Consolidate submit bo state Move all the locked/active/pinned state handling to msm_gem_submit.c. In particular, for drm/scheduler, we'll need to do all this before pushing the submit job to the scheduler. But while we're at it we can get rid of the dupicate pin and refcnt. Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20210728010632.2633470-7-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2021-07-27 18:09:18 -07:00
Rob Clark	030af2b05a	drm/msm: drop drm_gem_object_put_locked() No idea why we were still using this. It certainly hasn't been needed for some time. So drop the pointless twin codepaths. Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20210728010632.2633470-4-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org>	2021-07-27 18:09:18 -07:00

1 2 3 4

175 Commits