Commit Graph

110117 Commits

Author SHA1 Message Date
AngeloGioacchino Del Regno
f563dd9ca6 drm/mediatek: Initialize pointer in mtk_drm_of_ddp_path_build_one()
The struct device_node *next pointer is not initialized, and it is
used in an error path in which it may have never been modified by
function mtk_drm_of_get_ddp_ep_cid().

Since the error path is relying on that pointer being NULL for the
OVL Adaptor and/or invalid component check and since said pointer
is being used in prints for %pOF, in the case that it points to a
bogus address, the print may cause a KP.

To resolve that, initialize the *next pointer to NULL before usage.

Fixes: 4c932840db ("drm/mediatek: Implement OF graphs support for display paths")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/dri-devel/633f3c6d-d09f-447c-95f1-dfb4114c50e6@stanley.mountain/
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241112105030.93337-1-angelogioacchino.delregno@collabora.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-01-04 12:15:19 +00:00
Liankun Yang
5229081406 drm/mediatek: Add return value check when reading DPCD
Check the return value of drm_dp_dpcd_readb() to confirm that
AUX communication is successful. To simplify the code, replace
drm_dp_dpcd_readb() and DP_GET_SINK_COUNT() with drm_dp_read_sink_count().

Fixes: f70ac097a2 ("drm/mediatek: Add MT8195 Embedded DisplayPort driver")
Signed-off-by: Liankun Yang <liankun.yang@mediatek.com>
Reviewed-by: Guillaume Ranquet <granquet@baylibre.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241218113448.2992-1-liankun.yang@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-01-04 12:15:08 +00:00
Chun-Kuang Hu
c4bd13be19 drm/mediatek: Remove unneeded semicolon
cocci warnings: (new ones prefixed by >>)
>> drivers/gpu/drm/mediatek/mtk_drm_drv.c:1092:2-3: Unneeded semicolon

Fixes: 4c932840db ("drm/mediatek: Implement OF graphs support for display paths")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412022048.kY2ZhxZ4-lkp@intel.com/
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241230135314.5419-1-chunkuang.hu@kernel.org/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-01-02 13:36:18 +00:00
AngeloGioacchino Del Regno
76aed5e00f drm/mediatek: mtk_dsi: Add registers to pdata to fix MT8186/MT8188
Registers DSI_VM_CMD and DSI_SHADOW_DEBUG start at different
addresses in both MT8186 and MT8188 compared to the older IPs.

Add two members in struct mtk_dsi_driver_data to specify the
offsets for these two registers on a per-SoC basis, then do
specify those in all of the currently present SoC driver data.

This fixes writes to the Video Mode Command Packet Control
register, fixing enablement of command packet transmission
(VM_CMD_EN) and allowance of this transmission during the
VFP period (TS_VFP_EN) on both MT8186 and MT8188.

Fixes: 03d7adc410 ("drm/mediatek: Add mt8186 dsi compatible to mtk_dsi.c")
Fixes: 814d5341f3 ("drm/mediatek: Add mt8188 dsi compatible to mtk_dsi.c")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241219112733.47907-1-angelogioacchino.delregno@collabora.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-30 14:50:38 +00:00
Liankun Yang
0d68b55887 drm/mediatek: Fix mode valid issue for dp
Fix dp mode valid issue to avoid abnormal display of limit state.

After DP passes link training, it can express the lane count of the
current link status is good. Calculate the maximum bandwidth supported
by DP using the current lane count.

The color format will select the best one based on the bandwidth
requirements of the current timing mode. If the current timing mode
uses RGB and meets the DP link bandwidth requirements, RGB will be used.

If the timing mode uses RGB but does not meet the DP link bandwidthi
requirements, it will continue to check whether YUV422 meets
the DP link bandwidth.

FEC overhead is approximately 2.4% from DP 1.4a spec 2.2.1.4.2.
The down-spread amplitude shall either be disabled (0.0%) or up
to 0.5% from 1.4a 3.5.2.6. Add up to approximately 3% total overhead.

Because rate is already divided by 10,
mode->clock does not need to be multiplied by 10.

Fixes: f70ac097a2 ("drm/mediatek: Add MT8195 Embedded DisplayPort driver")
Signed-off-by: Liankun Yang <liankun.yang@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241025083036.8829-3-liankun.yang@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-30 14:25:03 +00:00
Liankun Yang
ef24fbd8f1 drm/mediatek: Fix YCbCr422 color format issue for DP
Setting up misc0 for Pixel Encoding Format.

According to the definition of YCbCr in spec 1.2a Table 2-96,
0x1 << 1 should be written to the register.

Use switch case to distinguish RGB, YCbCr422,
and unsupported color formats.

Fixes: f70ac097a2 ("drm/mediatek: Add MT8195 Embedded DisplayPort driver")
Signed-off-by: Liankun Yang <liankun.yang@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241025083036.8829-2-liankun.yang@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-30 14:10:27 +00:00
Chun-Kuang Hu
a10f26062a Revert "drm/mediatek: Switch to for_each_child_of_node_scoped()"
This reverts commit fd620fc25d.

Boot failures reported by
KernelCI:

[    4.395400] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8)
[    4.396155] mediatek-drm mediatek-drm.5.auto: bound 1c000000.ovl (ops 0xffffd35fd12977b8)
[    4.411951] mediatek-drm mediatek-drm.5.auto: bound 1c002000.rdma (ops 0xffffd35fd12989c0)
[    4.536837] mediatek-drm mediatek-drm.5.auto: bound 1c004000.ccorr (ops 0xffffd35fd1296cf0)
[    4.545181] mediatek-drm mediatek-drm.5.auto: bound 1c005000.aal (ops 0xffffd35fd1296a80)
[    4.553344] mediatek-drm mediatek-drm.5.auto: bound 1c006000.gamma (ops 0xffffd35fd12972b0)
[    4.561680] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8)
[    4.570025] ------------[ cut here ]------------
[    4.574630] refcount_t: underflow; use-after-free.
[    4.579416] WARNING: CPU: 6 PID: 81 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148
[    4.587670] Modules linked in:
[    4.590714] CPU: 6 UID: 0 PID: 81 Comm: kworker/u32:3 Tainted: G        W          6.12.0 #1 cab58e2e59020ebd4be8ada89a65f465a316c742
[    4.602695] Tainted: [W]=WARN
[    4.605649] Hardware name: Acer Tomato (rev2) board (DT)
[    4.610947] Workqueue: events_unbound deferred_probe_work_func
[    4.616768] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    4.623715] pc : refcount_warn_saturate+0xf4/0x148
[    4.628493] lr : refcount_warn_saturate+0xf4/0x148
[    4.633270] sp : ffff8000807639c0
[    4.636571] x29: ffff8000807639c0 x28: ffff34ff4116c640 x27: ffff34ff4368e080
[    4.643693] x26: ffffd35fd1299ac8 x25: ffff34ff46c8c410 x24: 0000000000000000
[    4.650814] x23: ffff34ff4368e080 x22: 00000000fffffdfb x21: 0000000000000002
[    4.657934] x20: ffff34ff470c6000 x19: ffff34ff410c7c10 x18: 0000000000000006
[    4.665055] x17: 666678302073706f x16: 2820656772656d2e x15: ffff800080763440
[    4.672176] x14: 0000000000000000 x13: 2e656572662d7265 x12: ffffd35fd2ed14f0
[    4.679297] x11: 0000000000000001 x10: 0000000000000001 x9 : ffffd35fd0342150
[    4.686418] x8 : c0000000ffffdfff x7 : ffffd35fd2e21450 x6 : 00000000000affa8
[    4.693539] x5 : ffffd35fd2ed1498 x4 : 0000000000000000 x3 : 0000000000000000
[    4.700660] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff34ff40932580
[    4.707781] Call trace:
[    4.710216]  refcount_warn_saturate+0xf4/0x148 (P)
[    4.714993]  refcount_warn_saturate+0xf4/0x148 (L)
[    4.719772]  kobject_put+0x110/0x118
[    4.723335]  put_device+0x1c/0x38
[    4.726638]  mtk_drm_bind+0x294/0x5c0
[    4.730289]  try_to_bring_up_aggregate_device+0x16c/0x1e0
[    4.735673]  __component_add+0xbc/0x1c0
[    4.739495]  component_add+0x1c/0x30
[    4.743058]  mtk_disp_rdma_probe+0x140/0x210
[    4.747314]  platform_probe+0x70/0xd0
[    4.750964]  really_probe+0xc4/0x2a8
[    4.754527]  __driver_probe_device+0x80/0x140
[    4.758870]  driver_probe_device+0x44/0x120
[    4.763040]  __device_attach_driver+0xc0/0x108
[    4.767470]  bus_for_each_drv+0x8c/0xf0
[    4.771294]  __device_attach+0xa4/0x198
[    4.775117]  device_initial_probe+0x1c/0x30
[    4.779286]  bus_probe_device+0xb4/0xc0
[    4.783109]  deferred_probe_work_func+0xb0/0x100
[    4.787714]  process_one_work+0x18c/0x420
[    4.791712]  worker_thread+0x30c/0x418
[    4.795449]  kthread+0x128/0x138
[    4.798665]  ret_from_fork+0x10/0x20
[    4.802229] ---[ end trace 0000000000000000 ]---

Fixes: fd620fc25d ("drm/mediatek: Switch to for_each_child_of_node_scoped()")
Cc: stable@vger.kernel.org
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/lkml/Z0lNHdwQ3rODHQ2c@sashalap/T/#mfaa6343cfd4d59aae5912b095c0693c0553e746c
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241223151218.7958-1-chunkuang.hu@kernel.org/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-30 13:21:16 +00:00
Arnd Bergmann
924d66011f drm/mediatek: stop selecting foreign drivers
The PHY portion of the mediatek hdmi driver was originally part of
the driver it self and later split out into drivers/phy, which a
'select' to keep the prior behavior.

However, this leads to build failures when the PHY driver cannot
be built:

WARNING: unmet direct dependencies detected for PHY_MTK_HDMI
  Depends on [n]: (ARCH_MEDIATEK || COMPILE_TEST [=y]) && COMMON_CLK [=y] && OF [=y] && REGULATOR [=n]
  Selected by [m]:
  - DRM_MEDIATEK_HDMI [=m] && HAS_IOMEM [=y] && DRM [=m] && DRM_MEDIATEK [=m]
ERROR: modpost: "devm_regulator_register" [drivers/phy/mediatek/phy-mtk-hdmi-drv.ko] undefined!
ERROR: modpost: "rdev_get_drvdata" [drivers/phy/mediatek/phy-mtk-hdmi-drv.ko] undefined!

The best option here is to just not select the phy driver and leave that
up to the defconfig. Do the same for the other PHY and memory drivers
selected here as well for consistency.

Fixes: a481bf2f0c ("drm/mediatek: Separate mtk_hdmi_phy to an independent module")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241218085837.2670434-1-arnd@kernel.org/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-25 15:29:57 +00:00
Jason-JH.Lin
5c9d7e79ba drm/mediatek: Add support for 180-degree rotation in the display driver
mediatek-drm driver reported the capability of 180-degree rotation by
adding `DRM_MODE_ROTATE_180` to the plane property, as flip-x combined
with flip-y equals a 180-degree rotation. However, we did not handle
the rotation property in the driver and lead to rotation issues.

Fixes: 74608d8fee ("drm/mediatek: Add DRM_MODE_ROTATE_0 to rotation property")
Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241118025126.30808-1-jason-jh.lin@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-25 15:27:32 +00:00
Daniel Golle
f8d9b91739 drm/mediatek: Only touch DISP_REG_OVL_PITCH_MSB if AFBC is supported
Touching DISP_REG_OVL_PITCH_MSB leads to video overlay on MT2701, MT7623N
and probably other older SoCs being broken.

Move setting up AFBC layer configuration into a separate function only
being called on hardware which actually supports AFBC which restores the
behavior as it was before commit c410fa9b07 ("drm/mediatek: Add AFBC
support to Mediatek DRM driver") on non-AFBC hardware.

Fixes: c410fa9b07 ("drm/mediatek: Add AFBC support to Mediatek DRM driver")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: c7fbd3c3e6.1734397800.git.daniel@makrotopia.org/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-25 15:19:27 +00:00
Jason-JH.Lin
da03801ad0 drm/mediatek: Move mtk_crtc_finish_page_flip() to ddp_cmdq_cb()
mtk_crtc_finish_page_flip() is used to notify userspace that a
page flip has been completed, allowing userspace to free the frame
buffer of the last frame and commit the next frame.

In MediaTek's hardware design for configuring display hardware by using
GCE, `DRM_EVENT_FLIP_COMPLETE` should be notified to userspace after
GCE has finished configuring all display hardware settings for each
atomic_commit().

Currently, mtk_crtc_finish_page_flip() cannot guarantee that GCE has
configured all the display hardware settings of the last frame.
Therefore, to increase the accuracy of the timing for notifying
`DRM_EVENT_FLIP_COMPLETE` to userspace, mtk_crtc_finish_page_flip()
should be moved to ddp_cmdq_cb().

Fixes: 7f82d9c438 ("drm/mediatek: Clear pending flag when cmdq packet is done")
Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241211034716.29241-1-jason-jh.lin@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-25 15:18:55 +00:00
Guoqing Jiang
36684e9d88 drm/mediatek: Set private->all_drm_private[i]->drm to NULL if mtk_drm_bind returns err
The pointer need to be set to NULL, otherwise KASAN complains about
use-after-free. Because in mtk_drm_bind, all private's drm are set
as follows.

private->all_drm_private[i]->drm = drm;

And drm will be released by drm_dev_put in case mtk_drm_kms_init returns
failure. However, the shutdown path still accesses the previous allocated
memory in drm_atomic_helper_shutdown.

[   84.874820] watchdog: watchdog0: watchdog did not stop!
[   86.512054] ==================================================================
[   86.513162] BUG: KASAN: use-after-free in drm_atomic_helper_shutdown+0x33c/0x378
[   86.514258] Read of size 8 at addr ffff0000d46fc068 by task shutdown/1
[   86.515213]
[   86.515455] CPU: 1 UID: 0 PID: 1 Comm: shutdown Not tainted 6.13.0-rc1-mtk+gfa1a78e5d24b-dirty #55
[   86.516752] Hardware name: Unknown Product/Unknown Product, BIOS 2022.10 10/01/2022
[   86.517960] Call trace:
[   86.518333]  show_stack+0x20/0x38 (C)
[   86.518891]  dump_stack_lvl+0x90/0xd0
[   86.519443]  print_report+0xf8/0x5b0
[   86.519985]  kasan_report+0xb4/0x100
[   86.520526]  __asan_report_load8_noabort+0x20/0x30
[   86.521240]  drm_atomic_helper_shutdown+0x33c/0x378
[   86.521966]  mtk_drm_shutdown+0x54/0x80
[   86.522546]  platform_shutdown+0x64/0x90
[   86.523137]  device_shutdown+0x260/0x5b8
[   86.523728]  kernel_restart+0x78/0xf0
[   86.524282]  __do_sys_reboot+0x258/0x2f0
[   86.524871]  __arm64_sys_reboot+0x90/0xd8
[   86.525473]  invoke_syscall+0x74/0x268
[   86.526041]  el0_svc_common.constprop.0+0xb0/0x240
[   86.526751]  do_el0_svc+0x4c/0x70
[   86.527251]  el0_svc+0x4c/0xc0
[   86.527719]  el0t_64_sync_handler+0x144/0x168
[   86.528367]  el0t_64_sync+0x198/0x1a0
[   86.528920]
[   86.529157] The buggy address belongs to the physical page:
[   86.529972] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0000d46fd4d0 pfn:0x1146fc
[   86.531319] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff)
[   86.532267] raw: 0bfffc0000000000 0000000000000000 dead000000000122 0000000000000000
[   86.533390] raw: ffff0000d46fd4d0 0000000000000000 00000000ffffffff 0000000000000000
[   86.534511] page dumped because: kasan: bad access detected
[   86.535323]
[   86.535559] Memory state around the buggy address:
[   86.536265]  ffff0000d46fbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   86.537314]  ffff0000d46fbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   86.538363] >ffff0000d46fc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   86.544733]                                                           ^
[   86.551057]  ffff0000d46fc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   86.557510]  ffff0000d46fc100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   86.563928] ==================================================================
[   86.571093] Disabling lock debugging due to kernel taint
[   86.577642] Unable to handle kernel paging request at virtual address e0e9c0920000000b
[   86.581834] KASAN: maybe wild-memory-access in range [0x0752049000000058-0x075204900000005f]
...

Fixes: 1ef7ed4835 ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support")
Signed-off-by: Guoqing Jiang <guoqing.jiang@canonical.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241223023227.1258112-1-guoqing.jiang@canonical.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-25 15:18:55 +00:00
Chun-Kuang Hu
d08555758f Revert "drm/mediatek: dsi: Correct calculation formula of PHY Timing"
This reverts commit 417d8c4727.

With that patch the panel in the Tentacruel ASUS Chromebook CM14
(CM1402F) flickers. There are 1 or 2 times per second a black panel.
Stable Kernel 6.11.5 and mainline 6.12-rc4 works only when reverse
that patch.

Fixes: 417d8c4727 ("drm/mediatek: dsi: Correct calculation formula of PHY Timing")
Cc: stable@vger.kernel.org
Cc: Shuijing Li <shuijing.li@mediatek.com>
Reported-by: Jens Ziller <zillerbaer@gmx.de>
Closes: https://patchwork.kernel.org/project/dri-devel/patch/20240412031208.30688-1-shuijing.li@mediatek.com/
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241212001908.6056-1-chunkuang.hu@kernel.org/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2024-12-23 14:34:12 +00:00
Linus Torvalds
2ba9f676d0 Merge tag 'drm-next-2024-11-29' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Merge window fixes, mostly amdgpu and xe, with a few other minor ones,
  all looks fairly normal,

  i915:
   - hdcp: Fix when the first read and write are retried

  xe:
   - Wake up waiters after wait condition set to true
   - Mark the preempt fence workqueue as reclaim
   - Update xe2 graphics name string
   - Fix a couple of guc submit races
   - Fix pat index usage in migrate
   - Ensure non-cached migrate pagetable bo mappings
   - Take a PM ref in the delayed snapshot capture worker

  amdgpu:
   - SMU 13.0.6 fixes
   - XGMI fixes
   - SMU 13.0.7 fixes
   - Misc code cleanups
   - Plane refcount fixes
   - DCN 4.0.1 fixes
   - DC power fixes
   - DTO fixes
   - NBIO 7.11 fixes
   - SMU 14.0.x fixes
   - Reset fixes
   - Enable DC on LoongArch
   - Sysfs hotplug warning fix
   - Misc small fixes
   - VCN 4.0.3 fix
   - Slab usage fix
   - Jpeg delayed work fix

  amdkfd:
   - wptr handling fixes

  radeon:
   - Use ttm_bo_move_null()
   - Constify struct pci_device_id
   - Fix spurious hotplug
   - HPD fix

  rockchip
   - fix 32-bit build"

* tag 'drm-next-2024-11-29' of https://gitlab.freedesktop.org/drm/kernel: (48 commits)
  drm/xe: Take PM ref in delayed snapshot capture worker
  drm/xe/migrate: use XE_BO_FLAG_PAGETABLE
  drm/xe/migrate: fix pat index usage
  drm/xe/guc_submit: fix race around suspend_pending
  drm/xe/guc_submit: fix race around pending_disable
  drm/xe: Update xe2_graphics name string
  drm/rockchip: avoid 64-bit division
  Revert "drm/radeon: Delay Connector detecting when HPD singals is unstable"
  drm/amdgpu/jpeg: cancel the jpeg worker
  drm/amdgpu: fix usage slab after free
  drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3
  drm/amdgpu: Fix sysfs warning when hotplugging
  drm/amdgpu: Add sysfs interface for vcn reset mask
  drm/amdgpu/gmc7: fix wait_for_idle callers
  drm/amd/pm: Remove arcturus min power limit
  drm/amd/pm: skip setting the power source on smu v14.0.2/3
  drm/amd/pm: disable pcie speed switching on Intel platform for smu v14.0.2/3
  drm/amdkfd: Use the correct wptr size
  drm/xe: Mark preempt fence workqueue as reclaim
  drm/xe/ufence: Wake up waiters after setting ufence->signalled
  ...
2024-11-29 13:06:06 -08:00
Linus Torvalds
55cb93fd24 Merge tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here is a small set of driver core changes for 6.13-rc1.

  Nothing major for this merge cycle, except for the two simple merge
  conflicts are here just to make life interesting.

  Included in here are:

   - sysfs core changes and preparations for more sysfs api cleanups
     that can come through all driver trees after -rc1 is out

   - fw_devlink fixes based on many reports and debugging sessions

   - list_for_each_reverse() removal, no one was using it!

   - last-minute seq_printf() format string bug found and fixed in many
     drivers all at once.

   - minor bugfixes and changes full details in the shortlog"

* tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits)
  Fix a potential abuse of seq_printf() format string in drivers
  cpu: Remove spurious NULL in attribute_group definition
  s390/con3215: Remove spurious NULL in attribute_group definition
  perf: arm-ni: Remove spurious NULL in attribute_group definition
  driver core: Constify bin_attribute definitions
  sysfs: attribute_group: allow registration of const bin_attribute
  firmware_loader: Fix possible resource leak in fw_log_firmware_info()
  drivers: core: fw_devlink: Fix excess parameter description in docstring
  driver core: class: Correct WARN() message in APIs class_(for_each|find)_device()
  cacheinfo: Use of_property_present() for non-boolean properties
  cdx: Fix cdx_mmap_resource() after constifying attr in ->mmap()
  drivers: core: fw_devlink: Make the error message a bit more useful
  phy: tegra: xusb: Set fwnode for xusb port devices
  drm: display: Set fwnode for aux bus devices
  driver core: fw_devlink: Stop trying to optimize cycle detection logic
  driver core: Constify attribute arguments of binary attributes
  sysfs: bin_attribute: add const read/write callback variants
  sysfs: implement all BIN_ATTR_* macros in terms of __BIN_ATTR()
  sysfs: treewide: constify attribute callback of bin_attribute::llseek()
  sysfs: treewide: constify attribute callback of bin_attribute::mmap()
  ...
2024-11-29 11:43:29 -08:00
Dave Airlie
9794b89c50 Merge tag 'drm-xe-next-fixes-2024-11-28' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Driver Changes:
- Update xe2 graphics name string (Matt Roper)
- Fix a couple of guc submit races (Matt Auld)
- Fix pat index usage in migrate (Matt Auld)
- Ensure non-cached migrate pagetable bo mappings (Matt Auld)
- Take a PM ref in the delayed snapshot capture worker (Matt Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z0iOjKwEGVo_DmgY@fedora
2024-11-29 04:59:28 +10:00
Dave Airlie
c54fdcc57b Merge tag 'drm-misc-next-fixes-2024-11-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
A single buildfix for 32-bits rockchip compilation.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1f91eeaa-d3e4-4eca-9375-24c467f6976d@linux.intel.com
2024-11-29 04:49:44 +10:00
Matthew Brost
aef0b4a072 drm/xe: Take PM ref in delayed snapshot capture worker
The delayed snapshot capture worker can access the GPU or VRAM both of
which require a PM reference. Take a reference in this worker.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: 4f04d07c0a ("drm/xe: Faster devcoredump")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241126174615.2665852-5-matthew.brost@intel.com
(cherry picked from commit 1c6878af11)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-11-28 15:22:36 +01:00
Matthew Auld
c78f439918 drm/xe/migrate: use XE_BO_FLAG_PAGETABLE
On some HW we want to avoid the host caching PTEs, since access from GPU
side can be incoherent. However here the special migrate object is
mapping PTEs which are written from the host and potentially cached. Use
XE_BO_FLAG_PAGETABLE to ensure that non-cached mapping is used, on
platforms where this matters.

Fixes: 7a060d786c ("drm/xe/mtl: Map PPGTT as CPU:WC")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241126181259.159713-4-matthew.auld@intel.com
(cherry picked from commit febc689b27)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-11-28 15:22:36 +01:00
Matthew Auld
23346f8516 drm/xe/migrate: fix pat index usage
XE_CACHE_WB must be converted into the per-platform pat index for that
particular caching mode, otherwise we are just encoding whatever happens
to be the value of that enum.

Fixes: e8babb280b ("drm/xe: Convert multiple bind ops into single job")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241126181259.159713-3-matthew.auld@intel.com
(cherry picked from commit f3dc9246f9)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-11-28 15:22:36 +01:00
Matthew Auld
87651f31ae drm/xe/guc_submit: fix race around suspend_pending
Currently in some testcases we can trigger:

xe 0000:03:00.0: [drm] Assertion `exec_queue_destroyed(q)` failed!
....
WARNING: CPU: 18 PID: 2640 at drivers/gpu/drm/xe/xe_guc_submit.c:1826 xe_guc_sched_done_handler+0xa54/0xef0 [xe]
xe 0000:03:00.0: [drm] *ERROR* GT1: DEREGISTER_DONE: Unexpected engine state 0x00a1, guc_id=57

Looking at a snippet of corresponding ftrace for this GuC id we can see:

162.673311: xe_sched_msg_add:     dev=0000:03:00.0, gt=1 guc_id=57, opcode=3
162.673317: xe_sched_msg_recv:    dev=0000:03:00.0, gt=1 guc_id=57, opcode=3
162.673319: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0
162.674089: xe_exec_queue_kill:   dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0
162.674108: xe_exec_queue_close:  dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0
162.674488: xe_exec_queue_scheduling_done: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0
162.678452: xe_exec_queue_deregister: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa1, flags=0x0

It looks like we try to suspend the queue (opcode=3), setting
suspend_pending and triggering a disable_scheduling. The user then
closes the queue. However the close will also forcefully signal the
suspend fence after killing the queue, later when the G2H response for
disable_scheduling comes back we have now cleared suspend_pending when
signalling the suspend fence, so the disable_scheduling now incorrectly
tries to also deregister the queue. This leads to warnings since the queue
has yet to even be marked for destruction. We also seem to trigger
errors later with trying to double unregister the same queue.

To fix this tweak the ordering when handling the response to ensure we
don't race with a disable_scheduling that didn't actually intend to
perform an unregister.  The destruction path should now also correctly
wait for any pending_disable before marking as destroyed.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3371
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241122161914.321263-6-matthew.auld@intel.com
(cherry picked from commit f161809b36)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-11-28 15:22:36 +01:00
Matthew Auld
6965f91a00 drm/xe/guc_submit: fix race around pending_disable
Currently in some testcases we can trigger:

[drm] *ERROR* GT0: SCHED_DONE: Unexpected engine state 0x02b1, guc_id=8, runnable_state=0
[drm] *ERROR* GT0: G2H action 0x1002 failed (-EPROTO) len 3 msg 02 10 00 90 08 00 00 00 00 00 00 00

Looking at a snippet of corresponding ftrace for this GuC id we can see:

498.852891: xe_sched_msg_add:     dev=0000:03:00.0, gt=0 guc_id=8, opcode=3
498.854083: xe_sched_msg_recv:    dev=0000:03:00.0, gt=0 guc_id=8, opcode=3
498.855389: xe_exec_queue_kill:   dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x3, flags=0x0
498.855436: xe_exec_queue_lr_cleanup: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x83, flags=0x0
498.856767: xe_exec_queue_close:  dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x83, flags=0x0
498.862889: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0xa9, flags=0x0
498.863032: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x2b9, flags=0x0
498.875596: xe_exec_queue_scheduling_done: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x2b9, flags=0x0
498.875604: xe_exec_queue_deregister: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x2b1, flags=0x0
499.074483: xe_exec_queue_deregister_done: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x2b1, flags=0x0

This looks to be the two scheduling_disable racing with each other, one
from the suspend (opcode=3) and then again during lr cleanup. While
those two operations are serialized, the G2H portion is not, therefore
when marking the queue as pending_disabled and then firing off the first
request, we proceed do the same again, however the first disable
response only fires after this which then clears the pending_disabled.
At this point the second comes back and is processed, however the
pending_disabled is no longer set, hence triggering the warning.

To fix this wait for pending_disabled when doing the lr cleanup and
calling disable_scheduling_deregister. Also do the same for all other
disable_scheduling callers.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3515
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <mattheq.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241122161914.321263-5-matthew.auld@intel.com
(cherry picked from commit ddb106d212)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-11-28 15:22:20 +01:00
Matt Roper
ece45026b0 drm/xe: Update xe2_graphics name string
Since both Xe2 and Xe3 platforms currently use the same set of graphics
IP feature flags, we associate the "graphics_xe2" structure with both IPs.
Update the name string on that IP structure to clarify this and avoid
confusion as Xe3 platforms start going into public CI.

Fixes: 800d75bf20 ("drm/xe/xe3: Define Xe3 feature flags")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241125194838.1190599-2-matthew.d.roper@intel.com
(cherry picked from commit 4fe70f664a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-11-28 15:07:37 +01:00
Dave Airlie
f2fdcd5868 Merge tag 'amd-drm-fixes-6.13-2024-11-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-6.13-2024-11-22:

amdgpu:
- SMU 13.0.6 fixes
- XGMI fixes
- SMU 13.0.7 fixes
- Misc code cleanups
- Plane refcount fixes
- DCN 4.0.1 fixes
- DC power fixes
- DTO fixes
- NBIO 7.11 fixes
- SMU 14.0.x fixes
- Reset fixes
- Enable DC on LoongArch
- Sysfs hotplug warning fix
- Misc small fixes
- VCN 4.0.3 fix
- Slab usage fix
- Jpeg delayed work fix

amdkfd:
- wptr handling fixes

radeon:
- Use ttm_bo_move_null()
- Constify struct pci_device_id
- Fix spurious hotplug
- HPD fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241122154441.636075-1-alexander.deucher@amd.com
2024-11-27 11:59:16 +10:00
Dave Airlie
b8126f24b4 Merge tag 'drm-xe-next-fixes-2024-11-21' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Driver Changes:
- Wake up waiters after wait condition set to true (Nirmoy Das)
- Mark the preempt fence workqueue as reclaim. (Matthew Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zz-MiVLFjOZQLrlc@fedora
2024-11-27 11:58:19 +10:00
Linus Torvalds
798bb342e0 Merge tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux
Pull rust updates from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Enable a series of lints, including safety-related ones, e.g. the
     compiler will now warn about missing safety comments, as well as
     unnecessary ones. How safety documentation is organized is a
     frequent source of review comments, thus having the compiler guide
     new developers on where they are expected (and where not) is very
     nice.

   - Start using '#[expect]': an interesting feature in Rust (stabilized
     in 1.81.0) that makes the compiler warn if an expected warning was
     _not_ emitted. This is useful to avoid forgetting cleaning up
     locally ignored diagnostics ('#[allow]'s).

   - Introduce '.clippy.toml' configuration file for Clippy, the Rust
     linter, which will allow us to tweak its behaviour. For instance,
     our first use cases are declaring a disallowed macro and, more
     importantly, enabling the checking of private items.

   - Lints-related fixes and cleanups related to the items above.

   - Migrate from 'receiver_trait' to 'arbitrary_self_types': to get the
     kernel into stable Rust, one of the major pieces of the puzzle is
     the support to write custom types that can be used as 'self', i.e.
     as receivers, since the kernel needs to write types such as 'Arc'
     that common userspace Rust would not. 'arbitrary_self_types' has
     been accepted to become stable, and this is one of the steps
     required to get there.

   - Remove usage of the 'new_uninit' unstable feature.

   - Use custom C FFI types. Includes a new 'ffi' crate to contain our
     custom mapping, instead of using the standard library 'core::ffi'
     one. The actual remapping will be introduced in a later cycle.

   - Map '__kernel_{size_t,ssize_t,ptrdiff_t}' to 'usize'/'isize'
     instead of 32/64-bit integers.

   - Fix 'size_t' in bindgen generated prototypes of C builtins.

   - Warn on bindgen < 0.69.5 and libclang >= 19.1 due to a double issue
     in the projects, which we managed to trigger with the upcoming
     tracepoint support. It includes a build test since some
     distributions backported the fix (e.g. Debian -- thanks!). All
     major distributions we list should be now OK except Ubuntu non-LTS.

  'macros' crate:

   - Adapt the build system to be able run the doctests there too; and
     clean up and enable the corresponding doctests.

  'kernel' crate:

   - Add 'alloc' module with generic kernel allocator support and remove
     the dependency on the Rust standard library 'alloc' and the
     extension traits we used to provide fallible methods with flags.

     Add the 'Allocator' trait and its implementations '{K,V,KV}malloc'.
     Add the 'Box' type (a heap allocation for a single value of type
     'T' that is also generic over an allocator and considers the
     kernel's GFP flags) and its shorthand aliases '{K,V,KV}Box'. Add
     'ArrayLayout' type. Add 'Vec' (a contiguous growable array type)
     and its shorthand aliases '{K,V,KV}Vec', including iterator
     support.

     For instance, now we may write code such as:

         let mut v = KVec::new();
         v.push(1, GFP_KERNEL)?;
         assert_eq!(&v, &[1]);

     Treewide, move as well old users to these new types.

   - 'sync' module: add global lock support, including the
     'GlobalLockBackend' trait; the 'Global{Lock,Guard,LockedBy}' types
     and the 'global_lock!' macro. Add the 'Lock::try_lock' method.

   - 'error' module: optimize 'Error' type to use 'NonZeroI32' and make
     conversion functions public.

   - 'page' module: add 'page_align' function.

   - Add 'transmute' module with the existing 'FromBytes' and 'AsBytes'
     traits.

   - 'block::mq::request' module: improve rendered documentation.

   - 'types' module: extend 'Opaque' type documentation and add simple
     examples for the 'Either' types.

  drm/panic:

   - Clean up a series of Clippy warnings.

  Documentation:

   - Add coding guidelines for lints and the '#[expect]' feature.

   - Add Ubuntu to the list of distributions in the Quick Start guide.

  MAINTAINERS:

   - Add Danilo Krummrich as maintainer of the new 'alloc' module.

  And a few other small cleanups and fixes"

* tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux: (82 commits)
  rust: alloc: Fix `ArrayLayout` allocations
  docs: rust: remove spurious item in `expect` list
  rust: allow `clippy::needless_lifetimes`
  rust: warn on bindgen < 0.69.5 and libclang >= 19.1
  rust: use custom FFI integer types
  rust: map `__kernel_size_t` and friends also to usize/isize
  rust: fix size_t in bindgen prototypes of C builtins
  rust: sync: add global lock support
  rust: macros: enable the rest of the tests
  rust: macros: enable paste! use from macro_rules!
  rust: enable macros::module! tests
  rust: kbuild: expand rusttest target for macros
  rust: types: extend `Opaque` documentation
  rust: block: fix formatting of `kernel::block::mq::request` module
  rust: macros: fix documentation of the paste! macro
  rust: kernel: fix THIS_MODULE header path in ThisModule doc comment
  rust: page: add Rust version of PAGE_ALIGN
  rust: helpers: remove unnecessary header includes
  rust: exports: improve grammar in commentary
  drm/panic: allow verbose version check
  ...
2024-11-26 14:00:26 -08:00
Linus Torvalds
f5f4745a7f Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - The series "resource: A couple of cleanups" from Andy Shevchenko
   performs some cleanups in the resource management code

 - The series "Improve the copy of task comm" from Yafang Shao addresses
   possible race-induced overflows in the management of
   task_struct.comm[]

 - The series "Remove unnecessary header includes from
   {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
   small fix to the list_sort library code and to its selftest

 - The series "Enhance min heap API with non-inline functions and
   optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
   min_heap library code

 - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
   finishes off nilfs2's folioification

 - The series "add detect count for hung tasks" from Lance Yang adds
   more userspace visibility into the hung-task detector's activity

 - Apart from that, singelton patches in many places - please see the
   individual changelogs for details

* tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  gdb: lx-symbols: do not error out on monolithic build
  kernel/reboot: replace sprintf() with sysfs_emit()
  lib: util_macros_kunit: add kunit test for util_macros.h
  util_macros.h: fix/rework find_closest() macros
  Improve consistency of '#error' directive messages
  ocfs2: fix uninitialized value in ocfs2_file_read_iter()
  hung_task: add docs for hung_task_detect_count
  hung_task: add detect count for hung tasks
  dma-buf: use atomic64_inc_return() in dma_buf_getfile()
  fs/proc/kcore.c: fix coccinelle reported ERROR instances
  resource: avoid unnecessary resource tree walking in __region_intersects()
  ocfs2: remove unused errmsg function and table
  ocfs2: cluster: fix a typo
  lib/scatterlist: use sg_phys() helper
  checkpatch: always parse orig_commit in fixes tag
  nilfs2: convert metadata aops from writepage to writepages
  nilfs2: convert nilfs_recovery_copy_block() to take a folio
  nilfs2: convert nilfs_page_count_clean_buffers() to take a folio
  nilfs2: remove nilfs_writepage
  nilfs2: convert checkpoint file to be folio-based
  ...
2024-11-25 16:09:48 -08:00
Arnd Bergmann
818956c765 drm/rockchip: avoid 64-bit division
Dividing a 64-bit integer prevents building this for 32-bit targets:

ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/rockchip/rockchipdrm.ko] undefined!

As this function is not performance criticial, just Use the div_u64() helper.

Fixes: 128a9bf8ac ("drm/rockchip: Add basic RK3588 HDMI output support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20241018151016.3496613-1-arnd@kernel.org
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
(cherry picked from commit 4b64b4a81f)
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-11-22 10:11:39 +01:00
Linus Torvalds
28eb75e178 Merge tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
 "There's a lot of rework, the panic helper support is being added to
  more drivers, v3d gets support for HW superpages, scheduler
  documentation, drm client and video aperture reworks, some new
  MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel
  has some Pantherlake enablement and xe is getting some SRIOV bits, but
  just lots of stuff everywhere.

  core:
   - split DSC helpers from DP helpers
   - clang build fixes for drm/mm test
   - drop simple pipeline support for gem vram
   - document submission error signaling
   - move drm_rect to drm core module from kms helper
   - add default client setup to most drivers
   - move to video aperture helpers instead of drm ones

  tests:
   - new framebuffer tests

  ttm:
   - remove swapped and pinned BOs from TTM lru

  panic:
   - fix uninit spinlock
   - add ABGR2101010 support

  bridge:
   - add TI TDP158 support
   - use standard PM OPS

  dma-fence:
   - use read_trylock instead of read_lock to help lockdep

  scheduler:
   - add errno to sched start to report different errors
   - add locking to drm_sched_entity_modify_sched
   - improve documentation

  xe:
   - add drm_line_printer
   - lots of refactoring
   - Enable Xe2 + PES disaggregation
   - add new ARL PCI ID
   - SRIOV development work
   - fix exec unnecessary implicit fence
   - define and parse OA sync props
   - forcewake refactoring

  i915:
   - Enable BMG/LNL ultra joiner
   - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+
   - use DSB for plane/color mgmt
   - Arrow lake PCI IDs
   - lots of i915/xe display refactoring
   - enable PXP GuC autoteardown
   - Pantherlake (PTL) Xe3 LPD display enablement
   - Allow fastset HDR infoframe changes
   - write DP source OUI for non-eDP sinks
   - share PCI IDs between i915 and xe

  amdgpu:
   - SDMA queue reset support
   - SMU 13.0.6, JPEG 4.0.3 updates
   - Initial runtime repartitioning support
   - rework IP structs for multiple IP instances
   - Fetch EDID from _DDC if available
   - SMU13 zero rpm user control
   - lots of fixes/cleanups

  amdkfd:
   - Increase event FIFO size
   - add topology cap flag for per queue reset

  msm:
   - DPU:
      - SA8775P support
      - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support
      - Enable large framebuffer support
      - Drop MSM8998 and SDM845
   - DP:
      - SA8775P support
   - GPU:
      - a7xx preemption support
      - Adreno A663 support

  ast:
   - warn about unsupported TX chips

  ivpu:
   - add coredump
   - add pantherlake support

  rockchip:
   - 4K@60Hz display enablement
   - generate pll programming tables

  panthor:
   - add timestamp query API
   - add realtime group priority
   - add fdinfo support

  etnaviv:
   - improve handling of DMA address limits
   - improve GPU hangcheck

  exynos:
   - Decon Exynos7870 support

  mediatek:
   - add OF graph support

  omap:
   - locking fixes

  bochs:
   - convert to gem/shmem from simpledrm

  v3d:
   - support big/super pages
   - add gemfs

  vc4:
   - BCM2712 support refactoring
   - add YUV444 format support

  udmabuf:
   - folio related fixes

  nouveau:
   - add panic support on nv50+"

* tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel: (1583 commits)
  drm/xe/guc: Fix dereference before NULL check
  drm/amd: Fix initialization mistake for NBIO 7.7.0
  Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"
  drm/amd/display: Fix failure to read vram info due to static BP_RESULT
  drm/amdgpu: enable GTT fallback handling for dGPUs only
  drm/amd/amdgpu: limit single process inside MES
  drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X
  drm/amdgpu/mes12: correct kiq unmap latency
  drm/amdgpu: Support vcn and jpeg error info parsing
  drm/amd : Update MES API header file for v11 & v12
  drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling
  drm/amdkfd: change kfd process kref count at creation
  drm/amdgpu: Cleanup shift coding style
  drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data
  drm/amdgpu: Implement virt req_ras_err_count
  drm/amdgpu: VF Query RAS Caps from Host if supported
  drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry
  drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support
  drm/amd/display: 3.2.309
  drm/amd/display: Adjust VSDB parser for replay feature
  ...
2024-11-21 14:56:17 -08:00
Alex Deucher
979bfe291b Revert "drm/radeon: Delay Connector detecting when HPD singals is unstable"
This reverts commit 949658cb9b.

This causes a blank screen on boot.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3696
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shixiong Ou <oushixiong@kylinos.cn>
Cc: stable@vger.kernel.org
2024-11-21 15:56:23 -05:00
Alex Deucher
93df748737 drm/amdgpu/jpeg: cancel the jpeg worker
Looks like these got missed when jpeg was split from vcn.
Cancel the jpeg workers rather than vcn workers.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-21 15:56:23 -05:00
Vitaly Prosyak
b61badd20b drm/amdgpu: fix usage slab after free
[  +0.000021] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000027] Read of size 8 at addr ffff8881b8605f88 by task amd_pci_unplug/2147

[  +0.000023] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1
[  +0.000016] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[  +0.000016] Call Trace:
[  +0.000008]  <TASK>
[  +0.000009]  dump_stack_lvl+0x76/0xa0
[  +0.000017]  print_report+0xce/0x5f0
[  +0.000017]  ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000019]  ? srso_return_thunk+0x5/0x5f
[  +0.000015]  ? kasan_complete_mode_report_info+0x72/0x200
[  +0.000016]  ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000019]  kasan_report+0xbe/0x110
[  +0.000015]  ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000023]  __asan_report_load8_noabort+0x14/0x30
[  +0.000014]  drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[  +0.000020]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? __kasan_check_write+0x14/0x30
[  +0.000016]  ? __pfx_drm_sched_entity_flush+0x10/0x10 [gpu_sched]
[  +0.000020]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? __kasan_check_write+0x14/0x30
[  +0.000013]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? enable_work+0x124/0x220
[  +0.000015]  ? __pfx_enable_work+0x10/0x10
[  +0.000013]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? free_large_kmalloc+0x85/0xf0
[  +0.000016]  drm_sched_entity_destroy+0x18/0x30 [gpu_sched]
[  +0.000020]  amdgpu_vce_sw_fini+0x55/0x170 [amdgpu]
[  +0.000735]  ? __kasan_check_read+0x11/0x20
[  +0.000016]  vce_v4_0_sw_fini+0x80/0x110 [amdgpu]
[  +0.000726]  amdgpu_device_fini_sw+0x331/0xfc0 [amdgpu]
[  +0.000679]  ? mutex_unlock+0x80/0xe0
[  +0.000017]  ? __pfx_amdgpu_device_fini_sw+0x10/0x10 [amdgpu]
[  +0.000662]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? __kasan_check_write+0x14/0x30
[  +0.000013]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? mutex_unlock+0x80/0xe0
[  +0.000016]  amdgpu_driver_release_kms+0x16/0x80 [amdgpu]
[  +0.000663]  drm_minor_release+0xc9/0x140 [drm]
[  +0.000081]  drm_release+0x1fd/0x390 [drm]
[  +0.000082]  __fput+0x36c/0xad0
[  +0.000018]  __fput_sync+0x3c/0x50
[  +0.000014]  __x64_sys_close+0x7d/0xe0
[  +0.000014]  x64_sys_call+0x1bc6/0x2680
[  +0.000014]  do_syscall_64+0x70/0x130
[  +0.000014]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? irqentry_exit_to_user_mode+0x60/0x190
[  +0.000015]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? irqentry_exit+0x43/0x50
[  +0.000012]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? exc_page_fault+0x7c/0x110
[  +0.000015]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0.000014] RIP: 0033:0x7ffff7b14f67
[  +0.000013] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff
[  +0.000026] RSP: 002b:00007fffffffe378 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  +0.000019] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67
[  +0.000014] RDX: 0000000000000000 RSI: 00007ffff7f6f47a RDI: 0000000000000003
[  +0.000014] RBP: 00007fffffffe3a0 R08: 0000555555569890 R09: 0000000000000000
[  +0.000014] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8
[  +0.000013] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040
[  +0.000020]  </TASK>

[  +0.000016] Allocated by task 383 on cpu 7 at 26.880319s:
[  +0.000014]  kasan_save_stack+0x28/0x60
[  +0.000008]  kasan_save_track+0x18/0x70
[  +0.000007]  kasan_save_alloc_info+0x38/0x60
[  +0.000007]  __kasan_kmalloc+0xc1/0xd0
[  +0.000007]  kmalloc_trace_noprof+0x180/0x380
[  +0.000007]  drm_sched_init+0x411/0xec0 [gpu_sched]
[  +0.000012]  amdgpu_device_init+0x695f/0xa610 [amdgpu]
[  +0.000658]  amdgpu_driver_load_kms+0x1a/0x120 [amdgpu]
[  +0.000662]  amdgpu_pci_probe+0x361/0xf30 [amdgpu]
[  +0.000651]  local_pci_probe+0xe7/0x1b0
[  +0.000009]  pci_device_probe+0x248/0x890
[  +0.000008]  really_probe+0x1fd/0x950
[  +0.000008]  __driver_probe_device+0x307/0x410
[  +0.000007]  driver_probe_device+0x4e/0x150
[  +0.000007]  __driver_attach+0x223/0x510
[  +0.000006]  bus_for_each_dev+0x102/0x1a0
[  +0.000007]  driver_attach+0x3d/0x60
[  +0.000006]  bus_add_driver+0x2ac/0x5f0
[  +0.000006]  driver_register+0x13d/0x490
[  +0.000008]  __pci_register_driver+0x1ee/0x2b0
[  +0.000007]  llc_sap_close+0xb0/0x160 [llc]
[  +0.000009]  do_one_initcall+0x9c/0x3e0
[  +0.000008]  do_init_module+0x241/0x760
[  +0.000008]  load_module+0x51ac/0x6c30
[  +0.000006]  __do_sys_init_module+0x234/0x270
[  +0.000007]  __x64_sys_init_module+0x73/0xc0
[  +0.000006]  x64_sys_call+0xe3/0x2680
[  +0.000006]  do_syscall_64+0x70/0x130
[  +0.000007]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  +0.000015] Freed by task 2147 on cpu 6 at 160.507651s:
[  +0.000013]  kasan_save_stack+0x28/0x60
[  +0.000007]  kasan_save_track+0x18/0x70
[  +0.000007]  kasan_save_free_info+0x3b/0x60
[  +0.000007]  poison_slab_object+0x115/0x1c0
[  +0.000007]  __kasan_slab_free+0x34/0x60
[  +0.000007]  kfree+0xfa/0x2f0
[  +0.000007]  drm_sched_fini+0x19d/0x410 [gpu_sched]
[  +0.000012]  amdgpu_fence_driver_sw_fini+0xc4/0x2f0 [amdgpu]
[  +0.000662]  amdgpu_device_fini_sw+0x77/0xfc0 [amdgpu]
[  +0.000653]  amdgpu_driver_release_kms+0x16/0x80 [amdgpu]
[  +0.000655]  drm_minor_release+0xc9/0x140 [drm]
[  +0.000071]  drm_release+0x1fd/0x390 [drm]
[  +0.000071]  __fput+0x36c/0xad0
[  +0.000008]  __fput_sync+0x3c/0x50
[  +0.000007]  __x64_sys_close+0x7d/0xe0
[  +0.000007]  x64_sys_call+0x1bc6/0x2680
[  +0.000007]  do_syscall_64+0x70/0x130
[  +0.000007]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  +0.000014] The buggy address belongs to the object at ffff8881b8605f80
               which belongs to the cache kmalloc-64 of size 64
[  +0.000020] The buggy address is located 8 bytes inside of
               freed 64-byte region [ffff8881b8605f80, ffff8881b8605fc0)

[  +0.000028] The buggy address belongs to the physical page:
[  +0.000011] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1b8605
[  +0.000008] anon flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[  +0.000007] page_type: 0xffffefff(slab)
[  +0.000009] raw: 0017ffffc0000000 ffff8881000428c0 0000000000000000 dead000000000001
[  +0.000006] raw: 0000000000000000 0000000000200020 00000001ffffefff 0000000000000000
[  +0.000006] page dumped because: kasan: bad access detected

[  +0.000012] Memory state around the buggy address:
[  +0.000011]  ffff8881b8605e80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[  +0.000015]  ffff8881b8605f00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[  +0.000015] >ffff8881b8605f80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[  +0.000013]                       ^
[  +0.000011]  ffff8881b8606000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
[  +0.000014]  ffff8881b8606080: fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb
[  +0.000013] ==================================================================

The issue reproduced on VG20 during the IGT pci_unplug test.
The root cause of the issue is that the function drm_sched_fini is called before drm_sched_entity_kill.
In drm_sched_fini, the drm_sched_rq structure is freed, but this structure is later accessed by
each entity within the run queue, leading to invalid memory access.
To resolve this, the order of cleanup calls is updated:

    Before:
        amdgpu_fence_driver_sw_fini
        amdgpu_device_ip_fini

    After:
        amdgpu_device_ip_fini
        amdgpu_fence_driver_sw_fini

This updated order ensures that all entities in the IPs are cleaned up first, followed by proper
cleanup of the schedulers.

Additional Investigation:

During debugging, another issue was identified in the amdgpu_vce_sw_fini function. The vce.vcpu_bo
buffer must be freed only as the final step in the cleanup process to prevent any premature
access during earlier cleanup stages.

v2: Using Christian suggestion call drm_sched_entity_destroy before drm_sched_fini.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-11-21 15:56:22 -05:00
Xiang Liu
928cd772e1 drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3
It is not necessarily corrupted. When there is RAS fatal error, device
memory access is blocked. Hence vcpu bo cannot be saved to system memory
as in a regular suspend sequence before going for reset. In other full
device reset cases, that gets saved and restored during resume.

v2: Remove redundant code like vcn_v4_0 did
v2: Refine commit message
v3: Drop the volatile
v3: Refine commit message

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-21 15:56:22 -05:00
Jesse.zhang@amd.com
2f1b13521d drm/amdgpu: Fix sysfs warning when hotplugging
Fix the similar warning when hotplugging:

[  155.585721] kernfs: can not remove 'enforce_isolation', no directory
[  155.592201] WARNING: CPU: 3 PID: 6960 at fs/kernfs/dir.c:1683 kernfs_remove_by_name_ns+0xb9/0xc0
[  155.601145] Modules linked in: xt_MASQUERADE xt_comment nft_compat veth bridge stp llc overlay nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr intel_rapl_msr amd_atl intel_rapl_common amd64_edac edac_mce_amd amdgpu kvm_amd kvm ipmi_ssif amdxcp rapl drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_ttm_helper ttm pcspkr drm_display_helper acpi_cpufreq drm_kms_helper video wmi k10temp i2c_piix4 acpi_ipmi ipmi_si drm zram ip_tables loop squashfs dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 sp5100_tco ixgbe rfkill ccp dca sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler fuse
[  155.685224] systemd-journald[1354]: Compressed data object 957 -> 524 using ZSTD
[  155.685687] CPU: 3 PID: 6960 Comm: amd_pci_unplug Not tainted 6.10.0-1148853.1.zuul.164395107d6642bdb451071313e9378d #1
[  155.704149] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019
[  155.712383] RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
[  155.717805] Code: a0 00 48 89 ef e8 37 96 c7 ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 f7 96 a0 00 0f 0b eb ab 48 c7 c7 48 ba 7e 8f e8 f7 66 bf ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[  155.736766] RSP: 0018:ffffb1685d7a3e20 EFLAGS: 00010296
[  155.742108] RAX: 0000000000000038 RBX: ffff929e94c80000 RCX: 0000000000000000
[  155.749363] RDX: ffff928e1efaf200 RSI: ffff928e1efa18c0 RDI: ffff928e1efa18c0
[  155.756612] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000003
[  155.763855] R10: ffffb1685d7a3cd8 R11: ffffffff8fb3e1c8 R12: ffffffffc1ef5341
[  155.771104] R13: ffff929e94cc5530 R14: 0000000000000000 R15: 0000000000000000
[  155.778357] FS:  00007fd9dd8d9c40(0000) GS:ffff928e1ef80000(0000) knlGS:0000000000000000
[  155.786594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.792450] CR2: 0000561245ceee38 CR3: 0000000113018000 CR4: 00000000003506f0
[  155.799702] Call Trace:
[  155.802254]  <TASK>
[  155.804460]  ? __warn+0x80/0x120
[  155.807798]  ? kernfs_remove_by_name_ns+0xb9/0xc0
[  155.812617]  ? report_bug+0x164/0x190
[  155.816393]  ? handle_bug+0x3c/0x80
[  155.819994]  ? exc_invalid_op+0x17/0x70
[  155.823939]  ? asm_exc_invalid_op+0x1a/0x20
[  155.828235]  ? kernfs_remove_by_name_ns+0xb9/0xc0
[  155.833058]  amdgpu_gfx_sysfs_fini+0x59/0xd0 [amdgpu]
[  155.838637]  gfx_v9_0_sw_fini+0x123/0x1c0 [amdgpu]
[  155.843887]  amdgpu_device_fini_sw+0xbc/0x3e0 [amdgpu]
[  155.849432]  amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
[  155.855235]  drm_dev_put.part.0+0x3c/0x60 [drm]
[  155.859914]  drm_release+0x8b/0xc0 [drm]
[  155.863978]  __fput+0xf1/0x2c0
[  155.867141]  __x64_sys_close+0x3c/0x80
[  155.870998]  do_syscall_64+0x64/0x170

V2: Add details in comments (Tim)

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reported-by: Andy Dong <andy.dong@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-21 15:56:22 -05:00
Jesse.zhang@amd.com
fb9898243a drm/amdgpu: Add sysfs interface for vcn reset mask
Add the sysfs interface for vcn:
vcn_reset_mask

The interface is read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.

V2: the sysfs node returns a text string instead of some flags (Christian)

V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
    and print the strings in the order they are applied (Christian)

    check amdgpu_gpu_recovery  before creating sysfs file itself,
    and initialize supported_reset_types in IP version files (Lijo)
v4: s/sdma/vcn/ in the reset mask setup

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-21 15:56:22 -05:00
Alex Deucher
4c28e645aa drm/amdgpu/gmc7: fix wait_for_idle callers
The wait_for_idle signature was changed, but the callers
were not.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reported-by: Michel Dänzer <michel@daenzer.net>
Fixes: 82ae6619a4 ("drm/amdgpu: update the handle ptr in wait_for_idle")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
2024-11-21 15:56:22 -05:00
Lijo Lazar
da868898cf drm/amd/pm: Remove arcturus min power limit
As per power team, there is no need to impose a lower bound on arcturus
power limit. Any unreasonable limit set will result in frequent
throttling.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-11-21 15:56:06 -05:00
Kenneth Feng
76c7f08094 drm/amd/pm: skip setting the power source on smu v14.0.2/3
skip setting power source on smu v14.0.2/3

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-11-21 15:55:58 -05:00
Kenneth Feng
b0df0e7778 drm/amd/pm: disable pcie speed switching on Intel platform for smu v14.0.2/3
disable pcie speed switching on Intel platform for smu v14.0.2/3
based on Intel's requirement.
v2: align the setting with smu v13.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-11-21 15:55:40 -05:00
Lijo Lazar
cdc6705f98 drm/amdkfd: Use the correct wptr size
Write pointer could be 32-bit or 64-bit. Use the correct size during
initialization.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-11-21 15:55:20 -05:00
Matthew Brost
ed31ba0aa7 drm/xe: Mark preempt fence workqueue as reclaim
Preempt fences are in the path of reclaim, and we signal these fences in
the preempt workqueue. With that, we need to mark the preempt fence
workqueue with reclaim so that this workqueue can make forward progress
during reclaim.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241113171751.1677784-1-matthew.brost@intel.com
(cherry picked from commit 15cf53ece4)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-11-21 17:16:38 +01:00
Nirmoy Das
37a1cf288e drm/xe/ufence: Wake up waiters after setting ufence->signalled
If a previous ufence is not signalled, vm_bind will return -EBUSY.
Delaying the modification of ufence->signalled can cause issues if the
UMD reuses the same ufence so update ufence->signalled before waking up
waiters.

Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3233
Fixes: 977e5b82e0 ("drm/xe: Expose user fence from xe_sync_entry")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241114150537.4161573-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
(cherry picked from commit 553a5d14fc)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-11-21 17:16:30 +01:00
Linus Torvalds
14d0e1a09f Merge tag 'soc-drivers-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
 "Nothing particular important in the SoC driver updates, just the usual
  improvements to for drivers/soc and a couple of subsystems that don't
  fit anywhere else:

   - The largest set of updates is for Qualcomm SoC drivers, extending
     the set of supported features for additional SoCs in the QSEECOM,
     LLCC and socinfo drivers.a

   - The ti_sci firmware driver gains support for power managment

   - The drivers/reset subsystem sees a rework of the microchip sparx5
     and amlogic reset drivers to support additional chips, plus a few
     minor updates on other platforms

   - The SCMI firmware interface driver gains support for two protocol
     extensions, allowing more flexible use of the shared memory area
     and new DT binding properties for configurability.

   - Mediatek SoC drivers gain support for power managment on the MT8188
     SoC and a new driver for DVFS.

   - The AMD/Xilinx ZynqMP SoC drivers gain support for system reboot
     and a few bugfixes

   - The Hisilicon Kunpeng HCCS driver gains support for configuring
     lanes through sysfs

  Finally, there are cleanups and minor fixes for drivers/{soc, bus,
  memory}, including changing back the .remove_new callback to .remove,
  as well as a few other updates for freescale (powerpc) soc drivers,
  NXP i.MX soc drivers, cznic turris platform driver, memory controller
  drviers, TI OMAP SoC drivers, and Tegra firmware drivers"

* tag 'soc-drivers-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (116 commits)
  soc: fsl: cpm1: qmc: Set the ret error code on platform_get_irq() failure
  soc: fsl: rcpm: fix missing of_node_put() in copy_ippdexpcr1_setting()
  soc: fsl: cpm1: tsa: switch to for_each_available_child_of_node_scoped()
  platform: cznic: turris-omnia-mcu: Rename variable holding GPIO line names
  platform: cznic: turris-omnia-mcu: Document the driver private data structure
  firmware: turris-mox-rwtm: Document the driver private data structure
  bus: Switch back to struct platform_driver::remove()
  soc: qcom: ice: Remove the device_link field in qcom_ice
  drm/msm/adreno: Setup SMMU aparture for per-process page table
  firmware: qcom: scm: Introduce CP_SMMU_APERTURE_ID
  firmware: arm_scpi: Check the DVFS OPP count returned by the firmware
  soc: qcom: socinfo: add IPQ5424/IPQ5404 SoC ID
  dt-bindings: arm: qcom,ids: add SoC ID for IPQ5424/IPQ5404
  soc: qcom: llcc: Flip the manual slice configuration condition
  dt-bindings: firmware: qcom,scm: Document sm8750 SCM
  firmware: qcom: uefisecapp: Allow X1E Devkit devices
  misc: lan966x_pci: Fix dtc warn 'Missing interrupt-parent'
  misc: lan966x_pci: Fix dtc warns 'missing or empty reg/ranges property'
  soc: qcom: llcc: Add LLCC configuration for the QCS8300 platform
  dt-bindings: cache: qcom,llcc: Document the QCS8300 LLCC
  ...
2024-11-20 15:40:54 -08:00
Linus Torvalds
79caa6c88a Merge tag 'asm-generic-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
 "These are a number of unrelated cleanups, generally simplifying the
  architecture specific header files:

   - A series from Al Viro simplifies asm/vga.h, after it turns out that
     most of it can be generalized.

   - A series from Julian Vetter adds a common version of
     memcpy_{to,from}io() and memset_io() and changes most architectures
     to use that instead of their own implementation

   - A series from Niklas Schnelle concludes his work to make PC style
     inb()/outb() optional

   - Nicolas Pitre contributes improvements for the generic do_div()
     helper

   - Christoph Hellwig adds a generic version of page_to_phys() and
     phys_to_page(), replacing the slightly different architecture
     specific definitions.

   - Uwe Kleine-Koenig has a minor cleanup for ioctl definitions"

* tag 'asm-generic-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (24 commits)
  empty include/asm-generic/vga.h
  sparc: get rid of asm/vga.h
  asm/vga.h: don't bother with scr_mem{cpy,move}v() unless we need to
  vt_buffer.h: get rid of dead code in default scr_...() instances
  tty: serial: export serial_8250_warn_need_ioport
  lib/iomem_copy: fix kerneldoc format style
  hexagon: simplify asm/io.h for !HAS_IOPORT
  loongarch: Use new fallback IO memcpy/memset
  csky: Use new fallback IO memcpy/memset
  arm64: Use new fallback IO memcpy/memset
  New implementation for IO memcpy and IO memset
  watchdog: Add HAS_IOPORT dependency for SBC8360 and SBC7240
  __arch_xprod64(): make __always_inline when optimizing for performance
  ARM: div64: improve __arch_xprod_64()
  asm-generic/div64: optimize/simplify __div64_const32()
  lib/math/test_div64: add some edge cases relevant to __div64_const32()
  asm-generic: add an optional pfn_valid check to page_to_phys
  asm-generic: provide generic page_to_phys and phys_to_page implementations
  asm-generic/io.h: Remove I/O port accessors for HAS_IOPORT=n
  tty: serial: handle HAS_IOPORT dependencies
  ...
2024-11-20 15:13:02 -08:00
Linus Torvalds
e6de688e93 Merge tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
 "Bindings:

   - Enable dtc "interrupt_provider" warnings for binding examples. Fix
     the warnings in fsl,mu-msi and ti,sci-inta due to this.

   - Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton, and
     altr,fpga-passive-serial to DT schema format

   - Add some documentation on the different forms of YAML text blocks
     which are a constant source of review comments

   - Fix some schema errors in constraints for arrays

   - Add compatibles for qcom,sar2130p-pdc and onnn,adt7462

  DT core:

   - Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n

   - Add some warnings on deprecated address handling

   - Rework early_init_dt_scan() so the arch can pass in the phys
     address of the DTB as __pa() is not always valid to use. This fixes
     a warning for arm64 with kexec.

   - Add and use some new DT graph iterators for iterating over ports
     and endpoints

   - Rework reserved-memory handling to be sized dynamically for fixed
     regions

   - Optimize of_modalias() to avoid a strlen() call

   - Constify struct device_node and property pointers where ever
     possible"

* tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (36 commits)
  of: Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
  dt-bindings: interrupt-controller: qcom,pdc: Add SAR2130P compatible
  of/address: Rework bus matching to avoid warnings
  of: WARN on deprecated #address-cells/#size-cells handling
  of/fdt: Don't use default address cell sizes for address translation
  dt-bindings: Enable dtc "interrupt_provider" warnings
  of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
  dt-bindings: cache: qcom,llcc: Fix X1E80100 reg entries
  dt-bindings: watchdog: convert zii,rave-sp-wdt.txt to yaml format
  dt-bindings: input: convert zii,rave-sp-pwrbutton.txt to yaml
  media: xilinx-tpg: use new of_graph functions
  fbdev: omapfb: use new of_graph functions
  gpu: drm: omapdrm: use new of_graph functions
  ASoC: audio-graph-card2: use new of_graph functions
  ASoC: audio-graph-card: use new of_graph functions
  ASoC: test-component: use new of_graph functions
  of: property: use new of_graph functions
  of: property: add of_graph_get_next_port_endpoint()
  of: property: add of_graph_get_next_port()
  of: module: remove strlen() call in of_modalias()
  ...
2024-11-20 13:19:25 -08:00
Linus Torvalds
75f2b37dd0 Merge tag 'pmdomain-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain updates from Ulf Hansson:
"pmdomain core:
   - Set the required dev for a required OPP during genpd attach
   - Add support for required OPPs to dev_pm_domain_attach_list()

  pmdomain providers:
   - ti: Enable GENPD_FLAG_ACTIVE_WAKEUP flag for ti_sci PM domains
   - mediatek: Add support for MT6735 PM domains
   - mediatek: Use OF-specific regulator API to get power domain supply
   - qcom: Add support for the SM8750/SAR2130P/qcs615/qcs8300 rpmhpds

  pmdomain consumers:
   - Convert a couple of consumer drivers to
     *_pm_domain_attach|detach_list()

  opp core:
   - Rework and cleanup some code that manages required OPPs
   - Remove *_opp_attach|detach_genpd()"

* tag 'pmdomain-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: (25 commits)
  pmdomain: qcom: rpmhpd: Add rpmhpd support for SM8750
  dt-bindings: power: qcom,rpmpd: document the SM8750 RPMh Power Domains
  pmdomain: imx: Use of_property_present() for non-boolean properties
  pmdomain: imx: gpcv2: replace dev_err() with dev_err_probe()
  pmdomain: ti-sci: Use scope based of_node_put() to simplify code.
  pmdomain: ti-sci: Add missing of_node_put() for args.np
  pmdomain: ti-sci: set the GENPD_FLAG_ACTIVE_WAKEUP flag for all PM domains
  pmdomain: mediatek: Add support for MT6735
  pmdomain: qcom: rpmhpd: add support for SAR2130P
  dt-bindings: power: Add binding for MediaTek MT6735 power controller
  dt-bindings: power: rpmpd: Add SAR2130P compatible
  OPP: Drop redundant *_opp_attach|detach_genpd()
  cpufreq: qcom-nvmem: Convert to dev_pm_domain_attach|detach_list()
  media: venus: Convert into devm_pm_domain_attach_list() for OPP PM domain
  drm/tegra: gr3d: Convert into devm_pm_domain_attach_list()
  OPP: Drop redundant code in _link_required_opps()
  pmdomain: core: Set the required dev for a required OPP during genpd attach
  pmdomain: core: Manage the default required OPP from a separate function
  PM: domains: Support required OPPs in dev_pm_domain_attach_list()
  OPP: Rework _set_required_devs() to manage a single device per call
  ...
2024-11-20 12:44:59 -08:00
Huacai Chen
4217ef9ab7 drm/amd/display: Allow building DC with clang on LoongArch
Clang on LoongArch (18+) appears to be unaffected by the bug causing
excessive stack usage in calculate_bandwidth(). But when building DC_FP
support the stack frame size can be as large as 2816 bytes, which causes
the FRAME_WARN build warnings. So on LoongArch we allow building DC with
clang, but disable DC_FP by default.

The help message is also updated.

Tested-by: Rui Wang <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20 10:03:06 -05:00
Zicheng Qu
2bc96c9507 drm/amd/display: Fix null check for pipe_ctx->plane_state in hwss_setup_dpp
This commit addresses a null pointer dereference issue in
hwss_setup_dpp(). The issue could occur when pipe_ctx->plane_state is
null. The fix adds a check to ensure `pipe_ctx->plane_state` is not null
before accessing. This prevents a null pointer dereference.

Fixes: 0baae62463 ("drm/amd/display: Refactor fast update to use new HWSS build sequence")
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20 10:03:06 -05:00
Zicheng Qu
6a057072dd drm/amd/display: Fix null check for pipe_ctx->plane_state in dcn20_program_pipe
This commit addresses a null pointer dereference issue in
dcn20_program_pipe(). Previously, commit 8e4ed3cf16 ("drm/amd/display:
Add null check for pipe_ctx->plane_state in dcn20_program_pipe")
partially fixed the null pointer dereference issue. However, in
dcn20_update_dchubp_dpp(), the variable pipe_ctx is passed in, and
plane_state is accessed again through pipe_ctx. Multiple if statements
directly call attributes of plane_state, leading to potential null
pointer dereference issues. This patch adds necessary null checks to
ensure stability.

Fixes: 8e4ed3cf16 ("drm/amd/display: Add null check for pipe_ctx->plane_state in dcn20_program_pipe")
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20 10:03:05 -05:00
Steven 'Steve' Kendall
7037bb0426 drm/radeon: Fix spurious unplug event on radeon HDMI
On several HP models (tested on HP 3125 and HP Probook 455 G2),
spurious unplug events are emitted upon login on Chrome OS.
This is likely due to the way Chrome OS restarts graphics
upon login, so it's possible it's an issue on other
distributions but not as common, though I haven't
reproduced the issue elsewhere.
Use logic from an earlier version of the merged change
(see link below) which iterates over connectors and finds
matching encoders, rather than the other way around.
Also fixes an issue with screen mirroring on Chrome OS.
I've deployed this patch on Fedora and did not observe
any regression on these devices.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1569#note_1603002
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3771
Fixes: 20ea34710f ("drm/radeon: Add HD-audio component notifier support (v6)")
Signed-off-by: Steven 'Steve' Kendall <skend@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-20 10:03:05 -05:00