This tag contains habanalabs driver changes for v6.18.
It continues the previous upstream work from tags/drm-habanalabs-next-2024-06-23,
Including improvements in debug and visibility, alongside general code cleanups,
and new features such as vmalloc-backed coherent mmap, HLDIO infrastructure, etc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: "Elbaz, Koby" <koby.elbaz@intel.com>
Link: https://lore.kernel.org/r/da02d370-9967-49d2-9eef-7aeaa40c987c@intel.com
The dce100_validate_global function was verbatim exactly the
same as dce60_validate_global and dce80_validate_global.
Share dce100_validate_global between DCE6-10 to save code size.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
DCE6-8 have very similar capabilities to DCE10, they support the
same DP and HDMI versions and work similarly.
Share dce100_validate_bandwidth between DCE6-10 to reduce code
duplication in the DC driver.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit fixes a potential race condition in the userqueue fence
signaling mechanism by replacing dma_fence_is_signaled_locked() with
dma_fence_is_signaled().
The issue occurred because:
1. dma_fence_is_signaled_locked() should only be used when holding
the fence's individual lock, not just the fence list lock
2. Using the locked variant without the proper fence lock could lead
to double-signaling scenarios:
- Hardware completion signals the fence
- Software path also tries to signal the same fence
By using dma_fence_is_signaled() instead, we properly handle the
locking hierarchy and avoid the race condition while still maintaining
the necessary synchronization through the fence_list_lock.
v2: drop the comment (Christian)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reject modes with a pixel clock higher than the maximum display
clock. Use 400 MHz as a fallback value when the maximum display
clock is not known. Pixel clocks that are higher than the display
clock just won't work and are not supported.
With the addition of the YUV422 fallback, DC can now accidentally
select a mode requiring higher pixel clock than actually supported
when the DP version supports the required bandwidth but the clock
is otherwise too high for the display engine. DCE 6-10 don't
support these modes but they don't have a bandwidth calculation
to reject them properly.
Fixes: db291ed173 ("drm/amd/display: Add fallback path for YCBCR422")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The main reason common modes are added is for compatibility with
clone mode when a laptop is connected to a projector or external
monitor. Since commit 978fa2f6d0 ("drm/amd/display: Use scaling
for non-native resolutions on eDP") when non-native modes are picked
for eDP the GPU scalar will be used. This is because it is inconsistent
whether eDP panels have the capability to actually drive non-native
resolutions. With panels connected to other connectors this limitation
generally doesn't exist as we the EDID will advertise support for a
number of resolutions and monitors will use built in scaling hardware.
Comparing DC and non-DC code paths the non-DC code path only adds
common modes for LVDS and eDP whereas the DC codepath does it for
all connector types.
In the past there was an experiment done to disable common mode adding
for eDP and LVDS from commit 6d396e7ac1 ("drm/amd/display: Disable
common modes for LVDS") and commit 7948afb46a ("drm/amd/display:
Disable common modes for eDP") but this was reverted in
commit a8b79b0918 ("drm/amd: Re-enable common modes for eDP and
LVDS") because it caused problems with Xorg.
[How]
Only add common modes for eDP and LVDS for DC, matching the behavior
of non-DC.
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250924161624.1975819-2-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Variable "i" has been redeclared as integer later in the function
which is wrong and not serving any purpose.
Fixes: 899fbde146 ("drm/amdgpu: replace get_user_pages with HMM mirror helpers")
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit e44a0fe630.
Initially we used VMID reservation to enforce isolation between
processes. That has now been replaced by proper fence handling.
Both OpenGL, RADV and ROCm developers requested a way to reserve a VMID
for SPM, so restore that approach by reverting back to only allowing a
single process to use the reserved VMID.
Only compile tested for now.
v2: use -ENOENT instead of -EINVAL if VMID is not available
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a fallback mechanism to attempt pipe reset when KCQ reset
fails to recover the ring. After performing the KCQ reset and
queue remapping, test the ring functionality. If the ring test
fails, initiate a pipe reset as an additional recovery step.
v2: fix the typo (Lijo)
v3: try pipeline reset when kiq mapping fails (Lijo)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On HL338 ASICs, the Infineon first‑stage firmware is not present and
the reported version is 0. In this case printing a version number is
misleading, as it suggests valid firmware when it does not exist.
Fix this by printing the first‑stage Infineon firmware version only
if the reported value is non‑zero. This avoids confusing or incorrect
log messages on devices where the first stage is not applicable.
Signed-off-by: Pavan S <pavan.sreenivas@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
Dirty state can occur when the host VM undergoes a reset while the
device does not. In such a case, the driver must reset the device before
it can be used again. As part of this reset, the device capabilities
are zeroed. Therefore, the driver must read the Preboot status again to
learn the Preboot state, capabilities, and security configuration.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
Add a new passthrough type HL_GET_P_STATE to the cpucp generic ioctl
to allow userspace to read the device performance state via firmware.
Signed-off-by: Ariel Aviad <ariel.aviad@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
Introduce NVMe Direct I/O (HLDIO) infrastructure to support
peer‑to‑peer DMA in the habanalabs driver. This adds internal helpers
and data structures to enable direct transfers between NVMe storage
and device memory.
The feature is built only when CONFIG_HL_HLDIO is enabled. A debugfs
interface is also provided for functional validation.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Farah Kassabri <farah.kassabri@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
When IOMMU is enabled, dma_alloc_coherent() with GFP_USER may return
addresses from the vmalloc range. If such an address is mapped without
VM_MIXEDMAP, vm_insert_page() will trigger a BUG_ON due to the
VM_PFNMAP restriction.
Fix this by checking for vmalloc addresses and setting VM_MIXEDMAP
in the VMA before mapping. This ensures safe mapping and avoids kernel
crashes. The memory is still driver-allocated and cannot be accessed
directly by userspace.
Signed-off-by: Moti Haimovski <moti.haimovski@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
The access_ok() API no longer requires the VERIFY_WRITE argument,
and the use of the old interface with VERIFY_WRITE is deprecated.
Clean up the habanalabs memory manager to use the modern access_ok()
interface consistently. This removes old #ifdef guards and aligns the
driver with current upstream kernel APIs.
Signed-off-by: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
After CPLD shutdown event the device is not usable anymore. The common
CPLD_SHUTDOWN event handler disables any subsequent device access.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
After a CPLD shutdown event the device becomes unusable. Prevent further
device access once this event is received.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
In hl_release_dmabuf(), ctx is dereferenced after calling hl_ctx_put()
to obtain the compute device file.
This is safe because the dma-buf object holds a file reference taken in
export_dmabuf(), and the file release (which drops another ctx reference)
can only happen after we drop that file reference via fput(). Thus, this
hl_ctx_put() call cannot be the last one at this point.
Add a comment explaining this to avoid confusion.
Signed-off-by: Tomer Tayar <tomer.tayar@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
Add infrastructure for logging the last configuration register accesses
that occur via debugfs read/write operations. At interrupt time, these
log entries can be dumped to dmesg, which helps in diagnosing the cause
of RAZWI and ADDR_DEC interrupts.
The logging is implemented as a ring buffer of access entries, with each
entry recording timestamp and access details. To ensure correctness
under concurrent access, operations are now protected using spinlocks.
Entries are copied under lock and then printed after releasing it, which
minimizes time spent in the critical section.
Signed-off-by: Sharley Calzolari <sharley.calzolari@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
Change the BMON_CR register value back to its original state before
enabling, so that BMON does not continue to collect information
after being disabled.
Signed-off-by: Vered Yavniely <vered.yavniely@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
EFAULT is currently returned if less than requested user pages are
pinned. This value means a "bad address" which might be confusing to
the user, as the address of the given user memory is not necessarily
"bad".
Modify the return value to ENOMEM, as "out of memory" is more suitable
in this case.
Signed-off-by: Tomer Tayar <tomer.tayar@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
This commit adds support for VCN reset functionality in SMU v13.0.12 by:
1. Adding two new PPSMC messages in smu_v13_0_12_ppsmc.h:
- PPSMC_MSG_ResetVCN (0x5E)
- Updates PPSMC_Message_Count to 0x5F to account for new messages
2. Adding message mapping for ResetVCN in smu_v13_0_12_ppt.c:
- Maps SMU_MSG_ResetVCN to PPSMC_MSG_ResetVCN
These changes enable proper VCN reset handling through the SMU firmware
interface for compatible AMD GPUs.
v2: Added fw version check to support vcn queue reset.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch moves the initialization of the VCN supported_reset mask from
sw_init to a new late_init function for VCN 5.0.1. The change ensures
that all necessary hardware and firmware initialization is complete
before determining the supported reset types.
Key changes:
- Added vcn_v5_0_1_late_init() function to handle late initialization
- Moved supported_reset mask setup from sw_init to late_init
- Added check for per-queue reset support via amdgpu_dpm_reset_vcn_is_supported()
- Updated ip_funcs to use the new late_init function
This change helps ensure proper reset behavior by waiting until all
dependencies are initialized before determining available reset types.
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Implement the ring reset callback for VCN v5.0.1 to properly handle
hardware recovery when encountering GPU hangs. The new functionality:
1. Adds vcn_v5_0_1_ring_reset() function that:
- Prepares for reset using amdgpu_ring_reset_helper_begin()
- Performs VCN instance reset via amdgpu_dpm_reset_vcn()
- Re-initializes hardware through vcn_v5_0_1_hw_init_inst()
- Restarts DPG mode with vcn_v5_0_1_start_dpg_mode()
- Completes reset with amdgpu_ring_reset_helper_end()
2. Hooks the reset function into the unified ring functions via:
- Adding .reset = vcn_v5_0_1_ring_reset to vcn_v5_0_1_unified_ring_vm_funcs
3. Maintains existing behavior for SR-IOV VF cases by checking RRMT status
This provides proper hardware recovery capabilities for VCN 5.0.1 IP block
during fault conditions, matching functionality available in other VCN versions.
v2: Remove the RRMT_ENABLED cap setting in the reset function
and replace adev->vcn.inst[ring->me].indirect_sram with vinst->indirect_sram (Lijo)
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Split the per-instance initialization code from vcn_v5_0_1_hw_init()
into a new vcn_v5_0_1_hw_init_inst() function. This improves code
organization by:
1. Separating the instance-specific initialization logic
2. Making the main init function more readable
3. Following the pattern used in queue reset
The SR-IOV specific initialization remains in the main function since
it has different requirements.
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove this flag as the driver stopped managing it individually since
commit a4056c2a63 ("drm/amd/display: use HW hdr mult for brightness
boost"). After some back and forth it was reintroduced as a condition to
`set_output_transfer_func()` in [1]. Without direct management, this
flag only changes value when all surface update flags are set true on
UPDATE_TYPE_FULL with no output TF status meaning.
Fixes: bb622e0c00 ("drm/amd/display: program output tf when required") [1]
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace the previous O(N^2) implementation of remove_duplicates() with
a O(N) version using a fast/slow pointer approach. The new version
keeps only the first occurrence of each element and compacts the array
in place, improving efficiency without changing functionality.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't update DC stream color components during atomic check. The driver
will continue validating the new CRTC color state but will not change DC
stream color components. The DC stream color state will only be
programmed at commit time in the `atomic_setup_commit` stage.
It fixes gamma LUT loss reported by KDE users when changing brightness
quickly or changing Display settings (such as overscan) with nightlight
on and HDR. As KWin can do a test commit with color settings different
from those that should be applied in a non-test-only commit, if the
driver changes DC stream color state in atomic check, this state can be
eventually HW programmed in commit tail, instead of the respective state
set by the non-blocking commit.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4444
Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Documentation/process/deprecated.rst recommends against the use of
kmalloc with dynamic size calculations due to the risk of overflow and
smaller allocation being made than the caller was expecting.
Replace kmalloc() with kmalloc_array() in amdgpu_amdkfd_gfx_v10.c,
amdgpu_amdkfd_gfx_v10_3.c, amdgpu_amdkfd_gfx_v11.c and
amdgpu_amdkfd_gfx_v12.c to make the intended allocation size clearer
and avoid potential overflow issues.
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rahul Kumar <rk0006818@gmail.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>