linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-21 23:16:50 +08:00

Author	SHA1	Message	Date
Matthew Brost	01f2557aa6	drm/xe: Open-code GGTT MMIO access protection GGTT MMIO access is currently protected by hotplug (drm_dev_enter), which works correctly when the driver loads successfully and is later unbound or unloaded. However, if driver load fails, this protection is insufficient because drm_dev_unplug() is never called. Additionally, devm release functions cannot guarantee that all BOs with GGTT mappings are destroyed before the GGTT MMIO region is removed, as some BOs may be freed asynchronously by worker threads. To address this, introduce an open-coded flag, protected by the GGTT lock, that guards GGTT MMIO access. The flag is cleared during the dev_fini_ggtt devm release function to ensure MMIO access is disabled once teardown begins. Cc: stable@vger.kernel.org Fixes: `919bb54e98` ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-8-zhanjun.dong@intel.com (cherry picked from commit 4f3a998a173b4325c2efd90bdadc6ccd3ad9a431) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 17:13:54 +01:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Matt Roper	8367585154	drm/xe: Cleanup unused header includes clangd reports many "unused header" warnings throughout the Xe driver. Start working to clean this up by removing unnecessary includes in our .c files and/or replacing them with explicit includes of other headers that were previously being included indirectly. By far the most common offender here was unnecessary inclusion of xe_gt.h. That likely originates from the early days of xe.ko when xe_mmio did not exist and all register accesses, including those unrelated to GTs, were done with GT functions. There's still a lot of additional #include cleanup that can be done in the headers themselves; that will come as a followup series. v2: - Squash the 79-patch series down to a single patch. (MattB) Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260115032803.4067824-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-01-15 07:05:04 -08:00
Maarten Lankhorst	987167b119	drm/xe: Privatize xe_ggtt_node Nothing requires it any more, make the member private. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-16-dev@lankhorst.se	2026-01-12 16:28:48 +01:00
Maarten Lankhorst	8d88aa149a	drm/xe: Improve xe_gt_sriov_pf_config GGTT handling Do not directly dereference xe_ggtt_node, and add a function to retrieve the allocated GGTT size. Reviewed-by: Matthew.brost@intel.com Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-15-dev@lankhorst.se	2026-01-12 16:28:48 +01:00
Maarten Lankhorst	c818b26515	drm/xe: Add xe_ggtt_node_addr() to avoid dereferencing xe_ggtt_node This function makes it possible to add an offset that is applied to all xe_ggtt_node's, and hides the internals from all its users. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108101014.579906-12-dev@lankhorst.se	2026-01-12 16:28:34 +01:00
Maarten Lankhorst	004311aa7d	drm/xe: Convert xe_fb_pin to use a callback for insertion into GGTT The rotation details belong in xe_fb_pin.c, while the operations involving GGTT belong to xe_ggtt.c. As directly locking xe_ggtt etc results in exposing all of xe_ggtt details anyway, create a special function that allocates a ggtt_node, and allow display to populate it using a callback as a compromise. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-11-dev@lankhorst.se	2026-01-12 16:28:25 +01:00
Maarten Lankhorst	22437f30d2	drm/xe: Start using ggtt->start in preparation of balloon removal Instead of having ggtt->size point to the end of ggtt, have ggtt->size be the actual size of the GGTT, and introduce ggtt->start to point to the beginning of GGTT. This will allow a massive cleanup of GGTT in case of SRIOV-VF. Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-10-dev@lankhorst.se	2026-01-12 16:27:33 +01:00
Marco Crivellari	aa39abc08e	drm/xe: fix WQ_MEM_RECLAIM passed as max_active to alloc_workqueue() Workqueue xe-ggtt-wq has been allocated using WQ_MEM_RECLAIM, but the flag has been passed as 3rd parameter (max_active) instead of 2nd (flags) creating the workqueue as per-cpu with max_active = 8 (the WQ_MEM_RECLAIM value). So change this by set WQ_MEM_RECLAIM as the 2nd parameter with a default max_active. Fixes: `60df57e496` ("drm/xe: Mark GGTT work queue with WQ_MEM_RECLAIM") Cc: stable@vger.kernel.org Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108180148.423062-1-marco.crivellari@suse.com	2026-01-08 14:33:29 -08:00
Matt Roper	8a579f4b24	drm/xe/ggtt: Use scope-based runtime pm Switch the GGTT code to scope-based runtime PM for consistency with other parts of the driver. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20251118164338.3572146-51-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-11-19 11:58:58 -08:00
Michał Winiarski	624ba6bfed	drm/xe/pf: Add helpers for VF GGTT migration data handling In an upcoming change, the VF GGTT migration data will be handled as part of VF control state machine. Add the necessary helpers to allow the migration data transfer to/from the HW GGTT resource. Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251112132220.516975-18-michal.winiarski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-11-13 11:48:20 +01:00
Matthew Brost	1f1314e8e7	drm/xe: Check return value of GGTT workqueue allocation Workqueue allocation can fail, so check the return value of the GGTT workqueue allocation and fail driver initialization if the allocation fails. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20251022005538.828980-2-matthew.brost@intel.com	2025-10-22 09:10:05 -07:00
Matt Roper	d0ff153cca	drm/xe: Check for primary GT before looking up Wa_22019338487 If the primary GT is disabled via configfs, we need to make sure that we don't search for this workaround on a NULL xe_gt pointer. Since we can disable the primary GT only on igpu platforms, the media GT is the one we'd want to check anyway for this workaround. The ternary operators in ggtt_update_access_counter() were getting a bit long/complicated, so rewrite them with regular if/else statements. While we're at it, throw in a couple extra assertions to make sure that we're truly picking the expected GT according to igpu/dgpu type. v2: - Adjust indentation/wrapping; it's easier to read this with longer, unwrapped lines. (Lucas) - Tweak wording of commit message to remove ambiguity. (Gustavo) Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-37-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-10-14 07:44:58 -07:00
Matthew Brost	cc9b24c6bb	drm/xe: Move GGTT lock init to alloc The GGTT lock is needed very early during GT initialization for a VF; move the GGTT lock initialization to the allocation phase. v8: - Rework function structure (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-13-matthew.brost@intel.com	2025-10-09 03:22:30 -07:00
Michal Wajdeczko	01ecf00463	drm/xe: Use tile-oriented messages in GGTT code Use recently added macros to print tile-oriented messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-6-michal.wajdeczko@intel.com	2025-09-12 12:23:59 +02:00
Thomas Hellström	0131514f97	drm/xe: Pass down drm_exec context to validation We want all validation (potential backing store allocation) to be part of a drm_exec transaction. Therefore add a drm_exec pointer argument to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches will deal with making all (or nearly all) calls to these functions part of a drm_exec transaction. In the meantime, define special values of the drm_exec pointer: XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction has not been done yet. XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow the drm_exec context to be passed down to map_attachment where validation takes place. XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive eviction isn't crucial and the ROI of converting those is very small. For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also a lockdep check that a drm_exec transaction can indeed start at the location where the macro is expanded. This is to encourage developers to take this into consideration early in the code development process. v2: - Fix xe_vm_set_validation_exec() imbalance. Add an assert that hopefully catches future instances of this (Matt Brost) v3: - Extend to psmi_alloc_object Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v3 Link: https://lore.kernel.org/r/20250908101246.65025-2-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:52 +02:00
Matthew Brost	15366239e2	drm/xe: Decouple TLB invalidations from GT Decouple TLB invalidations from the GT by updating the TLB invalidation layer to accept a `struct xe_tlb_inval` instead of a `struct xe_gt`. Also, rename gt_tlb to tlb. The internals of the TLB invalidation code still operate on a GT, but this is now hidden from the rest of the driver. Signed-off-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250826182911.392550-7-stuart.summers@intel.com	2025-08-27 11:49:18 -07:00
Matthew Brost	c697ddcf27	drm/xe: s/tlb_invalidation/tlb_inval tlb_invalidation is a bit verbose leading to ugly wraps in the code, shorten to tlb_inval. Signed-off-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250826182911.392550-4-stuart.summers@intel.com	2025-08-27 11:49:00 -07:00
Matt Atwood	4d5c98eb77	drm/xe: rename XE_WA to XE_GT_WA Now that there are two types of wa tables and infrastructure, be more concise in the naming of GT wa macros. v2: update the documentation Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250807214224.32728-1-matthew.s.atwood@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-08 10:50:45 -04:00
Michal Wajdeczko	538b27a09a	drm/xe: Make GGTT TLB invalidation failure message GT oriented GGTT TLB invalidation is performed on the specific GT, thus any failure message shall be also GT specific. And to help investigate any unexpected failures, promote message from warn level to WARN to get full call stack of this unlikely case. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250723133015.206601-1-michal.wajdeczko@intel.com	2025-07-24 14:44:31 +02:00
Matthew Brost	ec9223b49a	drm/xe: Drop bo->size bo->size is redundant because the base GEM object already has a size field with the same value. Drop bo->size and use the base GEM object’s size instead. While at it, introduce xe_bo_size() to abstract the BO size. v2: - Fix typo in kernel doc (Ashutosh) - Fix kunit (CI) - Fix line wrap (Checkpatch) v3: - Fix sriov build (CI) v4: - Fix display build (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com	2025-06-27 14:52:31 -07:00
Michal Wajdeczko	89d2835c36	drm/xe: Process deferred GGTT node removals on device unwind While we are indirectly draining our dedicated workqueue ggtt->wq that we use to complete asynchronous removal of some GGTT nodes, this happends as part of the managed-drm unwinding (ggtt_fini_early), which could be later then manage-device unwinding, where we could already unmap our MMIO/GMS mapping (mmio_fini). This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes) and this was leading to: [ ] BUG: unable to handle page fault for address: ffffc900058162a0 [ ] #PF: supervisor write access in kernel mode [ ] #PF: error_code(0x0002) - not-present page [ ] Oops: Oops: 0002 [#1] SMP NOPTI [ ] Tainted: [W]=WARN [ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe] [ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe] [ ] Call Trace: [ ] <TASK> [ ] xe_ggtt_clear+0xb0/0x270 [xe] [ ] ggtt_node_remove+0xbb/0x120 [xe] [ ] ggtt_node_remove_work_func+0x30/0x50 [xe] [ ] process_one_work+0x22b/0x6f0 [ ] worker_thread+0x1e8/0x3d Add managed-device action that will explicitly drain the workqueue with all pending node removals prior to releasing MMIO/GSM mapping. Fixes: `919bb54e98` ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250612220937.857-2-michal.wajdeczko@intel.com	2025-06-24 22:27:10 +02:00
Matt Roper	b5735e5e71	drm/xe: GSM size should be constant on most platforms On old Intel platforms, the size of the GSM (i.e., the stolen memory that holds the GGTT page table entries) could vary, so the driver needed to read the actual size from the PCI config space. However from Xe_HP onward, the GSM is now always guaranteed to be exactly 8MB (which translates to a 4GB GGTT address space); this is always true regardless of what the platform's much larger PPGTT address space is. The bspec doesn't document the PCI config space as being a valid way to query the size of the GSM after Xe_LP platforms, although so far it still seems to be giving us proper values for Xe_HP, Xe2, and Xe3. However we suspect that the config space will stop providing correct values on some upcoming platforms, so we should stop relying on it. Instead just use the hardcoded 8MB value as documented elsewhere in the bspec. Bspec: 49636, 67090, 50589 Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20250605225352.2333981-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-06-09 09:03:33 -07:00
Maarten Lankhorst	b2d6fd7ac5	drm/xe: Do not rely on GGTT internals in xe_guc_buf kunit tests Add a function to init ggtt for kunit, and use the GGTT function for initialising the GGTT node without populating it. This prevents the test from ever knowing about struct xe_ggtt. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-11-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-09 10:24:23 +02:00
Maarten Lankhorst	e0096fdcf8	drm/xe: Implement a helper for reading out a GGTT PTE at a specified offset Split the GGTT PTE readout to a separate function, this is useful for adding testcases in the next commit, and also cleaner than manually reading out GGTT. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://lore.kernel.org/r/20250505121924.921544-10-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-09 10:24:23 +02:00
Maarten Lankhorst	0c52d72252	drm/xe: Remove pte_encode_bo callback The users inside display have been converted to use thepte_encode_flags callback, we can now remove the pte_encode_bo cb. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://lore.kernel.org/r/20250505121924.921544-9-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-09 10:24:23 +02:00
Maarten Lankhorst	8ce1c8cc68	drm/xe/display: Dont poke into GGTT internals to fill a DPT For DPT, it is sufficient to get the GGTT encode flags to fill the DPT. Create a function to return the encode flags, and then encode using the BO address. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://lore.kernel.org/r/20250505121924.921544-7-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-09 10:24:22 +02:00
Maarten Lankhorst	57f6af194f	drm/xe/ggtt: Seperate flags and address in PTE encoding Pinning large linear display framebuffers is becoming a bottleneck. My plan of attack is doing a custom walk over the BO, this allows for easier optimization of consecutive entries. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-6-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-09 10:24:22 +02:00
Maarten Lankhorst	e0ee402750	drm/xe: Add xe_ggtt_alloc Instead of allocating inside xe_tile, create a new function that returns an allocated struct xe_ggtt from xe_ggtt.c Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-4-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-09 10:21:44 +02:00
Maarten Lankhorst	b5fe33dcb8	drm/xe: Add xe_ggtt_might_lock Another requirement of hiding more of struct xe_ggtt. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-3-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-09 10:21:44 +02:00
Maarten Lankhorst	3975d35683	drm/xe: Use xe_ggtt_map_bo_unlocked for resume This is the first step to hide the details of struct xe_ggtt. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250505121924.921544-2-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-09 10:21:44 +02:00
Michal Wajdeczko	eb9b34734c	drm/xe/vf: Move tile-related VF functions to separate file Some of our VF functions, even if they take a GT pointer, work only on primary GT and really are tile-related and would be better to keep them separate from the rest of true GT-oriented functions. Move them to a file and update to take a tile pointer instead. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20250602103325.549-3-michal.wajdeczko@intel.com	2025-06-03 12:35:57 +02:00
Tomasz Lis	3e693945b1	drm/xe/vf: Shifting GGTT area post migration We have only one GGTT for all IOV functions, with each VF having assigned a range of addresses for its use. After migration, a VF can receive a different range of addresses than it had initially. This implements shifting GGTT addresses within drm_mm nodes, so that VMAs stay valid after migration. This will make the driver use new addresses when accessing GGTT from the moment the shifting ends. By taking the ggtt->lock for the period of VMA fixups, this change also adds constraint on that mutex. Any locks used during the recovery cannot ever wait for hardware response - because after migration, the hardware will not do anything until fixups are finished. v2: Moved some functs to xe_ggtt.c; moved shift computation to just after querying; improved documentation; switched some warns to asserts; skipping fixups when GGTT shift eq 0; iterating through tiles (Michal) v3: Updated kerneldocs, removed unused funct, properly allocate balloning nodes if non existent v4: Re-used ballooning functions from VF init, used bool in place of standard error codes v5: Renamed one function v6: Subject tag change, several kerneldocs updated, some functions renamed, some moved, added several asserts, shuffled declarations of variables, revealed more detail in high level functions v7: Fixed typos, added `_locked` suffix to some functs, improved readability of asserts, removed unneeded conditional v8: Moved one function, removed implementation detail from kerneldoc, added asserts v9: Code shuffling without much change, and one param rename v10: Minor error path change, added printing the shift via debugfs Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250512114018.361843-3-tomasz.lis@intel.com	2025-05-12 15:53:35 +02:00
Tomasz Lis	dd39212b5f	drm/xe/vf: Divide GGTT ballooning into allocation and insertion The balloon nodes, which are used to fill areas of GGTT inaccessible for a specific VF, were allocated and inserted into GGTT within one function. To be able to re-use that insertion code during VF migration recovery, we need to split it. This patch separates allocation (init/fini functs) from the insertion of balloons (balloon/deballoon functs). Locks are also moved to ensure calls from post-migration recovery worker will not cause a deadlock. v2: Moved declarations to proper header v3: Rephrased description, introduced "_locked" versions of some functs, more lockdep checks, some functions renamed, altered error handling, added missing kerneldocs. v4: Suffixed more functs with `_locked`, moved lockdep asserts, fixed finalization in error path, added asserts v5: Renamed another few functs, used xe_ggtt_node_allocated(), moved lockdep back again to avoid null dereference, added asserts, improved comments v6: Changed params of cleanup_ggtt() Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250512114018.361843-2-tomasz.lis@intel.com	2025-05-12 15:53:33 +02:00
Matthew Auld	8e8e9c2663	drm/xe: unconditionally apply PINNED for pin_map() Some users apply PINNED and some don't when using pin_map(). The pin in pin_map() should imply PINNED so just unconditionally apply it and clean up all users. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-14-matthew.auld@intel.com	2025-04-04 11:41:08 +01:00
Nitin Gote	75fd04f276	drm/xe: Fix all typos in xe Fix all typos in files of xe, reported by codespell tool. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106102646.1400146-2-nitin.r.gote@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2025-01-09 17:58:09 +01:00
Niranjana Vishwanathapura	5a3b0df25d	drm/xe: Allow bo mapping on multiple ggtts Make bo->ggtt an array to support bo mapping on multiple ggtts. Add XE_BO_FLAG_GGTTx flags to map the bo on ggtt of tile 'x'. Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241120000222.204095-2-John.C.Harrison@Intel.com	2024-11-22 19:10:23 -08:00
Matthew Brost	5a71019688	drm/xe: Add mmio read before GGTT invalidate On LNL without a mmio read before a GGTT invalidate the GuC can incorrectly read the GGTT scratch page upon next access leading to jobs not getting scheduled. A mmio read before a GGTT invalidate seems to fix this. Since a GGTT invalidate is not a hot code path, blindly do a mmio read before each GGTT invalidate. Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: stable@vger.kernel.org Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3164 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023221200.1797832-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-10-30 22:14:06 -07:00
Matthew Brost	60df57e496	drm/xe: Mark GGTT work queue with WQ_MEM_RECLAIM GGTT work queue is used to free memory thus we should allow this work queue to run during reclaim. Mark with GGTT work queue with WQ_MEM_RECLAIM appropriately. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241021175705.1584521-3-matthew.brost@intel.com	2024-10-23 11:08:06 -07:00
Juha-Pekka Heikkila	3ad86ae1da	drm/xe: add interface to request physical alignment for buffer objects Add xe_bo_create_pin_map_at_aligned() which augment xe_bo_create_pin_map_at() with alignment parameter allowing to pass required alignemnt if it differ from default. Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Mika Kahola <mika.kahola@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241009151947.2240099-2-juhapekka.heikkila@gmail.com	2024-10-14 17:33:39 +03:00
Francois Dugast	91b2c42c21	drm/xe: Use fault injection infrastructure to find issues at probe time The kernel fault injection infrastructure is used to test proper error handling during probe. The return code of the functions using ALLOW_ERROR_INJECTION() can be conditionnally modified at runtime by tuning some debugfs entries. This requires CONFIG_FUNCTION_ERROR_INJECTION (among others). One way to use fault injection at probe time by making each of those functions fail one at a time is: FAILTYPE=fail_function DEVICE="0000:00:08.0" # depends on the system ERRNO=-12 # -ENOMEM, can depend on the function echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 100 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval echo -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose modprobe xe echo $DEVICE > /sys/bus/pci/drivers/xe/unbind grep -oP "^.* \[xe\]" /sys/kernel/debug/$FAILTYPE/injectable \| \ cut -d ' ' -f 1 \| while read -r FUNCTION ; do echo "Injecting fault in $FUNCTION" echo "" > /sys/kernel/debug/$FAILTYPE/inject echo $FUNCTION > /sys/kernel/debug/$FAILTYPE/inject printf %#x $ERRNO > /sys/kernel/debug/$FAILTYPE/$FUNCTION/retval echo $DEVICE > /sys/bus/pci/drivers/xe/bind done rmmod xe It will also be integrated into IGT for systematic execution by CI. v2: Wrappers are not needed in the cases covered by this patch, so remove them and use ALLOW_ERROR_INJECTION() directly. v3: Document the use of fault injection at probe time in xe_pci_probe and refer to it where ALLOW_ERROR_INJECTION() is used. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927151207.399354-1-francois.dugast@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-10-03 08:58:26 -04:00
Matt Roper	d8507423d4	drm/xe/ggtt: Convert register access to use xe_mmio Stop using GT pointers for register access. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-86-matthew.d.roper@intel.com	2024-09-11 15:32:51 -07:00
Michal Wajdeczko	f2710d9572	drm/xe: Don't keep stale pointer to bo->ggtt_node When we fail to map a BO in the GGTT, we release our GGTT node placeholder, but leave stale bo->ggtt_node pointer to it, which triggers an assert immediately followed by a crash, due to UAF: [ ] xe 0000:00:02.0: [drm] Assertion `bo->ggtt_node->base.size == bo->size` failed! [ ] WARNING: CPU: 4 PID: 126 at drivers/gpu/drm/xe/xe_ggtt.c:689 xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] RIP: 0010:xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] Call Trace: [ ] <TASK> [ ] ? __warn+0x88/0x190 [ ] ? xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] ? report_bug+0x1c3/0x1d0 [ ] ? handle_bug+0x42/0x70 [ ] ? exc_invalid_op+0x14/0x70 [ ] ? asm_exc_invalid_op+0x16/0x20 [ ] ? xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] ? xe_ggtt_remove_bo+0x1d9/0x250 [xe] [ ] xe_ttm_bo_destroy+0x11f/0x260 [xe] [ ] ? ttm_bo_release+0x31c/0x350 [ttm] [ ] ? __mutex_unlock_slowpath+0x35/0x270 [ ] __xe_bo_create_locked+0x4a0/0x550 [xe] [ ] ? mark_held_locks+0x49/0x80 [ ] xe_bo_create_pin_map_at+0x37/0x200 [xe] [ ] xe_bo_create_pin_map+0x11/0x20 [xe] While around, for similar reason, also don't keep an error pointer if we fail to allocate ggtt_node placeholder. Fixes: `34e804220f` ("drm/xe: Make xe_ggtt_node struct independent") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240906220348.1836-1-michal.wajdeczko@intel.com	2024-09-09 20:08:55 +02:00
Himal Prasad Ghimiray	c72084163c	drm/xe: Fix NPD in ggtt_node_remove() Make sure that ggtt_node_remove() is invoked only if both node and ggtt are not null. Move the null checks to the caller function xe_ggtt_node_remove(). v2: Move null check below declarations (Tejas) Fixes: `919bb54e98` ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828092229.3606503-1-himal.prasad.ghimiray@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 13:15:40 -04:00
Nathan Chancellor	ff9c674d11	drm/xe: Fix total initialization in xe_ggtt_print_holes() Clang warns (or errors with CONFIG_DRM_WERROR or CONFIG_WERROR): drivers/gpu/drm/xe/xe_ggtt.c:810:3: error: variable 'total' is uninitialized when used here [-Werror,-Wuninitialized] 810 \| total += hole_size; \| ^~~~~ drivers/gpu/drm/xe/xe_ggtt.c:798:11: note: initialize the variable 'total' to silence this warning 798 \| u64 total; \| ^ \| = 0 1 error generated. Move the zero initialization of total from xe_gt_sriov_pf_config_print_available_ggtt() to xe_ggtt_print_holes() to resolve the warning. Fixes: `136367290e` ("drm/xe: Introduce xe_ggtt_print_holes") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823-drm-xe-fix-total-in-xe_ggtt_print_holes-v1-1-12b02d079327@kernel.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-08-24 06:11:26 -07:00
Matthew Brost	5b993d00d7	drm/xe: Move ggtt_fini to devm managed ggtt->scratch is destroyed via devm, ggtt_fini sets ggtt->scratch to NULL, ggtt->scratch in GGTT clears, so ensure ggtt->scratch is set NULL before the BO is destroyed. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820172958.1095143-2-matthew.brost@intel.com	2024-08-23 09:54:10 -07:00
Rodrigo Vivi	919bb54e98	drm/xe: Fix missing runtime outer protection for ggtt_remove_node Defer the ggtt node removal to a thread if runtime_pm is not active. The ggtt node removal can be called from multiple places, including places where we cannot protect with outer callers and places we are within other locks. So, try to grab the runtime reference if the device is already active, otherwise defer the removal to a separate thread from where we are sure we can wake the device up. v2: - use xe wq instead of system wq (Matt and CI) - Avoid GFP_KERNEL to be future proof since this removal can be called from outside our drivers and we don't want to block if atomic is needed. (Brost) v3: amend forgot chunk declaring xe_device. v4: Use a xe_ggtt_region to encapsulate the node and remova info, wihtout the need for any memory allocation at runtime. v5: Actually fill the delayed_removal.invalidate (Brost) v6: - Ensure that ggtt_region is not freed before work finishes (Auld) - Own wq to ensures that the queued works are flushed before ggtt_fini (Brost) v7: also free ggtt_region on early !bound return (Auld) v8: Address the null deref (CI) v9: Based on the new xe_ggtt_node for the proper care of the lifetime of the object. v10: Redo the lost v5 change. (Brost) v11: Simplify the invalidate_on_remove (Lucas) Cc: Matthew Auld <matthew.auld@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-12-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:45 -04:00
Rodrigo Vivi	34e804220f	drm/xe: Make xe_ggtt_node struct independent In some rare cases, the drm_mm node cannot be removed synchronously due to runtime PM conditions. In this situation, the node removal will be delegated to a workqueue that will be able to wake up the device before removing the node. However, in this situation, the lifetime of the xe_ggtt_node cannot be restricted to the lifetime of the parent object. So, this patch introduces the infrastructure so the xe_ggtt_node struct can be allocated in advance and freed when needed. By having the ggtt backpointer, it also ensure that the init function is always called before any attempt to insert or reserve the node in the GGTT. v2: s/xe_ggtt_node_force_fini/xe_ggtt_node_fini and use it internaly (Brost) v3: - Use GF_NOFS for node allocation (CI) - Avoid ggtt argument, now that we have it inside the node (Lucas) - Fix some missed fini cases (CI) v4: - Fix SRIOV critical case where config->ggtt_region was lost (Michal) - Avoid ggtt argument also on removal (missed case on v3) (Michal) - Remove useless checks (Michal) - Return 0 instead of negative errno on a u32 addr. (Michal) - s/xe_ggtt_assign/xe_ggtt_node_assign for coherence, while we are touching it (Michal) v5: - Fix VFs' ggtt_balloon Cc: Matthew Auld <matthew.auld@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-11-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:45 -04:00
Rodrigo Vivi	15ca09499b	drm/xe: Refactor xe_ggtt balloon functions to make the node clear These operations are related to node. Convert them to the new appropriate name space xe_ggtt_node. v2: Also move arguments around for consistency (Lucas). v3: s/node_balloon/node_insert_balloon and s/node_deballoon/node_remove_balloon (Michal). Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-10-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:45 -04:00
Rodrigo Vivi	136367290e	drm/xe: Introduce xe_ggtt_print_holes Introduce a new xe_ggtt_print_holes helper that attends the SRIOV demand and finishes the goal of limiting drm_mm access to xe_ggtt. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240821193842.352557-9-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-22 14:00:45 -04:00

1 2 3

116 Commits