2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

666 Commits

Author SHA1 Message Date
Lu Baolu
25b1b75bba iommu/vt-d: Assign devtlb cache tag on ATS enablement
Commit <4f1492efb495> ("iommu/vt-d: Revert ATS timing change to fix boot
failure") placed the enabling of ATS in the probe_finalize callback. This
occurs after the default domain attachment, which is when the ATS cache
tag is assigned. Consequently, the device TLB cache tag is missed when the
domain is attached, leading to the device TLB not being invalidated in the
iommu_unmap paths.

Fix this by assigning the CACHE_TAG_DEVTLB cache tag when ATS is enabled.

Fixes: 4f1492efb4 ("iommu/vt-d: Revert ATS timing change to fix boot failure")
Cc: stable@vger.kernel.org
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20250625050135.3129955-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20250628100351.3198955-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-07-04 10:33:56 +02:00
Joerg Roedel
879b141b7c Merge branches 'fixes', 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', 'fsl/pamu', 'mediatek', 'renesas/ipmmu', 's390', 'intel/vt-d', 'amd/amd-vi' and 'core' into next 2025-05-23 17:14:32 +02:00
Lu Baolu
320302baed iommu/vt-d: Restore context entry setup order for aliased devices
Commit 2031c469f8 ("iommu/vt-d: Add support for static identity domain")
changed the context entry setup during domain attachment from a
set-and-check policy to a clear-and-reset approach. This inadvertently
introduced a regression affecting PCI aliased devices behind PCIe-to-PCI
bridges.

Specifically, keyboard and touchpad stopped working on several Apple
Macbooks with below messages:

 kernel: platform pxa2xx-spi.3: Adding to iommu group 20
 kernel: input: Apple SPI Keyboard as
 /devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi-APP000D:00/input/input0
 kernel: DMAR: DRHD: handling fault status reg 3
 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
 0xffffa000 [fault reason 0x06] PTE Read access is not set
 kernel: DMAR: DRHD: handling fault status reg 3
 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
 0xffffa000 [fault reason 0x06] PTE Read access is not set
 kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
 kernel: DMAR: DRHD: handling fault status reg 3
 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
 0xffffa000 [fault reason 0x06] PTE Read access is not set
 kernel: DMAR: DRHD: handling fault status reg 3
 kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00

Fix this by restoring the previous context setup order.

Fixes: 2031c469f8 ("iommu/vt-d: Add support for static identity domain")
Closes: https://lore.kernel.org/all/4dada48a-c5dd-4c30-9c85-5b03b0aa01f0@bfh.ch/
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20250514060523.2862195-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20250520075849.755012-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-05-23 17:11:40 +02:00
Wei Wang
f3fe7e1830 iommu/vt-d: Change dmar_ats_supported() to return boolean
According to "Function return values and names" in coding-style.rst, the
dmar_ats_supported() function should return a boolean instead of an
integer. Also, rename "ret" to "supported" to be more straightforward.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20250509140021.4029303-3-wei.w.wang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-05-16 08:49:30 +02:00
Wei Wang
3df2ebcee9 iommu/vt-d: Eliminate pci_physfn() in dmar_find_matched_satc_unit()
The function dmar_find_matched_satc_unit() contains a duplicate call to
pci_physfn(). This call is unnecessary as pci_physfn() has already been
invoked by the caller. Removing the redundant call simplifies the code
and improves efficiency a bit.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Link: https://lore.kernel.org/r/20250509140021.4029303-2-wei.w.wang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-05-16 08:49:29 +02:00
Lu Baolu
868720fe15 iommu/vt-d: Replace spin_lock with mutex to protect domain ida
The domain ID allocator is currently protected by a spin_lock. However,
ida_alloc_range can potentially block if it needs to allocate memory to
grow its internal structures.

Replace the spin_lock with a mutex which allows sleep on block. Thus,
the memory allocation flags can be updated from GFP_ATOMIC to GFP_KERNEL
to allow blocking memory allocations if necessary.

Introduce a new mutex, did_lock, specifically for protecting the domain
ida. The existing spinlock will remain for protecting other intel_iommu
fields.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20250430021135.2370244-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-05-16 08:49:28 +02:00
Lu Baolu
f93b4ac592 iommu/vt-d: Use ida to manage domain id
Switch the intel iommu driver to use the ida mechanism for managing domain
IDs, replacing the previous fixed-size bitmap.

The previous approach allocated a bitmap large enough to cover the maximum
number of domain IDs supported by the hardware, regardless of the actual
number of domains in use. This led to unnecessary memory consumption,
especially on systems supporting a large number of iommu units but only
utilizing a small number of domain IDs.

The ida allocator dynamically manages the allocation and freeing of integer
IDs, only consuming memory for the IDs that are currently in use. This
significantly optimizes memory usage compared to the fixed-size bitmap.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20250430021135.2370244-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-05-16 08:49:27 +02:00
Jason Gunthorpe
6f5dc76580 iommu/vt-d: Restore WO permissions on second-level paging entries
VT-D HW can do WO permissions on the second-stage but not the first-stage
page table formats. The commit eea53c5816 ("iommu/vt-d: Remove WO
permissions on second-level paging entries") wanted to make this uniform
for VT-D by disabling the support for WO permissions in the second-stage.

This isn't consistent with how other drivers are working. Instead if the
underlying HW can support WO, it should. For instance AMD already supports
WO on its second stage (v1) format and not its first (v2).

If WO support needs to be discoverable it should be done through an
iommu_domain capability flag.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v1-c26553717e90+65f-iommu_vtd_ss_wo_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-05-16 08:49:27 +02:00
Rolf Eike Beer
ddcc66cfe8 iommu: make inclusion of intel directory conditional
Nothing in there is active if CONFIG_INTEL_IOMMU is not enabled, so the whole
directory can depend on that switch as well.

Fixes: ab65ba57e3 ("iommu/vt-d: Move Kconfig and Makefile bits down into intel directory")
Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/3818749.MHq7AAxBmi@devpool92.emlix.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-05-16 08:46:46 +02:00
Lu Baolu
f984fb09e6 iommu: Remove iommu_dev_enable/disable_feature()
No external drivers use these interfaces anymore. Furthermore, no existing
iommu drivers implement anything in the callbacks. Remove them to avoid
dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/20250418080130.1844424-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-28 13:04:35 +02:00
Lu Baolu
17fce9d233 iommu/vt-d: Put iopf enablement in domain attach path
Update iopf enablement in the driver to use the new method, similar to
the arm-smmu-v3 driver. Enable iopf support when any domain with an
iopf_handler is attached, and disable it when the domain is removed.

Place all the logic for controlling the PRI and iopf queue in the domain
set/remove/replace paths. Keep track of the number of domains set to the
device and PASIDs that require iopf. When the first domain requiring iopf
is attached, add the device to the iopf queue and enable PRI. When the
last domain is removed, remove it from the iopf queue and disable PRI.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/r/20250418080130.1844424-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-28 13:04:31 +02:00
Jason Gunthorpe
7c8896dd4a iommu: Remove IOMMU_DEV_FEAT_SVA
None of the drivers implement anything here anymore, remove the dead code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/r/20250418080130.1844424-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-28 13:04:29 +02:00
Mingcong Bai
2c8a7c66c9 iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57)
On the Lenovo ThinkPad X201, when Intel VT-d is enabled in the BIOS, the
kernel boots with errors related to DMAR, the graphical interface appeared
quite choppy, and the system resets erratically within a minute after it
booted:

DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0xb97ff000
[fault reason 0x05] PTE Write access is not set

Upon comparing boot logs with VT-d on/off, I found that the Intel Calpella
quirk (`quirk_calpella_no_shadow_gtt()') correctly applied the igfx IOMMU
disable/quirk correctly:

pci 0000:00:00.0: DMAR: BIOS has allocated no shadow GTT; disabling IOMMU
for graphics

Whereas with VT-d on, it went into the "else" branch, which then
triggered the DMAR handling fault above:

... else if (!disable_igfx_iommu) {
	/* we have to ensure the gfx device is idle before we flush */
	pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n");
	iommu_set_dma_strict();
}

Now, this is not exactly scientific, but moving 0x0044 to quirk_iommu_igfx
seems to have fixed the aforementioned issue. Running a few `git blame'
runs on the function, I have found that the quirk was originally
introduced as a fix specific to ThinkPad X201:

commit 9eecabcb9a ("intel-iommu: Abort IOMMU setup for igfx if BIOS gave
no shadow GTT space")

Which was later revised twice to the "else" branch we saw above:

- 2011: commit 6fbcfb3e46 ("intel-iommu: Workaround IOTLB hang on
  Ironlake GPU")
- 2024: commit ba00196ca4 ("iommu/vt-d: Decouple igfx_off from graphic
  identity mapping")

I'm uncertain whether further testings on this particular laptops were
done in 2011 and (honestly I'm not sure) 2024, but I would be happy to do
some distro-specific testing if that's what would be required to verify
this patch.

P.S., I also see IDs 0x0040, 0x0062, and 0x006a listed under the same
`quirk_calpella_no_shadow_gtt()' quirk, but I'm not sure how similar these
chipsets are (if they share the same issue with VT-d or even, indeed, if
this issue is specific to a bug in the Lenovo BIOS). With regards to
0x0062, it seems to be a Centrino wireless card, but not a chipset?

I have also listed a couple (distro and kernel) bug reports below as
references (some of them are from 7-8 years ago!), as they seem to be
similar issue found on different Westmere/Ironlake, Haswell, and Broadwell
hardware setups.

Cc: stable@vger.kernel.org
Fixes: 6fbcfb3e46 ("intel-iommu: Workaround IOTLB hang on Ironlake GPU")
Fixes: ba00196ca4 ("iommu/vt-d: Decouple igfx_off from graphic identity mapping")
Link: https://groups.google.com/g/qubes-users/c/4NP4goUds2c?pli=1
Link: https://bugs.archlinux.org/task/65362
Link: https://bbs.archlinux.org/viewtopic.php?id=230323
Reported-by: Wenhao Sun <weiguangtwk@outlook.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=197029
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
Link: https://lore.kernel.org/r/20250415133330.12528-1-jeffbai@aosc.io
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-28 13:00:58 +02:00
Lu Baolu
4f1492efb4 iommu/vt-d: Revert ATS timing change to fix boot failure
Commit <5518f239aff1> ("iommu/vt-d: Move scalable mode ATS enablement to
probe path") changed the PCI ATS enablement logic to run earlier,
specifically before the default domain attachment.

On some client platforms, this change resulted in boot failures, causing
the kernel to panic with the following message and call trace:

 Kernel panic - not syncing: DMAR hardware is malfunctioning
 CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #175
 Call Trace:
  <TASK>
  dump_stack_lvl+0x6f/0xb0
  dump_stack+0x10/0x16
  panic+0x10a/0x2b7
  iommu_enable_translation.cold+0xc/0xc
  intel_iommu_init+0xe39/0xec0
  ? trace_hardirqs_on+0x1e/0xd0
  ? __pfx_pci_iommu_init+0x10/0x10
  pci_iommu_init+0xd/0x40
  do_one_initcall+0x5b/0x390
  kernel_init_freeable+0x26d/0x2b0
  ? __pfx_kernel_init+0x10/0x10
  kernel_init+0x15/0x120
  ret_from_fork+0x35/0x60
  ? __pfx_kernel_init+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
 RIP: 1f0f:0x0
 Code: Unable to access opcode bytes at 0xffffffffffffffd6.
 RSP: 0000:0000000000000000 EFLAGS: 841f0f2e66 ORIG_RAX:
      1f0f2e6600000000
 RAX: 0000000000000000 RBX: 1f0f2e6600000000 RCX:
      2e66000000000084
 RDX: 0000000000841f0f RSI: 000000841f0f2e66 RDI:
      00841f0f2e660000
 RBP: 00841f0f2e660000 R08: 00841f0f2e660000 R09:
      000000841f0f2e66
 R10: 0000000000841f0f R11: 2e66000000000084 R12:
      000000841f0f2e66
 R13: 0000000000841f0f R14: 2e66000000000084 R15:
      1f0f2e6600000000
  </TASK>
 ---[ end Kernel panic - not syncing: DMAR hardware is malfunctioning ]---

Fix this by reverting the timing change for ATS enablement introduced by
the offending commit and restoring the previous behavior.

Fixes: 5518f239af ("iommu/vt-d: Move scalable mode ATS enablement to probe path")
Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Closes: https://lore.kernel.org/linux-iommu/01b9c72f-460d-4f77-b696-54c6825babc9@linux.intel.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20250416073608.1799578-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17 16:45:28 +02:00
Jason Gunthorpe
249d3327f0 iommu/vtd: Remove iommu_alloc_pages_node()
Intel is the only thing that uses this now, convert to the size versions,
trying to avoid PAGE_SHIFT.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/23-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17 16:22:54 +02:00
Jason Gunthorpe
5087f663c2 iommu/pages: Remove iommu_alloc_page_node()
Use iommu_alloc_pages_node_sz() instead.

AMD and Intel are both using 4K pages for these structures since those
drivers only work on 4K PAGE_SIZE.

riscv is also spec'd to use SZ_4K.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/21-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17 16:22:51 +02:00
Jason Gunthorpe
d50aaa4a9f iommu: Update various drivers to pass in lg2sz instead of order to iommu pages
Convert most of the places calling get_order() as an argument to the
iommu-pages allocator into order_base_2() or the _sz flavour
instead. These places already have an exact size, there is no particular
reason to use order here.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/19-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17 16:22:48 +02:00
Jason Gunthorpe
868240c34e iommu: Change iommu_iotlb_gather to use iommu_page_list
This converts the remaining places using list of pages to the new API.

The Intel free path was shared with its gather path, so it is converted at
the same time.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/11-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17 16:22:42 +02:00
Jason Gunthorpe
3e8e986ce8 iommu/pages: Remove iommu_free_page()
Use iommu_free_pages() instead.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17 16:22:36 +02:00
Jason Gunthorpe
4316ba4a50 iommu/pages: Remove the order argument to iommu_free_pages()
Now that we have a folio under the allocation iommu_free_pages() can know
the order of the original allocation and do the correct thing to free it.

The next patch will rename iommu_free_page() to iommu_free_pages() so we
have naming consistency with iommu_alloc_pages_node().

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17 16:22:33 +02:00
Petr Tesarik
7d8c490ba3 iommu/vt-d: Remove an unnecessary call set_dma_ops()
Do not touch per-device DMA ops when the driver has been converted to use
the dma-iommu API.

Fixes: c588072bba ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Link: https://lore.kernel.org/r/20250403165605.278541-1-ptesarik@suse.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11 09:06:06 +02:00
Sean Christopherson
548183ea38 iommu/vt-d: Wire up irq_ack() to irq_move_irq() for posted MSIs
Set the posted MSI irq_chip's irq_ack() hook to irq_move_irq() instead of
a dummy/empty callback so that posted MSIs process pending changes to the
IRQ's SMP affinity.  Failure to honor a pending set-affinity results in
userspace being unable to change the effective affinity of the IRQ, as
IRQD_SETAFFINITY_PENDING is never cleared and so irq_set_affinity_locked()
always defers moving the IRQ.

The issue is most easily reproducible by setting /proc/irq/xx/smp_affinity
multiple times in quick succession, as only the first update is likely to
be handled in process context.

Fixes: ed1e48ea43 ("iommu/vt-d: Enable posted mode for device MSIs")
Cc: Robert Lippert <rlippert@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Wentao Yang <wentaoyang@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250321194249.1217961-1-seanjc@google.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11 09:06:06 +02:00
Linus Torvalds
48552153cf iommufd 6.15 merge window pull
Two significant new items:
 
 - Allow reporting IOMMU HW events to userspace when the events are clearly
   linked to a device. This is linked to the VIOMMU object and is intended to
   be used by a VMM to forward HW events to the virtual machine as part of
   emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
   mechanism. Like the existing fault events the data is delivered through
   a simple FD returning event records on read().
 
 - PASID support in VFIO. "Process Address Space ID" is a PCI feature that
   allows the device to tag all PCI DMA operations with an ID. The IOMMU
   will then use the ID to select a unique translation for those DMAs. This
   is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to
   manage each PASID entry. The support is generic so any VFIO user could
   attach any translation to a PASID, and the support should work on ARM
   SMMUv3 as well. AMD requires additional driver work.
 
 Some minor updates, along with fixes:
 
 - Prevent using nested parents with fault's, no driver support today
 
 - Put a single "cookie_type" value in the iommu_domain to indicate what
   owns the various opaque owner fields
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ+q6NgAKCRCFwuHvBreF
 YZ3zAQDbl4/Z0O+CLN2AXq4Zeiyq1HTSoF94hzqmm7lQ17zTIwD8CCdyLXHvupaq
 tkBIv5IovpaxlrSk6M0kh2K8vPCk9Qk=
 =CIM3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Two significant new items:

   - Allow reporting IOMMU HW events to userspace when the events are
     clearly linked to a device.

     This is linked to the VIOMMU object and is intended to be used by a
     VMM to forward HW events to the virtual machine as part of
     emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
     mechanism. Like the existing fault events the data is delivered
     through a simple FD returning event records on read().

   - PASID support in VFIO.

     The "Process Address Space ID" is a PCI feature that allows the
     device to tag all PCI DMA operations with an ID. The IOMMU will
     then use the ID to select a unique translation for those DMAs. This
     is part of Intel's vIOMMU support as VT-D HW requires the
     hypervisor to manage each PASID entry.

     The support is generic so any VFIO user could attach any
     translation to a PASID, and the support should work on ARM SMMUv3
     as well. AMD requires additional driver work.

  Some minor updates, along with fixes:

   - Prevent using nested parents with fault's, no driver support today

   - Put a single "cookie_type" value in the iommu_domain to indicate
     what owns the various opaque owner fields"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits)
  iommufd: Test attach before detaching pasid
  iommufd: Fix iommu_vevent_header tables markup
  iommu: Convert unreachable() to BUG()
  iommufd: Balance veventq->num_events inc/dec
  iommufd: Initialize the flags of vevent in iommufd_viommu_report_event()
  iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
  iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
  ida: Add ida_find_first_range()
  iommufd/selftest: Add coverage for iommufd pasid attach/detach
  iommufd/selftest: Add test ops to test pasid attach/detach
  iommufd/selftest: Add a helper to get test device
  iommufd/selftest: Add set_dev_pasid in mock iommu
  iommufd: Allow allocating PASID-compatible domain
  iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  iommufd: Enforce PASID-compatible domain for RID
  iommufd: Support pasid attach/replace
  iommufd: Enforce PASID-compatible domain in PASID path
  iommufd/device: Add pasid_attach array to track per-PASID attach
  ...
2025-04-01 18:03:46 -07:00
Yi Liu
ce15c13e7a iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
Intel iommu driver just treats it as a nop since Intel VT-d does not have
special requirement on domains attached to either the PASID or RID of a
PASID-capable device.

Link: https://patch.msgid.link/r/20250321171940.7213-14-yi.l.liu@intel.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25 10:18:31 -03:00
Joerg Roedel
22df63a23a Merge branches 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', 'rockchip', 's390', 'core', 'intel/vt-d' and 'amd/amd-vi' into next 2025-03-20 09:11:09 +01:00
Lu Baolu
93ae6e68b6 iommu/vt-d: Fix possible circular locking dependency
We have recently seen report of lockdep circular lock dependency warnings
on platforms like Skylake and Kabylake:

 ======================================================
 WARNING: possible circular locking dependency detected
 6.14.0-rc6-CI_DRM_16276-gca2c04fe76e8+ #1 Not tainted
 ------------------------------------------------------
 swapper/0/1 is trying to acquire lock:
 ffffffff8360ee48 (iommu_probe_device_lock){+.+.}-{3:3},
   at: iommu_probe_device+0x1d/0x70

 but task is already holding lock:
 ffff888102c7efa8 (&device->physical_node_lock){+.+.}-{3:3},
   at: intel_iommu_init+0xe75/0x11f0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #6 (&device->physical_node_lock){+.+.}-{3:3}:
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        intel_iommu_init+0xe75/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #5 (dmar_global_lock){++++}-{3:3}:
        down_read+0x43/0x1d0
        enable_drhd_fault_handling+0x21/0x110
        cpuhp_invoke_callback+0x4c6/0x870
        cpuhp_issue_call+0xbf/0x1f0
        __cpuhp_setup_state_cpuslocked+0x111/0x320
        __cpuhp_setup_state+0xb0/0x220
        irq_remap_enable_fault_handling+0x3f/0xa0
        apic_intr_mode_init+0x5c/0x110
        x86_late_time_init+0x24/0x40
        start_kernel+0x895/0xbd0
        x86_64_start_reservations+0x18/0x30
        x86_64_start_kernel+0xbf/0x110
        common_startup_64+0x13e/0x141

 -> #4 (cpuhp_state_mutex){+.+.}-{3:3}:
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        __cpuhp_setup_state_cpuslocked+0x67/0x320
        __cpuhp_setup_state+0xb0/0x220
        page_alloc_init_cpuhp+0x2d/0x60
        mm_core_init+0x18/0x2c0
        start_kernel+0x576/0xbd0
        x86_64_start_reservations+0x18/0x30
        x86_64_start_kernel+0xbf/0x110
        common_startup_64+0x13e/0x141

 -> #3 (cpu_hotplug_lock){++++}-{0:0}:
        __cpuhp_state_add_instance+0x4f/0x220
        iova_domain_init_rcaches+0x214/0x280
        iommu_setup_dma_ops+0x1a4/0x710
        iommu_device_register+0x17d/0x260
        intel_iommu_init+0xda4/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #2 (&domain->iova_cookie->mutex){+.+.}-{3:3}:
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        iommu_setup_dma_ops+0x16b/0x710
        iommu_device_register+0x17d/0x260
        intel_iommu_init+0xda4/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #1 (&group->mutex){+.+.}-{3:3}:
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        __iommu_probe_device+0x24c/0x4e0
        probe_iommu_group+0x2b/0x50
        bus_for_each_dev+0x7d/0xe0
        iommu_device_register+0xe1/0x260
        intel_iommu_init+0xda4/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #0 (iommu_probe_device_lock){+.+.}-{3:3}:
        __lock_acquire+0x1637/0x2810
        lock_acquire+0xc9/0x300
        __mutex_lock+0xb4/0xe40
        mutex_lock_nested+0x1b/0x30
        iommu_probe_device+0x1d/0x70
        intel_iommu_init+0xe90/0x11f0
        pci_iommu_init+0x13/0x70
        do_one_initcall+0x62/0x3f0
        kernel_init_freeable+0x3da/0x6a0
        kernel_init+0x1b/0x200
        ret_from_fork+0x44/0x70
        ret_from_fork_asm+0x1a/0x30

 other info that might help us debug this:

 Chain exists of:
   iommu_probe_device_lock --> dmar_global_lock -->
     &device->physical_node_lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&device->physical_node_lock);
                                lock(dmar_global_lock);
                                lock(&device->physical_node_lock);
   lock(iommu_probe_device_lock);

  *** DEADLOCK ***

This driver uses a global lock to protect the list of enumerated DMA
remapping units. It is necessary due to the driver's support for dynamic
addition and removal of remapping units at runtime.

Two distinct code paths require iteration over this remapping unit list:

- Device registration and probing: the driver iterates the list to
  register each remapping unit with the upper layer IOMMU framework
  and subsequently probe the devices managed by that unit.
- Global configuration: Upper layer components may also iterate the list
  to apply configuration changes.

The lock acquisition order between these two code paths was reversed. This
caused lockdep warnings, indicating a risk of deadlock. Fix this warning
by releasing the global lock before invoking upper layer interfaces for
device registration.

Fixes: b150654f74 ("iommu/vt-d: Fix suspicious RCU usage")
Closes: https://lore.kernel.org/linux-iommu/SJ1PR11MB612953431F94F18C954C4A9CB9D32@SJ1PR11MB6129.namprd11.prod.outlook.com/
Tested-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250317035714.1041549-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20 09:03:57 +01:00
Sean Christopherson
688124cc54 iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes
Don't overwrite an IRTE that is posting IRQs to a vCPU with a posted MSI
entry if the host IRQ affinity happens to change.  If/when the IRTE is
reverted back to "host mode", it will be reconfigured as a posted MSI or
remapped entry as appropriate.

Drop the "mode" field, which doesn't differentiate between posted MSIs and
posted vCPUs, in favor of a dedicated posted_vcpu flag.  Note!  The two
posted_{msi,vcpu} flags are intentionally not mutually exclusive; an IRTE
can transition between posted MSI and posted vCPU.

Fixes: ed1e48ea43 ("iommu/vt-d: Enable posted mode for device MSIs")
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250315025135.2365846-3-seanjc@google.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20 09:03:56 +01:00
Sean Christopherson
2454823e97 iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled
Add a helper to take care of reconfiguring an IRTE to deliver IRQs to the
host, i.e. not to a vCPU, and use the helper when an IRTE's vCPU affinity
is nullified, i.e. when KVM puts an IRTE back into "host" mode.  Because
posted MSIs use an ephemeral IRTE, using modify_irte() puts the IRTE into
full remapped mode, i.e. unintentionally disables posted MSIs on the IRQ.

Fixes: ed1e48ea43 ("iommu/vt-d: Enable posted mode for device MSIs")
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250315025135.2365846-2-seanjc@google.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-20 09:03:56 +01:00
Lu Baolu
4c293add58 iommu/vt-d: Cleanup intel_context_flush_present()
The intel_context_flush_present() is called in places where either the
scalable mode is disabled, or scalable mode is enabled but all PASID
entries are known to be non-present. In these cases, the flush_domains
path within intel_context_flush_present() will never execute. This dead
code is therefore removed.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/r/20250228092631.3425464-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10 09:31:05 +01:00
Lu Baolu
87caaba1d1 iommu/vt-d: Move PRI enablement in probe path
Update PRI enablement to use the new method, similar to the amd iommu
driver. Enable PRI in the device probe path and disable it when the device
is released. PRI is enabled throughout the device's iommu lifecycle. The
infrastructure for the iommu subsystem to handle iopf requests is created
during iopf enablement and released during iopf disablement.  All invalid
page requests from the device are automatically handled by the iommu
subsystem if iopf is not enabled. Add iopf_refcount to track the iopf
enablement.

Convert the return type of intel_iommu_disable_iopf() to void, as there
is no way to handle a failure when disabling this feature.  Make
intel_iommu_enable/disable_iopf() helpers global, as they will be used
beyond the current file in the subsequent patch.

The iopf_refcount is not protected by any lock. This is acceptable, as
there is no concurrent access to it in the current code. The following
patch will address this by moving it to the domain attach/detach paths,
which are protected by the iommu group mutex.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/r/20250228092631.3425464-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10 09:31:04 +01:00
Lu Baolu
5518f239af iommu/vt-d: Move scalable mode ATS enablement to probe path
Device ATS is currently enabled when a domain is attached to the device
and disabled when the domain is detached. This creates a limitation:
when the IOMMU is operating in scalable mode and IOPF is enabled, the
device's domain cannot be changed.

The previous code enables ATS when a domain is set to a device's RID and
disables it during RID domain switch. So, if a PASID is set with a
domain requiring PRI, ATS should remain enabled until the domain is
removed. During the PASID domain's lifecycle, if the RID's domain
changes, PRI will be disrupted because it depends on ATS, which is
disabled when the blocking domain is set for the device's RID.

Remove this limitation by moving ATS enablement to the device probe path.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/r/20250228092631.3425464-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10 09:31:04 +01:00
Jason Gunthorpe
607ba1bb09 iommu/vt-d: Check if SVA is supported when attaching the SVA domain
Attach of a SVA domain should fail if SVA is not supported, move the check
for SVA support out of IOMMU_DEV_FEAT_SVA and into attach.

Also check when allocating a SVA domain to match other drivers.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/r/20250228092631.3425464-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10 09:31:03 +01:00
Jason Gunthorpe
a8653e5cc2 iommu/vt-d: Use virt_to_phys()
If all the inlines are unwound virt_to_dma_pfn() is simply:
   return page_to_pfn(virt_to_page(p)) << (PAGE_SHIFT - VTD_PAGE_SHIFT);

Which can be re-arranged to:
   (page_to_pfn(virt_to_page(p)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT

The only caller is:
   ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT)

re-arranged to:
   ((page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT)
           << VTD_PAGE_SHIFT

Which simplifies to:
   page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT

That is the same as virt_to_phys(tmp_page), so just remove all of this.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/8-v3-e797f4dc6918+93057-iommu_pages_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10 09:31:02 +01:00
Yunhui Cui
9ce7603ad3 iommu/vt-d: Fix system hang on reboot -f
We found that executing the command ./a.out &;reboot -f (where a.out is a
program that only executes a while(1) infinite loop) can probabilistically
cause the system to hang in the intel_iommu_shutdown() function, rendering
it unresponsive. Through analysis, we identified that the factors
contributing to this issue are as follows:

1. The reboot -f command does not prompt the kernel to notify the
application layer to perform cleanup actions, allowing the application to
continue running.

2. When the kernel reaches the intel_iommu_shutdown() function, only the
BSP (Bootstrap Processor) CPU is operational in the system.

3. During the execution of intel_iommu_shutdown(), the function down_write
(&dmar_global_lock) causes the process to sleep and be scheduled out.

4. At this point, though the processor's interrupt flag is not cleared,
 allowing interrupts to be accepted. However, only legacy devices and NMI
(Non-Maskable Interrupt) interrupts could come in, as other interrupts
routing have already been disabled. If no legacy or NMI interrupts occur
at this stage, the scheduler will not be able to run.

5. If the application got scheduled at this time is executing a while(1)-
type loop, it will be unable to be preempted, leading to an infinite loop
and causing the system to become unresponsive.

To resolve this issue, the intel_iommu_shutdown() function should not
execute down_write(), which can potentially cause the process to be
scheduled out. Furthermore, since only the BSP is running during the later
stages of the reboot, there is no need for protection against parallel
access to the DMAR (DMA Remapping) unit. Therefore, the following lines
could be removed:

down_write(&dmar_global_lock);
up_write(&dmar_global_lock);

After testing, the issue has been resolved.

Fixes: 6c3a44ed3c ("iommu/vt-d: Turn off translations at shutdown")
Co-developed-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Link: https://lore.kernel.org/r/20250303062421.17929-1-cuiyunhui@bytedance.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10 09:31:02 +01:00
Lu Baolu
b150654f74 iommu/vt-d: Fix suspicious RCU usage
Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts
locally") moved the call to enable_drhd_fault_handling() to a code
path that does not hold any lock while traversing the drhd list. Fix
it by ensuring the dmar_global_lock lock is held when traversing the
drhd list.

Without this fix, the following warning is triggered:
 =============================
 WARNING: suspicious RCU usage
 6.14.0-rc3 #55 Not tainted
 -----------------------------
 drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!!
               other info that might help us debug this:
               rcu_scheduler_active = 1, debug_locks = 1
 2 locks held by cpuhp/1/23:
 #0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0
 #1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0
 stack backtrace:
 CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 #55
 Call Trace:
  <TASK>
  dump_stack_lvl+0xb7/0xd0
  lockdep_rcu_suspicious+0x159/0x1f0
  ? __pfx_enable_drhd_fault_handling+0x10/0x10
  enable_drhd_fault_handling+0x151/0x180
  cpuhp_invoke_callback+0x1df/0x990
  cpuhp_thread_fun+0x1ea/0x2c0
  smpboot_thread_fn+0x1f5/0x2e0
  ? __pfx_smpboot_thread_fn+0x10/0x10
  kthread+0x12a/0x2d0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x4a/0x60
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>

Holding the lock in enable_drhd_fault_handling() triggers a lockdep splat
about a possible deadlock between dmar_global_lock and cpu_hotplug_lock.
This is avoided by not holding dmar_global_lock when calling
iommu_device_register(), which initiates the device probe process.

Fixes: d74169ceb0 ("iommu/vt-d: Allocate DMAR fault interrupts locally")
Reported-and-tested-by: Ido Schimmel <idosch@nvidia.com>
Closes: https://lore.kernel.org/linux-iommu/Zx9OwdLIc_VoQ0-a@shredder.mtl.com/
Tested-by: Breno Leitao <leitao@debian.org>
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20250218022422.2315082-1-baolu.lu@linux.intel.com
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28 12:19:01 +01:00
Jerry Snitselaar
64f792981e iommu/vt-d: Remove device comparison in context_setup_pass_through_cb
Remove the device comparison check in context_setup_pass_through_cb.
pci_for_each_dma_alias already makes a decision on whether the
callback function should be called for a device. With the check
in place it will fail to create context entries for aliases as
it walks up to the root bus.

Fixes: 2031c469f8 ("iommu/vt-d: Add support for static identity domain")
Closes: https://lore.kernel.org/linux-iommu/82499eb6-00b7-4f83-879a-e97b4144f576@linux.intel.com/
Cc: stable@vger.kernel.org
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20250224180316.140123-1-jsnitsel@redhat.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28 12:19:00 +01:00
Lu Baolu
add43c4fbc iommu/vt-d: Make intel_iommu_drain_pasid_prq() cover faults for RID
This driver supports page faults on PCI RID since commit <9f831c16c69e>
("iommu/vt-d: Remove the pasid present check in prq_event_thread") by
allowing the reporting of page faults with the pasid_present field cleared
to the upper layer for further handling. The fundamental assumption here
is that the detach or replace operations act as a fence for page faults.
This implies that all pending page faults associated with a specific RID
or PASID are flushed when a domain is detached or replaced from a device
RID or PASID.

However, the intel_iommu_drain_pasid_prq() helper does not correctly
handle faults for RID. This leads to faults potentially remaining pending
in the iommu hardware queue even after the domain is detached, thereby
violating the aforementioned assumption.

Fix this issue by extending intel_iommu_drain_pasid_prq() to cover faults
for RID.

Fixes: 9f831c16c6 ("iommu/vt-d: Remove the pasid present check in prq_event_thread")
Cc: stable@vger.kernel.org
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20250121023150.815972-1-baolu.lu@linux.intel.com
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20250211005512.985563-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-14 09:12:47 +01:00
Linus Torvalds
aa44198a6c iommufd 6.14 merge window pull
No major functionality this cycle:
 
 - iommufd part of the domain_alloc_paging_flags() conversion
 
 - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers
 
 - Increase a timeout waiting for other threads to drop transient refcounts
   that syzkaller was hitting
 
 - Fix a UBSAN hit in iova_bitmap due to shift out of bounds
 
 - Add missing cleanup of fault events during FD shutdown, fixing a memory leak
 
 - Improve the fault delivery flow to have a smaller locking critical
   region that does not include copy_to_user()
 
 - Fix 32 bit ABI breakage due to missed implicit padding, and fix the
   stack memory leakage
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ5JgnAAKCRCFwuHvBreF
 YQWrAP9ItOsbeOxYIEVQK1E90HbMWVHF8RhWDPnpChgzyorjjQEA/OOou6uTAcVC
 fJybLk3T7JrutMmBFvVLsRg+yCJPlgE=
 =ldhT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "No major functionality this cycle:

   - iommufd part of the domain_alloc_paging_flags() conversion

   - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers

   - Increase a timeout waiting for other threads to drop transient
     refcounts that syzkaller was hitting

   - Fix a UBSAN hit in iova_bitmap due to shift out of bounds

   - Add missing cleanup of fault events during FD shutdown, fixing a
     memory leak

   - Improve the fault delivery flow to have a smaller locking critical
     region that does not include copy_to_user()

   - Fix 32 bit ABI breakage due to missed implicit padding, and fix the
     stack memory leakage"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Fix struct iommu_hwpt_pgfault init and padding
  iommufd/fault: Use a separate spinlock to protect fault->deliver list
  iommufd/fault: Destroy response and mutex in iommufd_fault_destroy()
  iommufd: Keep OBJ/IOCTL lists in an alphabetical order
  iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index()
  iommu: iommufd: fix WARNING in iommufd_device_unbind
  iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core
  iommufd/selftest: Remove domain_alloc_paging()
2025-01-24 12:04:35 -08:00
Linus Torvalds
f1c243fc78 IOMMU Updates for Linux v6.14
Including:
 
 	- Core changes:
 	  - PASID support for the blocked_domain.
 
 	- ARM-SMMU Updates:
 	  - SMMUv2:
 	    * Implement per-client prefetcher configuration on Qualcomm SoCs.
 	    * Support for the Adreno SMMU on Qualcomm's SDM670 SOC.
 	  - SMMUv3:
 	    * Pretty-printing of event records.
 	    * Drop the ->domain_alloc_paging implementation in favour of
 	      ->domain_alloc_paging_flags(flags==0).
 	  - IO-PGTable:
 	    * Generalisation of the page-table walker to enable external walkers
 	      (e.g. for debugging unexpected page-faults from the GPU).
 	    * Minor fix for handling concatenated PGDs at stage-2 with 16KiB pages.
 	  - Misc:
 	    * Clean-up device probing and replace the crufty probe-deferral hack
 	      with a more robust implementation of arm_smmu_get_by_fwnode().
 	    * Device-tree binding updates for a bunch of Qualcomm platforms.
 
 	- Intel VT-d Updates:
 	  - Remove domain_alloc_paging().
 	  - Remove capability audit code.
 	  - Draining PRQ in sva unbind path when FPD bit set.
 	  - Link cache tags of same iommu unit together.
 
 	- AMD-Vi Updates:
 	  - Use CMPXCHG128 to update DTE.
 	  - Cleanups of the domain_alloc_paging() path.
 
 	- RiscV IOMMU:
 	  - Platform MSI support.
 	  - Shutdown support.
 
 	- Rockchip IOMMU:
 	  - Add DT bindings for Rockchip RK3576.
 
 	- More smaller fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmeQsnQACgkQK/BELZcB
 GuOHug/8DDIuKlUHU2+3U2kKxMb8o1kDxGPkfKgXBaxxpprvY9DARRv7N4mnF6Vu
 Db8P7QihXfQKnZHXEiqHE7TlA5B+IVQd3Kz96P4sY3OlVWGZYqyKv2GEHyG5CjN/
 bay7bfgeo2EVEiAio6VToFFWTm+oxFZzhoYFIlAyZAuIQUp17gHXf7YyhUwk4rOz
 8g0XMH6uldidID6BVpArxHh/bN9MOTdHzkyhwPF3FL8E94ziX6rWILH9ADYxBn2o
 wqHR1STxv398k62toPpWb78c2RdANI8treDXsYpCyDF87dygdP+SA0qkK3G6kAVA
 /IiPothAj6oNm+Gvwd04tEkuqVVrVqrmWE3VXSps33Tk+apYtcmLCtdpwY/F93D1
 EZwTVqveBKk2hSWYVDlEyj9XKmZ9dYWDGvg2Fx844gltQHoHtEgYL+azCMU/ry7k
 3+KlkUqFZZxUDbVBbbDdMF+NpxyZTtfKsaLB8f5laOP1R8Os3+dC69Gw6bhoxaMc
 xfL3v245V5kWSRy+w02TyZGmZSzRx0FKUbFLKpLOvZD6pfx8t8oqJTG9FwN1KzpL
 lvcBpPB5AUeNJQpKcVSjs/ne8WiOqdABt7ge4E9J5TwrDI8sXk0mwJaPrlDgK1V/
 0xkkLmxWnqsu04CyDay+Uv3Bls/rRBpikR/iGt9P3BbZs7Cyryw=
 =Xei0
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:
 "Core changes:
   - PASID support for the blocked_domain

  ARM-SMMU Updates:
   - SMMUv2:
      - Implement per-client prefetcher configuration on Qualcomm SoCs
      - Support for the Adreno SMMU on Qualcomm's SDM670 SOC
   - SMMUv3:
      - Pretty-printing of event records
      - Drop the ->domain_alloc_paging implementation in favour of
        domain_alloc_paging_flags(flags==0)
   - IO-PGTable:
      - Generalisation of the page-table walker to enable external
        walkers (e.g. for debugging unexpected page-faults from the GPU)
      - Minor fix for handling concatenated PGDs at stage-2 with 16KiB
        pages
   - Misc:
      - Clean-up device probing and replace the crufty probe-deferral
        hack with a more robust implementation of
        arm_smmu_get_by_fwnode()
      - Device-tree binding updates for a bunch of Qualcomm platforms

  Intel VT-d Updates:
   - Remove domain_alloc_paging()
   - Remove capability audit code
   - Draining PRQ in sva unbind path when FPD bit set
   - Link cache tags of same iommu unit together

  AMD-Vi Updates:
   - Use CMPXCHG128 to update DTE
   - Cleanups of the domain_alloc_paging() path

  RiscV IOMMU:
   - Platform MSI support
   - Shutdown support

  Rockchip IOMMU:
   - Add DT bindings for Rockchip RK3576

  More smaller fixes and cleanups"

* tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (66 commits)
  iommu: Use str_enable_disable-like helpers
  iommu/amd: Fully decode all combinations of alloc_paging_flags
  iommu/amd: Move the nid to pdom_setup_pgtable()
  iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_mode
  iommu/amd: Remove type argument from do_iommu_domain_alloc() and related
  iommu/amd: Remove dev == NULL checks
  iommu/amd: Remove domain_alloc()
  iommu/amd: Remove unused amd_iommu_domain_update()
  iommu/riscv: Fixup compile warning
  iommu/arm-smmu-v3: Add missing #include of linux/string_choices.h
  iommu/arm-smmu-v3: Use str_read_write helper w/ logs
  iommu/io-pgtable-arm: Add way to debug pgtable walk
  iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys
  iommu/io-pgtable-arm: Make pgtable walker more generic
  iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500
  iommu/arm-smmu: Introduce ACTLR custom prefetcher settings
  iommu/arm-smmu: Add support for PRR bit setup
  iommu/arm-smmu: Refactor qcom_smmu structure to include single pointer
  iommu/arm-smmu: Re-enable context caching in smmu reset operation
  iommu/vt-d: Link cache tags of same iommu unit together
  ...
2025-01-24 07:33:46 -08:00
Linus Torvalds
4c551165e7 Updates for the interrupt subsystem:
- Consolidation of the machine_kexec_mask_interrupts() by providing a
     generic implementation and replacing the copy & pasta orgy in the
     relevant architectures.
 
   - Prevent unconditional operations on interrupt chips during kexec
     shutdown, which can trigger warnings in certain cases when the
     underlying interrupt has been shut down before.
 
   - Make the enforcement of interrupt handling in interrupt context
     unconditionally available, so that it actually works for non x86
     related interrupt chips. The earlier enablement for ARM GIC chips set
     the required chip flag, but did not notice that the check was hidden
     behind a config switch which is not selected by ARM[64].
 
   - Decrapify the handling of deferred interrupt affinity setting. Some
     interrupt chips require that affinity changes are made from the context
     of handling an interrupt to avoid certain race conditions. For x86 this
     was the default, but with interrupt remapping this requirement was
     lifted and a flag was introduced which tells the core code that
     affinity changes can be done in any context. Unrestricted affinity
     changes are the default for the majority of interrupt chips. RISCV has
     the requirement to add the deferred mode to one of it's interrupt
     controllers, but with the original implementation this would require to
     add the any context flag to all other RISC-V interrupt chips. That's
     backwards, so reverse the logic and require that chips, which need the
     deferred mode have to be marked accordingly. That avoids chasing the
     'sane' chips and marking them.
 
   - Add multi-node support to the Loongarch AVEC interrupt controller
     driver.
 
   - The usual tiny cleanups, fixes and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmePkVITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRbQD/9bHVph/V9Ekl7JAX3aY4gG4JbRhOc7
 dp1VAcHRhktRfoTztYRbjsbMu2nvZ58GKA8bkOS2jHSF/m3PbkIJfOhwk0YdIAoa
 +kdy5yDgqCGfkqW43DN4Cr+CnzGjWMitw67tFp3fhwehMDpDjdt2L28IjtanSS0f
 hO6FV7o65MWeJwxk4Isb2/nvkO+X23Lrp6RrWS8SXBnF9FFXxiPIg/fiOPTizhCh
 1W/bSGxLLb9WwsVzmlGAKVFlXDij0QGaIUug2fdVZ63OsELXD7tJrLSPG133yk92
 ppIa0s6BT4IBsfM00us4hG15PkLuJmP3yWWcoquG0rP8Wq58VOXiN6+rcJIyvB+5
 mWceTH6IKfZGoRQKwXC7BxeBAIb147reiJtb06meq1/8ADIvzafiNy0c8x9i/UaV
 QiyhPVENjaGCGDomZmJQqN7Yb02Wge1k8InQnodDrHxZNl/bX/B1Z8Bxd0n6hPHg
 NSJXYif2AxgaddpohsdygqRDbT6SNyQdj7YjJFY5qAGJ3yFyJ4JB6WTqkWW4o1vH
 3FVqdAnJmejAmmYSkah0Hkem2T5QASQmTWb93PLxiV6q+d0NM8stWAujjyVdIV/B
 W4Uj9mQ20cz54TjLtxqX+A1k6KcqOWRgh1l2QbUlFsgsOP3V8yz47yqYdR9qMWlO
 9kNEjI3sw+G/IQ==
 =q4rj
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt subsystem updates from Thomas Gleixner:

 - Consolidate the machine_kexec_mask_interrupts() by providing a
   generic implementation and replacing the copy & pasta orgy in the
   relevant architectures.

 - Prevent unconditional operations on interrupt chips during kexec
   shutdown, which can trigger warnings in certain cases when the
   underlying interrupt has been shut down before.

 - Make the enforcement of interrupt handling in interrupt context
   unconditionally available, so that it actually works for non x86
   related interrupt chips. The earlier enablement for ARM GIC chips set
   the required chip flag, but did not notice that the check was hidden
   behind a config switch which is not selected by ARM[64].

 - Decrapify the handling of deferred interrupt affinity setting.

   Some interrupt chips require that affinity changes are made from the
   context of handling an interrupt to avoid certain race conditions.
   For x86 this was the default, but with interrupt remapping this
   requirement was lifted and a flag was introduced which tells the core
   code that affinity changes can be done in any context. Unrestricted
   affinity changes are the default for the majority of interrupt chips.

   RISCV has the requirement to add the deferred mode to one of it's
   interrupt controllers, but with the original implementation this
   would require to add the any context flag to all other RISC-V
   interrupt chips. That's backwards, so reverse the logic and require
   that chips, which need the deferred mode have to be marked
   accordingly. That avoids chasing the 'sane' chips and marking them.

 - Add multi-node support to the Loongarch AVEC interrupt controller
   driver.

 - The usual tiny cleanups, fixes and improvements all over the place.

* tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set()
  genirq/timings: Add kernel-doc for a function parameter
  genirq: Remove IRQ_MOVE_PCNTXT and related code
  x86/apic: Convert to IRQCHIP_MOVE_DEFERRED
  genirq: Provide IRQCHIP_MOVE_DEFERRED
  hexagon: Remove GENERIC_PENDING_IRQ leftover
  ARC: Remove GENERIC_PENDING_IRQ
  genirq: Remove handle_enforce_irqctx() wrapper
  genirq: Make handle_enforce_irqctx() unconditionally available
  irqchip/loongarch-avec: Add multi-nodes topology support
  irqchip/ts4800: Replace seq_printf() by seq_puts()
  irqchip/ti-sci-inta : Add module build support
  irqchip/ti-sci-intr: Add module build support
  irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function
  irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args
  genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown
  kexec: Consolidate machine_kexec_mask_interrupts() implementation
  genirq: Reuse irq_thread_fn() for forced thread case
  genirq: Move irq_thread_fn() further up in the code
2025-01-21 13:51:07 -08:00
Joerg Roedel
125f34e4c1 Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'qualcomm/msm', 'rockchip', 'riscv', 'core', 'intel/vt-d' and 'amd/amd-vi' into next 2025-01-17 09:02:35 +01:00
Thomas Gleixner
7d04319a05 x86/apic: Convert to IRQCHIP_MOVE_DEFERRED
Instead of marking individual interrupts as safe to be migrated in
arbitrary contexts, mark the interrupt chips, which require the interrupt
to be moved in actual interrupt context, with the new IRQCHIP_MOVE_DEFERRED
flag. This makes more sense because this is a per interrupt chip property
and not restricted to individual interrupts.

That flips the logic from the historical opt-out to a opt-in model. This is
simpler to handle for other architectures, which default to unrestricted
affinity setting. It also allows to cleanup the redundant core logic
significantly.

All interrupt chips, which belong to a top-level domain sitting directly on
top of the x86 vector domain are marked accordingly, unless the related
setup code marks the interrupts with IRQ_MOVE_PCNTXT, i.e. XEN.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/all/20241210103335.563277044@linutronix.de
2025-01-15 21:38:53 +01:00
Zhenzhong Duan
acf5d49aaf iommu/vt-d: Link cache tags of same iommu unit together
Cache tag invalidation requests for a domain are accumulated until a
different iommu unit is found when traversing the cache_tags linked list.
But cache tags of same iommu unit can be distributed in the linked list,
this make batched flush less efficient. E.g., one device backed by iommu0
is attached to a domain in between two devices attaching backed by iommu1.

Group cache tags together for same iommu unit in cache_tag_assign() to
maximize the performance of batched flush.

Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/r/20241219054358.8654-1-zhenzhong.duan@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:53 +01:00
Lu Baolu
cf08ca81d0 iommu/vt-d: Draining PRQ in sva unbind path when FPD bit set
When a device uses a PASID for SVA (Shared Virtual Address), it's possible
that the PASID entry is marked as non-present and FPD bit set before the
device flushes all ongoing DMA requests and removes the SVA domain. This
can occur when an exception happens and the process terminates before the
device driver stops DMA and calls the iommu driver to unbind the PASID.

There's no need to drain the PRQ in the mm release path. Instead, the PRQ
will be drained in the SVA unbind path. But in such case,
intel_pasid_tear_down_entry() only checks the presence of the pasid entry
and returns directly.

Add the code to clear the FPD bit and drain the PRQ.

Fixes: c43e1ccdeb ("iommu/vt-d: Drain PRQs when domain removed from RID")
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241217024240.139615-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:52 +01:00
Lu Baolu
c220629940 iommu/vt-d: Remove iommu cap audit
The capability audit code was introduced by commit <ad3d19029979>
"iommu/vt-d: Audit IOMMU Capabilities and add helper functions", aiming
to verify the consistency of capabilities across all IOMMUs for supported
features.

Nowadays, all the kAPIs of the iommu subsystem have evolved to be device
oriented, in preparation for supporting heterogeneous IOMMU architectures.
There is no longer a need to require capability consistence among IOMMUs
for any feature.

Remove the iommu cap audit code to make the driver align with the design
in the iommu core.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241216071828.22962-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:51 +01:00
Jason Gunthorpe
de1dda7e0b iommu/vt-d: Remove domain_alloc_paging()
This is duplicated by intel_iommu_domain_alloc_paging_flags(), just remove
it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/0-v1-b101d00c5ee5+17645-vtd_paging_flags_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:50 +01:00
Kees Bakker
60f030f741 iommu/vt-d: Avoid use of NULL after WARN_ON_ONCE
There is a WARN_ON_ONCE to catch an unlikely situation when
domain_remove_dev_pasid can't find the `pasid`. In case it nevertheless
happens we must avoid using a NULL pointer.

Signed-off-by: Kees Bakker <kees@ijzerbout.nl>
Link: https://lore.kernel.org/r/20241218201048.E544818E57E@bout3.ijzerbout.nl
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07 09:30:50 +01:00
Yi Liu
4f0bdab175 iommu/vt-d: Make the blocked domain support PASID
The blocked domain can be extended to park PASID of a device to be the
DMA blocking state. By this the remove_dev_pasid() op is dropped.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20241204122928.11987-6-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18 09:39:36 +01:00
Lu Baolu
dda2b8c3c6 iommu/vt-d: Avoid draining PRQ in sva mm release path
When a PASID is used for SVA by a device, it's possible that the PASID
entry is cleared before the device flushes all ongoing DMA requests and
removes the SVA domain. This can occur when an exception happens and the
process terminates before the device driver stops DMA and calls the
iommu driver to unbind the PASID.

There's no need to drain the PRQ in the mm release path. Instead, the PRQ
will be drained in the SVA unbind path.

Unfortunately, commit c43e1ccdeb ("iommu/vt-d: Drain PRQs when domain
removed from RID") changed this behavior by unconditionally draining the
PRQ in intel_pasid_tear_down_entry(). This can lead to a potential
sleeping-in-atomic-context issue.

Smatch static checker warning:

	drivers/iommu/intel/prq.c:95 intel_iommu_drain_pasid_prq()
	warn: sleeping in atomic context

To avoid this issue, prevent draining the PRQ in the SVA mm release path
and restore the previous behavior.

Fixes: c43e1ccdeb ("iommu/vt-d: Drain PRQs when domain removed from RID")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-iommu/c5187676-2fa2-4e29-94e0-4a279dc88b49@stanley.mountain/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241212021529.1104745-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-13 15:54:27 +01:00
Yi Liu
74536f9196 iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
The qi_batch is allocated when assigning cache tag for a domain. While
for nested parent domain, it is missed. Hence, when trying to map pages
to the nested parent, NULL dereference occurred. Also, there is potential
memleak since there is no lock around domain->qi_batch allocation.

To solve it, add a helper for qi_batch allocation, and call it in both
the __cache_tag_assign_domain() and __cache_tag_assign_parent_domain().

  BUG: kernel NULL pointer dereference, address: 0000000000000200
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 8104795067 P4D 0
  Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
  Call Trace:
   ? __die+0x24/0x70
   ? page_fault_oops+0x80/0x150
   ? do_user_addr_fault+0x63/0x7b0
   ? exc_page_fault+0x7c/0x220
   ? asm_exc_page_fault+0x26/0x30
   ? cache_tag_flush_range_np+0x13c/0x260
   intel_iommu_iotlb_sync_map+0x1a/0x30
   iommu_map+0x61/0xf0
   batch_to_domain+0x188/0x250
   iopt_area_fill_domains+0x125/0x320
   ? rcu_is_watching+0x11/0x50
   iopt_map_pages+0x63/0x100
   iopt_map_common.isra.0+0xa7/0x190
   iopt_map_user_pages+0x6a/0x80
   iommufd_ioas_map+0xcd/0x1d0
   iommufd_fops_ioctl+0x118/0x1c0
   __x64_sys_ioctl+0x93/0xc0
   do_syscall_64+0x71/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 705c1cdf1e ("iommu/vt-d: Introduce batched cache invalidation")
Cc: stable@vger.kernel.org
Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241210130322.17175-1-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-13 15:54:25 +01:00