2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

320 Commits

Author SHA1 Message Date
Jason Gunthorpe
7b6dd84e70 iommu/vt-d: Update the definition of the blocking domain
The global static should pre-define the type and the NOP free function can
be now left as NULL.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/2-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26 16:53:50 +02:00
Lu Baolu
03476e687e iommu/vt-d: Disallow read-only mappings to nest parent domain
When remapping hardware is configured by system software in scalable mode
as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry,
it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled)
in first-stage page-table entries even when second-stage mappings indicate
that corresponding first-stage page-table is Read-Only.

As the result, contents of pages designated by VMM as Read-Only can be
modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of
address translation process due to DMAs issued by Guest.

This disallows read-only mappings in the domain that is supposed to be used
as nested parent. Reference from Sapphire Rapids Specification Update [1],
errata details, SPR17. Userspace should know this limitation by checking
the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO
ioctl.

[1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html

Link: https://lore.kernel.org/r/20231026044216.64964-9-yi.l.liu@intel.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:16:34 -03:00
Lu Baolu
b41e38e225 iommu/vt-d: Add nested domain allocation
This adds the support for IOMMU_HWPT_DATA_VTD_S1 type. And 'nested_parent'
is added to mark the nested parent domain to sanitize the input parent domain.

Link: https://lore.kernel.org/r/20231026044216.64964-8-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:16:34 -03:00
Yi Liu
d86724d4dc iommu/vt-d: Make domain attach helpers to be extern
This makes the helpers visible to nested.c.

Link: https://lore.kernel.org/r/20231026044216.64964-6-yi.l.liu@intel.com
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:16:34 -03:00
Yi Liu
a2cdecdf9d iommu/vt-d: Enhance capability check for nested parent domain allocation
This adds the scalable mode check before allocating the nested parent domain
as checking nested capability is not enough. User may turn off scalable mode
which also means no nested support even if the hardware supports it.

Fixes: c97d1b20d3 ("iommu/vt-d: Add domain_alloc_user op")
Link: https://lore.kernel.org/r/20231024150011.44642-1-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:16:11 -03:00
Yi Liu
2bdabb8e82 iommu: Pass in parent domain with user_data to domain_alloc_user op
domain_alloc_user op already accepts user flags for domain allocation, add
a parent domain pointer and a driver specific user data support as well.
The user data would be tagged with a type for iommu drivers to add their
own driver specific user data per hw_pagetable.

Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an
iommufd core uAPI structure. Make the user data opaque to the core, since
a userspace driver must match the kernel driver. In the future, if drivers
share some common parameter, there would be a generic parameter as well.

Link: https://lore.kernel.org/r/20231026043938.63898-7-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:15:57 -03:00
Joao Martins
f35f22cc76 iommu/vt-d: Access/Dirty bit support for SS domains
IOMMU advertises Access/Dirty bits for second-stage page table if the
extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS).
The first stage table is compatible with CPU page table thus A/D bits are
implicitly supported. Relevant Intel IOMMU SDM ref for first stage table
"3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table
"3.7.2 Accessed and Dirty Flags".

First stage page table is enabled by default so it's allowed to set dirty
tracking and no control bits needed, it just returns 0. To use SSADS, set
bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB
via pasid_flush_caches() following the manual. Relevant SDM refs:

"3.7.2 Accessed and Dirty Flags"
"6.5.3.3 Guidance to Software for Invalidations,
 Table 23. Guidance to Software for Invalidations"

PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush
IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that
iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus the
caller of the iommu op will flush the IOTLB. Relevant manuals over the
hardware translation is chapter 6 with some special mention to:

"6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations"
"6.2.4 IOTLB"

Select IOMMUFD_DRIVER only if IOMMUFD is enabled, given that IOMMU dirty
tracking requires IOMMUFD.

Link: https://lore.kernel.org/r/20231024135109.73787-13-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:43 -03:00
Jingqi Liu
d87731f609 iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid}
Add a debugfs directory per pair of {device, pasid} if the mappings of
its page table are created and destroyed by the iommu_map/unmap()
interfaces. i.e. /sys/kernel/debug/iommu/intel/<device source id>/<pasid>.
Create a debugfs file in the directory for users to dump the page
table corresponding to {device, pasid}. e.g.
/sys/kernel/debug/iommu/intel/0000:00:02.0/1/domain_translation_struct.
For the default domain without pasid, it creates a debugfs file in the
debugfs device directory for users to dump its page table. e.g.
/sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct.

When setting a domain to a PASID of device, create a debugfs file in
the pasid debugfs directory for users to dump the page table of the
specified pasid. Remove the debugfs device directory of the device
when releasing a device. e.g.
/sys/kernel/debug/iommu/intel/0000:00:01.0

Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com>
Link: https://lore.kernel.org/r/20231013135811.73953-3-Jingqi.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-16 09:34:51 +02:00
Yi Liu
c97d1b20d3 iommu/vt-d: Add domain_alloc_user op
Add the domain_alloc_user() op implementation. It supports allocating
domains to be used as parent under nested translation.

Unlike other drivers VT-D uses only a single page table format so it only
needs to check if the HW can support nesting.

Link: https://lore.kernel.org/r/20230928071528.26258-7-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-10 13:31:24 -03:00
Niklas Schnelle
fa4c450709 iommu: Allow .iotlb_sync_map to fail and handle s390's -ENOMEM return
On s390 when using a paging hypervisor, .iotlb_sync_map is used to sync
mappings by letting the hypervisor inspect the synced IOVA range and
updating a shadow table. This however means that .iotlb_sync_map can
fail as the hypervisor may run out of resources while doing the sync.
This can be due to the hypervisor being unable to pin guest pages, due
to a limit on mapped addresses such as vfio_iommu_type1.dma_entry_limit
or lack of other resources. Either way such a failure to sync a mapping
should result in a DMA_MAPPING_ERROR.

Now especially when running with batched IOTLB flushes for unmap it may
be that some IOVAs have already been invalidated but not yet synced via
.iotlb_sync_map. Thus if the hypervisor indicates running out of
resources, first do a global flush allowing the hypervisor to free
resources associated with these mappings as well a retry creating the
new mappings and only if that also fails report this error to callers.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> # sun50i
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-1-9e5fc4dacc36@linux.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-02 08:42:57 +02:00
Zhang Rui
59df44bfb0 iommu/vt-d: Avoid memory allocation in iommu_suspend()
The iommu_suspend() syscore suspend callback is invoked with IRQ disabled.
Allocating memory with the GFP_KERNEL flag may re-enable IRQs during
the suspend callback, which can cause intermittent suspend/hibernation
problems with the following kernel traces:

Calling iommu_suspend+0x0/0x1d0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 15 at kernel/time/timekeeping.c:868 ktime_get+0x9b/0xb0
...
CPU: 0 PID: 15 Comm: rcu_preempt Tainted: G     U      E      6.3-intel #r1
RIP: 0010:ktime_get+0x9b/0xb0
...
Call Trace:
 <IRQ>
 tick_sched_timer+0x22/0x90
 ? __pfx_tick_sched_timer+0x10/0x10
 __hrtimer_run_queues+0x111/0x2b0
 hrtimer_interrupt+0xfa/0x230
 __sysvec_apic_timer_interrupt+0x63/0x140
 sysvec_apic_timer_interrupt+0x7b/0xa0
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1f/0x30
...
------------[ cut here ]------------
Interrupts enabled after iommu_suspend+0x0/0x1d0
WARNING: CPU: 0 PID: 27420 at drivers/base/syscore.c:68 syscore_suspend+0x147/0x270
CPU: 0 PID: 27420 Comm: rtcwake Tainted: G     U  W   E      6.3-intel #r1
RIP: 0010:syscore_suspend+0x147/0x270
...
Call Trace:
 <TASK>
 hibernation_snapshot+0x25b/0x670
 hibernate+0xcd/0x390
 state_store+0xcf/0xe0
 kobj_attr_store+0x13/0x30
 sysfs_kf_write+0x3f/0x50
 kernfs_fop_write_iter+0x128/0x200
 vfs_write+0x1fd/0x3c0
 ksys_write+0x6f/0xf0
 __x64_sys_write+0x1d/0x30
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

Given that only 4 words memory is needed, avoid the memory allocation in
iommu_suspend().

CC: stable@kernel.org
Fixes: 33e0715710 ("iommu/vt-d: Avoid GFP_ATOMIC where it is not needed")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Ooi, Chin Hao <chin.hao.ooi@intel.com>
Link: https://lore.kernel.org/r/20230921093956.234692-1-rui.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230925120417.55977-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25 16:10:36 +02:00
Linus Torvalds
0468be89b3 IOMMU Updates for Linux v6.6
Including:
 
 	- Core changes:
 	  - Consolidate probe_device path
 	  - Make the PCI-SAC IOVA allocation trick PCI-only
 
 	- AMD IOMMU:
 	  - Consolidate PPR log handling
 	  - Interrupt handling improvements
 	  - Refcount fixes for amd_iommu_v2 driver
 
 	- Intel VT-d driver:
 	  - Enable idxd device DMA with pasid through iommu dma ops.
 	  - Lift RESV_DIRECT check from VT-d driver to core.
 	  - Miscellaneous cleanups and fixes.
 
 	- ARM-SMMU drivers:
 	  - Device-tree binding updates:
 	    - Add additional compatible strings for Qualcomm SoCs
 	    - Allow ASIDs to be configured in the DT to work around Qualcomm's
 	      broken hypervisor
 	    - Fix clocks for Qualcomm's MSM8998 SoC
 	  - SMMUv2:
 	    - Support for Qualcomm's legacy firmware implementation featured on
 	      at least MSM8956 and MSM8976.
 	    - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants
 	  - SMMUv3:
 	    - Use 'ida' instead of a bitmap for VMID allocation
 
 	  - Rockchip IOMMU:
 	    - Lift page-table allocation restrictions on newer hardware
 
 	  - Mediatek IOMMU:
 	    - Add MT8188 IOMMU Support
 
 	  - Renesas IOMMU:
 	    - Allow PCIe devices
 
 	- Usual set of cleanups an smaller fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmTx7IMACgkQK/BELZcB
 GuMxUA/+P/wYvAKCbDpXyszIpyCTx37BkeRTBaVqG0vEKLG6439i+PIm3oudQK+6
 0y+1clJi0Ddu0uv1ck90cIEP1YDuKaKdrOVeE7TtlK+6LKYxTyeN+mz4csMIbahI
 6JMrWzrIEPIyMBHzAepQiGDCsmDkrCngPj0WmA7+EQZSSHVYp+TLe6OLzNs74vDF
 zCITkYNq6aKyg/dNJpMRy6VOHvw9PUiwRvm7ko7WONP4VCtpW4g3Jpkerf19zoV2
 s0nwZuGn3o7F0aFOpRJPPKQNfQnNjOjHdxjcsGBafD9qqAk4TLvnZH24njKtPidJ
 P8CiAu//HxhDyUPTgTIrDroVOGVG7s85XO+WesjPkEI3vnNjXy+qEIinQBJ3oIaI
 ppDLSnArEhfSRgt6dXvPCJ/g4+WGS9jNV85GCa7XBtal2Msu8G89NKC97mpmjCkb
 lnGmCF9t7Tkt/fLWxw4GADBN3m2tOib1GQMvPYAF2WM3jH5aRq2UliIRuCHZkzwv
 EF3SiFQQqab6oogU9tF/A1QLUKQ8QfYOdabqL9z2COgF5tS00VC6b/6VTNkKeBHe
 qIiOpI7IWo76tFJule5gRaUth9nVkjpEo6kL9I6rEldOlFJrX6uaHTta6/isY3gx
 vkN98V/OThRUbDwMD122YVKNNjZE2MNsTeptXqB3jHvl3UWiLsQ=
 =RV+G
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "Core changes:

   - Consolidate probe_device path

   - Make the PCI-SAC IOVA allocation trick PCI-only

  AMD IOMMU:

   - Consolidate PPR log handling

   - Interrupt handling improvements

   - Refcount fixes for amd_iommu_v2 driver

  Intel VT-d driver:

   - Enable idxd device DMA with pasid through iommu dma ops

   - Lift RESV_DIRECT check from VT-d driver to core

   - Miscellaneous cleanups and fixes

  ARM-SMMU drivers:

   - Device-tree binding updates:
      - Add additional compatible strings for Qualcomm SoCs
      - Allow ASIDs to be configured in the DT to work around Qualcomm's
        broken hypervisor
      - Fix clocks for Qualcomm's MSM8998 SoC

   - SMMUv2:
      - Support for Qualcomm's legacy firmware implementation featured
        on at least MSM8956 and MSM8976
      - Match compatible strings for Qualcomm SM6350 and SM6375 SoC
        variants

   - SMMUv3:
      - Use 'ida' instead of a bitmap for VMID allocation

   - Rockchip IOMMU:
      - Lift page-table allocation restrictions on newer hardware

   - Mediatek IOMMU:
      - Add MT8188 IOMMU Support

   - Renesas IOMMU:
      - Allow PCIe devices

  .. and the usual set of cleanups an smaller fixes"

* tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (64 commits)
  iommu: Explicitly include correct DT includes
  iommu/amd: Remove unused declarations
  iommu/arm-smmu-qcom: Add SM6375 SMMUv2
  iommu/arm-smmu-qcom: Add SM6350 DPU compatible
  iommu/arm-smmu-qcom: Add SM6375 DPU compatible
  iommu/arm-smmu-qcom: Sort the compatible list alphabetically
  dt-bindings: arm-smmu: Fix MSM8998 clocks description
  iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope()
  iommu/vt-d: Fix to convert mm pfn to dma pfn
  iommu/vt-d: Fix to flush cache of PASID directory table
  iommu/vt-d: Remove rmrr check in domain attaching device path
  iommu: Prevent RESV_DIRECT devices from blocking domains
  dmaengine/idxd: Re-enable kernel workqueue under DMA API
  iommu/vt-d: Add set_dev_pasid callback for dma domain
  iommu/vt-d: Prepare for set_dev_pasid callback
  iommu/vt-d: Make prq draining code generic
  iommu/vt-d: Remove pasid_mutex
  iommu/vt-d: Add domain_flush_pasid_iotlb()
  iommu: Move global PASID allocation from SVA to core
  iommu: Generalize PASID 0 for normal DMA w/o PASID
  ...
2023-09-01 16:54:25 -07:00
Joerg Roedel
d8fe59f110 Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next 2023-08-21 14:18:43 +02:00
Yi Liu
55243393b0 iommu/vt-d: Implement hw_info for iommu capability query
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.

Link: https://lore.kernel.org/r/20230818101033.4100-6-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-18 12:52:15 -03:00
Yanfei Xu
fb5f50a43d iommu/vt-d: Fix to convert mm pfn to dma pfn
For the case that VT-d page is smaller than mm page, converting dma pfn
should be handled in two cases which are for start pfn and for end pfn.
Currently the calculation of end dma pfn is incorrect and the result is
less than real page frame number which is causing the mapping of iova
always misses some page frames.

Rename the mm_to_dma_pfn() to mm_to_dma_pfn_start() and add a new helper
for converting end dma pfn named mm_to_dma_pfn_end().

Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230625082046.979742-1-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:46:19 +02:00
Lu Baolu
d3aedf94f4 iommu/vt-d: Remove rmrr check in domain attaching device path
The core code now prevents devices with RMRR regions from being assigned
to user space. There is no need to check for this condition in individual
drivers. Remove it to avoid duplicate code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230724060352.113458-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:46:18 +02:00
Lu Baolu
7d0c9da6c1 iommu/vt-d: Add set_dev_pasid callback for dma domain
This allows the upper layers to set a domain to a PASID of a device
if the PASID feature is supported by the IOMMU hardware. The typical
use cases are, for example, kernel DMA with PASID and hardware
assisted mediated device drivers.

The attaching device and pasid information is tracked in a per-domain
list and is used for IOTLB and devTLB invalidation.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-8-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:39 +02:00
Lu Baolu
37f900e718 iommu/vt-d: Prepare for set_dev_pasid callback
The domain_flush_pasid_iotlb() helper function is used to flush the IOTLB
entries for a given PASID. Previously, this function assumed that
RID2PASID was only used for the first-level DMA translation. However, with
the introduction of the set_dev_pasid callback, this assumption is no
longer valid.

Add a check before using the RID2PASID for PASID invalidation. This check
ensures that the domain has been attached to a physical device before
using RID2PASID.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-7-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:38 +02:00
Lu Baolu
154786235d iommu/vt-d: Make prq draining code generic
Currently draining page requests and responses for a pasid is part of SVA
implementation. This is because the driver only supports attaching an SVA
domain to a device pasid. As we are about to support attaching other types
of domains to a device pasid, the prq draining code becomes generic.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-6-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:37 +02:00
Lu Baolu
ac1a3483fe iommu/vt-d: Add domain_flush_pasid_iotlb()
The VT-d spec requires to use PASID-based-IOTLB invalidation descriptor
to invalidate IOTLB and the paging-structure caches for a first-stage
page table. Add a generic helper to do this.

RID2PASID is used if the domain has been attached to a physical device,
otherwise real PASIDs that the domain has been attached to will be used.
The 'real' PASID attachment is handled in the subsequent change.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-4-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:37 +02:00
Jacob Pan
4298780126 iommu: Generalize PASID 0 for normal DMA w/o PASID
PCIe Process address space ID (PASID) is used to tag DMA traffic, it
provides finer grained isolation than requester ID (RID).

For each device/RID, 0 is a special PASID for the normal DMA (no
PASID). This is universal across all architectures that supports PASID,
therefore warranted to be reserved globally and declared in the common
header. Consequently, we can avoid the conflict between different PASID
use cases in the generic code. e.g. SVA and DMA API with PASIDs.

This paved away for device drivers to choose global PASID policy while
continue doing normal DMA.

Noting that VT-d could support none-zero RID/NO_PASID, but currently not
used.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230802212427.1497170-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09 17:44:34 +02:00
Jason Gunthorpe
6eb4da8cf5 iommu: Have __iommu_probe_device() check for already probed devices
This is a step toward making __iommu_probe_device() self contained.

It should, under proper locking, check if the device is already associated
with an iommu driver and resolve parallel probes. All but one of the
callers open code this test using two different means, but they all
rely on dev->iommu_group.

Currently the bus_iommu_probe()/probe_iommu_group() and
probe_acpi_namespace_devices() rejects already probed devices with an
unlocked read of dev->iommu_group. The OF and ACPI "replay" functions use
device_iommu_mapped() which is the same read without the pointless
refcount.

Move this test into __iommu_probe_device() and put it under the
iommu_probe_device_lock. The store to dev->iommu_group is in
iommu_group_add_device() which is also called under this lock for iommu
driver devices, making it properly locked.

The only path that didn't have this check is the hotplug path triggered by
BUS_NOTIFY_ADD_DEVICE. The only way to get dev->iommu_group assigned
outside the probe path is via iommu_group_add_device(). Today the only
caller is VFIO no-iommu which never associates with an iommu driver. Thus
adding this additional check is safe.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-328044aa278c+45e49-iommu_probe_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14 16:14:12 +02:00
Joerg Roedel
a7a334076d Merge branches 'iommu/fixes', 'arm/smmu', 'ppc/pamu', 'virtio', 'x86/vt-d', 'core' and 'x86/amd' into next 2023-06-19 10:12:42 +02:00
Lu Baolu
b4da4e112a iommu/vt-d: Remove commented-out code
These lines of code were commented out when they were first added in commit
ba39592764 ("Intel IOMMU: Intel IOMMU driver"). We do not want to restore
them because the VT-d spec has deprecated the read/write draining hit.

VT-d spec (section 11.4.2):
"
 Hardware implementation with Major Version 2 or higher (VER_REG), always
 performs required drain without software explicitly requesting a drain in
 IOTLB invalidation. This field is deprecated and hardware  will always
 report it as 1 to maintain backward compatibility with software.
"

Remove the code to make the code cleaner.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20230609060514.15154-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:33 +02:00
Yanfei Xu
3f13f72787 iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one()
Remove the WARN_ON(did == 0) as the domain id 0 is reserved and
set once the domain_ids is allocated. So iommu_init_domains will
never return 0.

Remove the WARN_ON(!table) as this pointer will be accessed in
the following code, if empty "table" really happens, the kernel
will report a NULL pointer reference warning at the first place.

Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230605112659.308981-3-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:32 +02:00
Yanfei Xu
a0e9911ac1 iommu/vt-d: Handle the failure case of dmar_reenable_qi()
dmar_reenable_qi() may not succeed. Check and return when it fails.

Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Link: https://lore.kernel.org/r/20230605112659.308981-2-yanfei.xu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:32 +02:00
Suhui
82d9654f92 iommu/vt-d: Remove unnecessary (void*) conversions
No need cast (void*) to (struct root_entry *).

Signed-off-by: Suhui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20230425033743.75986-1-suhui@nfschina.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16 16:38:31 +02:00
Robin Murphy
a4fdd97622 iommu: Use flush queue capability
It remains really handy to have distinct DMA domain types within core
code for the sake of default domain policy selection, but we can now
hide that detail from drivers by using the new capability instead.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1c552d99e8ba452bdac48209fa74c0bdd52fd9d9.1683233867.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:38:45 +02:00
Robin Murphy
4a20ce0ff6 iommu: Add a capability for flush queue support
Passing a special type to domain_alloc to indirectly query whether flush
queues are a worthwhile optimisation with the given driver is a bit
clunky, and looking increasingly anachronistic. Let's put that into an
explicit capability instead.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> # amd, intel, smmu-v3
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/f0086a93dbccb92622e1ace775846d81c1c4b174.1683233867.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-22 17:38:44 +02:00
Joerg Roedel
e51b419839 Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next 2023-04-14 13:45:50 +02:00
Tina Zhang
cbf2f9e8ba iommu/vt-d: Remove BUG_ON in map/unmap()
Domain map/unmap with invalid parameters shouldn't crash the kernel.
Therefore, using if() replaces the BUG_ON.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-6-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:52 +02:00
Tina Zhang
998d4c2db3 iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
When performing domain_context_mapping or getting dma_pte of a pfn, the
availability of the domain page table directory is ensured. Therefore,
the domain->pgd checkings are unnecessary.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-5-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:52 +02:00
Tina Zhang
4a627a2593 iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
VT-d iotlb cache invalidation request with unexpected type is considered
as a bug to developers, which can be fixed. So, when such kind of issue
comes out, it needs to be reported through the kernel log, instead of
halting the system. Replacing BUG_ON with warning reporting.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-4-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:51 +02:00
Tina Zhang
35dc5d8998 iommu/vt-d: Remove BUG_ON on checking valid pfn range
When encountering an unexpected invalid pfn range, the kernel should
attempt recovery and proceed with execution. Therefore, using WARN_ON to
replace BUG_ON to avoid halting the machine.

Besides, one redundant checking is reduced.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-3-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:51 +02:00
Tina Zhang
b31064f881 iommu/vt-d: Make size of operands same in bitwise operations
This addresses the following issue reported by klocwork tool:

 - operands of different size in bitwise operations

Suggested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406065944.2773296-2-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:50 +02:00
Jacob Pan
a7050fbde3 iommu/vt-d: Use non-privileged mode for all PASIDs
Supervisor Request Enable (SRE) bit in a PASID entry is for permission
checking on DMA requests. When SRE = 0, DMA with supervisor privilege
will be blocked. However, for in-kernel DMA this is not necessary in that
we are targeting kernel memory anyway. There's no need to differentiate
user and kernel for in-kernel DMA.

Let's use non-privileged (user) permission for all PASIDs used in kernel,
it will be consistent with DMA without PASID (RID_PASID) as well.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230331231137.1947675-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:49 +02:00
Lu Baolu
7b8aa998d6 iommu/vt-d: Remove unnecessary checks in iopf disabling path
iommu_unregister_device_fault_handler() and iopf_queue_remove_device()
are called after device has stopped issuing new page falut requests and
all outstanding page requests have been drained. They should never fail.
Trigger a warning if it happens unfortunately.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:48 +02:00
Lu Baolu
fbcde5bb92 iommu/vt-d: Move PRI handling to IOPF feature path
PRI is only used for IOPF. With this move, the PCI/PRI feature could be
controlled by the device driver through iommu_dev_enable/disable_feature()
interfaces.

Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:48 +02:00
Lu Baolu
5ae4008055 iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
They should be part of the per-device iommu private data initialization.

Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:47 +02:00
Lu Baolu
3d4c7cc3d1 iommu/vt-d: Move iopf code from SVA to IOPF enabling path
Generally enabling IOMMU_DEV_FEAT_SVA requires IOMMU_DEV_FEAT_IOPF, but
some devices manage I/O Page Faults themselves instead of relying on the
IOMMU. Move IOPF related code from SVA to IOPF enabling path.

For the device drivers that relies on the IOMMU for IOPF through PCI/PRI,
IOMMU_DEV_FEAT_IOPF must be enabled before and disabled after
IOMMU_DEV_FEAT_SVA.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:47 +02:00
Lu Baolu
a86fb77173 iommu/vt-d: Allow SVA with device-specific IOPF
Currently enabling SVA requires IOPF support from the IOMMU and device
PCI PRI. However, some devices can handle IOPF by itself without ever
sending PCI page requests nor advertising PRI capability.

Allow SVA support with IOPF handled either by IOMMU (PCI PRI) or device
driver (device-specific IOPF). As long as IOPF could be handled, SVA
should continue to work.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230324120234.313643-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-13 12:05:46 +02:00
Jacob Pan
fffaed1e24 iommu/ioasid: Rename INVALID_IOASID
INVALID_IOASID and IOMMU_PASID_INVALID are duplicated. Rename
INVALID_IOASID and consolidate since we are moving away from IOASID
infrastructure.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-7-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:27 +02:00
Jacob Pan
760f41d182 iommu/vt-d: Remove virtual command interface
Virtual command interface was introduced to allow using host PASIDs
inside VMs. It is unused and abandoned due to architectural change.

With this patch, we can safely remove this feature and the related helpers.

Link: https://lore.kernel.org/r/20230210230206.3160144-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322200803.869130-2-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31 10:03:21 +02:00
Lu Baolu
c33fcc13ee iommu: Use sysfs_emit() for sysfs show
Use sysfs_emit() instead of the sprintf() for sysfs entries. sysfs_emit()
knows the maximum of the temporary buffer used for outputting sysfs
content and avoids overrunning the buffer length.

Prefer 'long long' over 'long long int' as suggested by checkpatch.pl.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230322123421.278852-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22 15:47:10 +01:00
Linus Torvalds
143c7bc649 iommufd for 6.3
Some polishing and small fixes for iommufd:
 
 - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem
 
 - Use GFP_KERNEL_ACCOUNT inside the iommu_domains
 
 - Support VFIO_NOIOMMU mode with iommufd
 
 - Various typos
 
 - A list corruption bug if HWPTs are used for attach
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/TgzQAKCRCFwuHvBreF
 Ya3AAP4/WxTJIbDvtTyH3Fae3NxTdO8j8gsUvU1vrRYG83zdnAEAxd1yii7GEO8D
 crkeq9D4FUiPAkFnJ64Exw2FHb060Qg=
 =RABK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Some polishing and small fixes for iommufd:

   - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt
     subsystem

   - Use GFP_KERNEL_ACCOUNT inside the iommu_domains

   - Support VFIO_NOIOMMU mode with iommufd

   - Various typos

   - A list corruption bug if HWPTs are used for attach"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Do not add the same hwpt to the ioas->hwpt_list twice
  iommufd: Make sure to zero vfio_iommu_type1_info before copying to user
  vfio: Support VFIO_NOIOMMU with iommufd
  iommufd: Add three missing structures in ucmd_buffer
  selftests: iommu: Fix test_cmd_destroy_access() call in user_copy
  iommu: Remove IOMMU_CAP_INTR_REMAP
  irq/s390: Add arch_is_isolated_msi() for s390
  iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI
  genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI
  genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code
  iommufd: Convert to msi_device_has_isolated_msi()
  vfio/type1: Convert to iommu_group_has_isolated_msi()
  iommu: Add iommu_group_has_isolated_msi()
  genirq/msi: Add msi_device_has_isolated_msi()
2023-02-24 14:34:12 -08:00
Joerg Roedel
bedd29d793 Merge branches 'apple/dart', 'arm/exynos', 'arm/renesas', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next 2023-02-18 15:43:04 +01:00
Tina Zhang
257ec29074 iommu/vt-d: Allow to use flush-queue when first level is default
Commit 29b3283972 ("iommu/vt-d: Do not use flush-queue when caching-mode
is on") forced default domains to be strict mode as long as IOMMU
caching-mode is flagged. The reason for doing this is that when vIOMMU
uses VT-d caching mode to synchronize shadowing page tables, the strict
mode shows better performance.

However, this optimization is orthogonal to the first-level page table
because the Intel VT-d architecture does not define the caching mode of
the first-level page table. Refer to VT-d spec, section 6.1, "When the
CM field is reported as Set, any software updates to remapping
structures other than first-stage mapping (including updates to not-
present entries or present entries whose programming resulted in
translation faults) requires explicit invalidation of the caches."
Exclude the first-level page table from this optimization.

Generally using first-stage translation in vIOMMU implies nested
translation enabled in the physical IOMMU. In this case the first-stage
page table is wholly captured by the guest. The vIOMMU only needs to
transfer the cache invalidations on vIOMMU to the physical IOMMU.
Forcing the default domain to strict mode will cause more frequent
cache invalidations, resulting in performance degradation. In a real
performance benchmark test measured by iperf receive, the performance
result on Sapphire Rapids 100Gb NIC shows:
w/ this fix ~51 Gbits/s, w/o this fix ~39.3 Gbits/s.

Theoretically a first-stage IOMMU page table can still be shadowed
in absence of the caching mode, e.g. with host write-protecting guest
IOMMU page table to synchronize changed PTEs with the physical
IOMMU page table. In this case the shadowing overhead is decoupled
from emulating IOTLB invalidation then the overhead of the latter part
is solely decided by the frequency of IOTLB invalidations. Hence
allowing guest default dma domain to be lazy can also benefit the
overall performance by reducing the total VM-exit numbers.

Fixes: 29b3283972 ("iommu/vt-d: Do not use flush-queue when caching-mode is on")
Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230214025618.2292889-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:05 +01:00
Jacob Pan
16a75bbe48 iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
Intel IOMMU driver implements IOTLB flush queue with domain selective
or PASID selective invalidations. In this case there's no need to track
IOVA page range and sync IOTLBs, which may cause significant performance
hit.

This patch adds a check to avoid IOVA gather page and IOTLB sync for
the lazy path.

The performance difference on Sapphire Rapids 100Gb NIC is improved by
the following (as measured by iperf send):

w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s

Cc: <stable@vger.kernel.org>
Fixes: 2a2b8eaa5b ("iommu: Handle freelists when using deferred flushing in iommu drivers")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20230209175330.1783556-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:05 +01:00
Lu Baolu
60b1daa3b1 iommu/vt-d: Fix error handling in sva enable/disable paths
Roll back all previous actions in error paths of intel_iommu_enable_sva()
and intel_iommu_disable_sva().

Fixes: d5b9e4bfe0 ("iommu/vt-d: Report prq to io-pgfault framework")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230208051559.700109-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-16 14:43:04 +01:00
Kan Liang
d8a7c0cf05 iommu/vt-d: Enable IOMMU perfmon support
Register and enable an IOMMU perfmon for each active IOMMU device.

The failure of IOMMU perfmon registration doesn't impact other
functionalities of an IOMMU device.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-8-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:09 +01:00
Kan Liang
dc57875866 iommu/vt-d: Support Enhanced Command Interface
The Enhanced Command Register is to submit command and operand of
enhanced commands to DMA Remapping hardware. It can supports up to 256
enhanced commands.

There is a HW register to indicate the availability of all 256 enhanced
commands. Each bit stands for each command. But there isn't an existing
interface to read/write all 256 bits. Introduce the u64 ecmdcap[4] to
store the existence of each enhanced command. Read 4 times to get all of
them in map_iommu().

Add a helper to facilitate an enhanced command launch. Make sure hardware
complete the command. Also add a helper to facilitate the check of PMU
essentials. These helpers will be used later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230128200428.1459118-4-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:05 +01:00
Lu Baolu
d82e6ae67a iommu/vt-d: Remove include/linux/intel-svm.h
There's no need to have a public header for Intel SVA implementation.
The device driver should interact with Intel SVA implementation via
the IOMMU generic APIs.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230109014955.147068-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03 11:06:00 +01:00
Jason Gunthorpe
fd9f2a9122 Merge branch 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu intoiommufd/for-next
Jason Gunthorpe says:

====================
iommufd follows the same design as KVM and uses memory cgroups to limit
the amount of kernel memory a iommufd file descriptor can pin down. The
various internal data structures already use GFP_KERNEL_ACCOUNT to charge
its own memory.

However, one of the biggest consumers of kernel memory is the IOPTEs
stored under the iommu_domain and these allocations are not tracked.

This series is the first step in fixing it.

The iommu driver contract already includes a 'gfp' argument to the
map_pages op, allowing iommufd to specify GFP_KERNEL_ACCOUNT and then
having the driver allocate the IOPTE tables with that flag will capture a
significant amount of the allocations.

Update the iommu_map() API to pass in the GFP argument, and fix all call
sites. Replace iommu_map_atomic().

Audit the "enterprise" iommu drivers to make sure they do the right thing.
Intel and S390 ignore the GFP argument and always use GFP_ATOMIC. This is
problematic for iommufd anyhow, so fix it. AMD and ARM SMMUv2/3 are
already correct.

A follow up series will be needed to capture the allocations made when the
iommu_domain itself is allocated, which will complete the job.
====================

* 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/s390: Use GFP_KERNEL in sleepable contexts
  iommu/s390: Push the gfp parameter to the kmem_cache_alloc()'s
  iommu/intel: Use GFP_KERNEL in sleepable contexts
  iommu/intel: Support the gfp argument to the map_pages op
  iommu/intel: Add a gfp parameter to alloc_pgtable_page()
  iommufd: Use GFP_KERNEL_ACCOUNT for iommu_map()
  iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous()
  iommu: Add a gfp parameter to iommu_map_sg()
  iommu: Remove iommu_map_atomic()
  iommu: Add a gfp parameter to iommu_map()

Link: https://lore.kernel.org/linux-iommu/0-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-30 13:54:35 -04:00
Jason Gunthorpe
4951eb2623 iommu/intel: Use GFP_KERNEL in sleepable contexts
These contexts are sleepable, so use the proper annotation. The GFP_ATOMIC
was added mechanically in the prior patches.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/8-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-25 11:52:06 +01:00
Jason Gunthorpe
2d4d767659 iommu/intel: Support the gfp argument to the map_pages op
Flow it down to alloc_pgtable_page() via pfn_to_dma_pte() and
__domain_mapping().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-25 11:52:06 +01:00
Jason Gunthorpe
2552d3a229 iommu/intel: Add a gfp parameter to alloc_pgtable_page()
This is eventually called by iommufd through intel_iommu_map_pages() and
it should not be forced to atomic. Push the GFP_ATOMIC to all callers.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-25 11:52:05 +01:00
Jason Gunthorpe
f188bdb5f1 iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI
On x86 platforms when the HW can support interrupt remapping the iommu
driver creates an irq_domain for the IR hardware and creates a child MSI
irq_domain.

When the global irq_remapping_enabled is set, the IR MSI domain is
assigned to the PCI devices (by intel_irq_remap_add_device(), or
amd_iommu_set_pci_msi_domain()) making those devices have the isolated MSI
property.

Due to how interrupt domains work, setting IRQ_DOMAIN_FLAG_ISOLATED_MSI on
the parent IR domain will cause all struct devices attached to it to
return true from msi_device_has_isolated_msi(). This replaces the
IOMMU_CAP_INTR_REMAP flag as all places using IOMMU_CAP_INTR_REMAP also
call msi_device_has_isolated_msi()

Set the flag and delete the cap.

Link: https://lore.kernel.org/r/7-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-11 16:27:23 -04:00
Linus Torvalds
b8fd76f418 IOMMU Updates for Linux v6.2
Including:
 
 	- Core code:
 	  - map/unmap_pages() cleanup
 	  - SVA and IOPF refactoring
 	  - Clean up and document return codes from device/domain
 	    attachment code
 
 	- AMD driver:
 	  - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet
 	    and ivrs_acpihid command line options
 	  - Some smaller cleanups
 
 	- Intel driver:
 	  - Blocking domain support
 	  - Cleanups
 
 	- S390 driver:
 	  - Fixes and improvements for attach and aperture handling
 
 	- PAMU driver:
 	  - Resource leak fix and cleanup
 
 	- Rockchip driver:
 	  - Page table permission bit fix
 
 	- Mediatek driver:
 	  - Improve safety from invalid dts input
 	  - Smaller fixes and improvements
 
 	- Exynos driver:
 	  - Fix driver initialization sequence
 
 	- Sun50i driver:
 	  - Remove IOMMU_DOMAIN_IDENTITY as it has not been working
 	    forever
 	  - Various other fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmOd1PQACgkQK/BELZcB
 GuO7NxAAiwJUO99pTwvqnByzcC783AuE/fqKHDb9DZaN6Cr0VXSbKEwm8Lc2PC00
 2CTwK/zGhy8BKBQnPiooJ+YOMPjE4yhFIF9jr5ASH5AVWv8EEFpo8zIFKAcF5rh/
 c2Y5RIUwsGXuhR7U3lMTw84r39TZG2eHPwTEU6KvEJ1LCOMyD8IBYrZK2rvpGpem
 3swXUfF5bQGAT8LlIFN7p+qsVs6ZtuD40qre3kerjrBtCPUMlxIIV5TJ8oQTecsk
 vKpD51mEVW+rjUKvqui8NDYuPfT76F2FPS37dfA1F36p8dmsMGSrtWngNm73r546
 AmY8Gui6wKsv4Qn7Mxv49f/WZIXzdRTXOKx/zhYvvGxu7keqQIRIWYcLSxqfaGku
 cqJT401Ws1NHmRpx/t90lMH/anY5+kUMRTQG9Iq5ruLhExskd0SJcffa1i7YIGIe
 lPCTDf7MOXfDudR0Dtp87pGZQBaSkrSzZvb7qZY3Bj83WGZnLPpl6Z3N8KbkGzEO
 zNNvv1CtxZnIPrdOaKvfxQlAKiWKxkPRHuqk1TE8hkoNOe5ZgdOSJP5SeCrZ5tEf
 qljPXvDVF9f8CYw7QlfEDnbLnqDMGZpPAGqKPItbaijQLPZx4Jm4dw6+7i9hETIa
 wJ+1R9iAf+qiR0rlqueALKRaI4DjE8RU8yYSDpn2kn0BUOhWmb8=
 =ZM/m
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "Core code:
   - map/unmap_pages() cleanup
   - SVA and IOPF refactoring
   - Clean up and document return codes from device/domain attachment

  AMD driver:
   - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet and
     ivrs_acpihid command line options
   - Some smaller cleanups

  Intel driver:
   - Blocking domain support
   - Cleanups

  S390 driver:
   - Fixes and improvements for attach and aperture handling

  PAMU driver:
   - Resource leak fix and cleanup

  Rockchip driver:
   - Page table permission bit fix

  Mediatek driver:
   - Improve safety from invalid dts input
   - Smaller fixes and improvements

  Exynos driver:
   - Fix driver initialization sequence

  Sun50i driver:
   - Remove IOMMU_DOMAIN_IDENTITY as it has not been working forever
   - Various other fixes"

* tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (74 commits)
  iommu/mediatek: Fix forever loop in error handling
  iommu/mediatek: Fix crash on isr after kexec()
  iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY
  iommu/amd: Fix typo in macro parameter name
  iommu/mediatek: Remove unused "mapping" member from mtk_iommu_data
  iommu/mediatek: Improve safety for mediatek,smi property in larb nodes
  iommu/mediatek: Validate number of phandles associated with "mediatek,larbs"
  iommu/mediatek: Add error path for loop of mm_dts_parse
  iommu/mediatek: Use component_match_add
  iommu/mediatek: Add platform_device_put for recovering the device refcnt
  iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe()
  iommu/vt-d: Use real field for indication of first level
  iommu/vt-d: Remove unnecessary domain_context_mapped()
  iommu/vt-d: Rename domain_add_dev_info()
  iommu/vt-d: Rename iommu_disable_dev_iotlb()
  iommu/vt-d: Add blocking domain support
  iommu/vt-d: Add device_block_translation() helper
  iommu/vt-d: Allocate pasid table in device probe path
  iommu/amd: Check return value of mmu_notifier_register()
  iommu/amd: Fix pci device refcount leak in ppr_notifier()
  ...
2022-12-19 08:34:39 -06:00
Linus Torvalds
08cdc21579 iommufd for 6.2
iommufd is the user API to control the IOMMU subsystem as it relates to
 managing IO page tables that point at user space memory.
 
 It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
 container) which is the VFIO specific interface for a similar idea.
 
 We see a broad need for extended features, some being highly IOMMU device
 specific:
  - Binding iommu_domain's to PASID/SSID
  - Userspace IO page tables, for ARM, x86 and S390
  - Kernel bypassed invalidation of user page tables
  - Re-use of the KVM page table in the IOMMU
  - Dirty page tracking in the IOMMU
  - Runtime Increase/Decrease of IOPTE size
  - PRI support with faults resolved in userspace
 
 Many of these HW features exist to support VM use cases - for instance the
 combination of PASID, PRI and Userspace IO Page Tables allows an
 implementation of DMA Shared Virtual Addressing (vSVA) within a
 guest. Dirty tracking enables VM live migration with SRIOV devices and
 PASID support allow creating "scalable IOV" devices, among other things.
 
 As these features are fundamental to a VM platform they need to be
 uniformly exposed to all the driver families that do DMA into VMs, which
 is currently VFIO and VDPA.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF
 YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl
 s7fAu+CQ1pr9+9NKGevD+frw8Solsw4=
 =jJkd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd implementation from Jason Gunthorpe:
 "iommufd is the user API to control the IOMMU subsystem as it relates
  to managing IO page tables that point at user space memory.

  It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
  container) which is the VFIO specific interface for a similar idea.

  We see a broad need for extended features, some being highly IOMMU
  device specific:
   - Binding iommu_domain's to PASID/SSID
   - Userspace IO page tables, for ARM, x86 and S390
   - Kernel bypassed invalidation of user page tables
   - Re-use of the KVM page table in the IOMMU
   - Dirty page tracking in the IOMMU
   - Runtime Increase/Decrease of IOPTE size
   - PRI support with faults resolved in userspace

  Many of these HW features exist to support VM use cases - for instance
  the combination of PASID, PRI and Userspace IO Page Tables allows an
  implementation of DMA Shared Virtual Addressing (vSVA) within a guest.
  Dirty tracking enables VM live migration with SRIOV devices and PASID
  support allow creating "scalable IOV" devices, among other things.

  As these features are fundamental to a VM platform they need to be
  uniformly exposed to all the driver families that do DMA into VMs,
  which is currently VFIO and VDPA"

For more background, see the extended explanations in Jason's pull request:

  https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits)
  iommufd: Change the order of MSI setup
  iommufd: Improve a few unclear bits of code
  iommufd: Fix comment typos
  vfio: Move vfio group specific code into group.c
  vfio: Refactor dma APIs for emulated devices
  vfio: Wrap vfio group module init/clean code into helpers
  vfio: Refactor vfio_device open and close
  vfio: Make vfio_device_open() truly device specific
  vfio: Swap order of vfio_device_container_register() and open_device()
  vfio: Set device->group in helper function
  vfio: Create wrappers for group register/unregister
  vfio: Move the sanity check of the group to vfio_create_group()
  vfio: Simplify vfio_create_group()
  iommufd: Allow iommufd to supply /dev/vfio/vfio
  vfio: Make vfio_container optionally compiled
  vfio: Move container related MODULE_ALIAS statements into container.c
  vfio-iommufd: Support iommufd for emulated VFIO devices
  vfio-iommufd: Support iommufd for physical VFIO devices
  vfio-iommufd: Allow iommufd to be used in place of a container fd
  vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
  ...
2022-12-14 09:15:43 -08:00
Joerg Roedel
e3eca2e4f6 Merge branches 'arm/allwinner', 'arm/exynos', 'arm/mediatek', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 's390', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-12-12 12:50:53 +01:00
Jacob Pan
81c95fbaeb iommu/vt-d: Fix buggy QAT device mask
Impacted QAT device IDs that need extra dtlb flush quirk is ranging
from 0x4940 to 0x4943. After bitwise AND device ID with 0xfffc the
result should be 0x4940 instead of 0x494c to identify these devices.

Fixes: e65a6897be ("iommu/vt-d: Add a fix for devices need extra dtlb flush")
Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20221203005610.2927487-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05 14:27:03 +01:00
Jason Gunthorpe
90337f526c Merge tag 'v6.1-rc7' into iommufd.git for-next
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version.
The rc fix was done a different way when iommufd patches reworked this
code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02 12:04:39 -04:00
Xiongfeng Wang
afca9e19cc iommu/vt-d: Fix PCI device refcount leak in has_external_pci()
for_each_pci_dev() is implemented by pci_get_device(). The comment of
pci_get_device() says that it will increase the reference count for the
returned pci_dev and also decrease the reference count for the input
pci_dev @from if it is not NULL.

If we break for_each_pci_dev() loop with pdev not NULL, we need to call
pci_dev_put() to decrease the reference count. Add the missing
pci_dev_put() before 'return true' to avoid reference count leak.

Fixes: 89a6079df7 ("iommu/vt-d: Force IOMMU on for platform opt in hint")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Link: https://lore.kernel.org/r/20221121113649.190393-2-wangxiongfeng2@huawei.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02 11:45:32 +01:00
Jacob Pan
e65a6897be iommu/vt-d: Add a fix for devices need extra dtlb flush
QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in
address translation service (ATS). These devices may inadvertently issue
ATS invalidation completion before posted writes initiated with
translated address that utilized translations matching the invalidation
address range, violating the invalidation completion ordering.

This patch adds an extra device TLB invalidation for the affected devices,
it is needed to ensure no more posted writes with translated address
following the invalidation completion. Therefore, the ordering is
preserved and data-corruption is prevented.

Device TLBs are invalidated under the following six conditions:
1. Device driver does DMA API unmap IOVA
2. Device driver unbind a PASID from a process, sva_unbind_device()
3. PASID is torn down, after PASID cache is flushed. e.g. process
exit_mmap() due to crash
4. Under SVA usage, called by mmu_notifier.invalidate_range() where
VM has to free pages that were unmapped
5. userspace driver unmaps a DMA buffer
6. Cache invalidation in vSVA usage (upcoming)

For #1 and #2, device drivers are responsible for stopping DMA traffic
before unmap/unbind. For #3, iommu driver gets mmu_notifier to
invalidate TLB the same way as normal user unmap which will do an extra
invalidation. The dTLB invalidation after PASID cache flush does not
need an extra invalidation.

Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
covered by this patch due to common code path with #5.

Tested-by: Yuzhang Luo <yuzhang.luo@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/20221130062449.1360063-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02 11:45:31 +01:00
Jason Gunthorpe
4989764d8e iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY
This queries if a domain linked to a device should expect to support
enforce_cache_coherency() so iommufd can negotiate the rules for when a
domain should be shared or not.

For iommufd a device that declares IOMMU_CAP_ENFORCE_CACHE_COHERENCY will
not be attached to a domain that does not support it.

Link: https://lore.kernel.org/r/1-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-29 16:34:15 -04:00
Lu Baolu
e5b0feb436 iommu/vt-d: Use real field for indication of first level
The dmar_domain uses bit field members to indicate the behaviors. Add
a bit field for using first level and remove the flags member to avoid
duplication.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:22 +01:00
Lu Baolu
b1cf1563f3 iommu/vt-d: Remove unnecessary domain_context_mapped()
The device_domain_info::domain accurately records the domain attached to
the device. It is unnecessary to check whether the context is present in
the attach_dev path. Remove it to make the code neat.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:22 +01:00
Lu Baolu
a8204479f2 iommu/vt-d: Rename domain_add_dev_info()
dmar_domain_attach_device() is more meaningful according to what this
helper does.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:21 +01:00
Lu Baolu
ba502132f5 iommu/vt-d: Rename iommu_disable_dev_iotlb()
Rename iommu_disable_dev_iotlb() to iommu_disable_pci_caps() to pair with
iommu_enable_pci_caps().

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:21 +01:00
Lu Baolu
35a99c54dd iommu/vt-d: Add blocking domain support
The Intel IOMMU hardwares support blocking DMA transactions by clearing
the translation table entries. This implements a real blocking domain to
avoid using an empty UNMANAGED domain. The detach_dev callback of the
domain ops is not used in any path. Remove it to avoid dead code as well.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:21 +01:00
Lu Baolu
c7be17c290 iommu/vt-d: Add device_block_translation() helper
If domain attaching to device fails, the IOMMU driver should bring the
device to blocking DMA state. The upper layer is expected to recover it
by attaching a new domain. Use device_block_translation() in the error
path of dev_attach to make the behavior specific.

The difference between device_block_translation() and the previous
dmar_remove_one_dev_info() is that, in the scalable mode, it is the
RID2PASID entry instead of context entry being cleared. As a result,
enabling PCI capabilities is moved up.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:20 +01:00
Lu Baolu
ec62b44241 iommu/vt-d: Allocate pasid table in device probe path
Whether or not a domain is attached to the device, the pasid table should
always be valid as long as it has been probed. This moves the pasid table
allocation from the domain attaching device path to device probe path and
frees it in the device release path.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20221118132451.114406-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22 14:05:20 +01:00
Tina Zhang
242b0aaeab iommu/vt-d: Preset Access bit for IOVA in FL non-leaf paging entries
The A/D bits are preseted for IOVA over first level(FL) usage for both
kernel DMA (i.e, domain typs is IOMMU_DOMAIN_DMA) and user space DMA
usage (i.e., domain type is IOMMU_DOMAIN_UNMANAGED).

Presetting A bit in FL requires to preset the bit in every related paging
entries, including the non-leaf ones. Otherwise, hardware may treat this
as an error. For example, in a case of ECAP_REG.SMPWC==0, DMA faults might
occur with below DMAR fault messages (wrapped for line length) dumped.

 DMAR: DRHD: handling fault status reg 2
 DMAR: [DMA Read NO_PASID] Request device [aa:00.0] fault addr 0x10c3a6000
    [fault reason 0x90]
    SM: A/D bit update needed in first-level entry when set up in no snoop

Fixes: 289b3b005c ("iommu/vt-d: Preset A/D bits for user space DMA usage")
Cc: stable@vger.kernel.org
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20221113010324.1094483-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20221116051544.26540-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-19 10:46:51 +01:00
Joerg Roedel
69e61edebe iommu: Define EINVAL as device/domain incompatibility
This series is to replace the previous EMEDIUMTYPE patch in a VFIO series:
 https://lore.kernel.org/kvm/Yxnt9uQTmbqul5lf@8bytes.org/
 
 The purpose is to regulate all existing ->attach_dev callback functions to
 use EINVAL exclusively for an incompatibility error between a device and a
 domain. This allows VFIO and IOMMUFD to detect such a soft error, and then
 try a different domain with the same device.
 
 Among all the patches, the first two are preparatory changes. And then one
 patch to update kdocs and another three patches for the enforcement
 effort.
 
 Link: https://lore.kernel.org/r/cover.1666042872.git.nicolinc@nvidia.com
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY2JjUQAKCRCFwuHvBreF
 YaFbAP492zvOEaZaRxiK4XcdsU1ZBCovB/2Keh/QIQdb7Ig6hgD/dW7TygTP1+4a
 Oqpcu/6aLeHvhayfZt1142S3e0HuHwU=
 =g5C+
 -----END PGP SIGNATURE-----

Merge tag 'for-joerg' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core

iommu: Define EINVAL as device/domain incompatibility

This series is to replace the previous EMEDIUMTYPE patch in a VFIO series:
https://lore.kernel.org/kvm/Yxnt9uQTmbqul5lf@8bytes.org/

The purpose is to regulate all existing ->attach_dev callback functions to
use EINVAL exclusively for an incompatibility error between a device and a
domain. This allows VFIO and IOMMUFD to detect such a soft error, and then
try a different domain with the same device.

Among all the patches, the first two are preparatory changes. And then one
patch to update kdocs and another three patches for the enforcement
effort.

Link: https://lore.kernel.org/r/cover.1666042872.git.nicolinc@nvidia.com
2022-11-03 15:51:48 +01:00
Lu Baolu
757636ed26 iommu: Rename iommu-sva-lib.{c,h}
Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
for SVA implementation in iommu core.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-14-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03 15:47:54 +01:00
Lu Baolu
1c263576f4 iommu: Remove SVA related callbacks from iommu ops
These ops'es have been deprecated. There's no need for them anymore.
Remove them to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-11-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03 15:47:51 +01:00
Lu Baolu
eaca8889a1 iommu/vt-d: Add SVA domain support
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops. This implementation is based on the existing SVA
code. Possible cleanup and refactoring are left for incremental
changes later.

The VT-d driver will also need to support setting a DMA domain to a
PASID of device. Current SVA implementation uses different data
structures to track the domain and device PASID relationship. That's
the reason why we need to check the domain type in remove_dev_pasid
callback. Eventually we'll consolidate the data structures and remove
the need of domain type check.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03 15:47:49 +01:00
Nicolin Chen
f4a1477357 iommu: Use EINVAL for incompatible device/domain in ->attach_dev
Following the new rules in include/linux/iommu.h kdocs, update all drivers
->attach_dev callback functions to return EINVAL in the failure paths that
are related to domain incompatibility.

Also, drop adjacent error prints to prevent a kernel log spam.

Link: https://lore.kernel.org/r/f52a07f7320da94afe575c9631340d0019a203a7.1666042873.git.nicolinc@nvidia.com
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-01 14:39:59 -03:00
Jerry Snitselaar
620bf9f981 iommu/vt-d: Clean up si_domain in the init_dmars() error path
A splat from kmem_cache_destroy() was seen with a kernel prior to
commit ee2653bbe8 ("iommu/vt-d: Remove domain and devinfo mempool")
when there was a failure in init_dmars(), because the iommu_domain
cache still had objects. While the mempool code is now gone, there
still is a leak of the si_domain memory if init_dmars() fails. So
clean up si_domain in the init_dmars() error path.

Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Fixes: 86080ccc22 ("iommu/vt-d: Allocate si_domain in init_dmars()")
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20221010144842.308890-1-jsnitsel@redhat.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-21 10:49:35 +02:00
Lu Baolu
bf638a6513 iommu/vt-d: Use rcu_lock in get_resv_regions
Commit 5f64ce5411 ("iommu/vt-d: Duplicate iommu_resv_region objects
per device list") converted rcu_lock in get_resv_regions to
dmar_global_lock to allow sleeping in iommu_alloc_resv_region(). This
introduced possible recursive locking if get_resv_regions is called from
within a section where intel_iommu_init() already holds dmar_global_lock.

Especially, after commit 57365a04c9 ("iommu: Move bus setup to IOMMU
device registration"), below lockdep splats could always be seen.

 ============================================
 WARNING: possible recursive locking detected
 6.0.0-rc4+ #325 Tainted: G          I
 --------------------------------------------
 swapper/0/1 is trying to acquire lock:
 ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at:
 intel_iommu_get_resv_regions+0x25/0x270

 but task is already holding lock:
 ffffffffa8a18c90 (dmar_global_lock){++++}-{3:3}, at:
 intel_iommu_init+0x36d/0x6ea

 ...

 Call Trace:
  <TASK>
  dump_stack_lvl+0x48/0x5f
  __lock_acquire.cold.73+0xad/0x2bb
  lock_acquire+0xc2/0x2e0
  ? intel_iommu_get_resv_regions+0x25/0x270
  ? lock_is_held_type+0x9d/0x110
  down_read+0x42/0x150
  ? intel_iommu_get_resv_regions+0x25/0x270
  intel_iommu_get_resv_regions+0x25/0x270
  iommu_create_device_direct_mappings.isra.28+0x8d/0x1c0
  ? iommu_get_dma_cookie+0x6d/0x90
  bus_iommu_probe+0x19f/0x2e0
  iommu_device_register+0xd4/0x130
  intel_iommu_init+0x3e1/0x6ea
  ? iommu_setup+0x289/0x289
  ? rdinit_setup+0x34/0x34
  pci_iommu_init+0x12/0x3a
  do_one_initcall+0x65/0x320
  ? rdinit_setup+0x34/0x34
  ? rcu_read_lock_sched_held+0x5a/0x80
  kernel_init_freeable+0x28a/0x2f3
  ? rest_init+0x1b0/0x1b0
  kernel_init+0x1a/0x130
  ret_from_fork+0x1f/0x30
  </TASK>

This rolls back dmar_global_lock to rcu_lock in get_resv_regions to avoid
the lockdep splat.

Fixes: 57365a04c9 ("iommu: Move bus setup to IOMMU device registration")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20220927053109.4053662-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-21 10:49:34 +02:00
Lu Baolu
0251d0107c iommu: Add gfp parameter to iommu_alloc_resv_region
Add gfp parameter to iommu_alloc_resv_region() for the callers to specify
the memory allocation behavior. Thus iommu_alloc_resv_region() could also
be available in critical contexts.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-10-21 10:49:32 +02:00
Joerg Roedel
38713c6028 Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-09-26 15:52:31 +02:00
Lu Baolu
6ad931a232 iommu/vt-d: Avoid unnecessary global DMA cache invalidation
Some VT-d hardware implementations invalidate all DMA remapping hardware
translation caches as part of SRTP flow. The VT-d spec adds a ESRTPS
(Enhanced Set Root Table Pointer Support, section 11.4.2 in VT-d spec)
capability bit to indicate this. With this bit set, software has no need
to issue the global invalidation request.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220919062523.3438951-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26 15:52:26 +02:00
Yi Liu
b722cb32f0 iommu/vt-d: Rename cap_5lp_support to cap_fl5lp_support
This renaming better describes it is for first level page table (a.k.a
first stage page table since VT-d spec 3.4).

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220916071326.2223901-1-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26 15:52:25 +02:00
Lu Baolu
0faa19a151 iommu/vt-d: Decouple PASID & PRI enabling from SVA
Previously the PCI PASID and PRI capabilities are enabled in the path of
iommu device probe only if INTEL_IOMMU_SVM is configured and the device
supports ATS. As we've already decoupled the I/O page fault handler from
SVA, we could also decouple PASID and PRI enabling from it to make room
for growth of new features like kernel DMA with PASID, SIOV and nested
translation.

At the same time, the iommu_enable_dev_iotlb() helper is also called in
iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) path. It's unnecessary
and duplicate. This cleanups this helper to make the code neat.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20220915085814.2261409-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26 15:52:24 +02:00
Yi Liu
1548978070 iommu/vt-d: Check correct capability for sagaw determination
Check 5-level paging capability for 57 bits address width instead of
checking 1GB large page capability.

Fixes: 53fc7ad6ed ("iommu/vt-d: Correctly calculate sagaw value of IOMMU")
Cc: stable@vger.kernel.org
Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Link: https://lore.kernel.org/r/20220916071212.2223869-2-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-21 10:22:54 +02:00
Lu Baolu
7ebb5f8e00 Revert "iommu/vt-d: Fix possible recursive locking in intel_iommu_init()"
This reverts commit 9cd4f14344.

Some issues were reported on the original commit. Some thunderbolt devices
don't work anymore due to the following DMA fault.

DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [09:00.0] fault index 0x8080
      [fault reason 0x25]
      Blocked a compatibility format interrupt request

Bring it back for now to avoid functional regression.

Fixes: 9cd4f14344 ("iommu/vt-d: Fix possible recursive locking in intel_iommu_init()")
Link: https://lore.kernel.org/linux-iommu/485A6EA5-6D58-42EA-B298-8571E97422DE@getmailspring.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216497
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: <stable@vger.kernel.org> # 5.19.x
Reported-and-tested-by: George Hilliard <thirtythreeforty@gmail.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220920081701.3453504-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-21 10:22:54 +02:00
Lu Baolu
9cd4f14344 iommu/vt-d: Fix possible recursive locking in intel_iommu_init()
The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.

The dmar_global_lock used in the intel_iommu_init() might cause recursive
locking issue, for example, intel_iommu_get_resv_regions() is taking the
dmar_global_lock from within a section where intel_iommu_init() already
holds it via probe_acpi_namespace_devices().

Using dmar_global_lock in intel_iommu_init() could be relaxed since it is
unlikely that any IO board must be hot added before the IOMMU subsystem is
initialized. This eliminates the possible recursive locking issue by moving
down DMAR hotplug support after the IOMMU is initialized and removing the
uses of dmar_global_lock in intel_iommu_init().

Fixes: d5692d4af0 ("iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()")
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/894db0ccae854b35c73814485569b634237b5538.1657034828.git.robin.murphy@arm.com
Link: https://lore.kernel.org/r/20220718235325.3952426-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-11 08:19:24 +02:00
Joerg Roedel
7f34891b15 Merge branch 'iommu/fixes' into core 2022-09-09 09:27:09 +02:00
Robin Murphy
f2042ed21d iommu/dma: Make header private
Now that dma-iommu.h only contains internal interfaces, make it
private to the IOMMU subsytem.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b237e06c56a101f77af142a54b629b27aa179d22.1660668998.git.robin.murphy@arm.com
[ joro : re-add stub for iommu_dma_get_resv_regions ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-09 09:26:22 +02:00
Lu Baolu
35bf49e054 iommu/vt-d: Fix lockdep splat due to klist iteration in atomic context
With CONFIG_INTEL_IOMMU_DEBUGFS enabled, below lockdep splat are seen
when an I/O fault occurs on a machine with an Intel IOMMU in it.

 DMAR: DRHD: handling fault status reg 3
 DMAR: [DMA Write NO_PASID] Request device [00:1a.0] fault addr 0x0
       [fault reason 0x05] PTE Write access is not set
 DMAR: Dump dmar0 table entries for IOVA 0x0
 DMAR: root entry: 0x0000000127f42001
 DMAR: context entry: hi 0x0000000000001502, low 0x000000012d8ab001
 ================================
 WARNING: inconsistent lock state
 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1 Not tainted
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 rngd/1006 [HC1[1]:SC0[0]:HE0:SE1] takes:
 ff177021416f2d78 (&k->k_lock){?.+.}-{2:2}, at: klist_next+0x1b/0x160
 {HARDIRQ-ON-W} state was registered at:
   lock_acquire+0xce/0x2d0
   _raw_spin_lock+0x33/0x80
   klist_add_tail+0x46/0x80
   bus_add_device+0xee/0x150
   device_add+0x39d/0x9a0
   add_memory_block+0x108/0x1d0
   memory_dev_init+0xe1/0x117
   driver_init+0x43/0x4d
   kernel_init_freeable+0x1c2/0x2cc
   kernel_init+0x16/0x140
   ret_from_fork+0x1f/0x30
 irq event stamp: 7812
 hardirqs last  enabled at (7811): [<ffffffff85000e86>] asm_sysvec_apic_timer_interrupt+0x16/0x20
 hardirqs last disabled at (7812): [<ffffffff84f16894>] irqentry_enter+0x54/0x60
 softirqs last  enabled at (7794): [<ffffffff840ff669>] __irq_exit_rcu+0xf9/0x170
 softirqs last disabled at (7787): [<ffffffff840ff669>] __irq_exit_rcu+0xf9/0x170

The klist iterator functions using spin_*lock_irq*() but the klist
insertion functions using spin_*lock(), combined with the Intel DMAR
IOMMU driver iterating over klists from atomic (hardirq) context, where
pci_get_domain_bus_and_slot() calls into bus_find_device() which iterates
over klists.

As currently there's no plan to fix the klist to make it safe to use in
atomic context, this fixes the lockdep splat by avoid calling
pci_get_domain_bus_and_slot() in the hardirq context.

Fixes: 8ac0b64b97 ("iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()")
Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Link: https://lore.kernel.org/linux-iommu/Yvo2dfpEh%2FWC+Wrr@wantstofly.org/
Link: https://lore.kernel.org/linux-iommu/YvyBdPwrTuHHbn5X@wantstofly.org/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220819015949.4795-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 15:14:57 +02:00
Lu Baolu
a349ffcb4d iommu/vt-d: Fix recursive lock issue in iommu_flush_dev_iotlb()
The per domain spinlock is acquired in iommu_flush_dev_iotlb(), which
is possbile to be called in the interrupt context. For example, the
drm-intel's CI system got completely blocked with below error:

 WARNING: inconsistent lock state
 6.0.0-rc1-CI_DRM_11990-g6590d43d39b9+ #1 Not tainted
 --------------------------------
 inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
 swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
 ffff88810440d678 (&domain->lock){+.?.}-{2:2}, at: iommu_flush_dev_iotlb.part.61+0x23/0x80
 {SOFTIRQ-ON-W} state was registered at:
   lock_acquire+0xd3/0x310
   _raw_spin_lock+0x2a/0x40
   domain_update_iommu_cap+0x20b/0x2c0
   intel_iommu_attach_device+0x5bd/0x860
   __iommu_attach_device+0x18/0xe0
   bus_iommu_probe+0x1f3/0x2d0
   bus_set_iommu+0x82/0xd0
   intel_iommu_init+0xe45/0x102a
   pci_iommu_init+0x9/0x31
   do_one_initcall+0x53/0x2f0
   kernel_init_freeable+0x18f/0x1e1
   kernel_init+0x11/0x120
   ret_from_fork+0x1f/0x30
 irq event stamp: 162354
 hardirqs last  enabled at (162354): [<ffffffff81b59274>] _raw_spin_unlock_irqrestore+0x54/0x70
 hardirqs last disabled at (162353): [<ffffffff81b5901b>] _raw_spin_lock_irqsave+0x4b/0x50
 softirqs last  enabled at (162338): [<ffffffff81e00323>] __do_softirq+0x323/0x48e
 softirqs last disabled at (162349): [<ffffffff810c1588>] irq_exit_rcu+0xb8/0xe0
 other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0
        ----
   lock(&domain->lock);
   <Interrupt>
     lock(&domain->lock);
   *** DEADLOCK ***
 1 lock held by swapper/6/0:

This coverts the spin_lock/unlock() into the irq save/restore varieties
to fix the recursive locking issues.

Fixes: ffd5869d93 ("iommu/vt-d: Replace spin_lock_irqsave() with spin_lock()")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20220817025650.3253959-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 15:14:56 +02:00
Lu Baolu
53fc7ad6ed iommu/vt-d: Correctly calculate sagaw value of IOMMU
The Intel IOMMU driver possibly selects between the first-level and the
second-level translation tables for DMA address translation. However,
the levels of page-table walks for the 4KB base page size are calculated
from the SAGAW field of the capability register, which is only valid for
the second-level page table. This causes the IOMMU driver to stop working
if the hardware (or the emulated IOMMU) advertises only first-level
translation capability and reports the SAGAW field as 0.

This solves the above problem by considering both the first level and the
second level when calculating the supported page table levels.

Fixes: b802d070a5 ("iommu/vt-d: Use iova over first level")
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220817023558.3253263-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 15:14:56 +02:00
Lu Baolu
0c5f6c0d82 iommu/vt-d: Fix kdump kernels boot failure with scalable mode
The translation table copying code for kdump kernels is currently based
on the extended root/context entry formats of ECS mode defined in older
VT-d v2.5, and doesn't handle the scalable mode formats. This causes
the kexec capture kernel boot failure with DMAR faults if the IOMMU was
enabled in scalable mode by the previous kernel.

The ECS mode has already been deprecated by the VT-d spec since v3.0 and
Intel IOMMU driver doesn't support this mode as there's no real hardware
implementation. Hence this converts ECS checking in copying table code
into scalable mode.

The existing copying code consumes a bit in the context entry as a mark
of copied entry. It needs to work for the old format as well as for the
extended context entries. As it's hard to find such a common bit for both
legacy and scalable mode context entries. This replaces it with a per-
IOMMU bitmap.

Fixes: 7373a8cc38 ("iommu/vt-d: Setup context and enable RID2PASID support")
Cc: stable@vger.kernel.org
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Tested-by: Wen Jin <wen.jin@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220817011035.3250131-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 15:14:55 +02:00
Robin Murphy
29e932295b iommu: Clean up bus_set_iommu()
Clean up the remaining trivial bus_set_iommu() callsites along
with the implementation. Now drivers only have to know and care
about iommu_device instances, phew!

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ea383d5f4d74ffe200ab61248e5de6e95846180a.1660572783.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 14:26:17 +02:00
Robin Murphy
c919739ce4 iommu/vt-d: Handle race between registration and device probe
Currently we rely on registering all our instances before initially
allowing any .probe_device calls via bus_set_iommu(). In preparation for
phasing out the latter, make sure we won't inadvertently return success
for a device associated with a known but not yet registered instance,
otherwise we'll run straight into iommu_group_get_for_dev() trying to
use NULL ops.

That also highlights an issue with intel_iommu_get_resv_regions() taking
dmar_global_lock from within a section where intel_iommu_init() already
holds it, which already exists via probe_acpi_namespace_devices() when
an ANDD device is probed, but gets more obvious with the upcoming change
to iommu_device_register(). Since they are both read locks it manages
not to deadlock in practice, and a more in-depth rework of this locking
is underway, so no attempt is made to address it here.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/579f2692291bcbfc3ac64f7456fcff0d629af131.1660572783.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 14:25:01 +02:00
Robin Murphy
359ad15763 iommu: Retire iommu_capable()
With all callers now converted to the device-specific version, retire
the old bus-based interface, and give drivers the chance to indicate
accurate per-instance capabilities.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 14:16:37 +02:00
Joerg Roedel
c10100a416 Merge branches 'arm/exynos', 'arm/mediatek', 'arm/msm', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2022-07-29 12:06:56 +02:00
Lu Baolu
bdb46d1758 iommu/vt-d: Remove global g_iommus array
The g_iommus and g_num_of_iommus is not used anywhere. Remove them to
avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220702015610.2849494-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:42 +02:00
Lu Baolu
97a79de99a iommu/vt-d: Remove unnecessary check in intel_iommu_add()
The Intel IOMMU hot-add process starts from dmar_device_hotplug(). It
uses the global dmar_global_lock to synchronize all the hot-add and
hot-remove paths. In the hot-add path, the new IOMMU data structures
are allocated firstly by dmar_parse_one_drhd() and then initialized by
dmar_hp_add_drhd(). All the IOMMU units are allocated and initialized
in the same synchronized path. There is no case where any IOMMU unit
is created and then initialized for multiple times.

This removes the unnecessary check in intel_iommu_add() which is the
last reference place of the global IOMMU array.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220702015610.2849494-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15 10:21:42 +02:00