Merge remote-tracking branch 'drm-misc/drm-misc-next' into msm-next

Backmerge drm-misc-next to pick up some dependencies for drm/msm
patches, in particular:

https://patchwork.freedesktop.org/patch/570219/?series=127251&rev=1
https://patchwork.freedesktop.org/series/123411/

Signed-off-by: Rob Clark <robdclark@chromium.org>
This commit is contained in:
Rob Clark
2023-12-10 10:07:54 -08:00
1031 changed files with 58083 additions and 24250 deletions

View File

@@ -36,8 +36,9 @@ AIC100 DID (0xa100).
AIC100 does not implement FLR (function level reset).
AIC100 implements MSI but does not implement MSI-X. AIC100 requires 17 MSIs to
operate (1 for MHI, 16 for the DMA Bridge).
AIC100 implements MSI but does not implement MSI-X. AIC100 prefers 17 MSIs to
operate (1 for MHI, 16 for the DMA Bridge). Falling back to 1 MSI is possible in
scenarios where reserving 32 MSIs isn't feasible.
As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
hardware. AIC100 provides 3, 64-bit BARs.
@@ -220,10 +221,14 @@ of the defined channels, and their uses.
+----------------+---------+----------+----------------------------------------+
| QAIC_DEBUG | 18 & 19 | AMSS | Not used. |
+----------------+---------+----------+----------------------------------------+
| QAIC_TIMESYNC | 20 & 21 | SBL/AMSS | Used to synchronize timestamps in the |
| QAIC_TIMESYNC | 20 & 21 | SBL | Used to synchronize timestamps in the |
| | | | device side logs with the host time |
| | | | source. |
+----------------+---------+----------+----------------------------------------+
| QAIC_TIMESYNC | 22 & 23 | AMSS | Used to periodically synchronize |
| _PERIODIC | | | timestamps in the device side logs with|
| | | | the host time source. |
+----------------+---------+----------+----------------------------------------+
DMA Bridge
==========

View File

@@ -10,6 +10,9 @@ accelerator products.
Interrupts
==========
IRQ Storm Mitigation
--------------------
While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
mechanism, it is still possible for an IRQ storm to occur. A storm can happen
if the workload is particularly quick, and the host is responsive. If the host
@@ -35,6 +38,26 @@ generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
IRQs over 5 minutes while keeping the host system stable, and having the same
workload throughput performance (within run to run noise variation).
Single MSI Mode
---------------
MultiMSI is not well supported on all systems; virtualized ones even less so
(circa 2023). Between hypervisors masking the PCIe MSI capability structure to
large memory requirements for vIOMMUs (required for supporting MultiMSI), it is
useful to be able to fall back to a single MSI when needed.
To support this fallback, we allow the case where only one MSI is able to be
allocated, and share that one MSI between MHI and the DBCs. The device detects
when only one MSI has been configured and directs the interrupts for the DBCs
to the interrupt normally used for MHI. Unfortunately this means that the
interrupt handlers for every DBC and MHI wake up for every interrupt that
arrives; however, the DBC threaded irq handlers only are started when work to be
done is detected (MHI will always start its threaded handler).
If the DBC is configured to force MSI interrupts, this can circumvent the
software IRQ storm mitigation mentioned above. Since the MSI is shared it is
never disabled, allowing each new entry to the FIFO to trigger a new interrupt.
Neural Network Control (NNC) Protocol
=====================================
@@ -70,8 +93,15 @@ commands (does not impact QAIC).
uAPI
====
QAIC creates an accel device per phsyical PCIe device. This accel device exists
for as long as the PCIe device is known to Linux.
The PCIe device may not be in the state to accept requests from userspace at
all times. QAIC will trigger KOBJ_ONLINE/OFFLINE uevents to advertise when the
device can accept requests (ONLINE) and when the device is no longer accepting
requests (OFFLINE) because of a reset or other state transition.
QAIC defines a number of driver specific IOCTLs as part of the userspace API.
This section describes those APIs.
DRM_IOCTL_QAIC_MANAGE
This IOCTL allows userspace to send a NNC request to the QSM. The call will
@@ -178,3 +208,8 @@ overrides this for that call. Default is 5000 (5 seconds).
Sets the polling interval in microseconds (us) when datapath polling is active.
Takes effect at the next polling interval. Default is 100 (100 us).
**timesync_delay_ms (unsigned int)**
Sets the time interval in milliseconds (ms) between two consecutive timesync
operations. Default is 1000 (1000 ms).

View File

@@ -375,9 +375,9 @@ Developer web site of Loongson and LoongArch (Software and Documentation):
Documentation of LoongArch ISA:
https://github.com/loongson/LoongArch-Documentation/releases/latest/download/LoongArch-Vol1-v1.02-CN.pdf (in Chinese)
https://github.com/loongson/LoongArch-Documentation/releases/latest/download/LoongArch-Vol1-v1.10-CN.pdf (in Chinese)
https://github.com/loongson/LoongArch-Documentation/releases/latest/download/LoongArch-Vol1-v1.02-EN.pdf (in English)
https://github.com/loongson/LoongArch-Documentation/releases/latest/download/LoongArch-Vol1-v1.10-EN.pdf (in English)
Documentation of LoongArch ELF psABI:

View File

@@ -77,7 +77,7 @@ Protocol 2.14 BURNT BY INCORRECT COMMIT
Protocol 2.15 (Kernel 5.5) Added the kernel_info and kernel_info.setup_type_max.
============= ============================================================
.. note::
.. note::
The protocol version number should be changed only if the setup header
is changed. There is no need to update the version number if boot_params
or kernel_info are changed. Additionally, it is recommended to use

View File

@@ -153,6 +153,8 @@ NOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That's
because DAX pages do not have a separate page cache, and so "pinning" implies
locking down file system blocks, which is not (yet) supported in that way.
.. _mmu-notifier-registration-case:
CASE 3: MMU notifier registration, with or without page faulting hardware
-------------------------------------------------------------------------
Device drivers can pin pages via get_user_pages*(), and register for mmu

View File

@@ -55,6 +55,27 @@ properties:
- port@0
- port@1
vcchdmipll-supply:
description: A 1.8V supply that powers the HDMI PLL.
vcchdmitx-supply:
description: A 1.8V supply that powers the HDMI TX part.
vcclvdspll-supply:
description: A 1.8V supply that powers the LVDS PLL.
vcclvdstx-supply:
description: A 1.8V supply that powers the LVDS TX part.
vccmipirx-supply:
description: A 1.8V supply that powers the MIPI RX part.
vccsysclk-supply:
description: A 1.8V supply that powers the SYSCLK.
vdd-supply:
description: A 1.8V supply that powers the digital part.
required:
- compatible
- reg

View File

@@ -23,6 +23,7 @@ properties:
items:
- enum:
- hannstar,hsd060bhw4
- powkiddy,x55-panel
- const: himax,hx8394
reg: true
@@ -31,6 +32,8 @@ properties:
backlight: true
rotation: true
port: true
vcc-supply:

View File

@@ -16,6 +16,7 @@ properties:
compatible:
items:
- enum:
- ampire,am8001280g
- bananapi,lhr050h41
- feixin,k101-im2byl02
- tdo,tl050hdv35

View File

@@ -21,7 +21,7 @@ properties:
- enum:
- anbernic,rg351v-panel
- anbernic,rg353p-panel
- anbernic,rg353v-panel
- powkiddy,rk2023-panel
- const: newvision,nv3051d
reg: true

View File

@@ -73,6 +73,8 @@ properties:
- auo,t215hvn01
# Shanghai AVIC Optoelectronics 7" 1024x600 color TFT-LCD panel
- avic,tm070ddh03
# BOE BP101WX1-100 10.1" WXGA (1280x800) LVDS panel
- boe,bp101wx1-100
# BOE EV121WXM-N10-1850 12.1" WXGA (1280x800) TFT LCD panel
- boe,ev121wxm-n10-1850
# BOE HV070WSA-100 7.01" WSVGA TFT LCD panel
@@ -144,6 +146,8 @@ properties:
- edt,etmv570g2dhu
# E Ink VB3300-KCA
- eink,vb3300-kca
# Evervision Electronics Co. Ltd. VGG644804 5.7" VGA TFT LCD Panel
- evervision,vgg644804
# Evervision Electronics Co. Ltd. VGG804821 5.0" WVGA TFT LCD Panel
- evervision,vgg804821
# Foxlink Group 5" WVGA TFT LCD panel

View File

@@ -23,6 +23,7 @@ properties:
compatible:
enum:
- ti,am625-dss
- ti,am62a7,dss
- ti,am65x-dss
reg:
@@ -87,6 +88,7 @@ properties:
For AM65x DSS, the OLDI output port node from video port 1.
For AM625 DSS, the internal DPI output port node from video
port 1.
For AM62A7 DSS, the port is tied off inside the SoC.
port@1:
$ref: /schemas/graph.yaml#/properties/port
@@ -108,6 +110,18 @@ properties:
Input memory (from main memory to dispc) bandwidth limit in
bytes per second
allOf:
- if:
properties:
compatible:
contains:
const: ti,am62a7-dss
then:
properties:
ports:
properties:
port@0: false
required:
- compatible
- reg

View File

@@ -29,6 +29,7 @@ properties:
- allwinner,sun50i-a64-mali
- rockchip,rk3036-mali
- rockchip,rk3066-mali
- rockchip,rk3128-mali
- rockchip,rk3188-mali
- rockchip,rk3228-mali
- samsung,exynos4210-mali

View File

@@ -17,6 +17,7 @@ properties:
compatible:
enum:
- brcm,2711-v3d
- brcm,2712-v3d
- brcm,7268-v3d
- brcm,7278-v3d

View File

@@ -0,0 +1,73 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
# Copyright (c) 2023 Imagination Technologies Ltd.
%YAML 1.2
---
$id: http://devicetree.org/schemas/gpu/img,powervr.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Imagination Technologies PowerVR and IMG GPU
maintainers:
- Frank Binns <frank.binns@imgtec.com>
properties:
compatible:
items:
- enum:
- ti,am62-gpu
- const: img,img-axe # IMG AXE GPU model/revision is fully discoverable
reg:
maxItems: 1
clocks:
minItems: 1
maxItems: 3
clock-names:
items:
- const: core
- const: mem
- const: sys
minItems: 1
interrupts:
maxItems: 1
power-domains:
maxItems: 1
required:
- compatible
- reg
- clocks
- clock-names
- interrupts
additionalProperties: false
allOf:
- if:
properties:
compatible:
contains:
const: ti,am62-gpu
then:
properties:
clocks:
maxItems: 1
examples:
- |
#include <dt-bindings/interrupt-controller/irq.h>
#include <dt-bindings/interrupt-controller/arm-gic.h>
#include <dt-bindings/soc/ti,sci_pm_domain.h>
gpu@fd00000 {
compatible = "ti,am62-gpu", "img,img-axe";
reg = <0x0fd00000 0x20000>;
clocks = <&k3_clks 187 0>;
clock-names = "core";
interrupts = <GIC_SPI 86 IRQ_TYPE_LEVEL_HIGH>;
power-domains = <&k3_pds 187 TI_SCI_PD_EXCLUSIVE>;
};

View File

@@ -275,12 +275,12 @@ allOf:
properties:
rx-internal-delay-ps:
description:
RGMII Receive Clock Delay defined in pico seconds.This is used for
RGMII Receive Clock Delay defined in pico seconds. This is used for
controllers that have configurable RX internal delays. If this
property is present then the MAC applies the RX delay.
tx-internal-delay-ps:
description:
RGMII Transmit Clock Delay defined in pico seconds.This is used for
RGMII Transmit Clock Delay defined in pico seconds. This is used for
controllers that have configurable TX internal delays. If this
property is present then the MAC applies the TX delay.

View File

@@ -36,6 +36,7 @@ properties:
- qcom,sm8350-ufshc
- qcom,sm8450-ufshc
- qcom,sm8550-ufshc
- qcom,sm8650-ufshc
- const: qcom,ufshc
- const: jedec,ufs-2.0
@@ -122,6 +123,7 @@ allOf:
- qcom,sm8350-ufshc
- qcom,sm8450-ufshc
- qcom,sm8550-ufshc
- qcom,sm8650-ufshc
then:
properties:
clocks:

View File

@@ -36,7 +36,11 @@ properties:
vdd-supply:
description:
VDD power supply to the hub
3V3 power supply to the hub
vdd2-supply:
description:
1V2 power supply to the hub
peer-hub:
$ref: /schemas/types.yaml#/definitions/phandle
@@ -62,6 +66,7 @@ allOf:
properties:
reset-gpios: false
vdd-supply: false
vdd2-supply: false
peer-hub: false
i2c-bus: false
else:

View File

@@ -521,8 +521,8 @@ examples:
interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 488 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 489 IRQ_TYPE_LEVEL_HIGH>;
<GIC_SPI 488 IRQ_TYPE_EDGE_BOTH>,
<GIC_SPI 489 IRQ_TYPE_EDGE_BOTH>;
interrupt-names = "hs_phy_irq", "ss_phy_irq",
"dm_hs_phy_irq", "dp_hs_phy_irq";

View File

@@ -41,7 +41,7 @@ examples:
- |
usb {
phys = <&usb2_phy1>, <&usb3_phy1>;
phy-names = "usb";
phy-names = "usb2", "usb3";
#address-cells = <1>;
#size-cells = <0>;

View File

@@ -91,6 +91,10 @@ compatibility checking tool (fsck.erofs), and a debugging tool (dump.erofs):
- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
For more information, please also refer to the documentation site:
- https://erofs.docs.kernel.org
Bugs and patches are welcome, please kindly help us and send to the following
linux-erofs mailing list:

View File

@@ -3,9 +3,11 @@ GPU Driver Documentation
========================
.. toctree::
:maxdepth: 3
amdgpu/index
i915
imagination/index
mcde
meson
pl111

View File

@@ -363,6 +363,12 @@ EDID Helper Functions Reference
.. kernel-doc:: drivers/gpu/drm/drm_edid.c
:export:
.. kernel-doc:: include/drm/drm_eld.h
:internal:
.. kernel-doc:: drivers/gpu/drm/drm_eld.c
:export:
SCDC Helper Functions Reference
===============================

View File

@@ -548,6 +548,8 @@ Plane Composition Properties
.. kernel-doc:: drivers/gpu/drm/drm_blend.c
:doc: overview
.. _damage_tracking_properties:
Damage Tracking Properties
--------------------------
@@ -579,6 +581,12 @@ Variable Refresh Properties
.. kernel-doc:: drivers/gpu/drm/drm_connector.c
:doc: Variable refresh properties
Cursor Hotspot Properties
---------------------------
.. kernel-doc:: drivers/gpu/drm/drm_plane.c
:doc: hotspot properties
Existing KMS Properties
-----------------------

View File

@@ -466,6 +466,8 @@ DRM MM Range Allocator Function References
.. kernel-doc:: drivers/gpu/drm/drm_mm.c
:export:
.. _drm_gpuvm:
DRM GPUVM
=========
@@ -481,6 +483,8 @@ Split and Merge
.. kernel-doc:: drivers/gpu/drm/drm_gpuvm.c
:doc: Split and Merge
.. _drm_gpuvm_locking:
Locking
-------
@@ -552,6 +556,12 @@ Overview
.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
:doc: Overview
Flow Control
------------
.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
:doc: Flow Control
Scheduler Function References
-----------------------------

View File

@@ -0,0 +1,582 @@
.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
===============
VM_BIND locking
===============
This document attempts to describe what's needed to get VM_BIND locking right,
including the userptr mmu_notifier locking. It also discusses some
optimizations to get rid of the looping through of all userptr mappings and
external / shared object mappings that is needed in the simplest
implementation. In addition, there is a section describing the VM_BIND locking
required for implementing recoverable pagefaults.
The DRM GPUVM set of helpers
============================
There is a set of helpers for drivers implementing VM_BIND, and this
set of helpers implements much, but not all of the locking described
in this document. In particular, it is currently lacking a userptr
implementation. This document does not intend to describe the DRM GPUVM
implementation in detail, but it is covered in :ref:`its own
documentation <drm_gpuvm>`. It is highly recommended for any driver
implementing VM_BIND to use the DRM GPUVM helpers and to extend it if
common functionality is missing.
Nomenclature
============
* ``gpu_vm``: Abstraction of a virtual GPU address space with
meta-data. Typically one per client (DRM file-private), or one per
execution context.
* ``gpu_vma``: Abstraction of a GPU address range within a gpu_vm with
associated meta-data. The backing storage of a gpu_vma can either be
a GEM object or anonymous or page-cache pages mapped also into the CPU
address space for the process.
* ``gpu_vm_bo``: Abstracts the association of a GEM object and
a VM. The GEM object maintains a list of gpu_vm_bos, where each gpu_vm_bo
maintains a list of gpu_vmas.
* ``userptr gpu_vma or just userptr``: A gpu_vma, whose backing store
is anonymous or page-cache pages as described above.
* ``revalidating``: Revalidating a gpu_vma means making the latest version
of the backing store resident and making sure the gpu_vma's
page-table entries point to that backing store.
* ``dma_fence``: A struct dma_fence that is similar to a struct completion
and which tracks GPU activity. When the GPU activity is finished,
the dma_fence signals. Please refer to the ``DMA Fences`` section of
the :doc:`dma-buf doc </driver-api/dma-buf>`.
* ``dma_resv``: A struct dma_resv (a.k.a reservation object) that is used
to track GPU activity in the form of multiple dma_fences on a
gpu_vm or a GEM object. The dma_resv contains an array / list
of dma_fences and a lock that needs to be held when adding
additional dma_fences to the dma_resv. The lock is of a type that
allows deadlock-safe locking of multiple dma_resvs in arbitrary
order. Please refer to the ``Reservation Objects`` section of the
:doc:`dma-buf doc </driver-api/dma-buf>`.
* ``exec function``: An exec function is a function that revalidates all
affected gpu_vmas, submits a GPU command batch and registers the
dma_fence representing the GPU command's activity with all affected
dma_resvs. For completeness, although not covered by this document,
it's worth mentioning that an exec function may also be the
revalidation worker that is used by some drivers in compute /
long-running mode.
* ``local object``: A GEM object which is only mapped within a
single VM. Local GEM objects share the gpu_vm's dma_resv.
* ``external object``: a.k.a shared object: A GEM object which may be shared
by multiple gpu_vms and whose backing storage may be shared with
other drivers.
Locks and locking order
=======================
One of the benefits of VM_BIND is that local GEM objects share the gpu_vm's
dma_resv object and hence the dma_resv lock. So, even with a huge
number of local GEM objects, only one lock is needed to make the exec
sequence atomic.
The following locks and locking orders are used:
* The ``gpu_vm->lock`` (optionally an rwsem). Protects the gpu_vm's
data structure keeping track of gpu_vmas. It can also protect the
gpu_vm's list of userptr gpu_vmas. With a CPU mm analogy this would
correspond to the mmap_lock. An rwsem allows several readers to walk
the VM tree concurrently, but the benefit of that concurrency most
likely varies from driver to driver.
* The ``userptr_seqlock``. This lock is taken in read mode for each
userptr gpu_vma on the gpu_vm's userptr list, and in write mode during mmu
notifier invalidation. This is not a real seqlock but described in
``mm/mmu_notifier.c`` as a "Collision-retry read-side/write-side
'lock' a lot like a seqcount. However this allows multiple
write-sides to hold it at once...". The read side critical section
is enclosed by ``mmu_interval_read_begin() /
mmu_interval_read_retry()`` with ``mmu_interval_read_begin()``
sleeping if the write side is held.
The write side is held by the core mm while calling mmu interval
invalidation notifiers.
* The ``gpu_vm->resv`` lock. Protects the gpu_vm's list of gpu_vmas needing
rebinding, as well as the residency state of all the gpu_vm's local
GEM objects.
Furthermore, it typically protects the gpu_vm's list of evicted and
external GEM objects.
* The ``gpu_vm->userptr_notifier_lock``. This is an rwsem that is
taken in read mode during exec and write mode during a mmu notifier
invalidation. The userptr notifier lock is per gpu_vm.
* The ``gem_object->gpuva_lock`` This lock protects the GEM object's
list of gpu_vm_bos. This is usually the same lock as the GEM
object's dma_resv, but some drivers protects this list differently,
see below.
* The ``gpu_vm list spinlocks``. With some implementations they are needed
to be able to update the gpu_vm evicted- and external object
list. For those implementations, the spinlocks are grabbed when the
lists are manipulated. However, to avoid locking order violations
with the dma_resv locks, a special scheme is needed when iterating
over the lists.
.. _gpu_vma lifetime:
Protection and lifetime of gpu_vm_bos and gpu_vmas
==================================================
The GEM object's list of gpu_vm_bos, and the gpu_vm_bo's list of gpu_vmas
is protected by the ``gem_object->gpuva_lock``, which is typically the
same as the GEM object's dma_resv, but if the driver
needs to access these lists from within a dma_fence signalling
critical section, it can instead choose to protect it with a
separate lock, which can be locked from within the dma_fence signalling
critical section. Such drivers then need to pay additional attention
to what locks need to be taken from within the loop when iterating
over the gpu_vm_bo and gpu_vma lists to avoid locking-order violations.
The DRM GPUVM set of helpers provide lockdep asserts that this lock is
held in relevant situations and also provides a means of making itself
aware of which lock is actually used: :c:func:`drm_gem_gpuva_set_lock`.
Each gpu_vm_bo holds a reference counted pointer to the underlying GEM
object, and each gpu_vma holds a reference counted pointer to the
gpu_vm_bo. When iterating over the GEM object's list of gpu_vm_bos and
over the gpu_vm_bo's list of gpu_vmas, the ``gem_object->gpuva_lock`` must
not be dropped, otherwise, gpu_vmas attached to a gpu_vm_bo may
disappear without notice since those are not reference-counted. A
driver may implement its own scheme to allow this at the expense of
additional complexity, but this is outside the scope of this document.
In the DRM GPUVM implementation, each gpu_vm_bo and each gpu_vma
holds a reference count on the gpu_vm itself. Due to this, and to avoid circular
reference counting, cleanup of the gpu_vm's gpu_vmas must not be done from the
gpu_vm's destructor. Drivers typically implements a gpu_vm close
function for this cleanup. The gpu_vm close function will abort gpu
execution using this VM, unmap all gpu_vmas and release page-table memory.
Revalidation and eviction of local objects
==========================================
Note that in all the code examples given below we use simplified
pseudo-code. In particular, the dma_resv deadlock avoidance algorithm
as well as reserving memory for dma_resv fences is left out.
Revalidation
____________
With VM_BIND, all local objects need to be resident when the gpu is
executing using the gpu_vm, and the objects need to have valid
gpu_vmas set up pointing to them. Typically, each gpu command buffer
submission is therefore preceded with a re-validation section:
.. code-block:: C
dma_resv_lock(gpu_vm->resv);
// Validation section starts here.
for_each_gpu_vm_bo_on_evict_list(&gpu_vm->evict_list, &gpu_vm_bo) {
validate_gem_bo(&gpu_vm_bo->gem_bo);
// The following list iteration needs the Gem object's
// dma_resv to be held (it protects the gpu_vm_bo's list of
// gpu_vmas, but since local gem objects share the gpu_vm's
// dma_resv, it is already held at this point.
for_each_gpu_vma_of_gpu_vm_bo(&gpu_vm_bo, &gpu_vma)
move_gpu_vma_to_rebind_list(&gpu_vma, &gpu_vm->rebind_list);
}
for_each_gpu_vma_on_rebind_list(&gpu vm->rebind_list, &gpu_vma) {
rebind_gpu_vma(&gpu_vma);
remove_gpu_vma_from_rebind_list(&gpu_vma);
}
// Validation section ends here, and job submission starts.
add_dependencies(&gpu_job, &gpu_vm->resv);
job_dma_fence = gpu_submit(&gpu_job));
add_dma_fence(job_dma_fence, &gpu_vm->resv);
dma_resv_unlock(gpu_vm->resv);
The reason for having a separate gpu_vm rebind list is that there
might be userptr gpu_vmas that are not mapping a buffer object that
also need rebinding.
Eviction
________
Eviction of one of these local objects will then look similar to the
following:
.. code-block:: C
obj = get_object_from_lru();
dma_resv_lock(obj->resv);
for_each_gpu_vm_bo_of_obj(obj, &gpu_vm_bo);
add_gpu_vm_bo_to_evict_list(&gpu_vm_bo, &gpu_vm->evict_list);
add_dependencies(&eviction_job, &obj->resv);
job_dma_fence = gpu_submit(&eviction_job);
add_dma_fence(&obj->resv, job_dma_fence);
dma_resv_unlock(&obj->resv);
put_object(obj);
Note that since the object is local to the gpu_vm, it will share the gpu_vm's
dma_resv lock such that ``obj->resv == gpu_vm->resv``.
The gpu_vm_bos marked for eviction are put on the gpu_vm's evict list,
which is protected by ``gpu_vm->resv``. During eviction all local
objects have their dma_resv locked and, due to the above equality, also
the gpu_vm's dma_resv protecting the gpu_vm's evict list is locked.
With VM_BIND, gpu_vmas don't need to be unbound before eviction,
since the driver must ensure that the eviction blit or copy will wait
for GPU idle or depend on all previous GPU activity. Furthermore, any
subsequent attempt by the GPU to access freed memory through the
gpu_vma will be preceded by a new exec function, with a revalidation
section which will make sure all gpu_vmas are rebound. The eviction
code holding the object's dma_resv while revalidating will ensure a
new exec function may not race with the eviction.
A driver can be implemented in such a way that, on each exec function,
only a subset of vmas are selected for rebind. In this case, all vmas that are
*not* selected for rebind must be unbound before the exec
function workload is submitted.
Locking with external buffer objects
====================================
Since external buffer objects may be shared by multiple gpu_vm's they
can't share their reservation object with a single gpu_vm. Instead
they need to have a reservation object of their own. The external
objects bound to a gpu_vm using one or many gpu_vmas are therefore put on a
per-gpu_vm list which is protected by the gpu_vm's dma_resv lock or
one of the :ref:`gpu_vm list spinlocks <Spinlock iteration>`. Once
the gpu_vm's reservation object is locked, it is safe to traverse the
external object list and lock the dma_resvs of all external
objects. However, if instead a list spinlock is used, a more elaborate
iteration scheme needs to be used.
At eviction time, the gpu_vm_bos of *all* the gpu_vms an external
object is bound to need to be put on their gpu_vm's evict list.
However, when evicting an external object, the dma_resvs of the
gpu_vms the object is bound to are typically not held. Only
the object's private dma_resv can be guaranteed to be held. If there
is a ww_acquire context at hand at eviction time we could grab those
dma_resvs but that could cause expensive ww_mutex rollbacks. A simple
option is to just mark the gpu_vm_bos of the evicted gem object with
an ``evicted`` bool that is inspected before the next time the
corresponding gpu_vm evicted list needs to be traversed. For example, when
traversing the list of external objects and locking them. At that time,
both the gpu_vm's dma_resv and the object's dma_resv is held, and the
gpu_vm_bo marked evicted, can then be added to the gpu_vm's list of
evicted gpu_vm_bos. The ``evicted`` bool is formally protected by the
object's dma_resv.
The exec function becomes
.. code-block:: C
dma_resv_lock(gpu_vm->resv);
// External object list is protected by the gpu_vm->resv lock.
for_each_gpu_vm_bo_on_extobj_list(gpu_vm, &gpu_vm_bo) {
dma_resv_lock(gpu_vm_bo.gem_obj->resv);
if (gpu_vm_bo_marked_evicted(&gpu_vm_bo))
add_gpu_vm_bo_to_evict_list(&gpu_vm_bo, &gpu_vm->evict_list);
}
for_each_gpu_vm_bo_on_evict_list(&gpu_vm->evict_list, &gpu_vm_bo) {
validate_gem_bo(&gpu_vm_bo->gem_bo);
for_each_gpu_vma_of_gpu_vm_bo(&gpu_vm_bo, &gpu_vma)
move_gpu_vma_to_rebind_list(&gpu_vma, &gpu_vm->rebind_list);
}
for_each_gpu_vma_on_rebind_list(&gpu vm->rebind_list, &gpu_vma) {
rebind_gpu_vma(&gpu_vma);
remove_gpu_vma_from_rebind_list(&gpu_vma);
}
add_dependencies(&gpu_job, &gpu_vm->resv);
job_dma_fence = gpu_submit(&gpu_job));
add_dma_fence(job_dma_fence, &gpu_vm->resv);
for_each_external_obj(gpu_vm, &obj)
add_dma_fence(job_dma_fence, &obj->resv);
dma_resv_unlock_all_resv_locks();
And the corresponding shared-object aware eviction would look like:
.. code-block:: C
obj = get_object_from_lru();
dma_resv_lock(obj->resv);
for_each_gpu_vm_bo_of_obj(obj, &gpu_vm_bo)
if (object_is_vm_local(obj))
add_gpu_vm_bo_to_evict_list(&gpu_vm_bo, &gpu_vm->evict_list);
else
mark_gpu_vm_bo_evicted(&gpu_vm_bo);
add_dependencies(&eviction_job, &obj->resv);
job_dma_fence = gpu_submit(&eviction_job);
add_dma_fence(&obj->resv, job_dma_fence);
dma_resv_unlock(&obj->resv);
put_object(obj);
.. _Spinlock iteration:
Accessing the gpu_vm's lists without the dma_resv lock held
===========================================================
Some drivers will hold the gpu_vm's dma_resv lock when accessing the
gpu_vm's evict list and external objects lists. However, there are
drivers that need to access these lists without the dma_resv lock
held, for example due to asynchronous state updates from within the
dma_fence signalling critical path. In such cases, a spinlock can be
used to protect manipulation of the lists. However, since higher level
sleeping locks need to be taken for each list item while iterating
over the lists, the items already iterated over need to be
temporarily moved to a private list and the spinlock released
while processing each item:
.. code block:: C
struct list_head still_in_list;
INIT_LIST_HEAD(&still_in_list);
spin_lock(&gpu_vm->list_lock);
do {
struct list_head *entry = list_first_entry_or_null(&gpu_vm->list, head);
if (!entry)
break;
list_move_tail(&entry->head, &still_in_list);
list_entry_get_unless_zero(entry);
spin_unlock(&gpu_vm->list_lock);
process(entry);
spin_lock(&gpu_vm->list_lock);
list_entry_put(entry);
} while (true);
list_splice_tail(&still_in_list, &gpu_vm->list);
spin_unlock(&gpu_vm->list_lock);
Due to the additional locking and atomic operations, drivers that *can*
avoid accessing the gpu_vm's list outside of the dma_resv lock
might want to avoid also this iteration scheme. Particularly, if the
driver anticipates a large number of list items. For lists where the
anticipated number of list items is small, where list iteration doesn't
happen very often or if there is a significant additional cost
associated with each iteration, the atomic operation overhead
associated with this type of iteration is, most likely, negligible. Note that
if this scheme is used, it is necessary to make sure this list
iteration is protected by an outer level lock or semaphore, since list
items are temporarily pulled off the list while iterating, and it is
also worth mentioning that the local list ``still_in_list`` should
also be considered protected by the ``gpu_vm->list_lock``, and it is
thus possible that items can be removed also from the local list
concurrently with list iteration.
Please refer to the :ref:`DRM GPUVM locking section
<drm_gpuvm_locking>` and its internal
:c:func:`get_next_vm_bo_from_list` function.
userptr gpu_vmas
================
A userptr gpu_vma is a gpu_vma that, instead of mapping a buffer object to a
GPU virtual address range, directly maps a CPU mm range of anonymous-
or file page-cache pages.
A very simple approach would be to just pin the pages using
pin_user_pages() at bind time and unpin them at unbind time, but this
creates a Denial-Of-Service vector since a single user-space process
would be able to pin down all of system memory, which is not
desirable. (For special use-cases and assuming proper accounting pinning might
still be a desirable feature, though). What we need to do in the
general case is to obtain a reference to the desired pages, make sure
we are notified using a MMU notifier just before the CPU mm unmaps the
pages, dirty them if they are not mapped read-only to the GPU, and
then drop the reference.
When we are notified by the MMU notifier that CPU mm is about to drop the
pages, we need to stop GPU access to the pages by waiting for VM idle
in the MMU notifier and make sure that before the next time the GPU
tries to access whatever is now present in the CPU mm range, we unmap
the old pages from the GPU page tables and repeat the process of
obtaining new page references. (See the :ref:`notifier example
<Invalidation example>` below). Note that when the core mm decides to
laundry pages, we get such an unmap MMU notification and can mark the
pages dirty again before the next GPU access. We also get similar MMU
notifications for NUMA accounting which the GPU driver doesn't really
need to care about, but so far it has proven difficult to exclude
certain notifications.
Using a MMU notifier for device DMA (and other methods) is described in
:ref:`the pin_user_pages() documentation <mmu-notifier-registration-case>`.
Now, the method of obtaining struct page references using
get_user_pages() unfortunately can't be used under a dma_resv lock
since that would violate the locking order of the dma_resv lock vs the
mmap_lock that is grabbed when resolving a CPU pagefault. This means
the gpu_vm's list of userptr gpu_vmas needs to be protected by an
outer lock, which in our example below is the ``gpu_vm->lock``.
The MMU interval seqlock for a userptr gpu_vma is used in the following
way:
.. code-block:: C
// Exclusive locking mode here is strictly needed only if there are
// invalidated userptr gpu_vmas present, to avoid concurrent userptr
// revalidations of the same userptr gpu_vma.
down_write(&gpu_vm->lock);
retry:
// Note: mmu_interval_read_begin() blocks until there is no
// invalidation notifier running anymore.
seq = mmu_interval_read_begin(&gpu_vma->userptr_interval);
if (seq != gpu_vma->saved_seq) {
obtain_new_page_pointers(&gpu_vma);
dma_resv_lock(&gpu_vm->resv);
add_gpu_vma_to_revalidate_list(&gpu_vma, &gpu_vm);
dma_resv_unlock(&gpu_vm->resv);
gpu_vma->saved_seq = seq;
}
// The usual revalidation goes here.
// Final userptr sequence validation may not happen before the
// submission dma_fence is added to the gpu_vm's resv, from the POW
// of the MMU invalidation notifier. Hence the
// userptr_notifier_lock that will make them appear atomic.
add_dependencies(&gpu_job, &gpu_vm->resv);
down_read(&gpu_vm->userptr_notifier_lock);
if (mmu_interval_read_retry(&gpu_vma->userptr_interval, gpu_vma->saved_seq)) {
up_read(&gpu_vm->userptr_notifier_lock);
goto retry;
}
job_dma_fence = gpu_submit(&gpu_job));
add_dma_fence(job_dma_fence, &gpu_vm->resv);
for_each_external_obj(gpu_vm, &obj)
add_dma_fence(job_dma_fence, &obj->resv);
dma_resv_unlock_all_resv_locks();
up_read(&gpu_vm->userptr_notifier_lock);
up_write(&gpu_vm->lock);
The code between ``mmu_interval_read_begin()`` and the
``mmu_interval_read_retry()`` marks the read side critical section of
what we call the ``userptr_seqlock``. In reality, the gpu_vm's userptr
gpu_vma list is looped through, and the check is done for *all* of its
userptr gpu_vmas, although we only show a single one here.
The userptr gpu_vma MMU invalidation notifier might be called from
reclaim context and, again, to avoid locking order violations, we can't
take any dma_resv lock nor the gpu_vm->lock from within it.
.. _Invalidation example:
.. code-block:: C
bool gpu_vma_userptr_invalidate(userptr_interval, cur_seq)
{
// Make sure the exec function either sees the new sequence
// and backs off or we wait for the dma-fence:
down_write(&gpu_vm->userptr_notifier_lock);
mmu_interval_set_seq(userptr_interval, cur_seq);
up_write(&gpu_vm->userptr_notifier_lock);
// At this point, the exec function can't succeed in
// submitting a new job, because cur_seq is an invalid
// sequence number and will always cause a retry. When all
// invalidation callbacks, the mmu notifier core will flip
// the sequence number to a valid one. However we need to
// stop gpu access to the old pages here.
dma_resv_wait_timeout(&gpu_vm->resv, DMA_RESV_USAGE_BOOKKEEP,
false, MAX_SCHEDULE_TIMEOUT);
return true;
}
When this invalidation notifier returns, the GPU can no longer be
accessing the old pages of the userptr gpu_vma and needs to redo the
page-binding before a new GPU submission can succeed.
Efficient userptr gpu_vma exec_function iteration
_________________________________________________
If the gpu_vm's list of userptr gpu_vmas becomes large, it's
inefficient to iterate through the complete lists of userptrs on each
exec function to check whether each userptr gpu_vma's saved
sequence number is stale. A solution to this is to put all
*invalidated* userptr gpu_vmas on a separate gpu_vm list and
only check the gpu_vmas present on this list on each exec
function. This list will then lend itself very-well to the spinlock
locking scheme that is
:ref:`described in the spinlock iteration section <Spinlock iteration>`, since
in the mmu notifier, where we add the invalidated gpu_vmas to the
list, it's not possible to take any outer locks like the
``gpu_vm->lock`` or the ``gpu_vm->resv`` lock. Note that the
``gpu_vm->lock`` still needs to be taken while iterating to ensure the list is
complete, as also mentioned in that section.
If using an invalidated userptr list like this, the retry check in the
exec function trivially becomes a check for invalidated list empty.
Locking at bind and unbind time
===============================
At bind time, assuming a GEM object backed gpu_vma, each
gpu_vma needs to be associated with a gpu_vm_bo and that
gpu_vm_bo in turn needs to be added to the GEM object's
gpu_vm_bo list, and possibly to the gpu_vm's external object
list. This is referred to as *linking* the gpu_vma, and typically
requires that the ``gpu_vm->lock`` and the ``gem_object->gpuva_lock``
are held. When unlinking a gpu_vma the same locks should be held,
and that ensures that when iterating over ``gpu_vmas`, either under
the ``gpu_vm->resv`` or the GEM object's dma_resv, that the gpu_vmas
stay alive as long as the lock under which we iterate is not released. For
userptr gpu_vmas it's similarly required that during vma destroy, the
outer ``gpu_vm->lock`` is held, since otherwise when iterating over
the invalidated userptr list as described in the previous section,
there is nothing keeping those userptr gpu_vmas alive.
Locking for recoverable page-fault page-table updates
=====================================================
There are two important things we need to ensure with locking for
recoverable page-faults:
* At the time we return pages back to the system / allocator for
reuse, there should be no remaining GPU mappings and any GPU TLB
must have been flushed.
* The unmapping and mapping of a gpu_vma must not race.
Since the unmapping (or zapping) of GPU ptes is typically taking place
where it is hard or even impossible to take any outer level locks we
must either introduce a new lock that is held at both mapping and
unmapping time, or look at the locks we do hold at unmapping time and
make sure that they are held also at mapping time. For userptr
gpu_vmas, the ``userptr_seqlock`` is held in write mode in the mmu
invalidation notifier where zapping happens. Hence, if the
``userptr_seqlock`` as well as the ``gpu_vm->userptr_notifier_lock``
is held in read mode during mapping, it will not race with the
zapping. For GEM object backed gpu_vmas, zapping will take place under
the GEM object's dma_resv and ensuring that the dma_resv is held also
when populating the page-tables for any gpu_vma pointing to the GEM
object, will similarly ensure we are race-free.
If any part of the mapping is performed asynchronously
under a dma-fence with these locks released, the zapping will need to
wait for that dma-fence to signal under the relevant lock before
starting to modify the page-table.
Since modifying the
page-table structure in a way that frees up page-table memory
might also require outer level locks, the zapping of GPU ptes
typically focuses only on zeroing page-table or page-directory entries
and flushing TLB, whereas freeing of page-table memory is deferred to
unbind or rebind time.

View File

@@ -0,0 +1,13 @@
=======================================
drm/imagination PowerVR Graphics Driver
=======================================
.. kernel-doc:: drivers/gpu/drm/imagination/pvr_drv.c
:doc: PowerVR (Series 6 and later) and IMG Graphics Driver
Contents
========
.. toctree::
:maxdepth: 2
uapi

View File

@@ -0,0 +1,171 @@
====
UAPI
====
The sources associated with this section can be found in ``pvr_drm.h``.
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR UAPI
OBJECT ARRAYS
=============
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_obj_array
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: DRM_PVR_OBJ_ARRAY
IOCTLS
======
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL interface
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: PVR_IOCTL
DEV_QUERY
---------
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL DEV_QUERY interface
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_dev_query
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_dev_query_args
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_dev_query_gpu_info
drm_pvr_dev_query_runtime_info
drm_pvr_dev_query_hwrt_info
drm_pvr_dev_query_quirks
drm_pvr_dev_query_enhancements
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_heap_id
drm_pvr_heap
drm_pvr_dev_query_heap_info
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_static_data_area_usage
drm_pvr_static_data_area
drm_pvr_dev_query_static_data_areas
CREATE_BO
---------
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL CREATE_BO interface
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_create_bo_args
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: Flags for CREATE_BO
GET_BO_MMAP_OFFSET
------------------
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL GET_BO_MMAP_OFFSET interface
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_get_bo_mmap_offset_args
CREATE_VM_CONTEXT and DESTROY_VM_CONTEXT
----------------------------------------
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL CREATE_VM_CONTEXT and DESTROY_VM_CONTEXT interfaces
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_create_vm_context_args
drm_pvr_ioctl_destroy_vm_context_args
VM_MAP and VM_UNMAP
-------------------
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL VM_MAP and VM_UNMAP interfaces
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_vm_map_args
drm_pvr_ioctl_vm_unmap_args
CREATE_CONTEXT and DESTROY_CONTEXT
----------------------------------
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL CREATE_CONTEXT and DESTROY_CONTEXT interfaces
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_create_context_args
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ctx_priority
drm_pvr_ctx_type
drm_pvr_static_render_context_state
drm_pvr_static_render_context_state_format
drm_pvr_reset_framework
drm_pvr_reset_framework_format
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_destroy_context_args
CREATE_FREE_LIST and DESTROY_FREE_LIST
--------------------------------------
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL CREATE_FREE_LIST and DESTROY_FREE_LIST interfaces
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_create_free_list_args
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_destroy_free_list_args
CREATE_HWRT_DATASET and DESTROY_HWRT_DATASET
--------------------------------------------
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL CREATE_HWRT_DATASET and DESTROY_HWRT_DATASET interfaces
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_create_hwrt_dataset_args
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_create_hwrt_geom_data_args
drm_pvr_create_hwrt_rt_data_args
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_destroy_hwrt_dataset_args
SUBMIT_JOBS
-----------
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL SUBMIT_JOBS interface
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: Flags for the drm_pvr_sync_op object.
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_ioctl_submit_jobs_args
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: Flags for SUBMIT_JOB ioctl geometry command.
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: Flags for SUBMIT_JOB ioctl fragment command.
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: Flags for SUBMIT_JOB ioctl compute command.
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: Flags for SUBMIT_JOB ioctl transfer command.
.. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_sync_op
drm_pvr_job_type
drm_pvr_hwrt_data_ref
drm_pvr_job
Internal notes
==============
.. kernel-doc:: drivers/gpu/drm/imagination/pvr_device.h
:doc: IOCTL validation helpers
.. kernel-doc:: drivers/gpu/drm/imagination/pvr_device.h
:identifiers: PVR_STATIC_ASSERT_64BIT_ALIGNED PVR_IOCTL_UNION_PADDING_CHECK
pvr_ioctl_union_padding_check

View File

@@ -7,3 +7,4 @@ Misc DRM driver uAPI- and feature implementation guidelines
.. toctree::
drm-vm-bind-async
drm-vm-bind-locking

View File

@@ -70,35 +70,42 @@ When the time comes for Xe, the protection will be lifted on Xe and kept in i915
Xe Pre-Merge Goals - Work-in-Progress
=======================================
Drm_scheduler
-------------
Xe primarily uses Firmware based scheduling (GuC FW). However, it will use
drm_scheduler as the scheduler frontend for userspace submission in order to
resolve syncobj and dma-buf implicit sync dependencies. However, drm_scheduler is
not yet prepared to handle the 1-to-1 relationship between drm_gpu_scheduler and
drm_sched_entity.
Display integration with i915
-----------------------------
In order to share the display code with the i915 driver so that there is maximum
reuse, the i915/display/ code is built twice, once for i915.ko and then for
xe.ko. Currently, the i915/display code in Xe tree is polluted with many 'ifdefs'
depending on the build target. The goal is to refactor both Xe and i915/display
code simultaneously in order to get a clean result before they land upstream, so
that display can already be part of the initial pull request towards drm-next.
Deeper changes to drm_scheduler should *not* be required to get Xe accepted, but
some consensus needs to be reached between Xe and other community drivers that
could also benefit from this work, for coupling FW based/assisted submission such
as the ARMs new Mali GPU driver, and others.
However, display code should not gate the acceptance of Xe in upstream. Xe
patches will be refactored in a way that display code can be removed, if needed,
from the first pull request of Xe towards drm-next. The expectation is that when
both drivers are part of the drm-tip, the introduction of cleaner patches will be
easier and speed up.
As a key measurable result, the patch series introducing Xe itself shall not
depend on any other patch touching drm_scheduler itself that was not yet merged
through drm-misc. This, by itself, already includes the reach of an agreement for
uniform 1 to 1 relationship implementation / usage across drivers.
Xe uAPI high level overview
=============================
ASYNC VM_BIND
-------------
Although having a common DRM level IOCTL for VM_BIND is not a requirement to get
Xe merged, it is mandatory to have a consensus with other drivers and Mesa.
It needs to be clear how to handle async VM_BIND and interactions with userspace
memory fences. Ideally with helper support so people don't get it wrong in all
possible ways.
...Warning: To be done in follow up patches after/when/where the main consensus in various items are individually reached.
As a key measurable result, the benefits of ASYNC VM_BIND and a discussion of
various flavors, error handling and sample API suggestions are documented in
:doc:`The ASYNC VM_BIND document </gpu/drm-vm-bind-async>`.
Xe Pre-Merge Goals - Completed
================================
Drm_exec
--------
Helper to make dma_resv locking for a big number of buffers is getting removed in
the drm_exec series proposed in https://patchwork.freedesktop.org/patch/524376/
If that happens, Xe needs to change and incorporate the changes in the driver.
The goal is to engage with the Community to understand if the best approach is to
move that to the drivers that are using it or if we should keep the helpers in
place waiting for Xe to get merged.
This item ties into the GPUVA, VM_BIND, and even long-running compute support.
As a key measurable result, we need to have a community consensus documented in
this document and the Xe driver prepared for the changes, if necessary.
Userptr integration and vm_bind
-------------------------------
@@ -123,10 +130,45 @@ Documentation should include:
* O(1) complexity under VM_BIND.
The document is now included in the drm documentation :doc:`here </gpu/drm-vm-bind-async>`.
Some parts of userptr like mmu_notifiers should become GPUVA or DRM helpers when
the second driver supporting VM_BIND+userptr appears. Details to be defined when
the time comes.
The DRM GPUVM helpers do not yet include the userptr parts, but discussions
about implementing them are ongoing.
ASYNC VM_BIND
-------------
Although having a common DRM level IOCTL for VM_BIND is not a requirement to get
Xe merged, it is mandatory to have a consensus with other drivers and Mesa.
It needs to be clear how to handle async VM_BIND and interactions with userspace
memory fences. Ideally with helper support so people don't get it wrong in all
possible ways.
As a key measurable result, the benefits of ASYNC VM_BIND and a discussion of
various flavors, error handling and sample API suggestions are documented in
:doc:`The ASYNC VM_BIND document </gpu/drm-vm-bind-async>`.
Drm_scheduler
-------------
Xe primarily uses Firmware based scheduling (GuC FW). However, it will use
drm_scheduler as the scheduler frontend for userspace submission in order to
resolve syncobj and dma-buf implicit sync dependencies. However, drm_scheduler is
not yet prepared to handle the 1-to-1 relationship between drm_gpu_scheduler and
drm_sched_entity.
Deeper changes to drm_scheduler should *not* be required to get Xe accepted, but
some consensus needs to be reached between Xe and other community drivers that
could also benefit from this work, for coupling FW based/assisted submission such
as the ARMs new Mali GPU driver, and others.
As a key measurable result, the patch series introducing Xe itself shall not
depend on any other patch touching drm_scheduler itself that was not yet merged
through drm-misc. This, by itself, already includes the reach of an agreement for
uniform 1 to 1 relationship implementation / usage across drivers.
Long running compute: minimal data structure/scaffolding
--------------------------------------------------------
The generic scheduler code needs to include the handling of endless compute
@@ -139,46 +181,6 @@ this minimal drm/scheduler work, if needed, merged to drm-misc in a way that any
drm driver, including Xe, could re-use and add their own individual needs on top
in a next stage. However, this should not block the initial merge.
This is a non-blocker item since the driver without the support for the long
running compute enabled is not a showstopper.
Display integration with i915
-----------------------------
In order to share the display code with the i915 driver so that there is maximum
reuse, the i915/display/ code is built twice, once for i915.ko and then for
xe.ko. Currently, the i915/display code in Xe tree is polluted with many 'ifdefs'
depending on the build target. The goal is to refactor both Xe and i915/display
code simultaneously in order to get a clean result before they land upstream, so
that display can already be part of the initial pull request towards drm-next.
However, display code should not gate the acceptance of Xe in upstream. Xe
patches will be refactored in a way that display code can be removed, if needed,
from the first pull request of Xe towards drm-next. The expectation is that when
both drivers are part of the drm-tip, the introduction of cleaner patches will be
easier and speed up.
Drm_exec
--------
Helper to make dma_resv locking for a big number of buffers is getting removed in
the drm_exec series proposed in https://patchwork.freedesktop.org/patch/524376/
If that happens, Xe needs to change and incorporate the changes in the driver.
The goal is to engage with the Community to understand if the best approach is to
move that to the drivers that are using it or if we should keep the helpers in
place waiting for Xe to get merged.
This item ties into the GPUVA, VM_BIND, and even long-running compute support.
As a key measurable result, we need to have a community consensus documented in
this document and the Xe driver prepared for the changes, if necessary.
Xe uAPI high level overview
=============================
...Warning: To be done in follow up patches after/when/where the main consensus in various items are individually reached.
Xe Pre-Merge Goals - Completed
================================
Dev_coredump
------------

View File

@@ -337,8 +337,8 @@ connector register/unregister fixes
Level: Intermediate
Remove load/unload callbacks from all non-DRIVER_LEGACY drivers
---------------------------------------------------------------
Remove load/unload callbacks
----------------------------
The load/unload callbacks in struct &drm_driver are very much midlayers, plus
for historical reasons they get the ordering wrong (and we can't fix that)
@@ -347,8 +347,7 @@ between setting up the &drm_driver structure and calling drm_dev_register().
- Rework drivers to no longer use the load/unload callbacks, directly coding the
load/unload sequence into the driver's probe function.
- Once all non-DRIVER_LEGACY drivers are converted, disallow the load/unload
callbacks for all modern drivers.
- Once all drivers are converted, remove the load/unload callbacks.
Contact: Daniel Vetter
@@ -621,6 +620,23 @@ Contact: Javier Martinez Canillas <javierm@redhat.com>
Level: Intermediate
Clean up and document former selftests suites
---------------------------------------------
Some KUnit test suites (drm_buddy, drm_cmdline_parser, drm_damage_helper,
drm_format, drm_framebuffer, drm_dp_mst_helper, drm_mm, drm_plane_helper and
drm_rect) are former selftests suites that have been converted over when KUnit
was first introduced.
These suites were fairly undocumented, and with different goals than what unit
tests can be. Trying to identify what each test in these suites actually test
for, whether that makes sense for a unit test, and either remove it if it
doesn't or document it if it does would be of great help.
Contact: Maxime Ripard <mripard@kernel.org>
Level: Intermediate
Enable trinity for DRM
----------------------
@@ -765,6 +781,29 @@ Contact: Hans de Goede
Level: Advanced
Buffer age or other damage accumulation algorithm for buffer damage
===================================================================
Drivers that do per-buffer uploads, need a buffer damage handling (rather than
frame damage like drivers that do per-plane or per-CRTC uploads), but there is
no support to get the buffer age or any other damage accumulation algorithm.
For this reason, the damage helpers just fallback to a full plane update if the
framebuffer attached to a plane has changed since the last page-flip. Drivers
set &drm_plane_state.ignore_damage_clips to true as indication to
drm_atomic_helper_damage_iter_init() and drm_atomic_helper_damage_iter_next()
helpers that the damage clips should be ignored.
This should be improved to get damage tracking properly working on drivers that
do per-buffer uploads.
More information about damage tracking and references to learning materials can
be found in :ref:`damage_tracking_properties`.
Contact: Javier Martinez Canillas <javierm@redhat.com>
Level: Advanced
Outside DRM
===========

View File

@@ -193,9 +193,23 @@ Review timelines
Generally speaking, the patches get triaged quickly (in less than
48h). But be patient, if your patch is active in patchwork (i.e. it's
listed on the project's patch list) the chances it was missed are close to zero.
Asking the maintainer for status updates on your
patch is a good way to ensure your patch is ignored or pushed to the
bottom of the priority list.
The high volume of development on netdev makes reviewers move on
from discussions relatively quickly. New comments and replies
are very unlikely to arrive after a week of silence. If a patch
is no longer active in patchwork and the thread went idle for more
than a week - clarify the next steps and/or post the next version.
For RFC postings specifically, if nobody responded in a week - reviewers
either missed the posting or have no strong opinions. If the code is ready,
repost as a PATCH.
Emails saying just "ping" or "bump" are considered rude. If you can't figure
out the status of the patch from patchwork or where the discussion has
landed - describe your best guess and ask if it's correct. For example::
I don't understand what the next steps are. Person X seems to be unhappy
with A, should I do B and repost the patches?
.. _Changes requested:

View File

@@ -338,9 +338,9 @@ Loongson与LoongArch的开发者网站软件与文档资源
LoongArch指令集架构的文档
https://github.com/loongson/LoongArch-Documentation/releases/latest/download/LoongArch-Vol1-v1.02-CN.pdf (中文版)
https://github.com/loongson/LoongArch-Documentation/releases/latest/download/LoongArch-Vol1-v1.10-CN.pdf (中文版)
https://github.com/loongson/LoongArch-Documentation/releases/latest/download/LoongArch-Vol1-v1.02-EN.pdf (英文版)
https://github.com/loongson/LoongArch-Documentation/releases/latest/download/LoongArch-Vol1-v1.10-EN.pdf (英文版)
LoongArch的ELF psABI文档

View File

@@ -6503,8 +6503,7 @@ T: git git://anongit.freedesktop.org/drm/drm-misc
F: drivers/gpu/drm/sun4i/sun8i*
DRM DRIVER FOR ARM PL111 CLCD
M: Emma Anholt <emma@anholt.net>
S: Supported
S: Orphan
T: git git://anongit.freedesktop.org/drm/drm-misc
F: drivers/gpu/drm/pl111/
@@ -6619,8 +6618,7 @@ F: Documentation/devicetree/bindings/display/panel/himax,hx8394.yaml
F: drivers/gpu/drm/panel/panel-himax-hx8394.c
DRM DRIVER FOR HX8357D PANELS
M: Emma Anholt <emma@anholt.net>
S: Maintained
S: Orphan
T: git git://anongit.freedesktop.org/drm/drm-misc
F: Documentation/devicetree/bindings/display/himax,hx8357d.txt
F: drivers/gpu/drm/tiny/hx8357d.c
@@ -7213,8 +7211,8 @@ F: Documentation/devicetree/bindings/display/ti/
F: drivers/gpu/drm/omapdrm/
DRM DRIVERS FOR V3D
M: Emma Anholt <emma@anholt.net>
M: Melissa Wen <mwen@igalia.com>
M: Maíra Canal <mcanal@igalia.com>
S: Supported
T: git git://anongit.freedesktop.org/drm/drm-misc
F: Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml
@@ -7222,7 +7220,6 @@ F: drivers/gpu/drm/v3d/
F: include/uapi/drm/v3d_drm.h
DRM DRIVERS FOR VC4
M: Emma Anholt <emma@anholt.net>
M: Maxime Ripard <mripard@kernel.org>
S: Supported
T: git git://github.com/anholt/linux
@@ -7855,6 +7852,7 @@ R: Yue Hu <huyue2@coolpad.com>
R: Jeffle Xu <jefflexu@linux.alibaba.com>
L: linux-erofs@lists.ozlabs.org
S: Maintained
W: https://erofs.docs.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git
F: Documentation/ABI/testing/sysfs-fs-erofs
F: Documentation/filesystems/erofs.rst
@@ -8950,7 +8948,6 @@ S: Maintained
F: scripts/get_maintainer.pl
GFS2 FILE SYSTEM
M: Bob Peterson <rpeterso@redhat.com>
M: Andreas Gruenbacher <agruenba@redhat.com>
L: gfs2@lists.linux.dev
S: Supported
@@ -10395,6 +10392,17 @@ IMGTEC IR DECODER DRIVER
S: Orphan
F: drivers/media/rc/img-ir/
IMGTEC POWERVR DRM DRIVER
M: Frank Binns <frank.binns@imgtec.com>
M: Donald Robson <donald.robson@imgtec.com>
M: Matt Coster <matt.coster@imgtec.com>
S: Supported
T: git git://anongit.freedesktop.org/drm/drm-misc
F: Documentation/devicetree/bindings/gpu/img,powervr.yaml
F: Documentation/gpu/imagination/
F: drivers/gpu/drm/imagination/
F: include/uapi/drm/pvr_drm.h
IMON SOUNDGRAPH USB IR RECEIVER
M: Sean Young <sean@mess.org>
L: linux-media@vger.kernel.org
@@ -10639,9 +10647,9 @@ M: Rodrigo Vivi <rodrigo.vivi@intel.com>
M: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
L: intel-gfx@lists.freedesktop.org
S: Supported
W: https://01.org/linuxgraphics/
W: https://drm.pages.freedesktop.org/intel-docs/
Q: http://patchwork.freedesktop.org/project/intel-gfx/
B: https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs
B: https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html
C: irc://irc.oftc.net/intel-gfx
T: git git://anongit.freedesktop.org/drm-intel
F: Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
@@ -11025,7 +11033,6 @@ F: drivers/net/wireless/intel/iwlwifi/
INTEL WMI SLIM BOOTLOADER (SBL) FIRMWARE UPDATE DRIVER
M: Jithu Joseph <jithu.joseph@intel.com>
R: Maurice Ma <maurice.ma@intel.com>
S: Maintained
W: https://slimbootloader.github.io/security/firmware-update.html
F: drivers/platform/x86/intel/wmi/sbl-fw-update.c
@@ -13779,7 +13786,6 @@ F: drivers/net/ethernet/mellanox/mlxfw/
MELLANOX HARDWARE PLATFORM SUPPORT
M: Hans de Goede <hdegoede@redhat.com>
M: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
M: Mark Gross <markgross@kernel.org>
M: Vadim Pasternak <vadimp@nvidia.com>
L: platform-driver-x86@vger.kernel.org
S: Supported
@@ -14388,7 +14394,6 @@ F: drivers/platform/surface/surface_gpe.c
MICROSOFT SURFACE HARDWARE PLATFORM SUPPORT
M: Hans de Goede <hdegoede@redhat.com>
M: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
M: Mark Gross <markgross@kernel.org>
M: Maximilian Luz <luzmaximilian@gmail.com>
L: platform-driver-x86@vger.kernel.org
S: Maintained
@@ -14995,6 +15000,7 @@ M: Jakub Kicinski <kuba@kernel.org>
M: Paolo Abeni <pabeni@redhat.com>
L: netdev@vger.kernel.org
S: Maintained
P: Documentation/process/maintainer-netdev.rst
Q: https://patchwork.kernel.org/project/netdevbpf/list/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
@@ -15046,6 +15052,7 @@ M: Jakub Kicinski <kuba@kernel.org>
M: Paolo Abeni <pabeni@redhat.com>
L: netdev@vger.kernel.org
S: Maintained
P: Documentation/process/maintainer-netdev.rst
Q: https://patchwork.kernel.org/project/netdevbpf/list/
B: mailto:netdev@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
@@ -15056,6 +15063,7 @@ F: Documentation/networking/
F: Documentation/process/maintainer-netdev.rst
F: Documentation/userspace-api/netlink/
F: include/linux/in.h
F: include/linux/indirect_call_wrapper.h
F: include/linux/net.h
F: include/linux/netdevice.h
F: include/net/
@@ -21769,7 +21777,9 @@ F: Documentation/devicetree/bindings/counter/ti-eqep.yaml
F: drivers/counter/ti-eqep.c
TI ETHERNET SWITCH DRIVER (CPSW)
R: Grygorii Strashko <grygorii.strashko@ti.com>
R: Siddharth Vadapalli <s-vadapalli@ti.com>
R: Ravi Gunasekaran <r-gunasekaran@ti.com>
R: Roger Quadros <rogerq@kernel.org>
L: linux-omap@vger.kernel.org
L: netdev@vger.kernel.org
S: Maintained
@@ -21793,6 +21803,15 @@ F: Documentation/devicetree/bindings/media/i2c/ti,ds90*
F: drivers/media/i2c/ds90*
F: include/media/i2c/ds90*
TI ICSSG ETHERNET DRIVER (ICSSG)
R: MD Danish Anwar <danishanwar@ti.com>
R: Roger Quadros <rogerq@kernel.org>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
L: netdev@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/net/ti,icss*.yaml
F: drivers/net/ethernet/ti/icssg/*
TI J721E CSI2RX DRIVER
M: Jai Luthra <j-luthra@ti.com>
L: linux-media@vger.kernel.org
@@ -22068,6 +22087,7 @@ F: drivers/watchdog/tqmx86_wdt.c
TRACING
M: Steven Rostedt <rostedt@goodmis.org>
M: Masami Hiramatsu <mhiramat@kernel.org>
R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
L: linux-kernel@vger.kernel.org
L: linux-trace-kernel@vger.kernel.org
S: Maintained
@@ -23654,7 +23674,6 @@ F: drivers/platform/x86/x86-android-tablets/
X86 PLATFORM DRIVERS
M: Hans de Goede <hdegoede@redhat.com>
M: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
M: Mark Gross <markgross@kernel.org>
L: platform-driver-x86@vger.kernel.org
S: Maintained
Q: https://patchwork.kernel.org/project/platform-driver-x86/list/
@@ -23692,6 +23711,20 @@ F: arch/x86/kernel/dumpstack.c
F: arch/x86/kernel/stacktrace.c
F: arch/x86/kernel/unwind_*.c
X86 TRUST DOMAIN EXTENSIONS (TDX)
M: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
R: Dave Hansen <dave.hansen@linux.intel.com>
L: x86@kernel.org
L: linux-coco@lists.linux.dev
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/tdx
F: arch/x86/boot/compressed/tdx*
F: arch/x86/coco/tdx/
F: arch/x86/include/asm/shared/tdx.h
F: arch/x86/include/asm/tdx.h
F: arch/x86/virt/vmx/tdx/
F: drivers/virt/coco/tdx-guest
X86 VDSO
M: Andy Lutomirski <luto@kernel.org>
L: linux-kernel@vger.kernel.org
@@ -23872,8 +23905,7 @@ T: git git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
P: Documentation/filesystems/xfs-maintainer-entry-profile.rst
F: Documentation/ABI/testing/sysfs-fs-xfs
F: Documentation/admin-guide/xfs.rst
F: Documentation/filesystems/xfs-delayed-logging-design.rst
F: Documentation/filesystems/xfs-self-describing-metadata.rst
F: Documentation/filesystems/xfs-*
F: fs/xfs/
F: include/uapi/linux/dqblk_xfs.h
F: include/uapi/linux/fsmap.h

View File

@@ -2,7 +2,7 @@
VERSION = 6
PATCHLEVEL = 7
SUBLEVEL = 0
EXTRAVERSION = -rc1
EXTRAVERSION = -rc3
NAME = Hurr durr I'ma ninja sloth
# *DOCUMENTATION*

View File

@@ -484,7 +484,8 @@ static int __init xen_guest_init(void)
* for secondary CPUs as they are brought up.
* For uniformity we use VCPUOP_register_vcpu_info even on cpu0.
*/
xen_vcpu_info = alloc_percpu(struct vcpu_info);
xen_vcpu_info = __alloc_percpu(sizeof(struct vcpu_info),
1 << fls(sizeof(struct vcpu_info) - 1));
if (xen_vcpu_info == NULL)
return -ENOMEM;

View File

@@ -158,7 +158,7 @@ endif
all: $(notdir $(KBUILD_IMAGE))
vmlinuz.efi: Image
Image vmlinuz.efi: vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@

View File

@@ -21,9 +21,22 @@ static inline bool arch_parse_debug_rodata(char *arg)
extern bool rodata_enabled;
extern bool rodata_full;
if (arg && !strcmp(arg, "full")) {
if (!arg)
return false;
if (!strcmp(arg, "full")) {
rodata_enabled = rodata_full = true;
return true;
}
if (!strcmp(arg, "off")) {
rodata_enabled = rodata_full = false;
return true;
}
if (!strcmp(arg, "on")) {
rodata_enabled = true;
rodata_full = true;
rodata_full = false;
return true;
}

View File

@@ -29,8 +29,8 @@ bool can_set_direct_map(void)
*
* KFENCE pool requires page-granular mapping if initialized late.
*/
return (rodata_enabled && rodata_full) || debug_pagealloc_enabled() ||
arm64_kfence_can_set_direct_map();
return rodata_full || debug_pagealloc_enabled() ||
arm64_kfence_can_set_direct_map();
}
static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
@@ -105,8 +105,7 @@ static int change_memory_common(unsigned long addr, int numpages,
* If we are manipulating read-only permissions, apply the same
* change to the linear mapping of the pages that back this VM area.
*/
if (rodata_enabled &&
rodata_full && (pgprot_val(set_mask) == PTE_RDONLY ||
if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY ||
pgprot_val(clear_mask) == PTE_RDONLY)) {
for (i = 0; i < area->nr_pages; i++) {
__change_memory_common((u64)page_address(area->pages[i]),

View File

@@ -68,6 +68,7 @@ LDFLAGS_vmlinux += -static -n -nostdlib
ifdef CONFIG_AS_HAS_EXPLICIT_RELOCS
cflags-y += $(call cc-option,-mexplicit-relocs)
KBUILD_CFLAGS_KERNEL += $(call cc-option,-mdirect-extern-access)
KBUILD_CFLAGS_KERNEL += $(call cc-option,-fdirect-access-external-data)
KBUILD_AFLAGS_MODULE += $(call cc-option,-fno-direct-access-external-data)
KBUILD_CFLAGS_MODULE += $(call cc-option,-fno-direct-access-external-data)
KBUILD_AFLAGS_MODULE += $(call cc-option,-mno-relax) $(call cc-option,-Wa$(comma)-mno-relax)
@@ -142,6 +143,8 @@ vdso-install-y += arch/loongarch/vdso/vdso.so.dbg
all: $(notdir $(KBUILD_IMAGE))
vmlinuz.efi: vmlinux.efi
vmlinux.elf vmlinux.efi vmlinuz.efi: vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(bootvars-y) $(boot)/$@

View File

@@ -609,8 +609,7 @@
lu32i.d \reg, 0
lu52i.d \reg, \reg, 0
.pushsection ".la_abs", "aw", %progbits
768:
.dword 768b-766b
.dword 766b
.dword \sym
.popsection
#endif

View File

@@ -40,13 +40,13 @@ static __always_inline unsigned long __percpu_##op(void *ptr, \
switch (size) { \
case 4: \
__asm__ __volatile__( \
"am"#asm_op".w" " %[ret], %[val], %[ptr] \n" \
"am"#asm_op".w" " %[ret], %[val], %[ptr] \n" \
: [ret] "=&r" (ret), [ptr] "+ZB"(*(u32 *)ptr) \
: [val] "r" (val)); \
break; \
case 8: \
__asm__ __volatile__( \
"am"#asm_op".d" " %[ret], %[val], %[ptr] \n" \
"am"#asm_op".d" " %[ret], %[val], %[ptr] \n" \
: [ret] "=&r" (ret), [ptr] "+ZB"(*(u64 *)ptr) \
: [val] "r" (val)); \
break; \
@@ -63,7 +63,7 @@ PERCPU_OP(and, and, &)
PERCPU_OP(or, or, |)
#undef PERCPU_OP
static __always_inline unsigned long __percpu_read(void *ptr, int size)
static __always_inline unsigned long __percpu_read(void __percpu *ptr, int size)
{
unsigned long ret;
@@ -100,7 +100,7 @@ static __always_inline unsigned long __percpu_read(void *ptr, int size)
return ret;
}
static __always_inline void __percpu_write(void *ptr, unsigned long val, int size)
static __always_inline void __percpu_write(void __percpu *ptr, unsigned long val, int size)
{
switch (size) {
case 1:
@@ -132,8 +132,7 @@ static __always_inline void __percpu_write(void *ptr, unsigned long val, int siz
}
}
static __always_inline unsigned long __percpu_xchg(void *ptr, unsigned long val,
int size)
static __always_inline unsigned long __percpu_xchg(void *ptr, unsigned long val, int size)
{
switch (size) {
case 1:

View File

@@ -25,7 +25,7 @@ extern void set_merr_handler(unsigned long offset, void *addr, unsigned long len
#ifdef CONFIG_RELOCATABLE
struct rela_la_abs {
long offset;
long pc;
long symvalue;
};

View File

@@ -52,7 +52,7 @@ static inline void __init relocate_absolute(long random_offset)
for (p = begin; (void *)p < end; p++) {
long v = p->symvalue;
uint32_t lu12iw, ori, lu32id, lu52id;
union loongarch_instruction *insn = (void *)p - p->offset;
union loongarch_instruction *insn = (void *)p->pc;
lu12iw = (v >> 12) & 0xfffff;
ori = v & 0xfff;
@@ -102,6 +102,14 @@ static inline __init unsigned long get_random_boot(void)
return hash;
}
static int __init nokaslr(char *p)
{
pr_info("KASLR is disabled.\n");
return 0; /* Print a notice and silence the boot warning */
}
early_param("nokaslr", nokaslr);
static inline __init bool kaslr_disabled(void)
{
char *str;

View File

@@ -58,21 +58,6 @@ static int constant_set_state_oneshot(struct clock_event_device *evt)
return 0;
}
static int constant_set_state_oneshot_stopped(struct clock_event_device *evt)
{
unsigned long timer_config;
raw_spin_lock(&state_lock);
timer_config = csr_read64(LOONGARCH_CSR_TCFG);
timer_config &= ~CSR_TCFG_EN;
csr_write64(timer_config, LOONGARCH_CSR_TCFG);
raw_spin_unlock(&state_lock);
return 0;
}
static int constant_set_state_periodic(struct clock_event_device *evt)
{
unsigned long period;
@@ -92,6 +77,16 @@ static int constant_set_state_periodic(struct clock_event_device *evt)
static int constant_set_state_shutdown(struct clock_event_device *evt)
{
unsigned long timer_config;
raw_spin_lock(&state_lock);
timer_config = csr_read64(LOONGARCH_CSR_TCFG);
timer_config &= ~CSR_TCFG_EN;
csr_write64(timer_config, LOONGARCH_CSR_TCFG);
raw_spin_unlock(&state_lock);
return 0;
}
@@ -161,7 +156,7 @@ int constant_clockevent_init(void)
cd->rating = 320;
cd->cpumask = cpumask_of(cpu);
cd->set_state_oneshot = constant_set_state_oneshot;
cd->set_state_oneshot_stopped = constant_set_state_oneshot_stopped;
cd->set_state_oneshot_stopped = constant_set_state_shutdown;
cd->set_state_periodic = constant_set_state_periodic;
cd->set_state_shutdown = constant_set_state_shutdown;
cd->set_next_event = constant_timer_next_event;

View File

@@ -13,13 +13,13 @@ struct page *dmw_virt_to_page(unsigned long kaddr)
{
return pfn_to_page(virt_to_pfn(kaddr));
}
EXPORT_SYMBOL_GPL(dmw_virt_to_page);
EXPORT_SYMBOL(dmw_virt_to_page);
struct page *tlb_virt_to_page(unsigned long kaddr)
{
return pfn_to_page(pte_pfn(*virt_to_kpte(kaddr)));
}
EXPORT_SYMBOL_GPL(tlb_virt_to_page);
EXPORT_SYMBOL(tlb_virt_to_page);
pgd_t *pgd_alloc(struct mm_struct *mm)
{

View File

@@ -115,9 +115,12 @@ config ARCH_HAS_ILOG2_U64
default n
config GENERIC_BUG
bool
default y
def_bool y
depends on BUG
select GENERIC_BUG_RELATIVE_POINTERS if 64BIT
config GENERIC_BUG_RELATIVE_POINTERS
bool
config GENERIC_HWEIGHT
bool
@@ -140,11 +143,11 @@ config ARCH_MMAP_RND_COMPAT_BITS_MIN
default 8
config ARCH_MMAP_RND_BITS_MAX
default 24 if 64BIT
default 17
default 18 if 64BIT
default 13
config ARCH_MMAP_RND_COMPAT_BITS_MAX
default 17
default 13
# unless you want to implement ACPI on PA-RISC ... ;-)
config PM

View File

@@ -34,7 +34,8 @@ void apply_alternatives(struct alt_instr *start, struct alt_instr *end,
/* Alternative SMP implementation. */
#define ALTERNATIVE(cond, replacement) "!0:" \
".section .altinstructions, \"aw\" !" \
".section .altinstructions, \"a\" !" \
".align 4 !" \
".word (0b-4-.) !" \
".hword 1, " __stringify(cond) " !" \
".word " __stringify(replacement) " !" \
@@ -44,7 +45,8 @@ void apply_alternatives(struct alt_instr *start, struct alt_instr *end,
/* to replace one single instructions by a new instruction */
#define ALTERNATIVE(from, to, cond, replacement)\
.section .altinstructions, "aw" ! \
.section .altinstructions, "a" ! \
.align 4 ! \
.word (from - .) ! \
.hword (to - from)/4, cond ! \
.word replacement ! \
@@ -52,7 +54,8 @@ void apply_alternatives(struct alt_instr *start, struct alt_instr *end,
/* to replace multiple instructions by new code */
#define ALTERNATIVE_CODE(from, num_instructions, cond, new_instr_ptr)\
.section .altinstructions, "aw" ! \
.section .altinstructions, "a" ! \
.align 4 ! \
.word (from - .) ! \
.hword -num_instructions, cond ! \
.word (new_instr_ptr - .) ! \

View File

@@ -574,6 +574,7 @@
*/
#define ASM_EXCEPTIONTABLE_ENTRY(fault_addr, except_addr) \
.section __ex_table,"aw" ! \
.align 4 ! \
.word (fault_addr - .), (except_addr - .) ! \
.previous

View File

@@ -17,24 +17,27 @@
#define PARISC_BUG_BREAK_ASM "break 0x1f, 0x1fff"
#define PARISC_BUG_BREAK_INSN 0x03ffe01f /* PARISC_BUG_BREAK_ASM */
#if defined(CONFIG_64BIT)
#define ASM_WORD_INSN ".dword\t"
#ifdef CONFIG_GENERIC_BUG_RELATIVE_POINTERS
# define __BUG_REL(val) ".word " __stringify(val) " - ."
#else
#define ASM_WORD_INSN ".word\t"
# define __BUG_REL(val) ".word " __stringify(val)
#endif
#ifdef CONFIG_DEBUG_BUGVERBOSE
#define BUG() \
do { \
asm volatile("\n" \
"1:\t" PARISC_BUG_BREAK_ASM "\n" \
"\t.pushsection __bug_table,\"aw\"\n" \
"2:\t" ASM_WORD_INSN "1b, %c0\n" \
"\t.short %c1, %c2\n" \
"\t.org 2b+%c3\n" \
"\t.pushsection __bug_table,\"a\"\n" \
"\t.align 4\n" \
"2:\t" __BUG_REL(1b) "\n" \
"\t" __BUG_REL(%c0) "\n" \
"\t.short %1, %2\n" \
"\t.blockz %3-2*4-2*2\n" \
"\t.popsection" \
: : "i" (__FILE__), "i" (__LINE__), \
"i" (0), "i" (sizeof(struct bug_entry)) ); \
"i" (0), "i" (sizeof(struct bug_entry)) ); \
unreachable(); \
} while(0)
@@ -51,10 +54,12 @@
do { \
asm volatile("\n" \
"1:\t" PARISC_BUG_BREAK_ASM "\n" \
"\t.pushsection __bug_table,\"aw\"\n" \
"2:\t" ASM_WORD_INSN "1b, %c0\n" \
"\t.short %c1, %c2\n" \
"\t.org 2b+%c3\n" \
"\t.pushsection __bug_table,\"a\"\n" \
"\t.align 4\n" \
"2:\t" __BUG_REL(1b) "\n" \
"\t" __BUG_REL(%c0) "\n" \
"\t.short %1, %2\n" \
"\t.blockz %3-2*4-2*2\n" \
"\t.popsection" \
: : "i" (__FILE__), "i" (__LINE__), \
"i" (BUGFLAG_WARNING|(flags)), \
@@ -65,10 +70,11 @@
do { \
asm volatile("\n" \
"1:\t" PARISC_BUG_BREAK_ASM "\n" \
"\t.pushsection __bug_table,\"aw\"\n" \
"2:\t" ASM_WORD_INSN "1b\n" \
"\t.short %c0\n" \
"\t.org 2b+%c1\n" \
"\t.pushsection __bug_table,\"a\"\n" \
"\t.align %2\n" \
"2:\t" __BUG_REL(1b) "\n" \
"\t.short %0\n" \
"\t.blockz %1-4-2\n" \
"\t.popsection" \
: : "i" (BUGFLAG_WARNING|(flags)), \
"i" (sizeof(struct bug_entry)) ); \

View File

@@ -349,15 +349,7 @@ struct pt_regs; /* forward declaration... */
#define ELF_HWCAP 0
/* Masks for stack and mmap randomization */
#define BRK_RND_MASK (is_32bit_task() ? 0x07ffUL : 0x3ffffUL)
#define MMAP_RND_MASK (is_32bit_task() ? 0x1fffUL : 0x3ffffUL)
#define STACK_RND_MASK MMAP_RND_MASK
struct mm_struct;
extern unsigned long arch_randomize_brk(struct mm_struct *);
#define arch_randomize_brk arch_randomize_brk
#define STACK_RND_MASK 0x7ff /* 8MB of VA */
#define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
struct linux_binprm;

View File

@@ -15,10 +15,12 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
asm_volatile_goto("1:\n\t"
"nop\n\t"
".pushsection __jump_table, \"aw\"\n\t"
".align %1\n\t"
".word 1b - ., %l[l_yes] - .\n\t"
__stringify(ASM_ULONG_INSN) " %c0 - .\n\t"
".popsection\n\t"
: : "i" (&((char *)key)[branch]) : : l_yes);
: : "i" (&((char *)key)[branch]), "i" (sizeof(long))
: : l_yes);
return false;
l_yes:
@@ -30,10 +32,12 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
asm_volatile_goto("1:\n\t"
"b,n %l[l_yes]\n\t"
".pushsection __jump_table, \"aw\"\n\t"
".align %1\n\t"
".word 1b - ., %l[l_yes] - .\n\t"
__stringify(ASM_ULONG_INSN) " %c0 - .\n\t"
".popsection\n\t"
: : "i" (&((char *)key)[branch]) : : l_yes);
: : "i" (&((char *)key)[branch]), "i" (sizeof(long))
: : l_yes);
return false;
l_yes:

View File

@@ -55,7 +55,7 @@
})
#ifdef CONFIG_SMP
# define __lock_aligned __section(".data..lock_aligned")
# define __lock_aligned __section(".data..lock_aligned") __aligned(16)
#endif
#endif /* __PARISC_LDCW_H */

View File

@@ -47,6 +47,8 @@
#ifndef __ASSEMBLY__
struct rlimit;
unsigned long mmap_upper_limit(struct rlimit *rlim_stack);
unsigned long calc_max_stack_size(unsigned long stack_max);
/*

View File

@@ -41,6 +41,7 @@ struct exception_table_entry {
#define ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr )\
".section __ex_table,\"aw\"\n" \
".align 4\n" \
".word (" #fault_addr " - .), (" #except_addr " - .)\n\t" \
".previous\n"

View File

@@ -75,7 +75,6 @@
/* We now return you to your regularly scheduled HPUX. */
#define ENOSYM 215 /* symbol does not exist in executable */
#define ENOTSOCK 216 /* Socket operation on non-socket */
#define EDESTADDRREQ 217 /* Destination address required */
#define EMSGSIZE 218 /* Message too long */
@@ -101,7 +100,6 @@
#define ETIMEDOUT 238 /* Connection timed out */
#define ECONNREFUSED 239 /* Connection refused */
#define EREFUSED ECONNREFUSED /* for HP's NFS apparently */
#define EREMOTERELEASE 240 /* Remote peer released connection */
#define EHOSTDOWN 241 /* Host is down */
#define EHOSTUNREACH 242 /* No route to host */

View File

@@ -383,7 +383,7 @@ show_cpuinfo (struct seq_file *m, void *v)
char cpu_name[60], *p;
/* strip PA path from CPU name to not confuse lscpu */
strlcpy(cpu_name, per_cpu(cpu_data, 0).dev->name, sizeof(cpu_name));
strscpy(cpu_name, per_cpu(cpu_data, 0).dev->name, sizeof(cpu_name));
p = strrchr(cpu_name, '[');
if (p)
*(--p) = 0;

View File

@@ -77,7 +77,7 @@ unsigned long calc_max_stack_size(unsigned long stack_max)
* indicating that "current" should be used instead of a passed-in
* value from the exec bprm as done with arch_pick_mmap_layout().
*/
static unsigned long mmap_upper_limit(struct rlimit *rlim_stack)
unsigned long mmap_upper_limit(struct rlimit *rlim_stack)
{
unsigned long stack_base;

View File

@@ -130,6 +130,7 @@ SECTIONS
RO_DATA(8)
/* unwind info */
. = ALIGN(4);
.PARISC.unwind : {
__start___unwind = .;
*(.PARISC.unwind)

View File

@@ -228,7 +228,6 @@ typedef struct thread_struct thread_struct;
execve_tail(); \
} while (0)
/* Forward declaration, a strange C thing */
struct task_struct;
struct mm_struct;
struct seq_file;

View File

@@ -666,6 +666,7 @@ static int __init ipl_init(void)
&ipl_ccw_attr_group_lpar);
break;
case IPL_TYPE_ECKD:
case IPL_TYPE_ECKD_DUMP:
rc = sysfs_create_group(&ipl_kset->kobj, &ipl_eckd_attr_group);
break;
case IPL_TYPE_FCP:

View File

@@ -279,12 +279,6 @@ static int paicrypt_event_init(struct perf_event *event)
if (IS_ERR(cpump))
return PTR_ERR(cpump);
/* Event initialization sets last_tag to 0. When later on the events
* are deleted and re-added, do not reset the event count value to zero.
* Events are added, deleted and re-added when 2 or more events
* are active at the same time.
*/
event->hw.last_tag = 0;
event->destroy = paicrypt_event_destroy;
if (a->sample_period) {
@@ -318,6 +312,11 @@ static void paicrypt_start(struct perf_event *event, int flags)
{
u64 sum;
/* Event initialization sets last_tag to 0. When later on the events
* are deleted and re-added, do not reset the event count value to zero.
* Events are added, deleted and re-added when 2 or more events
* are active at the same time.
*/
if (!event->hw.last_tag) {
event->hw.last_tag = 1;
sum = paicrypt_getall(event); /* Get current value */

View File

@@ -260,7 +260,6 @@ static int paiext_event_init(struct perf_event *event)
rc = paiext_alloc(a, event);
if (rc)
return rc;
event->hw.last_tag = 0;
event->destroy = paiext_event_destroy;
if (a->sample_period) {

View File

@@ -4660,7 +4660,7 @@ static void intel_pmu_check_hybrid_pmus(struct x86_hybrid_pmu *pmu)
if (pmu->intel_cap.pebs_output_pt_available)
pmu->pmu.capabilities |= PERF_PMU_CAP_AUX_OUTPUT;
else
pmu->pmu.capabilities |= ~PERF_PMU_CAP_AUX_OUTPUT;
pmu->pmu.capabilities &= ~PERF_PMU_CAP_AUX_OUTPUT;
intel_pmu_check_event_constraints(pmu->event_constraints,
pmu->num_counters,

View File

@@ -15,6 +15,7 @@
#include <linux/io.h>
#include <asm/apic.h>
#include <asm/desc.h>
#include <asm/e820/api.h>
#include <asm/sev.h>
#include <asm/ibt.h>
#include <asm/hypervisor.h>
@@ -286,15 +287,31 @@ static int hv_cpu_die(unsigned int cpu)
static int __init hv_pci_init(void)
{
int gen2vm = efi_enabled(EFI_BOOT);
bool gen2vm = efi_enabled(EFI_BOOT);
/*
* For Generation-2 VM, we exit from pci_arch_init() by returning 0.
* The purpose is to suppress the harmless warning:
* A Generation-2 VM doesn't support legacy PCI/PCIe, so both
* raw_pci_ops and raw_pci_ext_ops are NULL, and pci_subsys_init() ->
* pcibios_init() doesn't call pcibios_resource_survey() ->
* e820__reserve_resources_late(); as a result, any emulated persistent
* memory of E820_TYPE_PRAM (12) via the kernel parameter
* memmap=nn[KMG]!ss is not added into iomem_resource and hence can't be
* detected by register_e820_pmem(). Fix this by directly calling
* e820__reserve_resources_late() here: e820__reserve_resources_late()
* depends on e820__reserve_resources(), which has been called earlier
* from setup_arch(). Note: e820__reserve_resources_late() also adds
* any memory of E820_TYPE_PMEM (7) into iomem_resource, and
* acpi_nfit_register_region() -> acpi_nfit_insert_resource() ->
* region_intersects() returns REGION_INTERSECTS, so the memory of
* E820_TYPE_PMEM won't get added twice.
*
* We return 0 here so that pci_arch_init() won't print the warning:
* "PCI: Fatal: No config space access function found"
*/
if (gen2vm)
if (gen2vm) {
e820__reserve_resources_late();
return 0;
}
/* For Generation-1 VM, we'll proceed in pci_arch_init(). */
return 1;

View File

@@ -16,6 +16,9 @@
#include <asm/x86_init.h>
#include <asm/cpufeature.h>
#include <asm/irq_vectors.h>
#include <asm/xen/hypervisor.h>
#include <xen/xen.h>
#ifdef CONFIG_ACPI_APEI
# include <asm/pgtable_types.h>
@@ -127,6 +130,17 @@ static inline void arch_acpi_set_proc_cap_bits(u32 *cap)
if (!cpu_has(c, X86_FEATURE_MWAIT) ||
boot_option_idle_override == IDLE_NOMWAIT)
*cap &= ~(ACPI_PROC_CAP_C_C1_FFH | ACPI_PROC_CAP_C_C2C3_FFH);
if (xen_initial_domain()) {
/*
* When Linux is running as Xen dom0, the hypervisor is the
* entity in charge of the processor power management, and so
* Xen needs to check the OS capabilities reported in the
* processor capabilities buffer matches what the hypervisor
* driver supports.
*/
xen_sanitize_proc_cap_bits(cap);
}
}
static inline bool acpi_has_cpu_in_madt(void)

View File

@@ -100,4 +100,13 @@ static inline void leave_lazy(enum xen_lazy_mode mode)
enum xen_lazy_mode xen_get_lazy_mode(void);
#if defined(CONFIG_XEN_DOM0) && defined(CONFIG_ACPI)
void xen_sanitize_proc_cap_bits(uint32_t *buf);
#else
static inline void xen_sanitize_proc_cap_bits(uint32_t *buf)
{
BUG();
}
#endif
#endif /* _ASM_X86_XEN_HYPERVISOR_H */

View File

@@ -63,6 +63,7 @@ int acpi_fix_pin2_polarity __initdata;
#ifdef CONFIG_X86_LOCAL_APIC
static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
static bool has_lapic_cpus __initdata;
static bool acpi_support_online_capable;
#endif
@@ -232,6 +233,14 @@ acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end)
if (!acpi_is_processor_usable(processor->lapic_flags))
return 0;
/*
* According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure
* when MADT provides both valid LAPIC and x2APIC entries, the APIC ID
* in x2APIC must be equal or greater than 0xff.
*/
if (has_lapic_cpus && apic_id < 0xff)
return 0;
/*
* We need to register disabled CPU as well to permit
* counting disabled CPUs. This allows us to size
@@ -1114,10 +1123,7 @@ static int __init early_acpi_parse_madt_lapic_addr_ovr(void)
static int __init acpi_parse_madt_lapic_entries(void)
{
int count;
int x2count = 0;
int ret;
struct acpi_subtable_proc madt_proc[2];
int count, x2count = 0;
if (!boot_cpu_has(X86_FEATURE_APIC))
return -ENODEV;
@@ -1126,21 +1132,11 @@ static int __init acpi_parse_madt_lapic_entries(void)
acpi_parse_sapic, MAX_LOCAL_APIC);
if (!count) {
memset(madt_proc, 0, sizeof(madt_proc));
madt_proc[0].id = ACPI_MADT_TYPE_LOCAL_APIC;
madt_proc[0].handler = acpi_parse_lapic;
madt_proc[1].id = ACPI_MADT_TYPE_LOCAL_X2APIC;
madt_proc[1].handler = acpi_parse_x2apic;
ret = acpi_table_parse_entries_array(ACPI_SIG_MADT,
sizeof(struct acpi_table_madt),
madt_proc, ARRAY_SIZE(madt_proc), MAX_LOCAL_APIC);
if (ret < 0) {
pr_err("Error parsing LAPIC/X2APIC entries\n");
return ret;
}
count = madt_proc[0].count;
x2count = madt_proc[1].count;
count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
acpi_parse_lapic, MAX_LOCAL_APIC);
has_lapic_cpus = count > 0;
x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
acpi_parse_x2apic, MAX_LOCAL_APIC);
}
if (!count && !x2count) {
pr_err("No LAPIC entries present\n");

View File

@@ -104,8 +104,6 @@ struct cont_desc {
size_t size;
};
static u32 ucode_new_rev;
/*
* Microcode patch container file is prepended to the initrd in cpio
* format. See Documentation/arch/x86/microcode.rst
@@ -442,12 +440,11 @@ static int __apply_microcode_amd(struct microcode_amd *mc)
*
* Returns true if container found (sets @desc), false otherwise.
*/
static bool early_apply_microcode(u32 cpuid_1_eax, void *ucode, size_t size)
static bool early_apply_microcode(u32 cpuid_1_eax, u32 old_rev, void *ucode, size_t size)
{
struct cont_desc desc = { 0 };
struct microcode_amd *mc;
bool ret = false;
u32 rev, dummy;
desc.cpuid_1_eax = cpuid_1_eax;
@@ -457,22 +454,15 @@ static bool early_apply_microcode(u32 cpuid_1_eax, void *ucode, size_t size)
if (!mc)
return ret;
native_rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy);
/*
* Allow application of the same revision to pick up SMT-specific
* changes even if the revision of the other SMT thread is already
* up-to-date.
*/
if (rev > mc->hdr.patch_id)
if (old_rev > mc->hdr.patch_id)
return ret;
if (!__apply_microcode_amd(mc)) {
ucode_new_rev = mc->hdr.patch_id;
ret = true;
}
return ret;
return !__apply_microcode_amd(mc);
}
static bool get_builtin_microcode(struct cpio_data *cp, unsigned int family)
@@ -506,9 +496,12 @@ static void __init find_blobs_in_containers(unsigned int cpuid_1_eax, struct cpi
*ret = cp;
}
void __init load_ucode_amd_bsp(unsigned int cpuid_1_eax)
void __init load_ucode_amd_bsp(struct early_load_data *ed, unsigned int cpuid_1_eax)
{
struct cpio_data cp = { };
u32 dummy;
native_rdmsr(MSR_AMD64_PATCH_LEVEL, ed->old_rev, dummy);
/* Needed in load_microcode_amd() */
ucode_cpu_info[0].cpu_sig.sig = cpuid_1_eax;
@@ -517,7 +510,8 @@ void __init load_ucode_amd_bsp(unsigned int cpuid_1_eax)
if (!(cp.data && cp.size))
return;
early_apply_microcode(cpuid_1_eax, cp.data, cp.size);
if (early_apply_microcode(cpuid_1_eax, ed->old_rev, cp.data, cp.size))
native_rdmsr(MSR_AMD64_PATCH_LEVEL, ed->new_rev, dummy);
}
static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size);
@@ -625,10 +619,8 @@ void reload_ucode_amd(unsigned int cpu)
rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy);
if (rev < mc->hdr.patch_id) {
if (!__apply_microcode_amd(mc)) {
ucode_new_rev = mc->hdr.patch_id;
pr_info("reload patch_level=0x%08x\n", ucode_new_rev);
}
if (!__apply_microcode_amd(mc))
pr_info_once("reload revision: 0x%08x\n", mc->hdr.patch_id);
}
}
@@ -649,8 +641,6 @@ static int collect_cpu_info_amd(int cpu, struct cpu_signature *csig)
if (p && (p->patch_id == csig->rev))
uci->mc = p->data;
pr_info("CPU%d: patch_level=0x%08x\n", cpu, csig->rev);
return 0;
}
@@ -691,8 +681,6 @@ static enum ucode_state apply_microcode_amd(int cpu)
rev = mc_amd->hdr.patch_id;
ret = UCODE_UPDATED;
pr_info("CPU%d: new patch_level=0x%08x\n", cpu, rev);
out:
uci->cpu_sig.rev = rev;
c->microcode = rev;
@@ -935,11 +923,6 @@ struct microcode_ops * __init init_amd_microcode(void)
pr_warn("AMD CPU family 0x%x not supported\n", c->x86);
return NULL;
}
if (ucode_new_rev)
pr_info_once("microcode updated early to new patch_level=0x%08x\n",
ucode_new_rev);
return &microcode_amd_ops;
}

View File

@@ -41,8 +41,6 @@
#include "internal.h"
#define DRIVER_VERSION "2.2"
static struct microcode_ops *microcode_ops;
bool dis_ucode_ldr = true;
@@ -77,6 +75,8 @@ static u32 final_levels[] = {
0, /* T-101 terminator */
};
struct early_load_data early_data;
/*
* Check the current patch level on this CPU.
*
@@ -155,9 +155,9 @@ void __init load_ucode_bsp(void)
return;
if (intel)
load_ucode_intel_bsp();
load_ucode_intel_bsp(&early_data);
else
load_ucode_amd_bsp(cpuid_1_eax);
load_ucode_amd_bsp(&early_data, cpuid_1_eax);
}
void load_ucode_ap(void)
@@ -828,6 +828,11 @@ static int __init microcode_init(void)
if (!microcode_ops)
return -ENODEV;
pr_info_once("Current revision: 0x%08x\n", (early_data.new_rev ?: early_data.old_rev));
if (early_data.new_rev)
pr_info_once("Updated early from: 0x%08x\n", early_data.old_rev);
microcode_pdev = platform_device_register_simple("microcode", -1, NULL, 0);
if (IS_ERR(microcode_pdev))
return PTR_ERR(microcode_pdev);
@@ -846,8 +851,6 @@ static int __init microcode_init(void)
cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/microcode:online",
mc_cpu_online, mc_cpu_down_prep);
pr_info("Microcode Update Driver: v%s.", DRIVER_VERSION);
return 0;
out_pdev:

View File

@@ -339,16 +339,9 @@ static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
static enum ucode_state apply_microcode_early(struct ucode_cpu_info *uci)
{
struct microcode_intel *mc = uci->mc;
enum ucode_state ret;
u32 cur_rev, date;
u32 cur_rev;
ret = __apply_microcode(uci, mc, &cur_rev);
if (ret == UCODE_UPDATED) {
date = mc->hdr.date;
pr_info_once("updated early: 0x%x -> 0x%x, date = %04x-%02x-%02x\n",
cur_rev, mc->hdr.rev, date & 0xffff, date >> 24, (date >> 16) & 0xff);
}
return ret;
return __apply_microcode(uci, mc, &cur_rev);
}
static __init bool load_builtin_intel_microcode(struct cpio_data *cp)
@@ -413,13 +406,17 @@ static int __init save_builtin_microcode(void)
early_initcall(save_builtin_microcode);
/* Load microcode on BSP from initrd or builtin blobs */
void __init load_ucode_intel_bsp(void)
void __init load_ucode_intel_bsp(struct early_load_data *ed)
{
struct ucode_cpu_info uci;
ed->old_rev = intel_get_microcode_revision();
uci.mc = get_microcode_blob(&uci, false);
if (uci.mc && apply_microcode_early(&uci) == UCODE_UPDATED)
ucode_patch_va = UCODE_BSP_LOADED;
ed->new_rev = uci.cpu_sig.rev;
}
void load_ucode_intel_ap(void)

View File

@@ -37,6 +37,12 @@ struct microcode_ops {
use_nmi : 1;
};
struct early_load_data {
u32 old_rev;
u32 new_rev;
};
extern struct early_load_data early_data;
extern struct ucode_cpu_info ucode_cpu_info[];
struct cpio_data find_microcode_in_initrd(const char *path);
@@ -92,14 +98,14 @@ extern bool dis_ucode_ldr;
extern bool force_minrev;
#ifdef CONFIG_CPU_SUP_AMD
void load_ucode_amd_bsp(unsigned int family);
void load_ucode_amd_bsp(struct early_load_data *ed, unsigned int family);
void load_ucode_amd_ap(unsigned int family);
int save_microcode_in_initrd_amd(unsigned int family);
void reload_ucode_amd(unsigned int cpu);
struct microcode_ops *init_amd_microcode(void);
void exit_amd_microcode(void);
#else /* CONFIG_CPU_SUP_AMD */
static inline void load_ucode_amd_bsp(unsigned int family) { }
static inline void load_ucode_amd_bsp(struct early_load_data *ed, unsigned int family) { }
static inline void load_ucode_amd_ap(unsigned int family) { }
static inline int save_microcode_in_initrd_amd(unsigned int family) { return -EINVAL; }
static inline void reload_ucode_amd(unsigned int cpu) { }
@@ -108,12 +114,12 @@ static inline void exit_amd_microcode(void) { }
#endif /* !CONFIG_CPU_SUP_AMD */
#ifdef CONFIG_CPU_SUP_INTEL
void load_ucode_intel_bsp(void);
void load_ucode_intel_bsp(struct early_load_data *ed);
void load_ucode_intel_ap(void);
void reload_ucode_intel(void);
struct microcode_ops *init_intel_microcode(void);
#else /* CONFIG_CPU_SUP_INTEL */
static inline void load_ucode_intel_bsp(void) { }
static inline void load_ucode_intel_bsp(struct early_load_data *ed) { }
static inline void load_ucode_intel_ap(void) { }
static inline void reload_ucode_intel(void) { }
static inline struct microcode_ops *init_intel_microcode(void) { return NULL; }

View File

@@ -262,11 +262,14 @@ static uint32_t __init ms_hyperv_platform(void)
static int hv_nmi_unknown(unsigned int val, struct pt_regs *regs)
{
static atomic_t nmi_cpu = ATOMIC_INIT(-1);
unsigned int old_cpu, this_cpu;
if (!unknown_nmi_panic)
return NMI_DONE;
if (atomic_cmpxchg(&nmi_cpu, -1, raw_smp_processor_id()) != -1)
old_cpu = -1;
this_cpu = raw_smp_processor_id();
if (!atomic_try_cmpxchg(&nmi_cpu, &old_cpu, this_cpu))
return NMI_HANDLED;
return NMI_DONE;

View File

@@ -175,9 +175,6 @@ int x64_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
frame = get_sigframe(ksig, regs, sizeof(struct rt_sigframe), &fp);
uc_flags = frame_uc_flags(regs);
if (setup_signal_shadow_stack(ksig))
return -EFAULT;
if (!user_access_begin(frame, sizeof(*frame)))
return -EFAULT;
@@ -198,6 +195,9 @@ int x64_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
return -EFAULT;
}
if (setup_signal_shadow_stack(ksig))
return -EFAULT;
/* Set up registers for signal handler */
regs->di = ksig->sig;
/* In case the signal handler was declared without prototypes */

View File

@@ -425,6 +425,8 @@ void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors)
void bdev_add(struct block_device *bdev, dev_t dev)
{
if (bdev_stable_writes(bdev))
mapping_set_stable_writes(bdev->bd_inode->i_mapping);
bdev->bd_dev = dev;
bdev->bd_inode->i_rdev = dev;
bdev->bd_inode->i_ino = dev;

View File

@@ -577,6 +577,7 @@ static void blkg_destroy_all(struct gendisk *disk)
struct request_queue *q = disk->queue;
struct blkcg_gq *blkg, *n;
int count = BLKG_DESTROY_BATCH_SIZE;
int i;
restart:
spin_lock_irq(&q->queue_lock);
@@ -602,6 +603,18 @@ restart:
}
}
/*
* Mark policy deactivated since policy offline has been done, and
* the free is scheduled, so future blkcg_deactivate_policy() can
* be bypassed
*/
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
if (pol)
__clear_bit(pol->plid, q->blkcg_pols);
}
q->root_blkg = NULL;
spin_unlock_irq(&q->queue_lock);
}

View File

@@ -249,8 +249,6 @@ static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg,
{
struct blkcg_gq *blkg;
WARN_ON_ONCE(!rcu_read_lock_held());
if (blkcg == &blkcg_root)
return q->root_blkg;

View File

@@ -2858,11 +2858,8 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
};
struct request *rq;
if (unlikely(bio_queue_enter(bio)))
return NULL;
if (blk_mq_attempt_bio_merge(q, bio, nsegs))
goto queue_exit;
return NULL;
rq_qos_throttle(q, bio);
@@ -2878,35 +2875,23 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
rq_qos_cleanup(q, bio);
if (bio->bi_opf & REQ_NOWAIT)
bio_wouldblock_error(bio);
queue_exit:
blk_queue_exit(q);
return NULL;
}
static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
struct blk_plug *plug, struct bio **bio, unsigned int nsegs)
/* return true if this @rq can be used for @bio */
static bool blk_mq_can_use_cached_rq(struct request *rq, struct blk_plug *plug,
struct bio *bio)
{
struct request *rq;
enum hctx_type type, hctx_type;
enum hctx_type type = blk_mq_get_hctx_type(bio->bi_opf);
enum hctx_type hctx_type = rq->mq_hctx->type;
if (!plug)
return NULL;
rq = rq_list_peek(&plug->cached_rq);
if (!rq || rq->q != q)
return NULL;
WARN_ON_ONCE(rq_list_peek(&plug->cached_rq) != rq);
if (blk_mq_attempt_bio_merge(q, *bio, nsegs)) {
*bio = NULL;
return NULL;
}
type = blk_mq_get_hctx_type((*bio)->bi_opf);
hctx_type = rq->mq_hctx->type;
if (type != hctx_type &&
!(type == HCTX_TYPE_READ && hctx_type == HCTX_TYPE_DEFAULT))
return NULL;
if (op_is_flush(rq->cmd_flags) != op_is_flush((*bio)->bi_opf))
return NULL;
return false;
if (op_is_flush(rq->cmd_flags) != op_is_flush(bio->bi_opf))
return false;
/*
* If any qos ->throttle() end up blocking, we will have flushed the
@@ -2914,12 +2899,12 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
* before we throttle.
*/
plug->cached_rq = rq_list_next(rq);
rq_qos_throttle(q, *bio);
rq_qos_throttle(rq->q, bio);
blk_mq_rq_time_init(rq, 0);
rq->cmd_flags = (*bio)->bi_opf;
rq->cmd_flags = bio->bi_opf;
INIT_LIST_HEAD(&rq->queuelist);
return rq;
return true;
}
static void bio_set_ioprio(struct bio *bio)
@@ -2949,7 +2934,7 @@ void blk_mq_submit_bio(struct bio *bio)
struct blk_plug *plug = blk_mq_plug(bio);
const int is_sync = op_is_sync(bio->bi_opf);
struct blk_mq_hw_ctx *hctx;
struct request *rq;
struct request *rq = NULL;
unsigned int nr_segs = 1;
blk_status_t ret;
@@ -2960,20 +2945,36 @@ void blk_mq_submit_bio(struct bio *bio)
return;
}
if (!bio_integrity_prep(bio))
return;
bio_set_ioprio(bio);
rq = blk_mq_get_cached_request(q, plug, &bio, nr_segs);
if (!rq) {
if (!bio)
if (plug) {
rq = rq_list_peek(&plug->cached_rq);
if (rq && rq->q != q)
rq = NULL;
}
if (rq) {
if (!bio_integrity_prep(bio))
return;
rq = blk_mq_get_new_requests(q, plug, bio, nr_segs);
if (unlikely(!rq))
if (blk_mq_attempt_bio_merge(q, bio, nr_segs))
return;
if (blk_mq_can_use_cached_rq(rq, plug, bio))
goto done;
percpu_ref_get(&q->q_usage_counter);
} else {
if (unlikely(bio_queue_enter(bio)))
return;
if (!bio_integrity_prep(bio))
goto fail;
}
rq = blk_mq_get_new_requests(q, plug, bio, nr_segs);
if (unlikely(!rq)) {
fail:
blk_queue_exit(q);
return;
}
done:
trace_block_getrq(bio);
rq_qos_track(q, rq, bio);

View File

@@ -163,38 +163,15 @@ EXPORT_SYMBOL(blk_pre_runtime_resume);
* @q: the queue of the device
*
* Description:
* For historical reasons, this routine merely calls blk_set_runtime_active()
* to do the real work of restarting the queue. It does this regardless of
* whether the device's runtime-resume succeeded; even if it failed the
* Restart the queue of a runtime suspended device. It does this regardless
* of whether the device's runtime-resume succeeded; even if it failed the
* driver or error handler will need to communicate with the device.
*
* This function should be called near the end of the device's
* runtime_resume callback.
* runtime_resume callback to correct queue runtime PM status and re-enable
* peeking requests from the queue.
*/
void blk_post_runtime_resume(struct request_queue *q)
{
blk_set_runtime_active(q);
}
EXPORT_SYMBOL(blk_post_runtime_resume);
/**
* blk_set_runtime_active - Force runtime status of the queue to be active
* @q: the queue of the device
*
* If the device is left runtime suspended during system suspend the resume
* hook typically resumes the device and corrects runtime status
* accordingly. However, that does not affect the queue runtime PM status
* which is still "suspended". This prevents processing requests from the
* queue.
*
* This function can be used in driver's resume hook to correct queue
* runtime PM status and re-enable peeking requests from the queue. It
* should be called before first request is added to the queue.
*
* This function is also called by blk_post_runtime_resume() for
* runtime resumes. It does everything necessary to restart the queue.
*/
void blk_set_runtime_active(struct request_queue *q)
{
int old_status;
@@ -211,4 +188,4 @@ void blk_set_runtime_active(struct request_queue *q)
if (old_status != RPM_ACTIVE)
blk_clear_pm_only(q);
}
EXPORT_SYMBOL(blk_set_runtime_active);
EXPORT_SYMBOL(blk_post_runtime_resume);

View File

@@ -1320,6 +1320,7 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
tg_bps_limit(tg, READ), tg_bps_limit(tg, WRITE),
tg_iops_limit(tg, READ), tg_iops_limit(tg, WRITE));
rcu_read_lock();
/*
* Update has_rules[] flags for the updated tg's subtree. A tg is
* considered to have rules if either the tg itself or any of its
@@ -1347,6 +1348,7 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
this_tg->latency_target = max(this_tg->latency_target,
parent_tg->latency_target);
}
rcu_read_unlock();
/*
* We're already holding queue_lock and know @tg is valid. Let's

View File

@@ -11,6 +11,7 @@
#include <linux/idr.h>
#include <drm/drm_accel.h>
#include <drm/drm_auth.h>
#include <drm/drm_debugfs.h>
#include <drm/drm_drv.h>
#include <drm/drm_file.h>

View File

@@ -1,16 +1,17 @@
# SPDX-License-Identifier: GPL-2.0-only
config DRM_ACCEL_IVPU
tristate "Intel VPU for Meteor Lake and newer"
tristate "Intel NPU (Neural Processing Unit)"
depends on DRM_ACCEL
depends on X86_64 && !UML
depends on PCI && PCI_MSI
select FW_LOADER
select SHMEM
select DRM_GEM_SHMEM_HELPER
select GENERIC_ALLOCATOR
help
Choose this option if you have a system that has an 14th generation Intel CPU
or newer. VPU stands for Versatile Processing Unit and it's a CPU-integrated
inference accelerator for Computer Vision and Deep Learning applications.
Choose this option if you have a system with an 14th generation
Intel CPU (Meteor Lake) or newer. Intel NPU (formerly called Intel VPU)
is a CPU-integrated inference accelerator for Computer Vision
and Deep Learning applications.
If "M" is selected, the module will be called intel_vpu.

View File

@@ -14,6 +14,7 @@
#include "ivpu_fw.h"
#include "ivpu_fw_log.h"
#include "ivpu_gem.h"
#include "ivpu_hw.h"
#include "ivpu_jsm_msg.h"
#include "ivpu_pm.h"
@@ -115,6 +116,31 @@ static const struct drm_debugfs_info vdev_debugfs_list[] = {
{"reset_pending", reset_pending_show, 0},
};
static ssize_t
dvfs_mode_fops_write(struct file *file, const char __user *user_buf, size_t size, loff_t *pos)
{
struct ivpu_device *vdev = file->private_data;
struct ivpu_fw_info *fw = vdev->fw;
u32 dvfs_mode;
int ret;
ret = kstrtou32_from_user(user_buf, size, 0, &dvfs_mode);
if (ret < 0)
return ret;
fw->dvfs_mode = dvfs_mode;
ivpu_pm_schedule_recovery(vdev);
return size;
}
static const struct file_operations dvfs_mode_fops = {
.owner = THIS_MODULE,
.open = simple_open,
.write = dvfs_mode_fops_write,
};
static int fw_log_show(struct seq_file *s, void *v)
{
struct ivpu_device *vdev = s->private;
@@ -151,6 +177,30 @@ static const struct file_operations fw_log_fops = {
.release = single_release,
};
static ssize_t
fw_profiling_freq_fops_write(struct file *file, const char __user *user_buf,
size_t size, loff_t *pos)
{
struct ivpu_device *vdev = file->private_data;
bool enable;
int ret;
ret = kstrtobool_from_user(user_buf, size, &enable);
if (ret < 0)
return ret;
ivpu_hw_profiling_freq_drive(vdev, enable);
ivpu_pm_schedule_recovery(vdev);
return size;
}
static const struct file_operations fw_profiling_freq_fops = {
.owner = THIS_MODULE,
.open = simple_open,
.write = fw_profiling_freq_fops_write,
};
static ssize_t
fw_trace_destination_mask_fops_write(struct file *file, const char __user *user_buf,
size_t size, loff_t *pos)
@@ -280,6 +330,9 @@ void ivpu_debugfs_init(struct ivpu_device *vdev)
debugfs_create_file("force_recovery", 0200, debugfs_root, vdev,
&ivpu_force_recovery_fops);
debugfs_create_file("dvfs_mode", 0200, debugfs_root, vdev,
&dvfs_mode_fops);
debugfs_create_file("fw_log", 0644, debugfs_root, vdev,
&fw_log_fops);
debugfs_create_file("fw_trace_destination_mask", 0200, debugfs_root, vdev,
@@ -291,4 +344,8 @@ void ivpu_debugfs_init(struct ivpu_device *vdev)
debugfs_create_file("reset_engine", 0200, debugfs_root, vdev,
&ivpu_reset_engine_fops);
if (ivpu_hw_gen(vdev) >= IVPU_HW_40XX)
debugfs_create_file("fw_profiling_freq_drive", 0200,
debugfs_root, vdev, &fw_profiling_freq_fops);
}

View File

@@ -31,8 +31,6 @@
__stringify(DRM_IVPU_DRIVER_MINOR) "."
#endif
static const struct drm_driver driver;
static struct lock_class_key submitted_jobs_xa_lock_class_key;
int ivpu_dbg_mask;
@@ -41,7 +39,7 @@ MODULE_PARM_DESC(dbg_mask, "Driver debug mask. See IVPU_DBG_* macros.");
int ivpu_test_mode;
module_param_named_unsafe(test_mode, ivpu_test_mode, int, 0644);
MODULE_PARM_DESC(test_mode, "Test mode: 0 - normal operation, 1 - fw unit test, 2 - null hw");
MODULE_PARM_DESC(test_mode, "Test mode mask. See IVPU_TEST_MODE_* macros.");
u8 ivpu_pll_min_ratio;
module_param_named(pll_min_ratio, ivpu_pll_min_ratio, byte, 0644);
@@ -93,8 +91,8 @@ static void file_priv_release(struct kref *ref)
ivpu_dbg(vdev, FILE, "file_priv release: ctx %u\n", file_priv->ctx.id);
ivpu_cmdq_release_all(file_priv);
ivpu_bo_remove_all_bos_from_context(&file_priv->ctx);
ivpu_jsm_context_release(vdev, file_priv->ctx.id);
ivpu_bo_remove_all_bos_from_context(vdev, &file_priv->ctx);
ivpu_mmu_user_context_fini(vdev, &file_priv->ctx);
drm_WARN_ON(&vdev->drm, xa_erase_irq(&vdev->context_xa, file_priv->ctx.id) != file_priv);
mutex_destroy(&file_priv->lock);
@@ -317,16 +315,14 @@ static int ivpu_wait_for_ready(struct ivpu_device *vdev)
unsigned long timeout;
int ret;
if (ivpu_test_mode == IVPU_TEST_MODE_FW_TEST)
if (ivpu_test_mode & IVPU_TEST_MODE_FW_TEST)
return 0;
ivpu_ipc_consumer_add(vdev, &cons, IVPU_IPC_CHAN_BOOT_MSG);
ivpu_ipc_consumer_add(vdev, &cons, IVPU_IPC_CHAN_BOOT_MSG, NULL);
timeout = jiffies + msecs_to_jiffies(vdev->timeout.boot);
while (1) {
ret = ivpu_ipc_irq_handler(vdev);
if (ret)
break;
ivpu_ipc_irq_handler(vdev, NULL);
ret = ivpu_ipc_receive(vdev, &cons, &ipc_hdr, NULL, 0);
if (ret != -ETIMEDOUT || time_after_eq(jiffies, timeout))
break;
@@ -362,7 +358,7 @@ int ivpu_boot(struct ivpu_device *vdev)
int ret;
/* Update boot params located at first 4KB of FW memory */
ivpu_fw_boot_params_setup(vdev, vdev->fw->mem->kvaddr);
ivpu_fw_boot_params_setup(vdev, ivpu_bo_vaddr(vdev->fw->mem));
ret = ivpu_hw_boot_fw(vdev);
if (ret) {
@@ -414,7 +410,9 @@ static const struct drm_driver driver = {
.open = ivpu_open,
.postclose = ivpu_postclose,
.gem_prime_import = ivpu_gem_prime_import,
.gem_create_object = ivpu_gem_create_object,
.gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
.ioctls = ivpu_drm_ioctls,
.num_ioctls = ARRAY_SIZE(ivpu_drm_ioctls),
@@ -427,6 +425,13 @@ static const struct drm_driver driver = {
.minor = DRM_IVPU_DRIVER_MINOR,
};
static irqreturn_t ivpu_irq_thread_handler(int irq, void *arg)
{
struct ivpu_device *vdev = arg;
return ivpu_ipc_irq_thread_handler(vdev);
}
static int ivpu_irq_init(struct ivpu_device *vdev)
{
struct pci_dev *pdev = to_pci_dev(vdev->drm.dev);
@@ -440,8 +445,8 @@ static int ivpu_irq_init(struct ivpu_device *vdev)
vdev->irq = pci_irq_vector(pdev, 0);
ret = devm_request_irq(vdev->drm.dev, vdev->irq, vdev->hw->ops->irq_handler,
IRQF_NO_AUTOEN, DRIVER_NAME, vdev);
ret = devm_request_threaded_irq(vdev->drm.dev, vdev->irq, vdev->hw->ops->irq_handler,
ivpu_irq_thread_handler, IRQF_NO_AUTOEN, DRIVER_NAME, vdev);
if (ret)
ivpu_err(vdev, "Failed to request an IRQ %d\n", ret);
@@ -533,6 +538,11 @@ static int ivpu_dev_init(struct ivpu_device *vdev)
xa_init_flags(&vdev->context_xa, XA_FLAGS_ALLOC);
xa_init_flags(&vdev->submitted_jobs_xa, XA_FLAGS_ALLOC1);
lockdep_set_class(&vdev->submitted_jobs_xa.xa_lock, &submitted_jobs_xa_lock_class_key);
INIT_LIST_HEAD(&vdev->bo_list);
ret = drmm_mutex_init(&vdev->drm, &vdev->bo_list_lock);
if (ret)
goto err_xa_destroy;
ret = ivpu_pci_init(vdev);
if (ret)
@@ -550,7 +560,7 @@ static int ivpu_dev_init(struct ivpu_device *vdev)
/* Power up early so the rest of init code can access VPU registers */
ret = ivpu_hw_power_up(vdev);
if (ret)
goto err_xa_destroy;
goto err_power_down;
ret = ivpu_mmu_global_context_init(vdev);
if (ret)
@@ -574,20 +584,15 @@ static int ivpu_dev_init(struct ivpu_device *vdev)
ivpu_pm_init(vdev);
ret = ivpu_job_done_thread_init(vdev);
ret = ivpu_boot(vdev);
if (ret)
goto err_ipc_fini;
ret = ivpu_boot(vdev);
if (ret)
goto err_job_done_thread_fini;
ivpu_job_done_consumer_init(vdev);
ivpu_pm_enable(vdev);
return 0;
err_job_done_thread_fini:
ivpu_job_done_thread_fini(vdev);
err_ipc_fini:
ivpu_ipc_fini(vdev);
err_fw_fini:
@@ -612,7 +617,7 @@ static void ivpu_dev_fini(struct ivpu_device *vdev)
ivpu_shutdown(vdev);
if (IVPU_WA(d3hot_after_power_off))
pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D3hot);
ivpu_job_done_thread_fini(vdev);
ivpu_job_done_consumer_fini(vdev);
ivpu_pm_cancel_recovery(vdev);
ivpu_ipc_fini(vdev);

View File

@@ -17,9 +17,10 @@
#include <uapi/drm/ivpu_accel.h>
#include "ivpu_mmu_context.h"
#include "ivpu_ipc.h"
#define DRIVER_NAME "intel_vpu"
#define DRIVER_DESC "Driver for Intel Versatile Processing Unit (VPU)"
#define DRIVER_DESC "Driver for Intel NPU (Neural Processing Unit)"
#define DRIVER_DATE "20230117"
#define PCI_DEVICE_ID_MTL 0x7d1d
@@ -88,6 +89,7 @@ struct ivpu_wa_table {
bool d3hot_after_power_off;
bool interrupt_clear_with_0;
bool disable_clock_relinquish;
bool disable_d0i3_msg;
};
struct ivpu_hw_info;
@@ -115,8 +117,11 @@ struct ivpu_device {
struct xarray context_xa;
struct xa_limit context_xa_limit;
struct mutex bo_list_lock; /* Protects bo_list */
struct list_head bo_list;
struct xarray submitted_jobs_xa;
struct task_struct *job_done_thread;
struct ivpu_ipc_consumer job_done_consumer;
atomic64_t unique_id_counter;
@@ -126,6 +131,7 @@ struct ivpu_device {
int tdr;
int reschedule_suspend;
int autosuspend;
int d0i3_entry_msg;
} timeout;
};
@@ -148,9 +154,11 @@ extern u8 ivpu_pll_min_ratio;
extern u8 ivpu_pll_max_ratio;
extern bool ivpu_disable_mmu_cont_pages;
#define IVPU_TEST_MODE_DISABLED 0
#define IVPU_TEST_MODE_FW_TEST 1
#define IVPU_TEST_MODE_NULL_HW 2
#define IVPU_TEST_MODE_FW_TEST BIT(0)
#define IVPU_TEST_MODE_NULL_HW BIT(1)
#define IVPU_TEST_MODE_NULL_SUBMISSION BIT(2)
#define IVPU_TEST_MODE_D0I3_MSG_DISABLE BIT(4)
#define IVPU_TEST_MODE_D0I3_MSG_ENABLE BIT(5)
extern int ivpu_test_mode;
struct ivpu_file_priv *ivpu_file_priv_get(struct ivpu_file_priv *file_priv);

View File

@@ -33,12 +33,17 @@
#define ADDR_TO_L2_CACHE_CFG(addr) ((addr) >> 31)
#define IVPU_FW_CHECK_API(vdev, fw_hdr, name, min_major) \
/* Check if FW API is compatible with the driver */
#define IVPU_FW_CHECK_API_COMPAT(vdev, fw_hdr, name, min_major) \
ivpu_fw_check_api(vdev, fw_hdr, #name, \
VPU_##name##_API_VER_INDEX, \
VPU_##name##_API_VER_MAJOR, \
VPU_##name##_API_VER_MINOR, min_major)
/* Check if API version is lower that the given version */
#define IVPU_FW_CHECK_API_VER_LT(vdev, fw_hdr, name, major, minor) \
ivpu_fw_check_api_ver_lt(vdev, fw_hdr, #name, VPU_##name##_API_VER_INDEX, major, minor)
static char *ivpu_firmware;
module_param_named_unsafe(firmware, ivpu_firmware, charp, 0644);
MODULE_PARM_DESC(firmware, "VPU firmware binary in /lib/firmware/..");
@@ -105,6 +110,19 @@ ivpu_fw_check_api(struct ivpu_device *vdev, const struct vpu_firmware_header *fw
return 0;
}
static bool
ivpu_fw_check_api_ver_lt(struct ivpu_device *vdev, const struct vpu_firmware_header *fw_hdr,
const char *str, int index, u16 major, u16 minor)
{
u16 fw_major = (u16)(fw_hdr->api_version[index] >> 16);
u16 fw_minor = (u16)(fw_hdr->api_version[index]);
if (fw_major < major || (fw_major == major && fw_minor < minor))
return true;
return false;
}
static int ivpu_fw_parse(struct ivpu_device *vdev)
{
struct ivpu_fw_info *fw = vdev->fw;
@@ -164,9 +182,9 @@ static int ivpu_fw_parse(struct ivpu_device *vdev)
ivpu_info(vdev, "Firmware: %s, version: %s", fw->name,
(const char *)fw_hdr + VPU_FW_HEADER_SIZE);
if (IVPU_FW_CHECK_API(vdev, fw_hdr, BOOT, 3))
if (IVPU_FW_CHECK_API_COMPAT(vdev, fw_hdr, BOOT, 3))
return -EINVAL;
if (IVPU_FW_CHECK_API(vdev, fw_hdr, JSM, 3))
if (IVPU_FW_CHECK_API_COMPAT(vdev, fw_hdr, JSM, 3))
return -EINVAL;
fw->runtime_addr = runtime_addr;
@@ -182,6 +200,8 @@ static int ivpu_fw_parse(struct ivpu_device *vdev)
fw->trace_destination_mask = VPU_TRACE_DESTINATION_VERBOSE_TRACING;
fw->trace_hw_component_mask = -1;
fw->dvfs_mode = 0;
ivpu_dbg(vdev, FW_BOOT, "Size: file %lu image %u runtime %u shavenn %u\n",
fw->file->size, fw->image_size, fw->runtime_size, fw->shave_nn_size);
ivpu_dbg(vdev, FW_BOOT, "Address: runtime 0x%llx, load 0x%llx, entry point 0x%llx\n",
@@ -195,6 +215,24 @@ static void ivpu_fw_release(struct ivpu_device *vdev)
release_firmware(vdev->fw->file);
}
/* Initialize workarounds that depend on FW version */
static void
ivpu_fw_init_wa(struct ivpu_device *vdev)
{
const struct vpu_firmware_header *fw_hdr = (const void *)vdev->fw->file->data;
if (IVPU_FW_CHECK_API_VER_LT(vdev, fw_hdr, BOOT, 3, 17) ||
(ivpu_hw_gen(vdev) > IVPU_HW_37XX) ||
(ivpu_test_mode & IVPU_TEST_MODE_D0I3_MSG_DISABLE))
vdev->wa.disable_d0i3_msg = true;
/* Force enable the feature for testing purposes */
if (ivpu_test_mode & IVPU_TEST_MODE_D0I3_MSG_ENABLE)
vdev->wa.disable_d0i3_msg = false;
IVPU_PRINT_WA(disable_d0i3_msg);
}
static int ivpu_fw_update_global_range(struct ivpu_device *vdev)
{
struct ivpu_fw_info *fw = vdev->fw;
@@ -248,7 +286,7 @@ static int ivpu_fw_mem_init(struct ivpu_device *vdev)
if (fw->shave_nn_size) {
fw->mem_shave_nn = ivpu_bo_alloc_internal(vdev, vdev->hw->ranges.shave.start,
fw->shave_nn_size, DRM_IVPU_BO_UNCACHED);
fw->shave_nn_size, DRM_IVPU_BO_WC);
if (!fw->mem_shave_nn) {
ivpu_err(vdev, "Failed to allocate shavenn buffer\n");
ret = -ENOMEM;
@@ -297,6 +335,8 @@ int ivpu_fw_init(struct ivpu_device *vdev)
if (ret)
goto err_fw_release;
ivpu_fw_init_wa(vdev);
ret = ivpu_fw_mem_init(vdev);
if (ret)
goto err_fw_release;
@@ -422,14 +462,31 @@ static void ivpu_fw_boot_params_print(struct ivpu_device *vdev, struct vpu_boot_
boot_params->punit_telemetry_sram_size);
ivpu_dbg(vdev, FW_BOOT, "boot_params.vpu_telemetry_enable = 0x%x\n",
boot_params->vpu_telemetry_enable);
ivpu_dbg(vdev, FW_BOOT, "boot_params.dvfs_mode = %u\n",
boot_params->dvfs_mode);
ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_delayed_entry = %d\n",
boot_params->d0i3_delayed_entry);
ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_residency_time_us = %lld\n",
boot_params->d0i3_residency_time_us);
ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_entry_vpu_ts = %llu\n",
boot_params->d0i3_entry_vpu_ts);
}
void ivpu_fw_boot_params_setup(struct ivpu_device *vdev, struct vpu_boot_params *boot_params)
{
struct ivpu_bo *ipc_mem_rx = vdev->ipc->mem_rx;
/* In case of warm boot we only have to reset the entrypoint addr */
/* In case of warm boot only update variable params */
if (!ivpu_fw_is_cold_boot(vdev)) {
boot_params->d0i3_residency_time_us =
ktime_us_delta(ktime_get_boottime(), vdev->hw->d0i3_entry_host_ts);
boot_params->d0i3_entry_vpu_ts = vdev->hw->d0i3_entry_vpu_ts;
ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_residency_time_us = %lld\n",
boot_params->d0i3_residency_time_us);
ivpu_dbg(vdev, FW_BOOT, "boot_params.d0i3_entry_vpu_ts = %llu\n",
boot_params->d0i3_entry_vpu_ts);
boot_params->save_restore_ret_address = 0;
vdev->pm->is_warmboot = true;
wmb(); /* Flush WC buffers after writing save_restore_ret_address */
@@ -442,6 +499,13 @@ void ivpu_fw_boot_params_setup(struct ivpu_device *vdev, struct vpu_boot_params
boot_params->vpu_id = to_pci_dev(vdev->drm.dev)->bus->number;
boot_params->frequency = ivpu_hw_reg_pll_freq_get(vdev);
/*
* This param is a debug firmware feature. It switches default clock
* to higher resolution one for fine-grained and more accurate firmware
* task profiling.
*/
boot_params->perf_clk_frequency = ivpu_hw_profiling_freq_get(vdev);
/*
* Uncached region of VPU address space, covers IPC buffers, job queues
* and log buffers, programmable to L2$ Uncached by VPU MTRR
@@ -493,6 +557,11 @@ void ivpu_fw_boot_params_setup(struct ivpu_device *vdev, struct vpu_boot_params
boot_params->punit_telemetry_sram_base = ivpu_hw_reg_telemetry_offset_get(vdev);
boot_params->punit_telemetry_sram_size = ivpu_hw_reg_telemetry_size_get(vdev);
boot_params->vpu_telemetry_enable = ivpu_hw_reg_telemetry_enable_get(vdev);
boot_params->dvfs_mode = vdev->fw->dvfs_mode;
if (!IVPU_WA(disable_d0i3_msg))
boot_params->d0i3_delayed_entry = 1;
boot_params->d0i3_residency_time_us = 0;
boot_params->d0i3_entry_vpu_ts = 0;
wmb(); /* Flush WC buffers after writing bootparams */

View File

@@ -27,6 +27,7 @@ struct ivpu_fw_info {
u32 trace_level;
u32 trace_destination_mask;
u64 trace_hw_component_mask;
u32 dvfs_mode;
};
int ivpu_fw_init(struct ivpu_device *vdev);

View File

@@ -20,215 +20,18 @@
#include "ivpu_mmu.h"
#include "ivpu_mmu_context.h"
MODULE_IMPORT_NS(DMA_BUF);
static const struct drm_gem_object_funcs ivpu_gem_funcs;
static struct lock_class_key prime_bo_lock_class_key;
static int __must_check prime_alloc_pages_locked(struct ivpu_bo *bo)
static inline void ivpu_dbg_bo(struct ivpu_device *vdev, struct ivpu_bo *bo, const char *action)
{
/* Pages are managed by the underlying dma-buf */
return 0;
}
static void prime_free_pages_locked(struct ivpu_bo *bo)
{
/* Pages are managed by the underlying dma-buf */
}
static int prime_map_pages_locked(struct ivpu_bo *bo)
{
struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
struct sg_table *sgt;
sgt = dma_buf_map_attachment_unlocked(bo->base.import_attach, DMA_BIDIRECTIONAL);
if (IS_ERR(sgt)) {
ivpu_err(vdev, "Failed to map attachment: %ld\n", PTR_ERR(sgt));
return PTR_ERR(sgt);
}
bo->sgt = sgt;
return 0;
}
static void prime_unmap_pages_locked(struct ivpu_bo *bo)
{
dma_buf_unmap_attachment_unlocked(bo->base.import_attach, bo->sgt, DMA_BIDIRECTIONAL);
bo->sgt = NULL;
}
static const struct ivpu_bo_ops prime_ops = {
.type = IVPU_BO_TYPE_PRIME,
.name = "prime",
.alloc_pages = prime_alloc_pages_locked,
.free_pages = prime_free_pages_locked,
.map_pages = prime_map_pages_locked,
.unmap_pages = prime_unmap_pages_locked,
};
static int __must_check shmem_alloc_pages_locked(struct ivpu_bo *bo)
{
int npages = ivpu_bo_size(bo) >> PAGE_SHIFT;
struct page **pages;
pages = drm_gem_get_pages(&bo->base);
if (IS_ERR(pages))
return PTR_ERR(pages);
if (bo->flags & DRM_IVPU_BO_WC)
set_pages_array_wc(pages, npages);
else if (bo->flags & DRM_IVPU_BO_UNCACHED)
set_pages_array_uc(pages, npages);
bo->pages = pages;
return 0;
}
static void shmem_free_pages_locked(struct ivpu_bo *bo)
{
if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
set_pages_array_wb(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
drm_gem_put_pages(&bo->base, bo->pages, true, false);
bo->pages = NULL;
}
static int ivpu_bo_map_pages_locked(struct ivpu_bo *bo)
{
int npages = ivpu_bo_size(bo) >> PAGE_SHIFT;
struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
struct sg_table *sgt;
int ret;
sgt = drm_prime_pages_to_sg(&vdev->drm, bo->pages, npages);
if (IS_ERR(sgt)) {
ivpu_err(vdev, "Failed to allocate sgtable\n");
return PTR_ERR(sgt);
}
ret = dma_map_sgtable(vdev->drm.dev, sgt, DMA_BIDIRECTIONAL, 0);
if (ret) {
ivpu_err(vdev, "Failed to map BO in IOMMU: %d\n", ret);
goto err_free_sgt;
}
bo->sgt = sgt;
return 0;
err_free_sgt:
kfree(sgt);
return ret;
}
static void ivpu_bo_unmap_pages_locked(struct ivpu_bo *bo)
{
struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
dma_unmap_sgtable(vdev->drm.dev, bo->sgt, DMA_BIDIRECTIONAL, 0);
sg_free_table(bo->sgt);
kfree(bo->sgt);
bo->sgt = NULL;
}
static const struct ivpu_bo_ops shmem_ops = {
.type = IVPU_BO_TYPE_SHMEM,
.name = "shmem",
.alloc_pages = shmem_alloc_pages_locked,
.free_pages = shmem_free_pages_locked,
.map_pages = ivpu_bo_map_pages_locked,
.unmap_pages = ivpu_bo_unmap_pages_locked,
};
static int __must_check internal_alloc_pages_locked(struct ivpu_bo *bo)
{
unsigned int i, npages = ivpu_bo_size(bo) >> PAGE_SHIFT;
struct page **pages;
int ret;
pages = kvmalloc_array(npages, sizeof(*bo->pages), GFP_KERNEL);
if (!pages)
return -ENOMEM;
for (i = 0; i < npages; i++) {
pages[i] = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
if (!pages[i]) {
ret = -ENOMEM;
goto err_free_pages;
}
cond_resched();
}
bo->pages = pages;
return 0;
err_free_pages:
while (i--)
put_page(pages[i]);
kvfree(pages);
return ret;
}
static void internal_free_pages_locked(struct ivpu_bo *bo)
{
unsigned int i, npages = ivpu_bo_size(bo) >> PAGE_SHIFT;
if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
set_pages_array_wb(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
for (i = 0; i < npages; i++)
put_page(bo->pages[i]);
kvfree(bo->pages);
bo->pages = NULL;
}
static const struct ivpu_bo_ops internal_ops = {
.type = IVPU_BO_TYPE_INTERNAL,
.name = "internal",
.alloc_pages = internal_alloc_pages_locked,
.free_pages = internal_free_pages_locked,
.map_pages = ivpu_bo_map_pages_locked,
.unmap_pages = ivpu_bo_unmap_pages_locked,
};
static int __must_check ivpu_bo_alloc_and_map_pages_locked(struct ivpu_bo *bo)
{
struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
int ret;
lockdep_assert_held(&bo->lock);
drm_WARN_ON(&vdev->drm, bo->sgt);
ret = bo->ops->alloc_pages(bo);
if (ret) {
ivpu_err(vdev, "Failed to allocate pages for BO: %d", ret);
return ret;
}
ret = bo->ops->map_pages(bo);
if (ret) {
ivpu_err(vdev, "Failed to map pages for BO: %d", ret);
goto err_free_pages;
}
return ret;
err_free_pages:
bo->ops->free_pages(bo);
return ret;
}
static void ivpu_bo_unmap_and_free_pages(struct ivpu_bo *bo)
{
mutex_lock(&bo->lock);
WARN_ON(!bo->sgt);
bo->ops->unmap_pages(bo);
WARN_ON(bo->sgt);
bo->ops->free_pages(bo);
WARN_ON(bo->pages);
mutex_unlock(&bo->lock);
if (bo->ctx)
ivpu_dbg(vdev, BO, "%6s: size %zu has_pages %d dma_mapped %d handle %u ctx %d vpu_addr 0x%llx mmu_mapped %d\n",
action, ivpu_bo_size(bo), (bool)bo->base.pages, (bool)bo->base.sgt,
bo->handle, bo->ctx->id, bo->vpu_addr, bo->mmu_mapped);
else
ivpu_dbg(vdev, BO, "%6s: size %zu has_pages %d dma_mapped %d handle %u (not added to context)\n",
action, ivpu_bo_size(bo), (bool)bo->base.pages, (bool)bo->base.sgt,
bo->handle);
}
/*
@@ -245,21 +48,24 @@ int __must_check ivpu_bo_pin(struct ivpu_bo *bo)
mutex_lock(&bo->lock);
if (!bo->vpu_addr) {
ivpu_err(vdev, "vpu_addr not set for BO ctx_id: %d handle: %d\n",
bo->ctx->id, bo->handle);
ivpu_dbg_bo(vdev, bo, "pin");
if (!bo->ctx) {
ivpu_err(vdev, "vpu_addr not allocated for BO %d\n", bo->handle);
ret = -EINVAL;
goto unlock;
}
if (!bo->sgt) {
ret = ivpu_bo_alloc_and_map_pages_locked(bo);
if (ret)
goto unlock;
}
if (!bo->mmu_mapped) {
ret = ivpu_mmu_context_map_sgt(vdev, bo->ctx, bo->vpu_addr, bo->sgt,
struct sg_table *sgt = drm_gem_shmem_get_pages_sgt(&bo->base);
if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
ivpu_err(vdev, "Failed to map BO in IOMMU: %d\n", ret);
goto unlock;
}
ret = ivpu_mmu_context_map_sgt(vdev, bo->ctx, bo->vpu_addr, sgt,
ivpu_bo_is_snooped(bo));
if (ret) {
ivpu_err(vdev, "Failed to map BO in MMU: %d\n", ret);
@@ -281,248 +87,213 @@ ivpu_bo_alloc_vpu_addr(struct ivpu_bo *bo, struct ivpu_mmu_context *ctx,
struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
int ret;
if (!range) {
if (bo->flags & DRM_IVPU_BO_SHAVE_MEM)
range = &vdev->hw->ranges.shave;
else if (bo->flags & DRM_IVPU_BO_DMA_MEM)
range = &vdev->hw->ranges.dma;
else
range = &vdev->hw->ranges.user;
}
mutex_lock(&bo->lock);
mutex_lock(&ctx->lock);
ret = ivpu_mmu_context_insert_node_locked(ctx, range, ivpu_bo_size(bo), &bo->mm_node);
ret = ivpu_mmu_context_insert_node(ctx, range, ivpu_bo_size(bo), &bo->mm_node);
if (!ret) {
bo->ctx = ctx;
bo->vpu_addr = bo->mm_node.start;
list_add_tail(&bo->ctx_node, &ctx->bo_list);
} else {
ivpu_err(vdev, "Failed to add BO to context %u: %d\n", ctx->id, ret);
}
mutex_unlock(&ctx->lock);
ivpu_dbg_bo(vdev, bo, "alloc");
mutex_unlock(&bo->lock);
return ret;
}
static void ivpu_bo_free_vpu_addr(struct ivpu_bo *bo)
static void ivpu_bo_unbind_locked(struct ivpu_bo *bo)
{
struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
struct ivpu_mmu_context *ctx = bo->ctx;
ivpu_dbg(vdev, BO, "remove from ctx: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
lockdep_assert_held(&bo->lock);
mutex_lock(&bo->lock);
ivpu_dbg_bo(vdev, bo, "unbind");
/* TODO: dma_unmap */
if (bo->mmu_mapped) {
drm_WARN_ON(&vdev->drm, !bo->sgt);
ivpu_mmu_context_unmap_sgt(vdev, ctx, bo->vpu_addr, bo->sgt);
drm_WARN_ON(&vdev->drm, !bo->ctx);
drm_WARN_ON(&vdev->drm, !bo->vpu_addr);
drm_WARN_ON(&vdev->drm, !bo->base.sgt);
ivpu_mmu_context_unmap_sgt(vdev, bo->ctx, bo->vpu_addr, bo->base.sgt);
bo->mmu_mapped = false;
}
mutex_lock(&ctx->lock);
list_del(&bo->ctx_node);
bo->vpu_addr = 0;
bo->ctx = NULL;
ivpu_mmu_context_remove_node_locked(ctx, &bo->mm_node);
mutex_unlock(&ctx->lock);
if (bo->ctx) {
ivpu_mmu_context_remove_node(bo->ctx, &bo->mm_node);
bo->vpu_addr = 0;
bo->ctx = NULL;
}
}
static void ivpu_bo_unbind(struct ivpu_bo *bo)
{
mutex_lock(&bo->lock);
ivpu_bo_unbind_locked(bo);
mutex_unlock(&bo->lock);
}
void ivpu_bo_remove_all_bos_from_context(struct ivpu_mmu_context *ctx)
void ivpu_bo_remove_all_bos_from_context(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx)
{
struct ivpu_bo *bo, *tmp;
struct ivpu_bo *bo;
list_for_each_entry_safe(bo, tmp, &ctx->bo_list, ctx_node)
ivpu_bo_free_vpu_addr(bo);
if (drm_WARN_ON(&vdev->drm, !ctx))
return;
mutex_lock(&vdev->bo_list_lock);
list_for_each_entry(bo, &vdev->bo_list, bo_list_node) {
mutex_lock(&bo->lock);
if (bo->ctx == ctx)
ivpu_bo_unbind_locked(bo);
mutex_unlock(&bo->lock);
}
mutex_unlock(&vdev->bo_list_lock);
}
struct drm_gem_object *ivpu_gem_create_object(struct drm_device *dev, size_t size)
{
struct ivpu_bo *bo;
if (size == 0 || !PAGE_ALIGNED(size))
return ERR_PTR(-EINVAL);
bo = kzalloc(sizeof(*bo), GFP_KERNEL);
if (!bo)
return ERR_PTR(-ENOMEM);
bo->base.base.funcs = &ivpu_gem_funcs;
bo->base.pages_mark_dirty_on_put = true; /* VPU can dirty a BO anytime */
INIT_LIST_HEAD(&bo->bo_list_node);
mutex_init(&bo->lock);
return &bo->base.base;
}
static struct ivpu_bo *
ivpu_bo_alloc(struct ivpu_device *vdev, struct ivpu_mmu_context *mmu_context,
u64 size, u32 flags, const struct ivpu_bo_ops *ops,
const struct ivpu_addr_range *range, u64 user_ptr)
ivpu_bo_create(struct ivpu_device *vdev, u64 size, u32 flags)
{
struct drm_gem_shmem_object *shmem;
struct ivpu_bo *bo;
int ret = 0;
if (drm_WARN_ON(&vdev->drm, size == 0 || !PAGE_ALIGNED(size)))
return ERR_PTR(-EINVAL);
switch (flags & DRM_IVPU_BO_CACHE_MASK) {
case DRM_IVPU_BO_CACHED:
case DRM_IVPU_BO_UNCACHED:
case DRM_IVPU_BO_WC:
break;
default:
return ERR_PTR(-EINVAL);
}
bo = kzalloc(sizeof(*bo), GFP_KERNEL);
if (!bo)
return ERR_PTR(-ENOMEM);
shmem = drm_gem_shmem_create(&vdev->drm, size);
if (IS_ERR(shmem))
return ERR_CAST(shmem);
mutex_init(&bo->lock);
bo->base.funcs = &ivpu_gem_funcs;
bo = to_ivpu_bo(&shmem->base);
bo->base.map_wc = flags & DRM_IVPU_BO_WC;
bo->flags = flags;
bo->ops = ops;
bo->user_ptr = user_ptr;
if (ops->type == IVPU_BO_TYPE_SHMEM)
ret = drm_gem_object_init(&vdev->drm, &bo->base, size);
else
drm_gem_private_object_init(&vdev->drm, &bo->base, size);
mutex_lock(&vdev->bo_list_lock);
list_add_tail(&bo->bo_list_node, &vdev->bo_list);
mutex_unlock(&vdev->bo_list_lock);
if (ret) {
ivpu_err(vdev, "Failed to initialize drm object\n");
goto err_free;
}
if (flags & DRM_IVPU_BO_MAPPABLE) {
ret = drm_gem_create_mmap_offset(&bo->base);
if (ret) {
ivpu_err(vdev, "Failed to allocate mmap offset\n");
goto err_release;
}
}
if (mmu_context) {
ret = ivpu_bo_alloc_vpu_addr(bo, mmu_context, range);
if (ret) {
ivpu_err(vdev, "Failed to add BO to context: %d\n", ret);
goto err_release;
}
}
ivpu_dbg(vdev, BO, "create: vpu_addr 0x%llx size %zu flags 0x%x\n",
bo->vpu_addr, bo->base.base.size, flags);
return bo;
}
err_release:
drm_gem_object_release(&bo->base);
err_free:
kfree(bo);
return ERR_PTR(ret);
static int ivpu_bo_open(struct drm_gem_object *obj, struct drm_file *file)
{
struct ivpu_file_priv *file_priv = file->driver_priv;
struct ivpu_device *vdev = file_priv->vdev;
struct ivpu_bo *bo = to_ivpu_bo(obj);
struct ivpu_addr_range *range;
if (bo->flags & DRM_IVPU_BO_SHAVE_MEM)
range = &vdev->hw->ranges.shave;
else if (bo->flags & DRM_IVPU_BO_DMA_MEM)
range = &vdev->hw->ranges.dma;
else
range = &vdev->hw->ranges.user;
return ivpu_bo_alloc_vpu_addr(bo, &file_priv->ctx, range);
}
static void ivpu_bo_free(struct drm_gem_object *obj)
{
struct ivpu_device *vdev = to_ivpu_device(obj->dev);
struct ivpu_bo *bo = to_ivpu_bo(obj);
struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
if (bo->ctx)
ivpu_dbg(vdev, BO, "free: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
bo->ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
else
ivpu_dbg(vdev, BO, "free: ctx (released) allocated %d mmu_mapped %d\n",
(bool)bo->sgt, bo->mmu_mapped);
mutex_lock(&vdev->bo_list_lock);
list_del(&bo->bo_list_node);
mutex_unlock(&vdev->bo_list_lock);
drm_WARN_ON(&vdev->drm, !dma_resv_test_signaled(obj->resv, DMA_RESV_USAGE_READ));
vunmap(bo->kvaddr);
if (bo->ctx)
ivpu_bo_free_vpu_addr(bo);
if (bo->sgt)
ivpu_bo_unmap_and_free_pages(bo);
if (bo->base.import_attach)
drm_prime_gem_destroy(&bo->base, bo->sgt);
drm_gem_object_release(&bo->base);
ivpu_dbg_bo(vdev, bo, "free");
ivpu_bo_unbind(bo);
mutex_destroy(&bo->lock);
kfree(bo);
drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1);
drm_gem_shmem_free(&bo->base);
}
static int ivpu_bo_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
{
struct ivpu_bo *bo = to_ivpu_bo(obj);
struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
ivpu_dbg(vdev, BO, "mmap: ctx %u handle %u vpu_addr 0x%llx size %zu type %s",
bo->ctx->id, bo->handle, bo->vpu_addr, ivpu_bo_size(bo), bo->ops->name);
if (obj->import_attach) {
/* Drop the reference drm_gem_mmap_obj() acquired.*/
drm_gem_object_put(obj);
vma->vm_private_data = NULL;
return dma_buf_mmap(obj->dma_buf, vma, 0);
}
vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND);
vma->vm_page_prot = ivpu_bo_pgprot(bo, vm_get_page_prot(vma->vm_flags));
return 0;
}
static struct sg_table *ivpu_bo_get_sg_table(struct drm_gem_object *obj)
{
struct ivpu_bo *bo = to_ivpu_bo(obj);
loff_t npages = obj->size >> PAGE_SHIFT;
int ret = 0;
mutex_lock(&bo->lock);
if (!bo->sgt)
ret = ivpu_bo_alloc_and_map_pages_locked(bo);
mutex_unlock(&bo->lock);
if (ret)
return ERR_PTR(ret);
return drm_prime_pages_to_sg(obj->dev, bo->pages, npages);
}
static vm_fault_t ivpu_vm_fault(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct drm_gem_object *obj = vma->vm_private_data;
struct ivpu_bo *bo = to_ivpu_bo(obj);
loff_t npages = obj->size >> PAGE_SHIFT;
pgoff_t page_offset;
struct page *page;
vm_fault_t ret;
int err;
mutex_lock(&bo->lock);
if (!bo->sgt) {
err = ivpu_bo_alloc_and_map_pages_locked(bo);
if (err) {
ret = vmf_error(err);
goto unlock;
}
}
/* We don't use vmf->pgoff since that has the fake offset */
page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
if (page_offset >= npages) {
ret = VM_FAULT_SIGBUS;
} else {
page = bo->pages[page_offset];
ret = vmf_insert_pfn(vma, vmf->address, page_to_pfn(page));
}
unlock:
mutex_unlock(&bo->lock);
return ret;
}
static const struct vm_operations_struct ivpu_vm_ops = {
.fault = ivpu_vm_fault,
.open = drm_gem_vm_open,
.close = drm_gem_vm_close,
static const struct dma_buf_ops ivpu_bo_dmabuf_ops = {
.cache_sgt_mapping = true,
.attach = drm_gem_map_attach,
.detach = drm_gem_map_detach,
.map_dma_buf = drm_gem_map_dma_buf,
.unmap_dma_buf = drm_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
.mmap = drm_gem_dmabuf_mmap,
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
};
static struct dma_buf *ivpu_bo_export(struct drm_gem_object *obj, int flags)
{
struct drm_device *dev = obj->dev;
struct dma_buf_export_info exp_info = {
.exp_name = KBUILD_MODNAME,
.owner = dev->driver->fops->owner,
.ops = &ivpu_bo_dmabuf_ops,
.size = obj->size,
.flags = flags,
.priv = obj,
.resv = obj->resv,
};
void *sgt;
/*
* Make sure that pages are allocated and dma-mapped before exporting the bo.
* DMA-mapping is required if the bo will be imported to the same device.
*/
sgt = drm_gem_shmem_get_pages_sgt(to_drm_gem_shmem_obj(obj));
if (IS_ERR(sgt))
return sgt;
return drm_gem_dmabuf_export(dev, &exp_info);
}
static const struct drm_gem_object_funcs ivpu_gem_funcs = {
.free = ivpu_bo_free,
.mmap = ivpu_bo_mmap,
.vm_ops = &ivpu_vm_ops,
.get_sg_table = ivpu_bo_get_sg_table,
.open = ivpu_bo_open,
.export = ivpu_bo_export,
.print_info = drm_gem_shmem_object_print_info,
.pin = drm_gem_shmem_object_pin,
.unpin = drm_gem_shmem_object_unpin,
.get_sg_table = drm_gem_shmem_object_get_sg_table,
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
.vm_ops = &drm_gem_shmem_vm_ops,
};
int
ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
int ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
{
struct ivpu_file_priv *file_priv = file->driver_priv;
struct ivpu_device *vdev = file_priv->vdev;
@@ -537,23 +308,20 @@ ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
if (size == 0)
return -EINVAL;
bo = ivpu_bo_alloc(vdev, &file_priv->ctx, size, args->flags, &shmem_ops, NULL, 0);
bo = ivpu_bo_create(vdev, size, args->flags);
if (IS_ERR(bo)) {
ivpu_err(vdev, "Failed to create BO: %pe (ctx %u size %llu flags 0x%x)",
bo, file_priv->ctx.id, args->size, args->flags);
return PTR_ERR(bo);
}
ret = drm_gem_handle_create(file, &bo->base, &bo->handle);
ret = drm_gem_handle_create(file, &bo->base.base, &bo->handle);
if (!ret) {
args->vpu_addr = bo->vpu_addr;
args->handle = bo->handle;
}
drm_gem_object_put(&bo->base);
ivpu_dbg(vdev, BO, "alloc shmem: ctx %u vpu_addr 0x%llx size %zu flags 0x%x\n",
file_priv->ctx.id, bo->vpu_addr, ivpu_bo_size(bo), bo->flags);
drm_gem_object_put(&bo->base.base);
return ret;
}
@@ -563,8 +331,8 @@ ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 fla
{
const struct ivpu_addr_range *range;
struct ivpu_addr_range fixed_range;
struct iosys_map map;
struct ivpu_bo *bo;
pgprot_t prot;
int ret;
drm_WARN_ON(&vdev->drm, !PAGE_ALIGNED(vpu_addr));
@@ -578,81 +346,42 @@ ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 fla
range = &vdev->hw->ranges.global;
}
bo = ivpu_bo_alloc(vdev, &vdev->gctx, size, flags, &internal_ops, range, 0);
bo = ivpu_bo_create(vdev, size, flags);
if (IS_ERR(bo)) {
ivpu_err(vdev, "Failed to create BO: %pe (vpu_addr 0x%llx size %llu flags 0x%x)",
bo, vpu_addr, size, flags);
return NULL;
}
ret = ivpu_bo_alloc_vpu_addr(bo, &vdev->gctx, range);
if (ret)
goto err_put;
ret = ivpu_bo_pin(bo);
if (ret)
goto err_put;
if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
drm_clflush_pages(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
if (bo->flags & DRM_IVPU_BO_WC)
set_pages_array_wc(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
else if (bo->flags & DRM_IVPU_BO_UNCACHED)
set_pages_array_uc(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT);
prot = ivpu_bo_pgprot(bo, PAGE_KERNEL);
bo->kvaddr = vmap(bo->pages, ivpu_bo_size(bo) >> PAGE_SHIFT, VM_MAP, prot);
if (!bo->kvaddr) {
ivpu_err(vdev, "Failed to map BO into kernel virtual memory\n");
ret = drm_gem_shmem_vmap(&bo->base, &map);
if (ret)
goto err_put;
}
ivpu_dbg(vdev, BO, "alloc internal: ctx 0 vpu_addr 0x%llx size %zu flags 0x%x\n",
bo->vpu_addr, ivpu_bo_size(bo), flags);
return bo;
err_put:
drm_gem_object_put(&bo->base);
drm_gem_object_put(&bo->base.base);
return NULL;
}
void ivpu_bo_free_internal(struct ivpu_bo *bo)
{
drm_gem_object_put(&bo->base);
}
struct iosys_map map = IOSYS_MAP_INIT_VADDR(bo->base.vaddr);
struct drm_gem_object *ivpu_gem_prime_import(struct drm_device *dev, struct dma_buf *buf)
{
struct ivpu_device *vdev = to_ivpu_device(dev);
struct dma_buf_attachment *attach;
struct ivpu_bo *bo;
attach = dma_buf_attach(buf, dev->dev);
if (IS_ERR(attach))
return ERR_CAST(attach);
get_dma_buf(buf);
bo = ivpu_bo_alloc(vdev, NULL, buf->size, DRM_IVPU_BO_MAPPABLE, &prime_ops, NULL, 0);
if (IS_ERR(bo)) {
ivpu_err(vdev, "Failed to import BO: %pe (size %lu)", bo, buf->size);
goto err_detach;
}
lockdep_set_class(&bo->lock, &prime_bo_lock_class_key);
bo->base.import_attach = attach;
return &bo->base;
err_detach:
dma_buf_detach(buf, attach);
dma_buf_put(buf);
return ERR_CAST(bo);
drm_gem_shmem_vunmap(&bo->base, &map);
drm_gem_object_put(&bo->base.base);
}
int ivpu_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
{
struct ivpu_file_priv *file_priv = file->driver_priv;
struct ivpu_device *vdev = to_ivpu_device(dev);
struct drm_ivpu_bo_info *args = data;
struct drm_gem_object *obj;
struct ivpu_bo *bo;
@@ -665,21 +394,12 @@ int ivpu_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *file
bo = to_ivpu_bo(obj);
mutex_lock(&bo->lock);
if (!bo->ctx) {
ret = ivpu_bo_alloc_vpu_addr(bo, &file_priv->ctx, NULL);
if (ret) {
ivpu_err(vdev, "Failed to allocate vpu_addr: %d\n", ret);
goto unlock;
}
}
args->flags = bo->flags;
args->mmap_offset = drm_vma_node_offset_addr(&obj->vma_node);
args->vpu_addr = bo->vpu_addr;
args->size = obj->size;
unlock:
mutex_unlock(&bo->lock);
drm_gem_object_put(obj);
return ret;
}
@@ -714,41 +434,41 @@ static void ivpu_bo_print_info(struct ivpu_bo *bo, struct drm_printer *p)
{
unsigned long dma_refcount = 0;
if (bo->base.dma_buf && bo->base.dma_buf->file)
dma_refcount = atomic_long_read(&bo->base.dma_buf->file->f_count);
mutex_lock(&bo->lock);
drm_printf(p, "%5u %6d %16llx %10lu %10u %12lu %14s\n",
bo->ctx->id, bo->handle, bo->vpu_addr, ivpu_bo_size(bo),
kref_read(&bo->base.refcount), dma_refcount, bo->ops->name);
if (bo->base.base.dma_buf && bo->base.base.dma_buf->file)
dma_refcount = atomic_long_read(&bo->base.base.dma_buf->file->f_count);
drm_printf(p, "%-3u %-6d 0x%-12llx %-10lu 0x%-8x %-4u %-8lu",
bo->ctx->id, bo->handle, bo->vpu_addr, bo->base.base.size,
bo->flags, kref_read(&bo->base.base.refcount), dma_refcount);
if (bo->base.base.import_attach)
drm_printf(p, " imported");
if (bo->base.pages)
drm_printf(p, " has_pages");
if (bo->mmu_mapped)
drm_printf(p, " mmu_mapped");
drm_printf(p, "\n");
mutex_unlock(&bo->lock);
}
void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p)
{
struct ivpu_device *vdev = to_ivpu_device(dev);
struct ivpu_file_priv *file_priv;
unsigned long ctx_id;
struct ivpu_bo *bo;
drm_printf(p, "%5s %6s %16s %10s %10s %12s %14s\n",
"ctx", "handle", "vpu_addr", "size", "refcount", "dma_refcount", "type");
drm_printf(p, "%-3s %-6s %-14s %-10s %-10s %-4s %-8s %s\n",
"ctx", "handle", "vpu_addr", "size", "flags", "refs", "dma_refs", "attribs");
mutex_lock(&vdev->gctx.lock);
list_for_each_entry(bo, &vdev->gctx.bo_list, ctx_node)
mutex_lock(&vdev->bo_list_lock);
list_for_each_entry(bo, &vdev->bo_list, bo_list_node)
ivpu_bo_print_info(bo, p);
mutex_unlock(&vdev->gctx.lock);
xa_for_each(&vdev->context_xa, ctx_id, file_priv) {
file_priv = ivpu_file_priv_get_by_ctx_id(vdev, ctx_id);
if (!file_priv)
continue;
mutex_lock(&file_priv->ctx.lock);
list_for_each_entry(bo, &file_priv->ctx.bo_list, ctx_node)
ivpu_bo_print_info(bo, p);
mutex_unlock(&file_priv->ctx.lock);
ivpu_file_priv_put(&file_priv);
}
mutex_unlock(&vdev->bo_list_lock);
}
void ivpu_bo_list_print(struct drm_device *dev)

View File

@@ -6,84 +6,52 @@
#define __IVPU_GEM_H__
#include <drm/drm_gem.h>
#include <drm/drm_gem_shmem_helper.h>
#include <drm/drm_mm.h>
struct dma_buf;
struct ivpu_bo_ops;
struct ivpu_file_priv;
struct ivpu_bo {
struct drm_gem_object base;
const struct ivpu_bo_ops *ops;
struct drm_gem_shmem_object base;
struct ivpu_mmu_context *ctx;
struct list_head ctx_node;
struct list_head bo_list_node;
struct drm_mm_node mm_node;
struct mutex lock; /* Protects: pages, sgt, mmu_mapped */
struct sg_table *sgt;
struct page **pages;
bool mmu_mapped;
void *kvaddr;
struct mutex lock; /* Protects: ctx, mmu_mapped, vpu_addr */
u64 vpu_addr;
u32 handle;
u32 flags;
uintptr_t user_ptr;
u32 job_status;
};
enum ivpu_bo_type {
IVPU_BO_TYPE_SHMEM = 1,
IVPU_BO_TYPE_INTERNAL,
IVPU_BO_TYPE_PRIME,
};
struct ivpu_bo_ops {
enum ivpu_bo_type type;
const char *name;
int (*alloc_pages)(struct ivpu_bo *bo);
void (*free_pages)(struct ivpu_bo *bo);
int (*map_pages)(struct ivpu_bo *bo);
void (*unmap_pages)(struct ivpu_bo *bo);
u32 job_status; /* Valid only for command buffer */
bool mmu_mapped;
};
int ivpu_bo_pin(struct ivpu_bo *bo);
void ivpu_bo_remove_all_bos_from_context(struct ivpu_mmu_context *ctx);
void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p);
void ivpu_bo_list_print(struct drm_device *dev);
void ivpu_bo_remove_all_bos_from_context(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx);
struct ivpu_bo *
ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 flags);
struct drm_gem_object *ivpu_gem_create_object(struct drm_device *dev, size_t size);
struct ivpu_bo *ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 flags);
void ivpu_bo_free_internal(struct ivpu_bo *bo);
struct drm_gem_object *ivpu_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf);
void ivpu_bo_unmap_sgt_and_remove_from_context(struct ivpu_bo *bo);
int ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
int ivpu_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
int ivpu_bo_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p);
void ivpu_bo_list_print(struct drm_device *dev);
static inline struct ivpu_bo *to_ivpu_bo(struct drm_gem_object *obj)
{
return container_of(obj, struct ivpu_bo, base);
return container_of(obj, struct ivpu_bo, base.base);
}
static inline void *ivpu_bo_vaddr(struct ivpu_bo *bo)
{
return bo->kvaddr;
return bo->base.vaddr;
}
static inline size_t ivpu_bo_size(struct ivpu_bo *bo)
{
return bo->base.size;
}
static inline struct page *ivpu_bo_get_page(struct ivpu_bo *bo, u64 offset)
{
if (offset > ivpu_bo_size(bo) || !bo->pages)
return NULL;
return bo->pages[offset / PAGE_SIZE];
return bo->base.base.size;
}
static inline u32 ivpu_bo_cache_mode(struct ivpu_bo *bo)
@@ -96,20 +64,9 @@ static inline bool ivpu_bo_is_snooped(struct ivpu_bo *bo)
return ivpu_bo_cache_mode(bo) == DRM_IVPU_BO_CACHED;
}
static inline pgprot_t ivpu_bo_pgprot(struct ivpu_bo *bo, pgprot_t prot)
{
if (bo->flags & DRM_IVPU_BO_WC)
return pgprot_writecombine(prot);
if (bo->flags & DRM_IVPU_BO_UNCACHED)
return pgprot_noncached(prot);
return prot;
}
static inline struct ivpu_device *ivpu_bo_to_vdev(struct ivpu_bo *bo)
{
return to_ivpu_device(bo->base.dev);
return to_ivpu_device(bo->base.base.dev);
}
static inline void *ivpu_to_cpu_addr(struct ivpu_bo *bo, u32 vpu_addr)

View File

@@ -15,8 +15,11 @@ struct ivpu_hw_ops {
int (*power_down)(struct ivpu_device *vdev);
int (*reset)(struct ivpu_device *vdev);
bool (*is_idle)(struct ivpu_device *vdev);
int (*wait_for_idle)(struct ivpu_device *vdev);
void (*wdt_disable)(struct ivpu_device *vdev);
void (*diagnose_failure)(struct ivpu_device *vdev);
u32 (*profiling_freq_get)(struct ivpu_device *vdev);
void (*profiling_freq_drive)(struct ivpu_device *vdev, bool enable);
u32 (*reg_pll_freq_get)(struct ivpu_device *vdev);
u32 (*reg_telemetry_offset_get)(struct ivpu_device *vdev);
u32 (*reg_telemetry_size_get)(struct ivpu_device *vdev);
@@ -58,6 +61,8 @@ struct ivpu_hw_info {
u32 sku;
u16 config;
int dma_bits;
ktime_t d0i3_entry_host_ts;
u64 d0i3_entry_vpu_ts;
};
extern const struct ivpu_hw_ops ivpu_hw_37xx_ops;
@@ -85,6 +90,11 @@ static inline bool ivpu_hw_is_idle(struct ivpu_device *vdev)
return vdev->hw->ops->is_idle(vdev);
};
static inline int ivpu_hw_wait_for_idle(struct ivpu_device *vdev)
{
return vdev->hw->ops->wait_for_idle(vdev);
};
static inline int ivpu_hw_power_down(struct ivpu_device *vdev)
{
ivpu_dbg(vdev, PM, "HW power down\n");
@@ -104,6 +114,16 @@ static inline void ivpu_hw_wdt_disable(struct ivpu_device *vdev)
vdev->hw->ops->wdt_disable(vdev);
};
static inline u32 ivpu_hw_profiling_freq_get(struct ivpu_device *vdev)
{
return vdev->hw->ops->profiling_freq_get(vdev);
};
static inline void ivpu_hw_profiling_freq_drive(struct ivpu_device *vdev, bool enable)
{
return vdev->hw->ops->profiling_freq_drive(vdev, enable);
};
/* Register indirect accesses */
static inline u32 ivpu_hw_reg_pll_freq_get(struct ivpu_device *vdev)
{

View File

@@ -29,6 +29,7 @@
#define PLL_REF_CLK_FREQ (50 * 1000000)
#define PLL_SIMULATION_FREQ (10 * 1000000)
#define PLL_PROF_CLK_FREQ (38400 * 1000)
#define PLL_DEFAULT_EPP_VALUE 0x80
#define TIM_SAFE_ENABLE 0xf1d0dead
@@ -37,7 +38,7 @@
#define TIMEOUT_US (150 * USEC_PER_MSEC)
#define PWR_ISLAND_STATUS_TIMEOUT_US (5 * USEC_PER_MSEC)
#define PLL_TIMEOUT_US (1500 * USEC_PER_MSEC)
#define IDLE_TIMEOUT_US (500 * USEC_PER_MSEC)
#define IDLE_TIMEOUT_US (5 * USEC_PER_MSEC)
#define ICB_0_IRQ_MASK ((REG_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, HOST_IPC_FIFO_INT)) | \
(REG_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, MMU_IRQ_0_INT)) | \
@@ -90,6 +91,7 @@ static void ivpu_hw_timeouts_init(struct ivpu_device *vdev)
vdev->timeout.tdr = 2000;
vdev->timeout.reschedule_suspend = 10;
vdev->timeout.autosuspend = 10;
vdev->timeout.d0i3_entry_msg = 5;
}
static int ivpu_pll_wait_for_cmd_send(struct ivpu_device *vdev)
@@ -502,6 +504,16 @@ static int ivpu_boot_pwr_domain_enable(struct ivpu_device *vdev)
return ret;
}
static int ivpu_boot_pwr_domain_disable(struct ivpu_device *vdev)
{
ivpu_boot_dpu_active_drive(vdev, false);
ivpu_boot_pwr_island_isolation_drive(vdev, true);
ivpu_boot_pwr_island_trickle_drive(vdev, false);
ivpu_boot_pwr_island_drive(vdev, false);
return ivpu_boot_wait_for_pwr_island_status(vdev, 0x0);
}
static void ivpu_boot_no_snoop_enable(struct ivpu_device *vdev)
{
u32 val = REGV_RD32(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES);
@@ -600,25 +612,17 @@ static int ivpu_hw_37xx_info_init(struct ivpu_device *vdev)
static int ivpu_hw_37xx_reset(struct ivpu_device *vdev)
{
int ret;
u32 val;
int ret = 0;
if (IVPU_WA(punit_disabled))
return 0;
ret = REGB_POLL_FLD(VPU_37XX_BUTTRESS_VPU_IP_RESET, TRIGGER, 0, TIMEOUT_US);
if (ret) {
ivpu_err(vdev, "Timed out waiting for TRIGGER bit\n");
return ret;
if (ivpu_boot_pwr_domain_disable(vdev)) {
ivpu_err(vdev, "Failed to disable power domain\n");
ret = -EIO;
}
val = REGB_RD32(VPU_37XX_BUTTRESS_VPU_IP_RESET);
val = REG_SET_FLD(VPU_37XX_BUTTRESS_VPU_IP_RESET, TRIGGER, val);
REGB_WR32(VPU_37XX_BUTTRESS_VPU_IP_RESET, val);
ret = REGB_POLL_FLD(VPU_37XX_BUTTRESS_VPU_IP_RESET, TRIGGER, 0, TIMEOUT_US);
if (ret)
ivpu_err(vdev, "Timed out waiting for RESET completion\n");
if (ivpu_pll_disable(vdev)) {
ivpu_err(vdev, "Failed to disable PLL\n");
ret = -EIO;
}
return ret;
}
@@ -651,10 +655,6 @@ static int ivpu_hw_37xx_power_up(struct ivpu_device *vdev)
{
int ret;
ret = ivpu_hw_37xx_reset(vdev);
if (ret)
ivpu_warn(vdev, "Failed to reset HW: %d\n", ret);
ret = ivpu_hw_37xx_d0i3_disable(vdev);
if (ret)
ivpu_warn(vdev, "Failed to disable D0I3: %d\n", ret);
@@ -718,15 +718,28 @@ static bool ivpu_hw_37xx_is_idle(struct ivpu_device *vdev)
REG_TEST_FLD(VPU_37XX_BUTTRESS_VPU_STATUS, IDLE, val);
}
static int ivpu_hw_37xx_wait_for_idle(struct ivpu_device *vdev)
{
return REGB_POLL_FLD(VPU_37XX_BUTTRESS_VPU_STATUS, IDLE, 0x1, IDLE_TIMEOUT_US);
}
static void ivpu_hw_37xx_save_d0i3_entry_timestamp(struct ivpu_device *vdev)
{
vdev->hw->d0i3_entry_host_ts = ktime_get_boottime();
vdev->hw->d0i3_entry_vpu_ts = REGV_RD64(VPU_37XX_CPU_SS_TIM_PERF_FREE_CNT);
}
static int ivpu_hw_37xx_power_down(struct ivpu_device *vdev)
{
int ret = 0;
if (!ivpu_hw_37xx_is_idle(vdev) && ivpu_hw_37xx_reset(vdev))
ivpu_err(vdev, "Failed to reset the VPU\n");
ivpu_hw_37xx_save_d0i3_entry_timestamp(vdev);
if (ivpu_pll_disable(vdev)) {
ivpu_err(vdev, "Failed to disable PLL\n");
if (!ivpu_hw_37xx_is_idle(vdev))
ivpu_warn(vdev, "VPU not idle during power down\n");
if (ivpu_hw_37xx_reset(vdev)) {
ivpu_err(vdev, "Failed to reset VPU\n");
ret = -EIO;
}
@@ -756,6 +769,16 @@ static void ivpu_hw_37xx_wdt_disable(struct ivpu_device *vdev)
REGV_WR32(VPU_37XX_CPU_SS_TIM_GEN_CONFIG, val);
}
static u32 ivpu_hw_37xx_profiling_freq_get(struct ivpu_device *vdev)
{
return PLL_PROF_CLK_FREQ;
}
static void ivpu_hw_37xx_profiling_freq_drive(struct ivpu_device *vdev, bool enable)
{
/* Profiling freq - is a debug feature. Unavailable on VPU 37XX. */
}
static u32 ivpu_hw_37xx_pll_to_freq(u32 ratio, u32 config)
{
u32 pll_clock = PLL_REF_CLK_FREQ * ratio;
@@ -867,17 +890,20 @@ static void ivpu_hw_37xx_irq_noc_firewall_handler(struct ivpu_device *vdev)
}
/* Handler for IRQs from VPU core (irqV) */
static u32 ivpu_hw_37xx_irqv_handler(struct ivpu_device *vdev, int irq)
static bool ivpu_hw_37xx_irqv_handler(struct ivpu_device *vdev, int irq, bool *wake_thread)
{
u32 status = REGV_RD32(VPU_37XX_HOST_SS_ICB_STATUS_0) & ICB_0_IRQ_MASK;
if (!status)
return false;
REGV_WR32(VPU_37XX_HOST_SS_ICB_CLEAR_0, status);
if (REG_TEST_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, MMU_IRQ_0_INT, status))
ivpu_mmu_irq_evtq_handler(vdev);
if (REG_TEST_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, HOST_IPC_FIFO_INT, status))
ivpu_ipc_irq_handler(vdev);
ivpu_ipc_irq_handler(vdev, wake_thread);
if (REG_TEST_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, MMU_IRQ_1_INT, status))
ivpu_dbg(vdev, IRQ, "MMU sync complete\n");
@@ -894,17 +920,17 @@ static u32 ivpu_hw_37xx_irqv_handler(struct ivpu_device *vdev, int irq)
if (REG_TEST_FLD(VPU_37XX_HOST_SS_ICB_STATUS_0, NOC_FIREWALL_INT, status))
ivpu_hw_37xx_irq_noc_firewall_handler(vdev);
return status;
return true;
}
/* Handler for IRQs from Buttress core (irqB) */
static u32 ivpu_hw_37xx_irqb_handler(struct ivpu_device *vdev, int irq)
static bool ivpu_hw_37xx_irqb_handler(struct ivpu_device *vdev, int irq)
{
u32 status = REGB_RD32(VPU_37XX_BUTTRESS_INTERRUPT_STAT) & BUTTRESS_IRQ_MASK;
bool schedule_recovery = false;
if (status == 0)
return 0;
if (!status)
return false;
if (REG_TEST_FLD(VPU_37XX_BUTTRESS_INTERRUPT_STAT, FREQ_CHANGE, status))
ivpu_dbg(vdev, IRQ, "FREQ_CHANGE irq: %08x",
@@ -940,23 +966,27 @@ static u32 ivpu_hw_37xx_irqb_handler(struct ivpu_device *vdev, int irq)
if (schedule_recovery)
ivpu_pm_schedule_recovery(vdev);
return status;
return true;
}
static irqreturn_t ivpu_hw_37xx_irq_handler(int irq, void *ptr)
{
struct ivpu_device *vdev = ptr;
u32 ret_irqv, ret_irqb;
bool irqv_handled, irqb_handled, wake_thread = false;
REGB_WR32(VPU_37XX_BUTTRESS_GLOBAL_INT_MASK, 0x1);
ret_irqv = ivpu_hw_37xx_irqv_handler(vdev, irq);
ret_irqb = ivpu_hw_37xx_irqb_handler(vdev, irq);
irqv_handled = ivpu_hw_37xx_irqv_handler(vdev, irq, &wake_thread);
irqb_handled = ivpu_hw_37xx_irqb_handler(vdev, irq);
/* Re-enable global interrupts to re-trigger MSI for pending interrupts */
REGB_WR32(VPU_37XX_BUTTRESS_GLOBAL_INT_MASK, 0x0);
return IRQ_RETVAL(ret_irqb | ret_irqv);
if (wake_thread)
return IRQ_WAKE_THREAD;
if (irqv_handled || irqb_handled)
return IRQ_HANDLED;
return IRQ_NONE;
}
static void ivpu_hw_37xx_diagnose_failure(struct ivpu_device *vdev)
@@ -993,11 +1023,14 @@ const struct ivpu_hw_ops ivpu_hw_37xx_ops = {
.info_init = ivpu_hw_37xx_info_init,
.power_up = ivpu_hw_37xx_power_up,
.is_idle = ivpu_hw_37xx_is_idle,
.wait_for_idle = ivpu_hw_37xx_wait_for_idle,
.power_down = ivpu_hw_37xx_power_down,
.reset = ivpu_hw_37xx_reset,
.boot_fw = ivpu_hw_37xx_boot_fw,
.wdt_disable = ivpu_hw_37xx_wdt_disable,
.diagnose_failure = ivpu_hw_37xx_diagnose_failure,
.profiling_freq_get = ivpu_hw_37xx_profiling_freq_get,
.profiling_freq_drive = ivpu_hw_37xx_profiling_freq_drive,
.reg_pll_freq_get = ivpu_hw_37xx_reg_pll_freq_get,
.reg_telemetry_offset_get = ivpu_hw_37xx_reg_telemetry_offset_get,
.reg_telemetry_size_get = ivpu_hw_37xx_reg_telemetry_size_get,

View File

@@ -240,6 +240,8 @@
#define VPU_37XX_CPU_SS_TIM_GEN_CONFIG 0x06021008u
#define VPU_37XX_CPU_SS_TIM_GEN_CONFIG_WDOG_TO_INT_CLR_MASK BIT_MASK(9)
#define VPU_37XX_CPU_SS_TIM_PERF_FREE_CNT 0x06029000u
#define VPU_37XX_CPU_SS_DOORBELL_0 0x06300000u
#define VPU_37XX_CPU_SS_DOORBELL_0_SET_MASK BIT_MASK(0)

View File

@@ -39,6 +39,7 @@
#define TIMEOUT_US (150 * USEC_PER_MSEC)
#define PWR_ISLAND_STATUS_TIMEOUT_US (5 * USEC_PER_MSEC)
#define PLL_TIMEOUT_US (1500 * USEC_PER_MSEC)
#define IDLE_TIMEOUT_US (5 * USEC_PER_MSEC)
#define WEIGHTS_DEFAULT 0xf711f711u
#define WEIGHTS_ATS_DEFAULT 0x0000f711u
@@ -139,18 +140,21 @@ static void ivpu_hw_timeouts_init(struct ivpu_device *vdev)
vdev->timeout.tdr = 2000000;
vdev->timeout.reschedule_suspend = 1000;
vdev->timeout.autosuspend = -1;
vdev->timeout.d0i3_entry_msg = 500;
} else if (ivpu_is_simics(vdev)) {
vdev->timeout.boot = 50;
vdev->timeout.jsm = 500;
vdev->timeout.tdr = 10000;
vdev->timeout.reschedule_suspend = 10;
vdev->timeout.autosuspend = -1;
vdev->timeout.d0i3_entry_msg = 100;
} else {
vdev->timeout.boot = 1000;
vdev->timeout.jsm = 500;
vdev->timeout.tdr = 2000;
vdev->timeout.reschedule_suspend = 10;
vdev->timeout.autosuspend = 10;
vdev->timeout.d0i3_entry_msg = 5;
}
}
@@ -824,12 +828,6 @@ static int ivpu_hw_40xx_power_up(struct ivpu_device *vdev)
{
int ret;
ret = ivpu_hw_40xx_reset(vdev);
if (ret) {
ivpu_err(vdev, "Failed to reset HW: %d\n", ret);
return ret;
}
ret = ivpu_hw_40xx_d0i3_disable(vdev);
if (ret)
ivpu_warn(vdev, "Failed to disable D0I3: %d\n", ret);
@@ -898,10 +896,23 @@ static bool ivpu_hw_40xx_is_idle(struct ivpu_device *vdev)
REG_TEST_FLD(VPU_40XX_BUTTRESS_VPU_STATUS, IDLE, val);
}
static int ivpu_hw_40xx_wait_for_idle(struct ivpu_device *vdev)
{
return REGB_POLL_FLD(VPU_40XX_BUTTRESS_VPU_STATUS, IDLE, 0x1, IDLE_TIMEOUT_US);
}
static void ivpu_hw_40xx_save_d0i3_entry_timestamp(struct ivpu_device *vdev)
{
vdev->hw->d0i3_entry_host_ts = ktime_get_boottime();
vdev->hw->d0i3_entry_vpu_ts = REGV_RD64(VPU_40XX_CPU_SS_TIM_PERF_EXT_FREE_CNT);
}
static int ivpu_hw_40xx_power_down(struct ivpu_device *vdev)
{
int ret = 0;
ivpu_hw_40xx_save_d0i3_entry_timestamp(vdev);
if (!ivpu_hw_40xx_is_idle(vdev) && ivpu_hw_40xx_reset(vdev))
ivpu_warn(vdev, "Failed to reset the VPU\n");
@@ -933,6 +944,19 @@ static void ivpu_hw_40xx_wdt_disable(struct ivpu_device *vdev)
REGV_WR32(VPU_40XX_CPU_SS_TIM_GEN_CONFIG, val);
}
static u32 ivpu_hw_40xx_profiling_freq_get(struct ivpu_device *vdev)
{
return vdev->hw->pll.profiling_freq;
}
static void ivpu_hw_40xx_profiling_freq_drive(struct ivpu_device *vdev, bool enable)
{
if (enable)
vdev->hw->pll.profiling_freq = PLL_PROFILING_FREQ_HIGH;
else
vdev->hw->pll.profiling_freq = PLL_PROFILING_FREQ_DEFAULT;
}
/* Register indirect accesses */
static u32 ivpu_hw_40xx_reg_pll_freq_get(struct ivpu_device *vdev)
{
@@ -1023,13 +1047,12 @@ static void ivpu_hw_40xx_irq_noc_firewall_handler(struct ivpu_device *vdev)
}
/* Handler for IRQs from VPU core (irqV) */
static irqreturn_t ivpu_hw_40xx_irqv_handler(struct ivpu_device *vdev, int irq)
static bool ivpu_hw_40xx_irqv_handler(struct ivpu_device *vdev, int irq, bool *wake_thread)
{
u32 status = REGV_RD32(VPU_40XX_HOST_SS_ICB_STATUS_0) & ICB_0_IRQ_MASK;
irqreturn_t ret = IRQ_NONE;
if (!status)
return IRQ_NONE;
return false;
REGV_WR32(VPU_40XX_HOST_SS_ICB_CLEAR_0, status);
@@ -1037,7 +1060,7 @@ static irqreturn_t ivpu_hw_40xx_irqv_handler(struct ivpu_device *vdev, int irq)
ivpu_mmu_irq_evtq_handler(vdev);
if (REG_TEST_FLD(VPU_40XX_HOST_SS_ICB_STATUS_0, HOST_IPC_FIFO_INT, status))
ret |= ivpu_ipc_irq_handler(vdev);
ivpu_ipc_irq_handler(vdev, wake_thread);
if (REG_TEST_FLD(VPU_40XX_HOST_SS_ICB_STATUS_0, MMU_IRQ_1_INT, status))
ivpu_dbg(vdev, IRQ, "MMU sync complete\n");
@@ -1054,17 +1077,17 @@ static irqreturn_t ivpu_hw_40xx_irqv_handler(struct ivpu_device *vdev, int irq)
if (REG_TEST_FLD(VPU_40XX_HOST_SS_ICB_STATUS_0, NOC_FIREWALL_INT, status))
ivpu_hw_40xx_irq_noc_firewall_handler(vdev);
return ret;
return true;
}
/* Handler for IRQs from Buttress core (irqB) */
static irqreturn_t ivpu_hw_40xx_irqb_handler(struct ivpu_device *vdev, int irq)
static bool ivpu_hw_40xx_irqb_handler(struct ivpu_device *vdev, int irq)
{
bool schedule_recovery = false;
u32 status = REGB_RD32(VPU_40XX_BUTTRESS_INTERRUPT_STAT) & BUTTRESS_IRQ_MASK;
if (status == 0)
return IRQ_NONE;
if (!status)
return false;
if (REG_TEST_FLD(VPU_40XX_BUTTRESS_INTERRUPT_STAT, FREQ_CHANGE, status))
ivpu_dbg(vdev, IRQ, "FREQ_CHANGE");
@@ -1116,26 +1139,27 @@ static irqreturn_t ivpu_hw_40xx_irqb_handler(struct ivpu_device *vdev, int irq)
if (schedule_recovery)
ivpu_pm_schedule_recovery(vdev);
return IRQ_HANDLED;
return true;
}
static irqreturn_t ivpu_hw_40xx_irq_handler(int irq, void *ptr)
{
bool irqv_handled, irqb_handled, wake_thread = false;
struct ivpu_device *vdev = ptr;
irqreturn_t ret = IRQ_NONE;
REGB_WR32(VPU_40XX_BUTTRESS_GLOBAL_INT_MASK, 0x1);
ret |= ivpu_hw_40xx_irqv_handler(vdev, irq);
ret |= ivpu_hw_40xx_irqb_handler(vdev, irq);
irqv_handled = ivpu_hw_40xx_irqv_handler(vdev, irq, &wake_thread);
irqb_handled = ivpu_hw_40xx_irqb_handler(vdev, irq);
/* Re-enable global interrupts to re-trigger MSI for pending interrupts */
REGB_WR32(VPU_40XX_BUTTRESS_GLOBAL_INT_MASK, 0x0);
if (ret & IRQ_WAKE_THREAD)
if (wake_thread)
return IRQ_WAKE_THREAD;
return ret;
if (irqv_handled || irqb_handled)
return IRQ_HANDLED;
return IRQ_NONE;
}
static void ivpu_hw_40xx_diagnose_failure(struct ivpu_device *vdev)
@@ -1185,11 +1209,14 @@ const struct ivpu_hw_ops ivpu_hw_40xx_ops = {
.info_init = ivpu_hw_40xx_info_init,
.power_up = ivpu_hw_40xx_power_up,
.is_idle = ivpu_hw_40xx_is_idle,
.wait_for_idle = ivpu_hw_40xx_wait_for_idle,
.power_down = ivpu_hw_40xx_power_down,
.reset = ivpu_hw_40xx_reset,
.boot_fw = ivpu_hw_40xx_boot_fw,
.wdt_disable = ivpu_hw_40xx_wdt_disable,
.diagnose_failure = ivpu_hw_40xx_diagnose_failure,
.profiling_freq_get = ivpu_hw_40xx_profiling_freq_get,
.profiling_freq_drive = ivpu_hw_40xx_profiling_freq_drive,
.reg_pll_freq_get = ivpu_hw_40xx_reg_pll_freq_get,
.reg_telemetry_offset_get = ivpu_hw_40xx_reg_telemetry_offset_get,
.reg_telemetry_size_get = ivpu_hw_40xx_reg_telemetry_size_get,

View File

@@ -5,7 +5,7 @@
#include <linux/genalloc.h>
#include <linux/highmem.h>
#include <linux/kthread.h>
#include <linux/pm_runtime.h>
#include <linux/wait.h>
#include "ivpu_drv.h"
@@ -17,19 +17,12 @@
#include "ivpu_pm.h"
#define IPC_MAX_RX_MSG 128
#define IS_KTHREAD() (get_current()->flags & PF_KTHREAD)
struct ivpu_ipc_tx_buf {
struct ivpu_ipc_hdr ipc;
struct vpu_jsm_msg jsm;
};
struct ivpu_ipc_rx_msg {
struct list_head link;
struct ivpu_ipc_hdr *ipc_hdr;
struct vpu_jsm_msg *jsm_msg;
};
static void ivpu_ipc_msg_dump(struct ivpu_device *vdev, char *c,
struct ivpu_ipc_hdr *ipc_hdr, u32 vpu_addr)
{
@@ -139,8 +132,49 @@ static void ivpu_ipc_tx(struct ivpu_device *vdev, u32 vpu_addr)
ivpu_hw_reg_ipc_tx_set(vdev, vpu_addr);
}
void
ivpu_ipc_consumer_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons, u32 channel)
static void
ivpu_ipc_rx_msg_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
struct ivpu_ipc_hdr *ipc_hdr, struct vpu_jsm_msg *jsm_msg)
{
struct ivpu_ipc_info *ipc = vdev->ipc;
struct ivpu_ipc_rx_msg *rx_msg;
lockdep_assert_held(&ipc->cons_lock);
lockdep_assert_irqs_disabled();
rx_msg = kzalloc(sizeof(*rx_msg), GFP_ATOMIC);
if (!rx_msg) {
ivpu_ipc_rx_mark_free(vdev, ipc_hdr, jsm_msg);
return;
}
atomic_inc(&ipc->rx_msg_count);
rx_msg->ipc_hdr = ipc_hdr;
rx_msg->jsm_msg = jsm_msg;
rx_msg->callback = cons->rx_callback;
if (rx_msg->callback) {
list_add_tail(&rx_msg->link, &ipc->cb_msg_list);
} else {
spin_lock(&cons->rx_lock);
list_add_tail(&rx_msg->link, &cons->rx_msg_list);
spin_unlock(&cons->rx_lock);
wake_up(&cons->rx_msg_wq);
}
}
static void
ivpu_ipc_rx_msg_del(struct ivpu_device *vdev, struct ivpu_ipc_rx_msg *rx_msg)
{
list_del(&rx_msg->link);
ivpu_ipc_rx_mark_free(vdev, rx_msg->ipc_hdr, rx_msg->jsm_msg);
atomic_dec(&vdev->ipc->rx_msg_count);
kfree(rx_msg);
}
void ivpu_ipc_consumer_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
u32 channel, ivpu_ipc_rx_callback_t rx_callback)
{
struct ivpu_ipc_info *ipc = vdev->ipc;
@@ -148,13 +182,15 @@ ivpu_ipc_consumer_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
cons->channel = channel;
cons->tx_vpu_addr = 0;
cons->request_id = 0;
spin_lock_init(&cons->rx_msg_lock);
cons->aborted = false;
cons->rx_callback = rx_callback;
spin_lock_init(&cons->rx_lock);
INIT_LIST_HEAD(&cons->rx_msg_list);
init_waitqueue_head(&cons->rx_msg_wq);
spin_lock_irq(&ipc->cons_list_lock);
spin_lock_irq(&ipc->cons_lock);
list_add_tail(&cons->link, &ipc->cons_list);
spin_unlock_irq(&ipc->cons_list_lock);
spin_unlock_irq(&ipc->cons_lock);
}
void ivpu_ipc_consumer_del(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons)
@@ -162,18 +198,14 @@ void ivpu_ipc_consumer_del(struct ivpu_device *vdev, struct ivpu_ipc_consumer *c
struct ivpu_ipc_info *ipc = vdev->ipc;
struct ivpu_ipc_rx_msg *rx_msg, *r;
spin_lock_irq(&ipc->cons_list_lock);
spin_lock_irq(&ipc->cons_lock);
list_del(&cons->link);
spin_unlock_irq(&ipc->cons_list_lock);
spin_unlock_irq(&ipc->cons_lock);
spin_lock_irq(&cons->rx_msg_lock);
list_for_each_entry_safe(rx_msg, r, &cons->rx_msg_list, link) {
list_del(&rx_msg->link);
ivpu_ipc_rx_mark_free(vdev, rx_msg->ipc_hdr, rx_msg->jsm_msg);
atomic_dec(&ipc->rx_msg_count);
kfree(rx_msg);
}
spin_unlock_irq(&cons->rx_msg_lock);
spin_lock_irq(&cons->rx_lock);
list_for_each_entry_safe(rx_msg, r, &cons->rx_msg_list, link)
ivpu_ipc_rx_msg_del(vdev, rx_msg);
spin_unlock_irq(&cons->rx_lock);
ivpu_ipc_tx_release(vdev, cons->tx_vpu_addr);
}
@@ -202,52 +234,61 @@ unlock:
return ret;
}
static bool ivpu_ipc_rx_need_wakeup(struct ivpu_ipc_consumer *cons)
{
bool ret;
spin_lock_irq(&cons->rx_lock);
ret = !list_empty(&cons->rx_msg_list) || cons->aborted;
spin_unlock_irq(&cons->rx_lock);
return ret;
}
int ivpu_ipc_receive(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
struct ivpu_ipc_hdr *ipc_buf,
struct vpu_jsm_msg *ipc_payload, unsigned long timeout_ms)
struct vpu_jsm_msg *jsm_msg, unsigned long timeout_ms)
{
struct ivpu_ipc_info *ipc = vdev->ipc;
struct ivpu_ipc_rx_msg *rx_msg;
int wait_ret, ret = 0;
wait_ret = wait_event_timeout(cons->rx_msg_wq,
(IS_KTHREAD() && kthread_should_stop()) ||
!list_empty(&cons->rx_msg_list),
msecs_to_jiffies(timeout_ms));
if (drm_WARN_ONCE(&vdev->drm, cons->rx_callback, "Consumer works only in async mode\n"))
return -EINVAL;
if (IS_KTHREAD() && kthread_should_stop())
return -EINTR;
wait_ret = wait_event_timeout(cons->rx_msg_wq,
ivpu_ipc_rx_need_wakeup(cons),
msecs_to_jiffies(timeout_ms));
if (wait_ret == 0)
return -ETIMEDOUT;
spin_lock_irq(&cons->rx_msg_lock);
spin_lock_irq(&cons->rx_lock);
if (cons->aborted) {
spin_unlock_irq(&cons->rx_lock);
return -ECANCELED;
}
rx_msg = list_first_entry_or_null(&cons->rx_msg_list, struct ivpu_ipc_rx_msg, link);
if (!rx_msg) {
spin_unlock_irq(&cons->rx_msg_lock);
spin_unlock_irq(&cons->rx_lock);
return -EAGAIN;
}
list_del(&rx_msg->link);
spin_unlock_irq(&cons->rx_msg_lock);
if (ipc_buf)
memcpy(ipc_buf, rx_msg->ipc_hdr, sizeof(*ipc_buf));
if (rx_msg->jsm_msg) {
u32 size = min_t(int, rx_msg->ipc_hdr->data_size, sizeof(*ipc_payload));
u32 size = min_t(int, rx_msg->ipc_hdr->data_size, sizeof(*jsm_msg));
if (rx_msg->jsm_msg->result != VPU_JSM_STATUS_SUCCESS) {
ivpu_dbg(vdev, IPC, "IPC resp result error: %d\n", rx_msg->jsm_msg->result);
ret = -EBADMSG;
}
if (ipc_payload)
memcpy(ipc_payload, rx_msg->jsm_msg, size);
if (jsm_msg)
memcpy(jsm_msg, rx_msg->jsm_msg, size);
}
ivpu_ipc_rx_mark_free(vdev, rx_msg->ipc_hdr, rx_msg->jsm_msg);
atomic_dec(&ipc->rx_msg_count);
kfree(rx_msg);
ivpu_ipc_rx_msg_del(vdev, rx_msg);
spin_unlock_irq(&cons->rx_lock);
return ret;
}
@@ -260,7 +301,7 @@ ivpu_ipc_send_receive_internal(struct ivpu_device *vdev, struct vpu_jsm_msg *req
struct ivpu_ipc_consumer cons;
int ret;
ivpu_ipc_consumer_add(vdev, &cons, channel);
ivpu_ipc_consumer_add(vdev, &cons, channel, NULL);
ret = ivpu_ipc_send(vdev, &cons, req);
if (ret) {
@@ -285,23 +326,19 @@ consumer_del:
return ret;
}
int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
enum vpu_ipc_msg_type expected_resp_type,
struct vpu_jsm_msg *resp, u32 channel,
unsigned long timeout_ms)
int ivpu_ipc_send_receive_active(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
enum vpu_ipc_msg_type expected_resp, struct vpu_jsm_msg *resp,
u32 channel, unsigned long timeout_ms)
{
struct vpu_jsm_msg hb_req = { .type = VPU_JSM_MSG_QUERY_ENGINE_HB };
struct vpu_jsm_msg hb_resp;
int ret, hb_ret;
ret = ivpu_rpm_get(vdev);
if (ret < 0)
return ret;
drm_WARN_ON(&vdev->drm, pm_runtime_status_suspended(vdev->drm.dev));
ret = ivpu_ipc_send_receive_internal(vdev, req, expected_resp_type, resp,
channel, timeout_ms);
ret = ivpu_ipc_send_receive_internal(vdev, req, expected_resp, resp, channel, timeout_ms);
if (ret != -ETIMEDOUT)
goto rpm_put;
return ret;
hb_ret = ivpu_ipc_send_receive_internal(vdev, &hb_req, VPU_JSM_MSG_QUERY_ENGINE_HB_DONE,
&hb_resp, VPU_IPC_CHAN_ASYNC_CMD,
@@ -311,7 +348,21 @@ int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
ivpu_pm_schedule_recovery(vdev);
}
rpm_put:
return ret;
}
int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
enum vpu_ipc_msg_type expected_resp, struct vpu_jsm_msg *resp,
u32 channel, unsigned long timeout_ms)
{
int ret;
ret = ivpu_rpm_get(vdev);
if (ret < 0)
return ret;
ret = ivpu_ipc_send_receive_active(vdev, req, expected_resp, resp, channel, timeout_ms);
ivpu_rpm_put(vdev);
return ret;
}
@@ -329,35 +380,7 @@ ivpu_ipc_match_consumer(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons
return false;
}
static void
ivpu_ipc_dispatch(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
struct ivpu_ipc_hdr *ipc_hdr, struct vpu_jsm_msg *jsm_msg)
{
struct ivpu_ipc_info *ipc = vdev->ipc;
struct ivpu_ipc_rx_msg *rx_msg;
unsigned long flags;
lockdep_assert_held(&ipc->cons_list_lock);
rx_msg = kzalloc(sizeof(*rx_msg), GFP_ATOMIC);
if (!rx_msg) {
ivpu_ipc_rx_mark_free(vdev, ipc_hdr, jsm_msg);
return;
}
atomic_inc(&ipc->rx_msg_count);
rx_msg->ipc_hdr = ipc_hdr;
rx_msg->jsm_msg = jsm_msg;
spin_lock_irqsave(&cons->rx_msg_lock, flags);
list_add_tail(&rx_msg->link, &cons->rx_msg_list);
spin_unlock_irqrestore(&cons->rx_msg_lock, flags);
wake_up(&cons->rx_msg_wq);
}
int ivpu_ipc_irq_handler(struct ivpu_device *vdev)
void ivpu_ipc_irq_handler(struct ivpu_device *vdev, bool *wake_thread)
{
struct ivpu_ipc_info *ipc = vdev->ipc;
struct ivpu_ipc_consumer *cons;
@@ -375,7 +398,7 @@ int ivpu_ipc_irq_handler(struct ivpu_device *vdev)
vpu_addr = ivpu_hw_reg_ipc_rx_addr_get(vdev);
if (vpu_addr == REG_IO_ERROR) {
ivpu_err_ratelimited(vdev, "Failed to read IPC rx addr register\n");
return -EIO;
return;
}
ipc_hdr = ivpu_to_cpu_addr(ipc->mem_rx, vpu_addr);
@@ -405,15 +428,15 @@ int ivpu_ipc_irq_handler(struct ivpu_device *vdev)
}
dispatched = false;
spin_lock_irqsave(&ipc->cons_list_lock, flags);
spin_lock_irqsave(&ipc->cons_lock, flags);
list_for_each_entry(cons, &ipc->cons_list, link) {
if (ivpu_ipc_match_consumer(vdev, cons, ipc_hdr, jsm_msg)) {
ivpu_ipc_dispatch(vdev, cons, ipc_hdr, jsm_msg);
ivpu_ipc_rx_msg_add(vdev, cons, ipc_hdr, jsm_msg);
dispatched = true;
break;
}
}
spin_unlock_irqrestore(&ipc->cons_list_lock, flags);
spin_unlock_irqrestore(&ipc->cons_lock, flags);
if (!dispatched) {
ivpu_dbg(vdev, IPC, "IPC RX msg 0x%x dropped (no consumer)\n", vpu_addr);
@@ -421,7 +444,28 @@ int ivpu_ipc_irq_handler(struct ivpu_device *vdev)
}
}
return 0;
if (wake_thread)
*wake_thread = !list_empty(&ipc->cb_msg_list);
}
irqreturn_t ivpu_ipc_irq_thread_handler(struct ivpu_device *vdev)
{
struct ivpu_ipc_info *ipc = vdev->ipc;
struct ivpu_ipc_rx_msg *rx_msg, *r;
struct list_head cb_msg_list;
INIT_LIST_HEAD(&cb_msg_list);
spin_lock_irq(&ipc->cons_lock);
list_splice_tail_init(&ipc->cb_msg_list, &cb_msg_list);
spin_unlock_irq(&ipc->cons_lock);
list_for_each_entry_safe(rx_msg, r, &cb_msg_list, link) {
rx_msg->callback(vdev, rx_msg->ipc_hdr, rx_msg->jsm_msg);
ivpu_ipc_rx_msg_del(vdev, rx_msg);
}
return IRQ_HANDLED;
}
int ivpu_ipc_init(struct ivpu_device *vdev)
@@ -456,10 +500,10 @@ int ivpu_ipc_init(struct ivpu_device *vdev)
goto err_free_rx;
}
spin_lock_init(&ipc->cons_lock);
INIT_LIST_HEAD(&ipc->cons_list);
spin_lock_init(&ipc->cons_list_lock);
INIT_LIST_HEAD(&ipc->cb_msg_list);
drmm_mutex_init(&vdev->drm, &ipc->lock);
ivpu_ipc_reset(vdev);
return 0;
@@ -472,6 +516,13 @@ err_free_tx:
void ivpu_ipc_fini(struct ivpu_device *vdev)
{
struct ivpu_ipc_info *ipc = vdev->ipc;
drm_WARN_ON(&vdev->drm, ipc->on);
drm_WARN_ON(&vdev->drm, !list_empty(&ipc->cons_list));
drm_WARN_ON(&vdev->drm, !list_empty(&ipc->cb_msg_list));
drm_WARN_ON(&vdev->drm, atomic_read(&ipc->rx_msg_count) > 0);
ivpu_ipc_mem_fini(vdev);
}
@@ -488,16 +539,27 @@ void ivpu_ipc_disable(struct ivpu_device *vdev)
{
struct ivpu_ipc_info *ipc = vdev->ipc;
struct ivpu_ipc_consumer *cons, *c;
unsigned long flags;
struct ivpu_ipc_rx_msg *rx_msg, *r;
drm_WARN_ON(&vdev->drm, !list_empty(&ipc->cb_msg_list));
mutex_lock(&ipc->lock);
ipc->on = false;
mutex_unlock(&ipc->lock);
spin_lock_irqsave(&ipc->cons_list_lock, flags);
list_for_each_entry_safe(cons, c, &ipc->cons_list, link)
spin_lock_irq(&ipc->cons_lock);
list_for_each_entry_safe(cons, c, &ipc->cons_list, link) {
spin_lock(&cons->rx_lock);
if (!cons->rx_callback)
cons->aborted = true;
list_for_each_entry_safe(rx_msg, r, &cons->rx_msg_list, link)
ivpu_ipc_rx_msg_del(vdev, rx_msg);
spin_unlock(&cons->rx_lock);
wake_up(&cons->rx_msg_wq);
spin_unlock_irqrestore(&ipc->cons_list_lock, flags);
}
spin_unlock_irq(&ipc->cons_lock);
drm_WARN_ON(&vdev->drm, atomic_read(&ipc->rx_msg_count) > 0);
}
void ivpu_ipc_reset(struct ivpu_device *vdev)
@@ -505,6 +567,7 @@ void ivpu_ipc_reset(struct ivpu_device *vdev)
struct ivpu_ipc_info *ipc = vdev->ipc;
mutex_lock(&ipc->lock);
drm_WARN_ON(&vdev->drm, ipc->on);
memset(ivpu_bo_vaddr(ipc->mem_tx), 0, ivpu_bo_size(ipc->mem_tx));
memset(ivpu_bo_vaddr(ipc->mem_rx), 0, ivpu_bo_size(ipc->mem_rx));

View File

@@ -42,13 +42,26 @@ struct ivpu_ipc_hdr {
u8 status;
} __packed __aligned(IVPU_IPC_ALIGNMENT);
typedef void (*ivpu_ipc_rx_callback_t)(struct ivpu_device *vdev,
struct ivpu_ipc_hdr *ipc_hdr,
struct vpu_jsm_msg *jsm_msg);
struct ivpu_ipc_rx_msg {
struct list_head link;
struct ivpu_ipc_hdr *ipc_hdr;
struct vpu_jsm_msg *jsm_msg;
ivpu_ipc_rx_callback_t callback;
};
struct ivpu_ipc_consumer {
struct list_head link;
u32 channel;
u32 tx_vpu_addr;
u32 request_id;
bool aborted;
ivpu_ipc_rx_callback_t rx_callback;
spinlock_t rx_msg_lock; /* Protects rx_msg_list */
spinlock_t rx_lock; /* Protects rx_msg_list and aborted */
struct list_head rx_msg_list;
wait_queue_head_t rx_msg_wq;
};
@@ -60,8 +73,9 @@ struct ivpu_ipc_info {
atomic_t rx_msg_count;
spinlock_t cons_list_lock; /* Protects cons_list */
spinlock_t cons_lock; /* Protects cons_list and cb_msg_list */
struct list_head cons_list;
struct list_head cb_msg_list;
atomic_t request_id;
struct mutex lock; /* Lock on status */
@@ -75,19 +89,22 @@ void ivpu_ipc_enable(struct ivpu_device *vdev);
void ivpu_ipc_disable(struct ivpu_device *vdev);
void ivpu_ipc_reset(struct ivpu_device *vdev);
int ivpu_ipc_irq_handler(struct ivpu_device *vdev);
void ivpu_ipc_irq_handler(struct ivpu_device *vdev, bool *wake_thread);
irqreturn_t ivpu_ipc_irq_thread_handler(struct ivpu_device *vdev);
void ivpu_ipc_consumer_add(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
u32 channel);
u32 channel, ivpu_ipc_rx_callback_t callback);
void ivpu_ipc_consumer_del(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons);
int ivpu_ipc_receive(struct ivpu_device *vdev, struct ivpu_ipc_consumer *cons,
struct ivpu_ipc_hdr *ipc_buf, struct vpu_jsm_msg *ipc_payload,
struct ivpu_ipc_hdr *ipc_buf, struct vpu_jsm_msg *jsm_msg,
unsigned long timeout_ms);
int ivpu_ipc_send_receive_active(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
enum vpu_ipc_msg_type expected_resp, struct vpu_jsm_msg *resp,
u32 channel, unsigned long timeout_ms);
int ivpu_ipc_send_receive(struct ivpu_device *vdev, struct vpu_jsm_msg *req,
enum vpu_ipc_msg_type expected_resp_type,
struct vpu_jsm_msg *resp, u32 channel,
unsigned long timeout_ms);
enum vpu_ipc_msg_type expected_resp, struct vpu_jsm_msg *resp,
u32 channel, unsigned long timeout_ms);
#endif /* __IVPU_IPC_H__ */

View File

@@ -7,7 +7,6 @@
#include <linux/bitfield.h>
#include <linux/highmem.h>
#include <linux/kthread.h>
#include <linux/pci.h>
#include <linux/module.h>
#include <uapi/drm/ivpu_accel.h>
@@ -24,10 +23,6 @@
#define JOB_ID_CONTEXT_MASK GENMASK(31, 8)
#define JOB_MAX_BUFFER_COUNT 65535
static unsigned int ivpu_tdr_timeout_ms;
module_param_named(tdr_timeout_ms, ivpu_tdr_timeout_ms, uint, 0644);
MODULE_PARM_DESC(tdr_timeout_ms, "Timeout for device hang detection, in milliseconds, 0 - default");
static void ivpu_cmdq_ring_db(struct ivpu_device *vdev, struct ivpu_cmdq *cmdq)
{
ivpu_hw_reg_db_set(vdev, cmdq->db_id);
@@ -196,6 +191,8 @@ static int ivpu_cmdq_push_job(struct ivpu_cmdq *cmdq, struct ivpu_job *job)
entry->batch_buf_addr = job->cmd_buf_vpu_addr;
entry->job_id = job->job_id;
entry->flags = 0;
if (unlikely(ivpu_test_mode & IVPU_TEST_MODE_NULL_SUBMISSION))
entry->flags = VPU_JOB_FLAGS_NULL_SUBMISSION_MASK;
wmb(); /* Ensure that tail is updated after filling entry */
header->tail = next_entry;
wmb(); /* Flush WC buffer for jobq header */
@@ -264,7 +261,7 @@ static void job_release(struct kref *ref)
for (i = 0; i < job->bo_count; i++)
if (job->bos[i])
drm_gem_object_put(&job->bos[i]->base);
drm_gem_object_put(&job->bos[i]->base.base);
dma_fence_put(job->done_fence);
ivpu_file_priv_put(&job->file_priv);
@@ -340,23 +337,12 @@ static int ivpu_job_done(struct ivpu_device *vdev, u32 job_id, u32 job_status)
ivpu_dbg(vdev, JOB, "Job complete: id %3u ctx %2d engine %d status 0x%x\n",
job->job_id, job->file_priv->ctx.id, job->engine_idx, job_status);
ivpu_stop_job_timeout_detection(vdev);
job_put(job);
return 0;
}
static void ivpu_job_done_message(struct ivpu_device *vdev, void *msg)
{
struct vpu_ipc_msg_payload_job_done *payload;
struct vpu_jsm_msg *job_ret_msg = msg;
int ret;
payload = (struct vpu_ipc_msg_payload_job_done *)&job_ret_msg->payload;
ret = ivpu_job_done(vdev, payload->job_id, payload->job_status);
if (ret)
ivpu_err(vdev, "Failed to finish job %d: %d\n", payload->job_id, ret);
}
void ivpu_jobs_abort_all(struct ivpu_device *vdev)
{
struct ivpu_job *job;
@@ -398,11 +384,13 @@ static int ivpu_direct_job_submission(struct ivpu_job *job)
if (ret)
goto err_xa_erase;
ivpu_start_job_timeout_detection(vdev);
ivpu_dbg(vdev, JOB, "Job submitted: id %3u addr 0x%llx ctx %2d engine %d next %d\n",
job->job_id, job->cmd_buf_vpu_addr, file_priv->ctx.id,
job->engine_idx, cmdq->jobq->header.tail);
if (ivpu_test_mode == IVPU_TEST_MODE_NULL_HW) {
if (ivpu_test_mode & IVPU_TEST_MODE_NULL_HW) {
ivpu_job_done(vdev, job->job_id, VPU_JSM_STATUS_SUCCESS);
cmdq->jobq->header.head = cmdq->jobq->header.tail;
wmb(); /* Flush WC buffer for jobq header */
@@ -448,7 +436,7 @@ ivpu_job_prepare_bos_for_submit(struct drm_file *file, struct ivpu_job *job, u32
}
bo = job->bos[CMD_BUF_IDX];
if (!dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_READ)) {
if (!dma_resv_test_signaled(bo->base.base.resv, DMA_RESV_USAGE_READ)) {
ivpu_warn(vdev, "Buffer is already in use\n");
return -EBUSY;
}
@@ -468,7 +456,7 @@ ivpu_job_prepare_bos_for_submit(struct drm_file *file, struct ivpu_job *job, u32
}
for (i = 0; i < buf_count; i++) {
ret = dma_resv_reserve_fences(job->bos[i]->base.resv, 1);
ret = dma_resv_reserve_fences(job->bos[i]->base.base.resv, 1);
if (ret) {
ivpu_warn(vdev, "Failed to reserve fences: %d\n", ret);
goto unlock_reservations;
@@ -477,7 +465,7 @@ ivpu_job_prepare_bos_for_submit(struct drm_file *file, struct ivpu_job *job, u32
for (i = 0; i < buf_count; i++) {
usage = (i == CMD_BUF_IDX) ? DMA_RESV_USAGE_WRITE : DMA_RESV_USAGE_BOOKKEEP;
dma_resv_add_fence(job->bos[i]->base.resv, job->done_fence, usage);
dma_resv_add_fence(job->bos[i]->base.base.resv, job->done_fence, usage);
}
unlock_reservations:
@@ -562,61 +550,36 @@ free_handles:
return ret;
}
static int ivpu_job_done_thread(void *arg)
static void
ivpu_job_done_callback(struct ivpu_device *vdev, struct ivpu_ipc_hdr *ipc_hdr,
struct vpu_jsm_msg *jsm_msg)
{
struct ivpu_device *vdev = (struct ivpu_device *)arg;
struct ivpu_ipc_consumer cons;
struct vpu_jsm_msg jsm_msg;
bool jobs_submitted;
unsigned int timeout;
struct vpu_ipc_msg_payload_job_done *payload;
int ret;
ivpu_dbg(vdev, JOB, "Started %s\n", __func__);
ivpu_ipc_consumer_add(vdev, &cons, VPU_IPC_CHAN_JOB_RET);
while (!kthread_should_stop()) {
timeout = ivpu_tdr_timeout_ms ? ivpu_tdr_timeout_ms : vdev->timeout.tdr;
jobs_submitted = !xa_empty(&vdev->submitted_jobs_xa);
ret = ivpu_ipc_receive(vdev, &cons, NULL, &jsm_msg, timeout);
if (!ret) {
ivpu_job_done_message(vdev, &jsm_msg);
} else if (ret == -ETIMEDOUT) {
if (jobs_submitted && !xa_empty(&vdev->submitted_jobs_xa)) {
ivpu_err(vdev, "TDR detected, timeout %d ms", timeout);
ivpu_hw_diagnose_failure(vdev);
ivpu_pm_schedule_recovery(vdev);
}
}
if (!jsm_msg) {
ivpu_err(vdev, "IPC message has no JSM payload\n");
return;
}
ivpu_ipc_consumer_del(vdev, &cons);
ivpu_jobs_abort_all(vdev);
ivpu_dbg(vdev, JOB, "Stopped %s\n", __func__);
return 0;
}
int ivpu_job_done_thread_init(struct ivpu_device *vdev)
{
struct task_struct *thread;
thread = kthread_run(&ivpu_job_done_thread, (void *)vdev, "ivpu_job_done_thread");
if (IS_ERR(thread)) {
ivpu_err(vdev, "Failed to start job completion thread\n");
return -EIO;
if (jsm_msg->result != VPU_JSM_STATUS_SUCCESS) {
ivpu_err(vdev, "Invalid JSM message result: %d\n", jsm_msg->result);
return;
}
get_task_struct(thread);
wake_up_process(thread);
vdev->job_done_thread = thread;
return 0;
payload = (struct vpu_ipc_msg_payload_job_done *)&jsm_msg->payload;
ret = ivpu_job_done(vdev, payload->job_id, payload->job_status);
if (!ret && !xa_empty(&vdev->submitted_jobs_xa))
ivpu_start_job_timeout_detection(vdev);
}
void ivpu_job_done_thread_fini(struct ivpu_device *vdev)
void ivpu_job_done_consumer_init(struct ivpu_device *vdev)
{
kthread_stop_put(vdev->job_done_thread);
ivpu_ipc_consumer_add(vdev, &vdev->job_done_consumer,
VPU_IPC_CHAN_JOB_RET, ivpu_job_done_callback);
}
void ivpu_job_done_consumer_fini(struct ivpu_device *vdev)
{
ivpu_ipc_consumer_del(vdev, &vdev->job_done_consumer);
}

View File

@@ -59,8 +59,8 @@ int ivpu_submit_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
void ivpu_cmdq_release_all(struct ivpu_file_priv *file_priv);
void ivpu_cmdq_reset_all_contexts(struct ivpu_device *vdev);
int ivpu_job_done_thread_init(struct ivpu_device *vdev);
void ivpu_job_done_thread_fini(struct ivpu_device *vdev);
void ivpu_job_done_consumer_init(struct ivpu_device *vdev);
void ivpu_job_done_consumer_fini(struct ivpu_device *vdev);
void ivpu_jobs_abort_all(struct ivpu_device *vdev);

View File

@@ -4,6 +4,7 @@
*/
#include "ivpu_drv.h"
#include "ivpu_hw.h"
#include "ivpu_ipc.h"
#include "ivpu_jsm_msg.h"
@@ -36,6 +37,17 @@ const char *ivpu_jsm_msg_type_to_str(enum vpu_ipc_msg_type type)
IVPU_CASE_TO_STR(VPU_JSM_MSG_DESTROY_CMD_QUEUE);
IVPU_CASE_TO_STR(VPU_JSM_MSG_SET_CONTEXT_SCHED_PROPERTIES);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_REGISTER_DB);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_RESUME_CMDQ);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SUSPEND_CMDQ);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_RESUME_CMDQ_RSP);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SUSPEND_CMDQ_DONE);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SET_SCHEDULING_LOG);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SET_SCHEDULING_LOG_RSP);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_SCHEDULING_LOG_NOTIFICATION);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_ENGINE_RESUME);
IVPU_CASE_TO_STR(VPU_JSM_MSG_HWS_RESUME_ENGINE_DONE);
IVPU_CASE_TO_STR(VPU_JSM_MSG_STATE_DUMP);
IVPU_CASE_TO_STR(VPU_JSM_MSG_STATE_DUMP_RSP);
IVPU_CASE_TO_STR(VPU_JSM_MSG_BLOB_DEINIT);
IVPU_CASE_TO_STR(VPU_JSM_MSG_DYNDBG_CONTROL);
IVPU_CASE_TO_STR(VPU_JSM_MSG_JOB_DONE);
@@ -65,6 +77,12 @@ const char *ivpu_jsm_msg_type_to_str(enum vpu_ipc_msg_type type)
IVPU_CASE_TO_STR(VPU_JSM_MSG_SET_CONTEXT_SCHED_PROPERTIES_RSP);
IVPU_CASE_TO_STR(VPU_JSM_MSG_BLOB_DEINIT_DONE);
IVPU_CASE_TO_STR(VPU_JSM_MSG_DYNDBG_CONTROL_RSP);
IVPU_CASE_TO_STR(VPU_JSM_MSG_PWR_D0I3_ENTER);
IVPU_CASE_TO_STR(VPU_JSM_MSG_PWR_D0I3_ENTER_DONE);
IVPU_CASE_TO_STR(VPU_JSM_MSG_DCT_ENABLE);
IVPU_CASE_TO_STR(VPU_JSM_MSG_DCT_ENABLE_DONE);
IVPU_CASE_TO_STR(VPU_JSM_MSG_DCT_DISABLE);
IVPU_CASE_TO_STR(VPU_JSM_MSG_DCT_DISABLE_DONE);
}
#undef IVPU_CASE_TO_STR
@@ -243,3 +261,23 @@ int ivpu_jsm_context_release(struct ivpu_device *vdev, u32 host_ssid)
return ivpu_ipc_send_receive(vdev, &req, VPU_JSM_MSG_SSID_RELEASE_DONE, &resp,
VPU_IPC_CHAN_ASYNC_CMD, vdev->timeout.jsm);
}
int ivpu_jsm_pwr_d0i3_enter(struct ivpu_device *vdev)
{
struct vpu_jsm_msg req = { .type = VPU_JSM_MSG_PWR_D0I3_ENTER };
struct vpu_jsm_msg resp;
int ret;
if (IVPU_WA(disable_d0i3_msg))
return 0;
req.payload.pwr_d0i3_enter.send_response = 1;
ret = ivpu_ipc_send_receive_active(vdev, &req, VPU_JSM_MSG_PWR_D0I3_ENTER_DONE,
&resp, VPU_IPC_CHAN_GEN_CMD,
vdev->timeout.d0i3_entry_msg);
if (ret)
return ret;
return ivpu_hw_wait_for_idle(vdev);
}

View File

@@ -22,4 +22,5 @@ int ivpu_jsm_trace_get_capability(struct ivpu_device *vdev, u32 *trace_destinati
int ivpu_jsm_trace_set_config(struct ivpu_device *vdev, u32 trace_level, u32 trace_destination_mask,
u64 trace_hw_component_mask);
int ivpu_jsm_context_release(struct ivpu_device *vdev, u32 host_ssid);
int ivpu_jsm_pwr_d0i3_enter(struct ivpu_device *vdev);
#endif

View File

@@ -230,7 +230,12 @@
(REG_FLD(IVPU_MMU_REG_GERROR, MSI_PRIQ_ABT)) | \
(REG_FLD(IVPU_MMU_REG_GERROR, MSI_ABT)))
static char *ivpu_mmu_event_to_str(u32 cmd)
#define IVPU_MMU_CERROR_NONE 0x0
#define IVPU_MMU_CERROR_ILL 0x1
#define IVPU_MMU_CERROR_ABT 0x2
#define IVPU_MMU_CERROR_ATC_INV_SYNC 0x3
static const char *ivpu_mmu_event_to_str(u32 cmd)
{
switch (cmd) {
case IVPU_MMU_EVT_F_UUT:
@@ -276,6 +281,22 @@ static char *ivpu_mmu_event_to_str(u32 cmd)
}
}
static const char *ivpu_mmu_cmdq_err_to_str(u32 err)
{
switch (err) {
case IVPU_MMU_CERROR_NONE:
return "No CMDQ Error";
case IVPU_MMU_CERROR_ILL:
return "Illegal command";
case IVPU_MMU_CERROR_ABT:
return "External abort on CMDQ read";
case IVPU_MMU_CERROR_ATC_INV_SYNC:
return "Sync failed to complete ATS invalidation";
default:
return "Unknown CMDQ Error";
}
}
static void ivpu_mmu_config_check(struct ivpu_device *vdev)
{
u32 val_ref;
@@ -479,10 +500,7 @@ static int ivpu_mmu_cmdq_sync(struct ivpu_device *vdev)
u64 val;
int ret;
val = FIELD_PREP(IVPU_MMU_CMD_OPCODE, CMD_SYNC) |
FIELD_PREP(IVPU_MMU_CMD_SYNC_0_CS, 0x2) |
FIELD_PREP(IVPU_MMU_CMD_SYNC_0_MSH, 0x3) |
FIELD_PREP(IVPU_MMU_CMD_SYNC_0_MSI_ATTR, 0xf);
val = FIELD_PREP(IVPU_MMU_CMD_OPCODE, CMD_SYNC);
ret = ivpu_mmu_cmdq_cmd_write(vdev, "SYNC", val, 0);
if (ret)
@@ -492,8 +510,15 @@ static int ivpu_mmu_cmdq_sync(struct ivpu_device *vdev)
REGV_WR32(IVPU_MMU_REG_CMDQ_PROD, q->prod);
ret = ivpu_mmu_cmdq_wait_for_cons(vdev);
if (ret)
ivpu_err(vdev, "Timed out waiting for consumer: %d\n", ret);
if (ret) {
u32 err;
val = REGV_RD32(IVPU_MMU_REG_CMDQ_CONS);
err = REG_GET_FLD(IVPU_MMU_REG_CMDQ_CONS, ERR, val);
ivpu_err(vdev, "Timed out waiting for MMU consumer: %d, error: %s\n", ret,
ivpu_mmu_cmdq_err_to_str(err));
}
return ret;
}
@@ -750,9 +775,12 @@ int ivpu_mmu_init(struct ivpu_device *vdev)
ivpu_dbg(vdev, MMU, "Init..\n");
drmm_mutex_init(&vdev->drm, &mmu->lock);
ivpu_mmu_config_check(vdev);
ret = drmm_mutex_init(&vdev->drm, &mmu->lock);
if (ret)
return ret;
ret = ivpu_mmu_structs_alloc(vdev);
if (ret)
return ret;

View File

@@ -5,6 +5,9 @@
#include <linux/bitfield.h>
#include <linux/highmem.h>
#include <linux/set_memory.h>
#include <drm/drm_cache.h>
#include "ivpu_drv.h"
#include "ivpu_hw.h"
@@ -39,12 +42,57 @@
#define IVPU_MMU_ENTRY_MAPPED (IVPU_MMU_ENTRY_FLAG_AF | IVPU_MMU_ENTRY_FLAG_USER | \
IVPU_MMU_ENTRY_FLAG_NG | IVPU_MMU_ENTRY_VALID)
static void *ivpu_pgtable_alloc_page(struct ivpu_device *vdev, dma_addr_t *dma)
{
dma_addr_t dma_addr;
struct page *page;
void *cpu;
page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
if (!page)
return NULL;
set_pages_array_wc(&page, 1);
dma_addr = dma_map_page(vdev->drm.dev, page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
if (dma_mapping_error(vdev->drm.dev, dma_addr))
goto err_free_page;
cpu = vmap(&page, 1, VM_MAP, pgprot_writecombine(PAGE_KERNEL));
if (!cpu)
goto err_dma_unmap_page;
*dma = dma_addr;
return cpu;
err_dma_unmap_page:
dma_unmap_page(vdev->drm.dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
err_free_page:
put_page(page);
return NULL;
}
static void ivpu_pgtable_free_page(struct ivpu_device *vdev, u64 *cpu_addr, dma_addr_t dma_addr)
{
struct page *page;
if (cpu_addr) {
page = vmalloc_to_page(cpu_addr);
vunmap(cpu_addr);
dma_unmap_page(vdev->drm.dev, dma_addr & ~IVPU_MMU_ENTRY_FLAGS_MASK, PAGE_SIZE,
DMA_BIDIRECTIONAL);
set_pages_array_wb(&page, 1);
put_page(page);
}
}
static int ivpu_mmu_pgtable_init(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable)
{
dma_addr_t pgd_dma;
pgtable->pgd_dma_ptr = dma_alloc_coherent(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, &pgd_dma,
GFP_KERNEL);
pgtable->pgd_dma_ptr = ivpu_pgtable_alloc_page(vdev, &pgd_dma);
if (!pgtable->pgd_dma_ptr)
return -ENOMEM;
@@ -53,13 +101,6 @@ static int ivpu_mmu_pgtable_init(struct ivpu_device *vdev, struct ivpu_mmu_pgtab
return 0;
}
static void ivpu_mmu_pgtable_free(struct ivpu_device *vdev, u64 *cpu_addr, dma_addr_t dma_addr)
{
if (cpu_addr)
dma_free_coherent(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, cpu_addr,
dma_addr & ~IVPU_MMU_ENTRY_FLAGS_MASK);
}
static void ivpu_mmu_pgtables_free(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable)
{
int pgd_idx, pud_idx, pmd_idx;
@@ -84,19 +125,19 @@ static void ivpu_mmu_pgtables_free(struct ivpu_device *vdev, struct ivpu_mmu_pgt
pte_dma_ptr = pgtable->pte_ptrs[pgd_idx][pud_idx][pmd_idx];
pte_dma = pgtable->pmd_ptrs[pgd_idx][pud_idx][pmd_idx];
ivpu_mmu_pgtable_free(vdev, pte_dma_ptr, pte_dma);
ivpu_pgtable_free_page(vdev, pte_dma_ptr, pte_dma);
}
kfree(pgtable->pte_ptrs[pgd_idx][pud_idx]);
ivpu_mmu_pgtable_free(vdev, pmd_dma_ptr, pmd_dma);
ivpu_pgtable_free_page(vdev, pmd_dma_ptr, pmd_dma);
}
kfree(pgtable->pmd_ptrs[pgd_idx]);
kfree(pgtable->pte_ptrs[pgd_idx]);
ivpu_mmu_pgtable_free(vdev, pud_dma_ptr, pud_dma);
ivpu_pgtable_free_page(vdev, pud_dma_ptr, pud_dma);
}
ivpu_mmu_pgtable_free(vdev, pgtable->pgd_dma_ptr, pgtable->pgd_dma);
ivpu_pgtable_free_page(vdev, pgtable->pgd_dma_ptr, pgtable->pgd_dma);
}
static u64*
@@ -108,7 +149,7 @@ ivpu_mmu_ensure_pud(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable,
if (pud_dma_ptr)
return pud_dma_ptr;
pud_dma_ptr = dma_alloc_wc(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, &pud_dma, GFP_KERNEL);
pud_dma_ptr = ivpu_pgtable_alloc_page(vdev, &pud_dma);
if (!pud_dma_ptr)
return NULL;
@@ -131,7 +172,7 @@ err_free_pmd_ptrs:
kfree(pgtable->pmd_ptrs[pgd_idx]);
err_free_pud_dma_ptr:
ivpu_mmu_pgtable_free(vdev, pud_dma_ptr, pud_dma);
ivpu_pgtable_free_page(vdev, pud_dma_ptr, pud_dma);
return NULL;
}
@@ -145,7 +186,7 @@ ivpu_mmu_ensure_pmd(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable,
if (pmd_dma_ptr)
return pmd_dma_ptr;
pmd_dma_ptr = dma_alloc_wc(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, &pmd_dma, GFP_KERNEL);
pmd_dma_ptr = ivpu_pgtable_alloc_page(vdev, &pmd_dma);
if (!pmd_dma_ptr)
return NULL;
@@ -160,7 +201,7 @@ ivpu_mmu_ensure_pmd(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable,
return pmd_dma_ptr;
err_free_pmd_dma_ptr:
ivpu_mmu_pgtable_free(vdev, pmd_dma_ptr, pmd_dma);
ivpu_pgtable_free_page(vdev, pmd_dma_ptr, pmd_dma);
return NULL;
}
@@ -174,7 +215,7 @@ ivpu_mmu_ensure_pte(struct ivpu_device *vdev, struct ivpu_mmu_pgtable *pgtable,
if (pte_dma_ptr)
return pte_dma_ptr;
pte_dma_ptr = dma_alloc_wc(vdev->drm.dev, IVPU_MMU_PGTABLE_SIZE, &pte_dma, GFP_KERNEL);
pte_dma_ptr = ivpu_pgtable_alloc_page(vdev, &pte_dma);
if (!pte_dma_ptr)
return NULL;
@@ -249,38 +290,6 @@ static void ivpu_mmu_context_unmap_page(struct ivpu_mmu_context *ctx, u64 vpu_ad
ctx->pgtable.pte_ptrs[pgd_idx][pud_idx][pmd_idx][pte_idx] = IVPU_MMU_ENTRY_INVALID;
}
static void
ivpu_mmu_context_flush_page_tables(struct ivpu_mmu_context *ctx, u64 vpu_addr, size_t size)
{
struct ivpu_mmu_pgtable *pgtable = &ctx->pgtable;
u64 end_addr = vpu_addr + size;
/* Align to PMD entry (2 MB) */
vpu_addr &= ~(IVPU_MMU_PTE_MAP_SIZE - 1);
while (vpu_addr < end_addr) {
int pgd_idx = FIELD_GET(IVPU_MMU_PGD_INDEX_MASK, vpu_addr);
u64 pud_end = (pgd_idx + 1) * (u64)IVPU_MMU_PUD_MAP_SIZE;
while (vpu_addr < end_addr && vpu_addr < pud_end) {
int pud_idx = FIELD_GET(IVPU_MMU_PUD_INDEX_MASK, vpu_addr);
u64 pmd_end = (pud_idx + 1) * (u64)IVPU_MMU_PMD_MAP_SIZE;
while (vpu_addr < end_addr && vpu_addr < pmd_end) {
int pmd_idx = FIELD_GET(IVPU_MMU_PMD_INDEX_MASK, vpu_addr);
clflush_cache_range(pgtable->pte_ptrs[pgd_idx][pud_idx][pmd_idx],
IVPU_MMU_PGTABLE_SIZE);
vpu_addr += IVPU_MMU_PTE_MAP_SIZE;
}
clflush_cache_range(pgtable->pmd_ptrs[pgd_idx][pud_idx],
IVPU_MMU_PGTABLE_SIZE);
}
clflush_cache_range(pgtable->pud_ptrs[pgd_idx], IVPU_MMU_PGTABLE_SIZE);
}
clflush_cache_range(pgtable->pgd_dma_ptr, IVPU_MMU_PGTABLE_SIZE);
}
static int
ivpu_mmu_context_map_pages(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx,
u64 vpu_addr, dma_addr_t dma_addr, size_t size, u64 prot)
@@ -327,6 +336,9 @@ ivpu_mmu_context_map_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx,
u64 prot;
u64 i;
if (drm_WARN_ON(&vdev->drm, !ctx))
return -EINVAL;
if (!IS_ALIGNED(vpu_addr, IVPU_MMU_PAGE_SIZE))
return -EINVAL;
@@ -349,10 +361,11 @@ ivpu_mmu_context_map_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx,
mutex_unlock(&ctx->lock);
return ret;
}
ivpu_mmu_context_flush_page_tables(ctx, vpu_addr, size);
vpu_addr += size;
}
/* Ensure page table modifications are flushed from wc buffers to memory */
wmb();
mutex_unlock(&ctx->lock);
ret = ivpu_mmu_invalidate_tlb(vdev, ctx->id);
@@ -369,8 +382,8 @@ ivpu_mmu_context_unmap_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ct
int ret;
u64 i;
if (!IS_ALIGNED(vpu_addr, IVPU_MMU_PAGE_SIZE))
ivpu_warn(vdev, "Unaligned vpu_addr: 0x%llx\n", vpu_addr);
if (drm_WARN_ON(&vdev->drm, !ctx))
return;
mutex_lock(&ctx->lock);
@@ -378,10 +391,11 @@ ivpu_mmu_context_unmap_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ct
size_t size = sg_dma_len(sg) + sg->offset;
ivpu_mmu_context_unmap_pages(ctx, vpu_addr, size);
ivpu_mmu_context_flush_page_tables(ctx, vpu_addr, size);
vpu_addr += size;
}
/* Ensure page table modifications are flushed from wc buffers to memory */
wmb();
mutex_unlock(&ctx->lock);
ret = ivpu_mmu_invalidate_tlb(vdev, ctx->id);
@@ -390,28 +404,34 @@ ivpu_mmu_context_unmap_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ct
}
int
ivpu_mmu_context_insert_node_locked(struct ivpu_mmu_context *ctx,
const struct ivpu_addr_range *range,
u64 size, struct drm_mm_node *node)
ivpu_mmu_context_insert_node(struct ivpu_mmu_context *ctx, const struct ivpu_addr_range *range,
u64 size, struct drm_mm_node *node)
{
lockdep_assert_held(&ctx->lock);
int ret;
WARN_ON(!range);
mutex_lock(&ctx->lock);
if (!ivpu_disable_mmu_cont_pages && size >= IVPU_MMU_CONT_PAGES_SIZE) {
if (!drm_mm_insert_node_in_range(&ctx->mm, node, size, IVPU_MMU_CONT_PAGES_SIZE, 0,
range->start, range->end, DRM_MM_INSERT_BEST))
return 0;
ret = drm_mm_insert_node_in_range(&ctx->mm, node, size, IVPU_MMU_CONT_PAGES_SIZE, 0,
range->start, range->end, DRM_MM_INSERT_BEST);
if (!ret)
goto unlock;
}
return drm_mm_insert_node_in_range(&ctx->mm, node, size, IVPU_MMU_PAGE_SIZE, 0,
range->start, range->end, DRM_MM_INSERT_BEST);
ret = drm_mm_insert_node_in_range(&ctx->mm, node, size, IVPU_MMU_PAGE_SIZE, 0,
range->start, range->end, DRM_MM_INSERT_BEST);
unlock:
mutex_unlock(&ctx->lock);
return ret;
}
void
ivpu_mmu_context_remove_node_locked(struct ivpu_mmu_context *ctx, struct drm_mm_node *node)
ivpu_mmu_context_remove_node(struct ivpu_mmu_context *ctx, struct drm_mm_node *node)
{
lockdep_assert_held(&ctx->lock);
mutex_lock(&ctx->lock);
drm_mm_remove_node(node);
mutex_unlock(&ctx->lock);
}
static int
@@ -421,7 +441,6 @@ ivpu_mmu_context_init(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx, u3
int ret;
mutex_init(&ctx->lock);
INIT_LIST_HEAD(&ctx->bo_list);
ret = ivpu_mmu_pgtable_init(vdev, &ctx->pgtable);
if (ret) {

Some files were not shown because too many files have changed in this diff Show More