Userptr should not need the kernel for a userspace memcpy, userspace
needs to call memcpy directly.
Specifically, disable i915_gem_pwrite_ioctl() and i915_gem_pread_ioctl().
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-12-maarten.lankhorst@linux.intel.com
The rationale for this change is roughly as follows:
1. The functionality can be done entirely in userspace with a
combination of mmap + memcpy
2. The only reason anyone in userspace is still using it is because
someone implemented bo_subdata that way in libdrm ages ago and
they're all too lazy to write the 5 lines of code to do a map.
3. This falls cleanly into the category of things which will only get
more painful with local memory support.
These ioctls aren't used much anymore by "real" userspace drivers.
Vulkan has never used them and neither has the iris GL driver. The old
i965 GL driver does use PWRITE for glBufferSubData but it only supports
up through Gen11; Gen12 was never enabled in i965. The compute driver
has never used PREAD/PWRITE. The only remaining user is the media
driver which uses it exactly twice and they're easily removed [1] so
expecting them to drop it going forward is reasonable.
IGT changes which handle this kernel change have also been submitted [2].
[1] https://github.com/intel/media-driver/pull/1160
[2] https://patchwork.freedesktop.org/series/81384/
v2 (Jason Ekstrand):
- Improved commit message with the status of all usermode drivers
- A more future-proof platform check
v3 (Jason Ekstrand):
- Drop the HAS_LMEM checks as they're already covered by the version
checks
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210317234014.2271006-4-jason@jlekstrand.net
Push the hibernate pm routines next to the suspend pm routines in
gem/i915_gem_pm.c. This has the side-effect of putting the wbinvd()
abusers next to each other.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 30d2bfd093 ("drm/i915/gem: Almagamate clflushes on freeze")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123145543.10533-1-chris@chris-wilson.co.uk
(cherry picked from commit 6d8f02207420e76db693a00ccb44792474e297fc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Give obj->mm.quirked a name much more reflective of its purpose
(i915_gem_object_has_tiling_quirk) and move it from the obj->mm field as
it doesn't denote a quirk of the backing store, but a quirk in the
object in its treatment of the backing pages, similar to tiling modes.
Then instead of abusing the pinned status of the buffer to protect it
from the shrinker, we can instead hide the buffer from the shrinker so
it is never considered for being swapped.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210119214336.1463-4-chris@chris-wilson.co.uk
When flushing objects larger than the CPU cache it is preferrable to use
a single wbinvd() rather than overlapping clflush(). At runtime, we
avoid wbinvd() due to its system-wide latencies, but during
singlethreaded suspend, no one will observe the imposed latency and we
can opt for the faster wbinvd to clear all objects in a single hit.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210119214336.1463-2-chris@chris-wilson.co.uk
In preparation for gem_create_ext break out the gem_create uAPI, so that
we don't clutter i915_gem.c once we start adding various extensions
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114182402.840247-1-matthew.auld@intel.com
The free_list and worker was introduced in commit 5f09a9c8ab ("drm/i915:
Allow contexts to be unreferenced locklessly"), but subsequently made
redundant by the removal of the last sleeping lock in commit 2935ed5339
("drm/i915: Remove logical HW ID"). As we can now free the GEM context
immediately from any context, remove the deferral of the free_list
v2: Lift removing the context from the global list into close().
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201215152138.8158-1-chris@chris-wilson.co.uk
Move the specialised interactions with the physical GEM object from the
pread/pwrite ioctl handler into the phys backend.
Currently, if one is able to exhaust the entire aperture and then try to
pwrite into an object not backed by struct page, we accidentally invoked
the phys pwrite handler on a non-phys object; calamitous.
Fixes: c6790dc223 ("drm/i915: Wean off drm_pci_alloc/drm_pci_free")
Testcase: igt/gem_pwrite/exhaustion
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-2-chris@chris-wilson.co.uk
As there are more and more complicated interactions between the different
backing stores and userspace, push the control into the backends rather
than accumulate them all inside the ioctl handlers.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-1-chris@chris-wilson.co.uk
As a preparation step for full object locking and wait/wound handling
during pin and object mapping, ensure that we always pass the ww context
in i915_gem_execbuffer.c to i915_vma_pin, use lockdep to ensure this
happens.
This also requires changing the order of eb_parse slightly, to ensure
we pass ww at a point where we could still handle -EDEADLK safely.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-15-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Now that we changed execbuf submission slightly to allow us to do all
pinning in one place, we can now simply add ww versions on top of
struct_mutex. All we have to do is a separate path for -EDEADLK
handling, which needs to unpin all gem bo's before dropping the lock,
then starting over.
This finally allows us to do parallel submission, but because not
all of the pinning code uses the ww ctx yet, we cannot completely
drop struct_mutex yet.
Changes since v1:
- Keep struct_mutex for now. :(
Changes since v2:
- Make sure we always lock the ww context in slowpath.
Changes since v3:
- Don't call __eb_unreserve_vma in eb_move_to_gpu now; this can be
done on normal unlock path.
- Unconditionally release vmas and context.
Changes since v4:
- Rebased on top of struct_mutex reduction.
Changes since v5:
- Remove training wheels.
Changes since v6:
- Fix accidentally broken -ENOSPC handling.
Changes since v7:
- Handle gt buffer pool better.
Changes since v8:
- Properly clear variables, to make -EDEADLK handling not BUG.
Change since v9:
- Fix unpinning fence on pnv and below.
Changes since v10:
- Make relocation gpu chaining working again.
Changes since v11:
- Remove relocation chaining, pain to make it work.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-9-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Execbuffer submission will perform its own WW locking, and we
cannot rely on the implicit lock there.
This also makes it clear that the GVT code will get a lockdep splat when
multiple batchbuffer shadows need to be performed in the same instance,
fix that up.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-7-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory
eviction. We don't use it yet, but lets start adding the definition
first.
To use it, we have to pass a non-NULL ww to gem_object_lock, and don't
unlock directly. It is done in i915_gem_ww_ctx_fini.
Changes since v1:
- Change ww_ctx and obj order in locking functions (Jonas Lahtinen)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-6-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
I915_GEM_THROTTLE dates back to the time before contexts where there was
just a single engine, and therefore a single timeline and request list
globally. That request list was in execution/retirement order, and so
walking it to find a particular aged request made sense and could be
split per file.
That is no more. We now have many timelines with a file, as many as the
user wants to construct (essentially per-engine, per-context). Each of
those run independently and so make the single list futile. Remove the
disordered list, and iterate over all the timelines to find a request to
wait on in each to satisfy the criteria that the CPU is no more than 20ms
ahead of its oldest request.
It should go without saying that the I915_GEM_THROTTLE ioctl is no
longer used as the primary means of throttling, so it makes sense to push
the complication into the ioctl where it only impacts upon its few
irregular users, rather than the execbuf/retire where everybody has to
pay the cost. Fortunately, the few users do not create vast amount of
contexts, so the loops over contexts/engines should be concise.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
If we find ourselves trying to reuse a misplaced but active vma, we
currently try to discard it to avoid having to wait to unbind it
(upsetting the current user fo the vma). An alternative to marking it as
a dicarded vma and keeping it in both the obj->vma.list and
obj->vma.tree, is to simply remove it from the lookup rbtree.
While it remains in the list of vma, it will be unbound under eviction
pressure and freed along with the object. We will never reuse it again
for new instances. As before, with no pruning, the list may continually
grow, but eventually we will have the most constrained version of the
ggtt view that meets all requirements -- so the list of vma should not
grow without bound.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2012
Fixes: 9bdcaa5e3a ("drm/i915: Discard a misplaced GGTT vma")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200611180421.23262-1-chris@chris-wilson.co.uk
As a last minute addition, I added an assertion to make sure that the
new i915_vma view would be equal to the discard. However, the positive
encouragement from CI only goes to show that we rarely take this path,
and it wasn't until the post-merge run did we hit the assert -- because
it compared the wrong view. Fixup the copy'n'paste error and compare
against both the old view and the expected new view.
Fixes: 9bdcaa5e3a ("drm/i915: Discard a misplaced GGTT vma")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200605184844.24644-1-chris@chris-wilson.co.uk
Across the many users of the GGTT vma (internal objects, mmapings,
display etc), we may end up with conflicting requirements for the
placement. Currently, we try to resolve the conflict by unbinding the
vma and rebinding it to match the new constraints; over time we will end
up with a GGTT that matches the most strict constraints over all
concurrent users. However, this causes a problem if the vma is currently
in use as we must wait until it is idle before moving it. But there is
no restriction on the number of views we may use (apart from the limited
size of the GGTT itself), and so if the active vma does not meet our
requirements, try and build a new one!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200605165258.1483-1-chris@chris-wilson.co.uk
We cached the number of vma bound to the object in order to speed up
shrinker decisions. This has been superseded by being more proactive in
removing objects we cannot shrink from the shrinker lists, and so we can
drop the clumsy attempt at atomically counting the bind count and
comparing it to the number of pinned mappings of the object. This will
only get more clumsier with asynchronous binding and unbinding.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200401223924.16667-1-chris@chris-wilson.co.uk
If we must revoke the fence because the VMA is no longer present, or
because the fence no longer applies, ensure that we do and convert it
into an error if we try but cannot.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200401210104.15907-3-chris@chris-wilson.co.uk
The #include has been splattered all over the place, but there are
precious few places, all .c files, that actually need it.
v2: remove leftover double newlines
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200225133131.3301-1-jani.nikula@intel.com
Commit 4f2a572eda ("drm/i915/userptr: Never allow userptr into the
mappable GGTT") made I915_GEM_MMAP_GTT IOCTLs to fail when attempted
on a userptr object in order to protect from a lockdep splat. Later
on, new mapping types were introduced by commit cc662126b4
("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET"). Those new mapping
types suffer from the same lockdep splat issue but they now succeed
when tried on top of a userptr object. Fix it.
v2: Don't play with the -ENODEV driver response (Chris)
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200204162302.1299516-1-chris@chris-wilson.co.uk
drm_pci_alloc and drm_pci_free are just very thin wrappers around
dma_alloc_coherent, with a note that we should be removing them.
Furthermore since
commit de09d31dd3
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date: Fri Jan 15 16:51:42 2016 -0800
page-flags: define PG_reserved behavior on compound pages
As far as I can see there's no users of PG_reserved on compound pages.
Let's use PF_NO_COMPOUND here.
drm_pci_alloc has been declared broken since it mixes GFP_COMP and
SetPageReserved. Avoid this conflict by weaning ourselves off using the
abstraction and using the dma functions directly.
Reported-by: Taketo Kabe
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1027
Fixes: de09d31dd3 ("page-flags: define PG_reserved behavior on compound pages")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v4.5+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200202153934.3899472-1-chris@chris-wilson.co.uk
On Braswell and Broxton (also known as Valleyview and Apollolake), we
need to serialise updates of the GGTT using the big stop_machine()
hammer. This has the side effect of appearing to lockdep as a possible
reclaim (since it uses the cpuhp mutex and that is tainted by per-cpu
allocations). However, we want to use vm->mutex (including ggtt->mutex)
from within the shrinker and so must avoid such possible taints. For this
purpose, we introduced the asynchronous vma binding and we can apply it
to the PIN_GLOBAL so long as take care to add the necessary waits for
the worker afterwards.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/211
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-3-chris@chris-wilson.co.uk
The i915_ggtt now sits beneath gt/ outside of the auspices of gem/ and
should be given a fresh name to reflect that. We also want to give it a
name that reflects its role in the system suspend/resume, with the
intention of pulling together all the GGTT operations (e.g. restoring
the fence registers once they are pulled under gt/intel_ggtt_detiler.c)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Rreviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-2-chris@chris-wilson.co.uk
When LMEM is supported, dumb buffer preferred to be created from LMEM.
v2:
Parameters are reshuffled. [Chris]
v3:
s/region_id/mem_type
v4:
use the i915_gem_object_create_region [chris]
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200104191043.2207314-2-chris@chris-wilson.co.uk
Start introducing a kref on i915_vma in order to protect the vma unbind
(i915_gem_object_unbind) from a parallel destruction (i915_vma_parked).
Later, we will use the refcount to manage all access and turn i915_vma
into a first class container.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191222210256.2066451-2-chris@chris-wilson.co.uk
Begin pulling the GT setup underneath a single GT umbrella; let intel_gt
take ownership of its engines! As hinted, the complication is the
lifetime of the probed engine versus the active lifetime of the GT
backends. We need to detect the engine layout early and keep it until
the end so that we can sanitize state on takeover and release.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191222120752.1368352-1-chris@chris-wilson.co.uk
As the GEM global context setup is now independent of the GT state
(although GT does currently still depend upon the global
i915->kernel_context), we can move its init earlier, leaving the gt init
ready to be extracted.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221200109.1202310-1-chris@chris-wilson.co.uk
Allocate only an internal intel_context for the kernel_context, forgoing
a global GEM context for internal use as we only require a separate
address space (for our own protection).
Now having weaned GT from requiring ce->gem_context, we can stop
referencing it entirely. This also means we no longer have to create random
and unnecessary GEM contexts for internal use.
GEM contexts are now entirely for tracking GEM clients, and intel_context
the execution environment on the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk
Keep the intel_context as being the primary state for i915_request, with
the GEM context a backpointer from the low level state for the rarer
cases we need client information. Our goal is to remove such references
to clients from the backend, and leave the HW submission agnostic to
client interfaces and self-contained.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk
Since obj->frontbuffer is no longer protected by the struct_mutex, as we
are processing the execbuf, it may be removed. Mark the
intel_frontbuffer as rcu protected, and so acquire a reference to
the struct as we track activity upon it.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/827
Fixes: 8e7cb1799b ("drm/i915: Extract intel_frontbuffer active tracking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191218104043.3539458-1-chris@chris-wilson.co.uk
Before we signal the fence to indicate completion, ensure the pwrite
through the indirect GGTT is coherent (as best as we know) in memory.
Any listeners to the fence may start immediately and sample from the
backing store prior to the writes being posted, thus seeing stale data.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191206105527.1130413-1-chris@chris-wilson.co.uk
If we cannot handle a vma within the unbind loop, try to flush the
pending events (i915_vma_parked, i915_vm_release) and try again. This
avoids a round trip to userspace that is not guaranteed to make forward
progress, as the events we wait upon require being idle.
References: cb6c3d45f9 ("drm/i915/gem: Avoid parking the vma as we unbind")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204123556.3740002-1-chris@chris-wilson.co.uk
This is really just an alias of mmap_gtt. The 'mmap offset' nomenclature
comes from the value returned by this ioctl which is the offset into the
device fd which userpace uses with mmap(2).
mmap_gtt was our initial mmap_offset implementation, this extends
our CPU mmap support to allow additional fault handlers that depends on
the object's backing pages.
Note that we multiplex mmap_gtt and mmap_offset through the same ioctl,
and use the zero extending behaviour of drm to differentiate between
them, when we inspect the flags.
To support multiple mmap types on an object we need to support multiple
mmap_offsets for an object (each offset in the global device address
space corresponding to a unique instance of the object for a file + mmap
type). As we drop the simplified drm core idea of a single mmap_offset,
we need to provide replacement hooks for the dumb mmap interface as
well.
Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1675
Testcase: igt/gem_mmap_offset
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204120032.3682839-1-chris@chris-wilson.co.uk
In order to avoid keeping a reference on the i915_vma (which is long
overdue!) we have to coordinate all the possible lifetimes and only use
the vma while we know it is alive. In this episode, we are reminded that
while idle, the closed vma are destroyed. So if the GT idles while we are
working with the vma, the vma itself becomes invalid.
First class i915_vma here we come, but in the meantime keep piling on
the straw.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203155032.3137263-1-chris@chris-wilson.co.uk
Since retirement may be running in a worker on another CPU, it may be
skipped in the local intel_gt_wait_for_idle(). To ensure the state is
consistent for our sanity checks upon load, serialise with the remote
retirer by waiting on the timeline->mutex.
Outside of this use case, e.g. on suspend or module unload, we expect the
slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to
put the special case serialisation with retirement in its single user,
for now at least.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-2-chris@chris-wilson.co.uk
Backmerge to get dfce90259d ("Backmerge i915 security patches from
commit 'ea0b163b13ff' into drm-next") and thus 100d46bd72 ("Merge
Intel Gen8/Gen9 graphics fixes from Jon Bloomfield.").
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
This backmerges the branch that ended up in Linus' tree. It removes
all the changes for the rc6 patches from Linus' tree in favour of
a patch that is based on a large refactor that occured.
Otherwise it all looks good.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For Gen7, the original cmdparser motive was to permit limited
use of register read/write instructions in unprivileged BB's.
This worked by copying the user supplied bb to a kmd owned
bb, and running it in secure mode, from the ggtt, only if
the scanner finds no unsafe commands or registers.
For Gen8+ we can't use this same technique because running bb's
from the ggtt also disables access to ppgtt space. But we also
do not actually require 'secure' execution since we are only
trying to reduce the available command/register set. Instead we
will copy the user buffer to a kmd owned read-only bb in ppgtt,
and run in the usual non-secure mode.
Note that ro pages are only supported by ppgtt (not ggtt), but
luckily that's exactly what we need.
Add the required paths to map the shadow buffer to ppgtt ro for Gen8+
v2: IS_GEN7/IS_GEN (Mika)
v3: rebase
v4: rebase
v5: rebase
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
Assume all responsibility for operating on the HW to sanitize the GT
state upon load/resume in intel_gt_sanitize() itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101141009.15581-1-chris@chris-wilson.co.uk
(cherry picked from commit 797a615357)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Our timelines are currently contained within an intel_gt, and we only
need to perform list/spinlock initialisation, so we can pull the
intel_timelines_init() into our intel_gt_init_early().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101130406.4142-1-chris@chris-wilson.co.uk
Commit 50d84418f5 ("drm/i915: Add i915 to i915_inject_probe_failure")
introduced new functions unfortunately named incompatibly with rules
established by commit f2db53f14d ("drm/i915: Replace "_load" with
"_probe" consequently"). Fix it for consistency.
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Michał Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191029102036.6326-2-janusz.krzysztofik@linux.intel.com
i915_irq.c is large. One reason for this is that has a large chunk of
the GT render power management stashed away in it. Extract that logic
out of i915_irq.c and intel_pm.c and put it under one roof.
Based on a patch by Chris Wilson.
Signed-off-by: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024211642.7688-1-chris@chris-wilson.co.uk
Again we wish to operate on the engines, which are owned by the
intel_gt. As such it is easier, and much more consistent, to pass the
intel_gt parameter.
v2: Unexport i915_gem_load_power_context()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191022141935.15733-1-chris@chris-wilson.co.uk
With the last user, i915_vma_parked(), retired, there are no more users
of the per-gt pm notifications and we can remove the unused
infrastructure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191021183236.21790-2-chris@chris-wilson.co.uk
Convert shmem to an intel_memory_region.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191018090751.28295-2-matthew.auld@intel.com
Now that i915_ggtt knows everything about its own paths to perform mmio,
we can use that as our primary backpointer for individual fence
registers. This reduces the amount of pointer dancing we have to perform
on the common paths, but more importantly finishes our fence register
encapsulation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191016143234.4075-1-chris@chris-wilson.co.uk
Now that we record the default "goldenstate" context, we do not need to
emit the mocs registers at the start of each context and can simply do
mmio before the first context and capture the registers as part of its
default image. As a consequence, this means that we repeat the mmio
after each engine reset, fixing up any platform and registers that were
zapped by the reset (for those platforms with global not context-saved
settings).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111723
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111645
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Reviewed-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191016090749.7092-1-chris@chris-wilson.co.uk
UAPI Changes:
-Colorspace: Expose different prop values for DP vs. HDMI (Gwan-gyeong Mun)
-fourcc: Add DRM_FORMAT_MOD_ARM_16X16_BLOCK_U_INTERLEAVED (Raymond)
-not_actually: s/ENOTSUPP/EOPNOTSUPP/ in drm_edid and drm_mipi_dbi. This should
not reach userspace, but adding here to specifically call that out (Daniel)
-i810: Prevent underflow in dispatch ioctls (Dan)
-komeda: Add ACLK sysfs attribute (Mihail)
-v3d: Allow userspace to clean up after render jobs (Iago)
Cross-subsystem Changes:
-MAINTAINERS:
-Add Alyssa & Steven as panfrost reviewers (Rob)
-Add Jernej as DE2 reviewer (Maxime)
-Add Chen-Yu as Allwinner maintainer (Maxime)
-staging: Make some stack arrays static const (Colin)
Core Changes:
-ttm: Allow drivers to specify their vma manager (to use gem mgr) (Gerd)
-docs: Various fixes in connector/encoder/bridge docs (Daniel, Lyude, Laurent)
-connector: Allow more than 3 possible encoders for a connector (José)
-dp_cec: Allow a connector to be associated with a cec device (Dariusz)
-various: Fix some compile/sparse warnings (Ville)
-mm: Ensure mm node removals are properly serialised (Chris)
-panel: Specify the type of panel for drm_panels for later use (Laurent)
-panel: Use drm_panel_init to init device and funcs (Laurent)
-mst: Refactors and cleanups in anticipation of suspend/resume support (Lyude)
-vram:
-Add lazy unmapping for gem bo's (Thomas)
-Unify and rationalize vram mm and gem vram (Thomas)
-Expose vmap and vunmap for gem vram objects (Thomas)
-Allow objects to be pinned at the top of vram to avoid fragmentation (Thomas)
Driver Changes:
-various: Include drm_bridge.h instead of relying on drm_crtc.h (Boris)
-ast/mgag200: Refactor show_cursor(), move cursor to top of video mem (Thomas)
-komeda:
-Add error event printing (behind CONFIG) and reg dump support (Lowry)
-Add suspend/resume support (Lowry)
-Workaround D71 shadow registers not flushing on disable (Lowry)
-meson: Add suspend/resume support (Neil)
-omap: Miscellaneous refactors and improvements (Tomi/Jyri)
-panfrost/shmem: Silence lockdep by using mutex_trylock (Rob)
-panfrost: Miscellaneous small fixes (Rob/Steven)
-sti: Fix warnings (Benjamin/Linus)
-sun4i:
-Add vcc-dsi regulator to sun6i_mipi_dsi (Jagan)
-A few patches to figure out the DRQ/start delay calc on dsi (Jagan/Icenowy)
-virtio:
-Add module param to switch resource reuse workaround on/off (Gerd)
-Avoid calling vmexit while holding spinlock (Gerd)
-Use gem shmem helpers instead of ttm (Gerd)
-Accommodate command buffer allocations too big for cma (David)
Cc: Rob Herring <robh@kernel.org>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Dariusz Marcinkiewicz <darekm@google.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Raymond Smith <raymond.smith@arm.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mihail Atanassov <Mihail.Atanassov@arm.com>
Cc: Lowry Li <Lowry.Li@arm.com>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Jyri Sarha <jsarha@ti.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Benjamin Gaignard <benjamin.gaignard@st.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Jagan Teki <jagan@amarulasolutions.com>
Cc: Icenowy Zheng <icenowy@aosc.io>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: David Riley <davidriley@chromium.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEHF6rntfJ3enn8gh8cywAJXLcr3kFAl2d9h8ACgkQcywAJXLc
r3ms5gf9HIFpqwJ16CqaRukSnpcBcDoYUM8DGrOic+vw2bw14BQwFqvEOqrCkKL4
V6h/OCJlNFPtOcc1LvU/jeXxYf4AQWh/2qZeg+oee7HAGX5x8Y3f08GsEjO8+55t
QvSVxCKVti04M1ErPRfKrM7KPVE+IC+KdY26nO8Bf5zDGeCAkiPIDrdh2aZGMRdC
Eer0DJ96cgWW9LrhseCdj5nKwcR78DlbWa79zuPAss4LaBBbXqThNXYYzg/mZMKB
+VYgzs48tGYKK1NXXJ6biVI3brHrM52bqv5JpIncD5HepF1oIartWOMnbAO7MAqh
h/tgJWxL+4bnl9aqY87by1BtyVgl3w==
=kaOE
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2019-10-09-2' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 5.5:
UAPI Changes:
-Colorspace: Expose different prop values for DP vs. HDMI (Gwan-gyeong Mun)
-fourcc: Add DRM_FORMAT_MOD_ARM_16X16_BLOCK_U_INTERLEAVED (Raymond)
-not_actually: s/ENOTSUPP/EOPNOTSUPP/ in drm_edid and drm_mipi_dbi. This should
not reach userspace, but adding here to specifically call that out (Daniel)
-i810: Prevent underflow in dispatch ioctls (Dan)
-komeda: Add ACLK sysfs attribute (Mihail)
-v3d: Allow userspace to clean up after render jobs (Iago)
Cross-subsystem Changes:
-MAINTAINERS:
-Add Alyssa & Steven as panfrost reviewers (Rob)
-Add Jernej as DE2 reviewer (Maxime)
-Add Chen-Yu as Allwinner maintainer (Maxime)
-staging: Make some stack arrays static const (Colin)
Core Changes:
-ttm: Allow drivers to specify their vma manager (to use gem mgr) (Gerd)
-docs: Various fixes in connector/encoder/bridge docs (Daniel, Lyude, Laurent)
-connector: Allow more than 3 possible encoders for a connector (José)
-dp_cec: Allow a connector to be associated with a cec device (Dariusz)
-various: Fix some compile/sparse warnings (Ville)
-mm: Ensure mm node removals are properly serialised (Chris)
-panel: Specify the type of panel for drm_panels for later use (Laurent)
-panel: Use drm_panel_init to init device and funcs (Laurent)
-mst: Refactors and cleanups in anticipation of suspend/resume support (Lyude)
-vram:
-Add lazy unmapping for gem bo's (Thomas)
-Unify and rationalize vram mm and gem vram (Thomas)
-Expose vmap and vunmap for gem vram objects (Thomas)
-Allow objects to be pinned at the top of vram to avoid fragmentation (Thomas)
Driver Changes:
-various: Include drm_bridge.h instead of relying on drm_crtc.h (Boris)
-ast/mgag200: Refactor show_cursor(), move cursor to top of video mem (Thomas)
-komeda:
-Add error event printing (behind CONFIG) and reg dump support (Lowry)
-Add suspend/resume support (Lowry)
-Workaround D71 shadow registers not flushing on disable (Lowry)
-meson: Add suspend/resume support (Neil)
-omap: Miscellaneous refactors and improvements (Tomi/Jyri)
-panfrost/shmem: Silence lockdep by using mutex_trylock (Rob)
-panfrost: Miscellaneous small fixes (Rob/Steven)
-sti: Fix warnings (Benjamin/Linus)
-sun4i:
-Add vcc-dsi regulator to sun6i_mipi_dsi (Jagan)
-A few patches to figure out the DRQ/start delay calc on dsi (Jagan/Icenowy)
-virtio:
-Add module param to switch resource reuse workaround on/off (Gerd)
-Avoid calling vmexit while holding spinlock (Gerd)
-Use gem shmem helpers instead of ttm (Gerd)
-Accommodate command buffer allocations too big for cma (David)
Cc: Rob Herring <robh@kernel.org>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Dariusz Marcinkiewicz <darekm@google.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Raymond Smith <raymond.smith@arm.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mihail Atanassov <Mihail.Atanassov@arm.com>
Cc: Lowry Li <Lowry.Li@arm.com>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Jyri Sarha <jsarha@ti.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Benjamin Gaignard <benjamin.gaignard@st.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Jagan Teki <jagan@amarulasolutions.com>
Cc: Icenowy Zheng <icenowy@aosc.io>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: David Riley <davidriley@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
# gpg: Signature made Thu 10 Oct 2019 01:00:47 AM AEST
# gpg: using RSA key 732C002572DCAF79
# gpg: Can't check signature: public key not found
# Conflicts:
# drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
# drivers/gpu/drm/i915/i915_drv.c
# drivers/gpu/drm/i915/i915_gem.c
# drivers/gpu/drm/i915/i915_gem_gtt.c
# drivers/gpu/drm/i915/i915_vma.c
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009150825.GA227673@art_vandelay
Keep track of the GEM contexts underneath i915->gem.contexts and assign
them their own lock for the purposes of list management.
v2: Focus on lock tracking; ctx->vm is protected by ctx->mutex
v3: Correct split with removal of logical HW ID
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-15-chris@chris-wilson.co.uk
wait_for_timelines is essentially the same loop as retiring requests
(with an extra timeout), so merge the two into one routine.
v2: i915_retire_requests_timeout and keep VT'd w/a as !interruptible
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-10-chris@chris-wilson.co.uk
We don't need to hold struct_mutex now for retiring requests, so drop it
from i915_retire_requests() and i915_gem_wait_for_idle(), finally
removing I915_WAIT_LOCKED for good.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-8-chris@chris-wilson.co.uk
Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.
This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).
The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.
v2: Add some comments that barely elucidate anything :(
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
A straightforward conversion of assignment and checking of the boolean
state flags (allocated, scanned) into non-atomic bitops. The caller
remains responsible for all locking around the drm_mm and its nodes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-4-chris@chris-wilson.co.uk
We're currently using scratch presence as a way of identifying that we
entered wedged state at driver initialization time.
Let's use a separate flag rather than rely on scratch.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190926133142.2838-1-chris@chris-wilson.co.uk
Code in i915_gem_init_hw is all about GT init so move it to intel_gt.c
renaming to intel_gt_init_hw.
Existing intel_gt_init_hw is renamed to intel_gt_init_hw_early since it
is currently called from driver probe.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190910143823.10686-2-tvrtko.ursulin@linux.intel.com
Refactor the GT power management interface to work through the GT now
that it is under the control of gt/
Based on a patch by Chris Wilson.
Signed-off-by: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190905111403.10071-1-andi.shyti@intel.com
Our fence management is lazy, very lazy. If the user marks an object as
untiled, we do not immediately flush the fence but merely mark it as
dirty. On the next use we have to remember to check and remove the fence,
by which time we hope it is idle and we do not have to wait.
v2: Throw away the old fence on the next ggtt_pin.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111468
Fixes: 1f7fd484ff ("drm/i915: Replace i915_vma_put_fence()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823153944.20630-1-chris@chris-wilson.co.uk
(cherry picked from commit 636e83f2f2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Sadly lockdep records when the irqs are re-enabled and then marks up the
fake lock as being irq-unsafe. Our hand is forced and so we must mark up
the entire fake lock critical section as irq-off.
Hopefully this is the last tweak required!
v2: Not quite, we need to mark the timeline spinlock as irqsafe. That
was a genuine bug being hidden by the earlier lockdep splat.
Fixes: d67739268c ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk
(cherry picked from commit 6dcb85a0ad)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Our fence management is lazy, very lazy. If the user marks an object as
untiled, we do not immediately flush the fence but merely mark it as
dirty. On the next use we have to remember to check and remove the fence,
by which time we hope it is idle and we do not have to wait.
v2: Throw away the old fence on the next ggtt_pin.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111468
Fixes: 1f7fd484ff ("drm/i915: Replace i915_vma_put_fence()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823153944.20630-1-chris@chris-wilson.co.uk
Sadly lockdep records when the irqs are re-enabled and then marks up the
fake lock as being irq-unsafe. Our hand is forced and so we must mark up
the entire fake lock critical section as irq-off.
Hopefully this is the last tweak required!
v2: Not quite, we need to mark the timeline spinlock as irqsafe. That
was a genuine bug being hidden by the earlier lockdep splat.
Fixes: d67739268c ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk
Avoid calling i915_vma_put_fence() by using our alternate paths that
bind a secondary vma avoiding the original fenced vma. For the few
instances where we need to release the fence (i.e. on binding when the
GGTT range becomes invalid), replace the put_fence with a revoke_fence.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190822061557.18402-1-chris@chris-wilson.co.uk
We need the rename of reservation_object to dma_resv.
The solution on this merge came from linux-next:
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 14 Aug 2019 12:48:39 +1000
Subject: [PATCH] drm: fix up fallout from "dma-buf: rename reservation_object to dma_resv"
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
drivers/gpu/drm/i915/gt/intel_engine_pool.c | 8 ++++----
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pool.c b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
index 03d90b49584a..4cd54c569911 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pool.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
@@ -43,12 +43,12 @@ static int pool_active(struct i915_active *ref)
{
struct intel_engine_pool_node *node =
container_of(ref, typeof(*node), active);
- struct reservation_object *resv = node->obj->base.resv;
+ struct dma_resv *resv = node->obj->base.resv;
int err;
- if (reservation_object_trylock(resv)) {
- reservation_object_add_excl_fence(resv, NULL);
- reservation_object_unlock(resv);
+ if (dma_resv_trylock(resv)) {
+ dma_resv_add_excl_fence(resv, NULL);
+ dma_resv_unlock(resv);
}
err = i915_gem_object_pin_pages(node->obj);
which is a simplified version from a previous one which had:
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When under severe stress for GTT mappable space, the LRU eviction model
falls off a cliff. We spend all our time scanning the much larger
non-mappable area searching for something within the mappable zone we can
evict. Turn this on its head by only using the full vma for the object if
it is already pinned in the mappable zone or there is sufficient *free*
space to accommodate it (prioritizing speedy reuse). If there is not,
immediately fall back to using small chunks (tilerow for GTT mmap, single
pages for pwrite/relocation) and using random eviction before doing a full
search.
Testcase: igt/gem_concurrent_blt
References: https://bugs.freedesktop.org/show_bug.cgi?id=110848
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190821123234.19194-1-chris@chris-wilson.co.uk
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- dma-buf: add reservation_object_fences helper, relax
reservation_object_add_shared_fence, remove
reservation_object seq number (and then
restored)
- dma-fence: Shrinkage of the dma_fence structure,
Merge dma_fence_signal and dma_fence_signal_locked,
Store the timestamp in struct dma_fence in a union with
cb_list
Driver Changes:
- More dt-bindings YAML conversions
- More removal of drmP.h includes
- dw-hdmi: Support get_eld and various i2s improvements
- gm12u320: Few fixes
- meson: Global cleanup
- panfrost: Few refactors, Support for GPU heap allocations
- sun4i: Support for DDC enable GPIO
- New panels: TI nspire, NEC NL8048HL11, LG Philips LB035Q02,
Sharp LS037V7DW01, Sony ACX565AKM, Toppoly TD028TTEC1
Toppoly TD043MTEA1
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXVqvpwAKCRDj7w1vZxhR
xa3RAQDzAnt5zeesAxX4XhRJzHoCEwj2PJj9Re6xMJ9PlcfcvwD+OS+bcB6jfiXV
Ug9IBd/DqjlmD9G9MxFxfSV946rksAw=
=8uv4
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2019-08-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 5.4:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- dma-buf: add reservation_object_fences helper, relax
reservation_object_add_shared_fence, remove
reservation_object seq number (and then
restored)
- dma-fence: Shrinkage of the dma_fence structure,
Merge dma_fence_signal and dma_fence_signal_locked,
Store the timestamp in struct dma_fence in a union with
cb_list
Driver Changes:
- More dt-bindings YAML conversions
- More removal of drmP.h includes
- dw-hdmi: Support get_eld and various i2s improvements
- gm12u320: Few fixes
- meson: Global cleanup
- panfrost: Few refactors, Support for GPU heap allocations
- sun4i: Support for DDC enable GPIO
- New panels: TI nspire, NEC NL8048HL11, LG Philips LB035Q02,
Sharp LS037V7DW01, Sony ACX565AKM, Toppoly TD028TTEC1
Toppoly TD043MTEA1
Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied: fixup dma_resv rename fallout]
From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819141923.7l2adietcr2pioct@flea
Let's wait with decision about importance of uC failure to
hardware initialization step.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817131144.26884-4-michal.wajdeczko@intel.com
Move the active tracking for the frontbuffer operations out of the
i915_gem_object and into its own first class (refcounted) object. In the
process of detangling, we switch from low level request tracking to the
easier i915_active -- with the plan that this avoids any potential
atomic callbacks as the frontbuffer tracking wishes to sleep as it
flushes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816074635.26062-1-chris@chris-wilson.co.uk
Be more consistent with the naming of the other DMA-buf objects.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/323401/
Since commit 6ca9a2beb5 ("drm/i915: Unwind i915_gem_init() failure")
we believed that we correctly handle all errors encountered during
GuC initialization, including special one that indicates request to
run driver with disabled GPU submission (-EIO).
Unfortunately since commit 121981fafe ("drm/i915/guc: Combine
enable_guc_loading|submission modparams") we stopped using that
error code to avoid unwanted fallback to execlist submission mode.
In result any GuC initialization failure was treated as non-recoverable
error leading to driver load abort, so we could not even read related
GuC error log to investigate cause of the problem.
For now always return -EIO on any uC hardware related failure.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190811195132.9660-5-michal.wajdeczko@intel.com
We keep a global seed for the legacy BSD round-robin selector, but in
our testing of multiple simultaneous client workloads, a random seed
spreads the load more evenly. (As even as an initial round-robin selector
can be!) Removing the global is one less variable we have to find a home
for!
We can simulate multi-client (both same and mixed workloads) using
igt/gem_wsim to work out optimal strategies and then compare our
simulation with the actual transcoder on multi-engine machines. This
fixed round-robin turns out to be one of the worst methods.
No user is advised to use this method; the current suggestion is to use
a virtual engine for agnostic batches, randomised submission or using
the busyness tracking to select the most idle engine at the time of
dispatch. At the present time, intel-media is explicit, but libva still
seems to use it, with the exception of batches that must execute on vcs0.
Oh well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809091010.23281-2-chris@chris-wilson.co.uk
As we need to acquire a mutex to serialise the final
intel_wakeref_put, we need to ensure that we are in process context at
that time. However, we want to allow operation on the intel_wakeref from
inside timer and other hardirq context, which means that need to defer
that final put to a workqueue.
Inside the final wakeref puts, we are safe to operate in any context, as
we are simply marking up the HW and state tracking for the potential
sleep. It's only the serialisation with the potential sleeping getting
that requires careful wait avoidance. This allows us to retain the
immediate processing as before (we only need to sleep over the same
races as the current mutex_lock).
v2: Add a selftest to ensure we exercise the code while lockdep watches.
v3: That test was extremely loud and complained about many things!
v4: Not a whale!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111295
References: https://bugs.freedesktop.org/show_bug.cgi?id=111245
References: https://bugs.freedesktop.org/show_bug.cgi?id=111256
Fixes: 18398904ca ("drm/i915: Only recover active engines")
Fixes: 51fbd8de87 ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190808202758.10453-1-chris@chris-wilson.co.uk
Ignore the central i915->kernel_context for allocating an engine, as
that GEM context is being phased out. For internal clients, we just need
the per-engine logical state, so allocate it at the point of use.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190808110612.23539-1-chris@chris-wilson.co.uk