2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

2357 Commits

Author SHA1 Message Date
Chris Wilson
c1e63f6df3 drm/i915: Warn if we hit the timeout for wait-for-idle
Hitting the timeout and finding that all engines are actually idle is
indicative of an interrupt delivery problem. This problem is an issue
that we need to fix, so make sure we log it and provide the GEM trace.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180808105101.913-1-chris@chris-wilson.co.uk
2018-08-08 17:08:06 +01:00
Chris Wilson
ec5b65a97c drm/i915: Don't disable the GPU for older gen on wedging
If we issue a device level GPU reset on the older gen, it will disable
key components of the GMCH and the display engine. The purpose of
wedging is to simply prevent further GEM usage without disabling KMS, so
we need to be careful when we do issue the reset on wedging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180726085033.4044-3-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2018-07-26 12:53:34 +01:00
Chris Wilson
7ed43df720 drm/i915: Restore sane defaults for KMS on GEM error load
If we fail during GEM initialisation, we scrub the HW state by
performing a device level GPU resuet. However, we want to leave the
system in a usable state (with functioning KMS but no GEM) so after
scrubbing the HW state, we need to restore some sane defaults and
re-enable the low-level common parts of the GPU (such as the GMCH).

v2: Restore GTT entries.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180726085033.4044-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2018-07-26 12:53:28 +01:00
Chris Wilson
d899aceb60 drm/i915: Mark up object tiling-and-stride getters as const
For that little bit of defense against a tired programmer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180725155447.11909-1-chris@chris-wilson.co.uk
2018-07-26 12:31:02 +01:00
Chris Wilson
3970c65c2b drm/i915: Skip repeated calls to i915_gem_set_wedged()
If we already wedged, i915_gem_set_wedged() becomes a complicated no-op.

References: https://bugs.freedesktop.org/show_bug.cgi?id=107343
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180723145335.24579-1-chris@chris-wilson.co.uk
2018-07-25 08:06:33 +01:00
Chris Wilson
900ccf30f9 drm/i915: Only force GGTT coherency w/a on required chipsets
Not all chipsets have an internal buffer delaying the visibility of
writes via the GGTT being visible by other physical paths, but we use a
very heavy workaround for all. We only need to apply that workarounds to
the chipsets we know suffer from the delay and the resulting coherency
issue.

Similarly, the same inconsistent coherency fouls up our ABI promise that
a write into a mmap_gtt is immediately visible to others. Since the HW
has made that a lie, let userspace know when that contract is broken.
(Not that userspace would want to use mmap_gtt on those chipsets for
other performance reasons...)

Testcase: igt/drv_selftest/live_coherency
Testcase: igt/gem_mmap_gtt/coherency
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180720101910.11153-1-chris@chris-wilson.co.uk
2018-07-20 16:53:55 +01:00
Chris Wilson
01f8f33e99 drm/i915: Always retire residual requests before suspend
If the driver is wedged, we skip idling the GPU. However, we may still
have a few requests still not retired following the wedging (since they
will be waiting for a background worker trying to acquire struct_mutex).
As we hold the struct_mutex, always do a quick request retirement in
order to flush the wedged path.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107257
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180717084121.28185-1-chris@chris-wilson.co.uk
2018-07-18 14:49:55 +01:00
Chris Wilson
a8bd3b884d drm/i915: Flush chipset caches after GGTT writes
Our I915g (early gen3, the oldest machine we have in the farm) is still
reporting occasional incoherency performing the following operations:

  1) write through GGTT (indirect write into memory)
  2) write through either CPU or WC (direct write into memory)
  3) read from GGTT (indirect read)

Instead of reporting the value from (2), the read from GGTT reports the
earlier value written via the GGTT. We have made sure that the writes are
flushed from the CPU (commit 3a32497f0d ("drm/i915/selftests: Provide
full mb() around clflush") and commit add00e6d89 ("drm/i915: Flush the
WCB following a WC write")), but still see the error, just less
frequently. The only remaining cache that might be affected here is a
chipset cache, so flush that as well.

Testcase: igt/drv_selftest/live_coherency #gdg
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180717092655.28417-1-chris@chris-wilson.co.uk
2018-07-17 17:32:52 +01:00
Chris Wilson
f8c1cce36c drm/i915: Reject attempted pwrites into a read-only object
If the user created a read-only object, they should not be allowed to
circumvent the write protection using the pwrite ioctl.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712185315.3288-5-chris@chris-wilson.co.uk
2018-07-13 16:15:38 +01:00
Chris Wilson
3e977ac617 drm/i915: Prevent writing into a read-only object via a GGTT mmap
If the user has created a read-only object, they should not be allowed
to circumvent the write protection by using a GGTT mmapping. Deny it.

Also most machines do not support read-only GGTT PTEs, so again we have
to reject attempted writes. Fortunately, this is known a priori, so we
can at least reject in the call to create the mmap (with a sanity check
in the fault handler).

v2: Check the vma->vm_flags during mmap() to allow readonly access.
v3: Remove VM_MAYWRITE to curtail mprotect()

Testcase: igt/gem_userptr_blits/readonly_mmap*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> #v1
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712185315.3288-4-chris@chris-wilson.co.uk
2018-07-13 16:14:04 +01:00
Michał Winiarski
60c0a66ee9 drm/i915: Tidy error handling in i915_gem_init_hw
Let's reorder things so that we can do onion teardown rather than double
goto.

References: b96f6ebfd0 ("drm/i915: Correctly handle error path in i915_gem_init_hw")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712124810.25241-1-michal.winiarski@intel.com
2018-07-12 15:15:21 +01:00
Chris Wilson
8bcf9f7034 drm/i915: Flush the residual parking on emergency shutdown
On unwinding following a critical failure inside GEM init, we also need
to be sure to flush the workers before unloading the module.

Testcase: igt/drv_module_reload/basic-reload-inject
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180710094421.16223-1-chris@chris-wilson.co.uk
2018-07-10 13:59:00 +01:00
Chris Wilson
bf06112f86 drm/i915: Tidy i915_gem_suspend()
In the next patch, we will make a fairly minor change to flush
outstanding resets before suspend. In order to keep churn to a minimum
in that functional patch, we fix up the comments and coding style now.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709130208.11730-7-chris@chris-wilson.co.uk
2018-07-10 12:22:44 +01:00
Chris Wilson
2621cefaa4 drm/i915: Provide a timeout to i915_gem_wait_for_idle() on setup
With a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.

We can therefore set a timeout on our wait-for-idle that is shorter than
the hangcheck (which may be up to 60s for a declaring a wedged driver)
and so detect the broken GPU much more quickly during driver load (and
so prevent stalling userspace for ages).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-2-chris@chris-wilson.co.uk
2018-07-09 13:56:42 +01:00
Chris Wilson
ec625fb932 drm/i915: Provide a timeout to i915_gem_wait_for_idle()
Usually we have no idea about the upper bound we need to wait to catch
up with userspace when idling the device, but in a few situations we
know the system was idle beforehand and can provide a short timeout in
order to very quickly catch a failure, long before hangcheck kicks in.

In the following patches, we will use the timeout to curtain two overly
long waits, where we know we can expect the GPU to complete within a
reasonable time or declare it broken.

In particular, with a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.

The other improvement is that in selftests, we do not need to arm an
independent timer to inject a wedge, as we can just limit the timeout on
the wait directly.

v2: Include the timeout parameter in the trace.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
2018-07-09 13:55:41 +01:00
Chris Wilson
890fd185d5 drm/i915: Replace nested subclassing with explicit subclasses
In the next patch, we will want a third distinct class of timeline that
may overlap with the current pair of client and engine timeline classes.
Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise
the different timeline classes with an explicit subclass.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk
2018-07-07 08:09:43 +01:00
Chris Wilson
6dd7526f6f drm/i915: Export i915_request_skip()
In the next patch, we will want to start skipping requests on failing to
complete their payloads. So export the utility function current used to
make requests inoperable following a failed gpu reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-2-chris@chris-wilson.co.uk
2018-07-06 18:22:36 +01:00
Chris Wilson
add00e6d89 drm/i915: Flush the WCB following a WC write
If we have just completed a WC write, we must ensure that the WCB (Write
Combining Buffer) is flushed out to main memory before we can expect to
see the results. This is especially important when mixing WC with GTT as
the physical paths are different and cachelines are not naturally flushed.

Testcase: igt/drv_selftests/live_coherency #gdg
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706115402.18547-1-chris@chris-wilson.co.uk
2018-07-06 14:05:23 +01:00
Gustavo A. R. Silva
f0d759f038 drm/i915: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 141432
Addresses-Coverity-ID: 141433
Addresses-Coverity-ID: 141434
Addresses-Coverity-ID: 141435
Addresses-Coverity-ID: 141436
Addresses-Coverity-ID: 1357360
Addresses-Coverity-ID: 1357403
Addresses-Coverity-ID: 1357433
Addresses-Coverity-ID: 1392622
Addresses-Coverity-ID: 1415273
Addresses-Coverity-ID: 1435752
Addresses-Coverity-ID: 1441500
Addresses-Coverity-ID: 1454596
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628223541.GA17665@embeddedor.com
2018-07-05 16:40:51 +03:00
Chris Wilson
7e7367d3bc drm/i915: Try GGTT mmapping whole object as partial
If the whole object is already pinned by HW for use as scanout, we will
fail to move it to the mappable region and so must resort to using a
partial VMA covering the whole object.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104513
Fixes: aa136d9d72 ("drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180630090509.469-1-chris@chris-wilson.co.uk
2018-07-02 17:36:09 +01:00
Michal Wajdeczko
f7dc0157e4 drm/i915/uc: Fetch GuC/HuC firmwares from guc/huc specific init
We're fetching GuC/HuC firmwares directly from uc level during
init_early stage but this breaks guc/huc struct isolation and
also strict SW-only initialization rule for init_early. Move fw
fetching to init phase and do it separately per guc/huc struct.

v2: don't forget to move wopcm_init - Michele
v3: fetch in init_misc phase - Michal

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com> #2
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628141522.62788-2-michal.wajdeczko@intel.com
2018-06-28 22:51:33 +01:00
Chris Wilson
a61b47f672 drm/i915: Wait for engines to idle before retiring
In the next^W forthcoming patch, we will start to defer retiring the
request from the engine list if it is still active on the submission
backend. To preserve the semantics that after wait-for-idle completes
the system is idle and fully retired, we need to therefore wait for the
backends to idle before calling i915_retire_requests().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180627115334.16282-1-chris@chris-wilson.co.uk
2018-06-27 19:00:22 +01:00
Chris Wilson
bcc2661e32 drm/i915: Only show debug for state changes when banning
Since we trigger 10,000s of hangs and resets during selftesting, we emit
many, many thousands of lines of useless debug messages. Reduce the
frequency by only logging a change in state of a guilty context.

Fixes: 14921f3cef ("drm/i915: Fix context ban and hang accounting for client")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180618073135.10849-1-chris@chris-wilson.co.uk
2018-06-18 14:43:47 +01:00
Chris Wilson
4fdd5b4e9a drm/i915: Fix fallout of fake reset along resume
commit b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on
reset/sanitization") and commit 1288786b18 ("drm/i915: Move GEM sanitize
from resume_early to resume") show the conflicting requirements on the
code. We must reset the GPU before trashing live state on a fast resume
(hibernation debug, or error paths), but we must only reset our state
tracking iff the GPU is reset (or power cycled). This is tricky if we
are disabling GPU reset to simulate broken hardware; we reset our state
tracking but the GPU is left intact and recovers from its stale state.

v2: Again without the assertion for forcewake, no longer required since
commit b3ee09a4de ("drm/i915/ringbuffer: Fix context restore upon reset")
as the contexts are reset from the CS ensuring everything is powered up.

Fixes: b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization")
Fixes: 1288786b18 ("drm/i915: Move GEM sanitize from resume_early to resume")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk
2018-06-18 10:14:54 +01:00
Chris Wilson
042ed2dbe5 drm/i915: Be irqsafe inside reset
As we want to be able to call i915_reset_engine and co from a softirq or
timer context, we need to be irqsafe at all times. So we have to forgo
the simple spin_lock_irq for the full spin_lock_irqsave.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615093137.14270-1-chris@chris-wilson.co.uk
2018-06-15 19:50:36 +01:00
Mika Kuoppala
14921f3cef drm/i915: Fix context ban and hang accounting for client
If client is smart or lucky enough to create a new context
after each hang, our context banning mechanism will never
catch up, and as a result of that it will be saved from
client banning. This can result in a never ending streak of
gpu hangs caused by bad or malicious client, preventing
access from other legit gpu clients.

Fix this by always incrementing per client ban score if
it hangs in short successions regardless of context ban
scoring. The exception are non bannable contexts. They remain
detached from client ban scoring mechanism.

v2: xchg timestamp, tidyup (Chris)
v3: comment, bannable & banned together (Chris)

Fixes: b083a0870c ("drm/i915: Add per client max context ban limit")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615104429.31477-1-mika.kuoppala@linux.intel.com
2018-06-15 16:36:21 +03:00
Chris Wilson
697b9a8714 drm/i915: Make closing request flush mandatory
For symmetry, simplicity and ensuring the request is always truly idle
upon its completion, always emit the closing flush prior to emitting the
request breadcrumb. Previously, we would only emit the flush if we had
started a user batch, but this just leaves all the other paths open to
speculation (do they affect the GPU caches or not?) With mm switching, a
key requirement is that the GPU is flushed and invalidated before hand,
so for absolute safety, we want that closing flush be mandatory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180612105135.4459-1-chris@chris-wilson.co.uk
2018-06-14 08:16:12 +01:00
Chris Wilson
acd1c1e621 drm/i915: Refactor unsettting obj->mm.pages
As i915_gem_object_phys_attach() wants to play dirty and mess around
with obj->mm.pages itself (replacing the shmemfs with a DMA allocation),
refactor the gubbins so into i915_gem_object_unset_pages() that we don't
have to duplicate all the secrets.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611075532.26534-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/152871104647.1718.8796913290418060204@jlahtine-desk.ger.corp.intel.com
2018-06-11 11:00:04 +01:00
Chris Wilson
51c18bf7fd drm/i915: Squash GEM load failure message (again)
Due to a silent conflict (silent because we are trying to fix the CI
test that is meant to exercising these failures!) between commit
51e645b665 ("drm/i915: Mark the GPU as wedged without error on fault
injection") and commit 8571a05a9d ("drm/i915: Use GEM suspend when
aborting initialisation"), we failed to actually squash the error
message after injecting the load failure.

Rearrange the code to export i915_load_failure() for better logging of
real errors (and quiet logging of injected errors).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180609111058.2660-1-chris@chris-wilson.co.uk
2018-06-11 10:01:03 +01:00
Chris Wilson
51e645b665 drm/i915: Mark the GPU as wedged without error on fault injection
If we have been instructed (by CI) to inject a fault to load the module
with a wedged GPU, do so quietly less we upset CI.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180607134558.31150-1-chris@chris-wilson.co.uk
2018-06-08 10:36:10 +01:00
Chris Wilson
521370106d drm/i915: Change i915_gem_fault() to return vm_fault_t
In preparation for vm_fault_t becoming a distinct type, convert the
fault handler (i915_gem_fault()) over to the new interface.

Based on a patch by Souptick Joarder

References: 1c8f422059 ("mm: change return type to vm_fault_t")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180606214520.20220-1-chris@chris-wilson.co.uk
2018-06-07 08:46:04 +01:00
Chris Wilson
8571a05a9d drm/i915: Use GEM suspend when aborting initialisation
As part of our GEM initialisation now, we send a request to the hardware
in order to record the initial GPU state. This coupled with deferred
idle workers, makes aborting on error tricky. We already have the
mechanism in place to wait on the GPU and cancel all the deferred
workers for suspend, so let's reuse it during the error teardown. It is
already used in places for later init error handling, but doing so at
this point is slightly ugly due to the mutex dance (it's ok, the module
load is still single threaded).

Testcase: igt/drv_module_reload/basic-reload-inject
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180606145441.4460-1-chris@chris-wilson.co.uk
2018-06-07 08:42:36 +01:00
Chris Wilson
82ad6443a5 drm/i915/gtt: Rename i915_hw_ppgtt base member
In the near future, I want to subclass gen6_hw_ppgtt as it contains a
few specialised members and I wish to add more. To avoid the ugliness of
using ppgtt->base.base, rename the i915_hw_ppgtt base member
(i915_address_space) as vm, which is our common shorthand for an
i915_address_space local.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605153758.18422-1-chris@chris-wilson.co.uk
2018-06-05 21:11:20 +01:00
Chris Wilson
420980ca79 drm/i915: Swap magics and use SZ_1M
Since the kernel provides SZ_1M, use it in preference of 1 << 20.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605135746.8020-1-chris@chris-wilson.co.uk
2018-06-05 16:16:41 +01:00
Michal Wajdeczko
b96f6ebfd0 drm/i915: Correctly handle error path in i915_gem_init_hw
In function gem_init_hw() we are calling uc_init_hw() but in case
of error later in function, we missed to call matching uc_fini_hw()

v2: pulled out from the series

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605122443.23776-1-michal.wajdeczko@intel.com
2018-06-05 15:16:08 +01:00
Changbin Du
52b2416ceb drm/i915: Add new vGPU cap info bit VGT_CAPS_HUGE_GTT
This adds a new vGPU cap info bit VGT_CAPS_HUGE_GTT, which is to detect
whether the host supports shadowing of huge gtt pages. If host does
support it, remove the page sizes restriction for vGPU.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1525770425-5373-1-git-send-email-changbin.du@intel.com
2018-06-05 16:57:01 +03:00
Michal Wajdeczko
8979187a8c drm/i915: Move i915_gem_fini to i915_gem.c
We should keep i915_gem_init/fini functions together for easier
tracking of their symmetry.

v2: rebased, pulled out from the series

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180604090032.20840-1-michal.wajdeczko@intel.com
2018-06-04 15:00:01 +01:00
Chris Wilson
95c778daec drm/i915: Apply the full CPU domain markup before freezing
Let's not take any chances by using a shortcut to mark the objects as in
the CPU domain upon freezing (all pages will be written to disk and so
on restore all objects will start from the CPU domain). Currently, we
simply mark the objects as being in the CPU domain, bypassing the
flushes. Let's call the full domain transfer function so that we have
less special case code (and symmetry with the suspend path) even though
it will be mostly redundant.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180601144125.18026-2-chris@chris-wilson.co.uk
2018-06-01 21:42:15 +01:00
Chris Wilson
9776f47253 drm/i915: Flush all writes before suspend
As we have already suspended the device, this should be a no-op except
for marking that all writes are indeed complete. The downside is that
we then have to walk all the lists of objects for what should be a no-op
(in some cases they will be mmio read to ensure the GGTT writes are
indeed flushed, and clflushes to ensure that cpu writes are in memory).

It seems prudent and the safer course for us to ensure all writes are
flushed to memory before suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180601144125.18026-1-chris@chris-wilson.co.uk
2018-06-01 21:42:14 +01:00
Chris Wilson
1934f5deaf drm/i915: Assert we idle in the kernel context
Now that we always switch to the kernel context upon idling, we can
make that assertion.

References: 4dfacb0bcb ("drm/i915: Switch to kernel context before idling at runtime")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531224057.6036-1-chris@chris-wilson.co.uk
2018-06-01 13:38:40 +01:00
Chris Wilson
ec92ad00a3 drm/i915: Only sanitize GEM from late suspend
During testing we encounter a conflict between SUSPEND_TEST_DEVICES and
disabling reset (gem_eio/suspend). This results in the device continuing
on without being reset, but since it has gone through HW sanitization to
account for the suspend/resume cycle, we have to assume the device has
been reset to its defaults. A simple way around this is to skip the
sanitize phase for SUSPEND_TEST_DEVICES by moving it to suspend-late.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-4-chris@chris-wilson.co.uk
2018-05-31 19:29:54 +01:00
Chris Wilson
c3160da9a6 drm/i915: After reset on sanitization, reset the engine backends
As we reset the GPU on suspend/resume, we also do need to reset the
engine state tracking so call into the engine backends. This is
especially important so that we can also sanitize the state tracking
across resume.

References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-3-chris@chris-wilson.co.uk
2018-05-31 19:29:53 +01:00
Chris Wilson
0606035fca drm/i915: "Race-to-idle" after switching to the kernel context
During suspend we want to flush out all active contexts and their
rendering. To do so we queue a request from the kernel's context, once
we know that request is done, we know the GPU is completely idle. To
speed up that switch bump the GPU clocks.

Switching to the kernel context prior to idling is also used to enforce
a barrier before changing OA properties, and when evicting active
rendering from the global GTT. All cases where we do want to
race-to-idle.

v2: Limit the boosting to only the switch before suspend.
v3: Limit it to the wait-for-idle on suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Tested-by: David Weinehall <david.weinehall@linux.intel.com> #v1
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-2-chris@chris-wilson.co.uk
2018-05-31 19:29:52 +01:00
Chris Wilson
4dfacb0bcb drm/i915: Switch to kernel context before idling at runtime
We can reduce our exposure to random neutrinos by resting on the kernel
context having flushed out the user contexts to system memory and
beyond. The corollary is that we then we require two passes through the
idle handler to go to sleep, which on a truly idle system involves an
extra pass through the slow and irregular retire work handler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-1-chris@chris-wilson.co.uk
2018-05-31 19:29:50 +01:00
Chris Wilson
cc7cc53435 drm/i915: Remove stale asserts from i915_gem_find_active_request()
Since we use i915_gem_find_active_request() from inside
intel_engine_dump() and may call that at any time, we do not guarantee
that the engine is paused nor that the signal kthreads and irq handler
are suspended, so we cannot assert that the breadcrumb doesn't advance
and that the irq hasn't happened on another CPU signaling the request we
believe to be idle.

The second assert removed (that request->engine == engine) remains
valid, but is now more rigorously checked during retirement.

Fixes: f636edb214 ("drm/i915: Make i915_engine_info pretty printer to standalone")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180529132922.6831-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2018-05-30 12:11:10 +01:00
Chris Wilson
09a4c02e58 drm/i915: Look for an active kernel context before switching
We were not very carefully checking to see if an older request on the
engine was an earlier switch-to-kernel-context before deciding to emit a
new switch. The end result would be that we could get into a permanent
loop of trying to emit a new request to perform the switch simply to
flush the existing switch.

What we need is a means of tracking the completion of each timeline
versus the kernel context, that is to detect if a more recent request
has been submitted that would result in a switch away from the kernel
context. To realise this, we need only to look in our syncmap on the
kernel context and check that we have synchronized against all active
rings.

v2: Since all ringbuffer clients currently share the same timeline, we do
have to use the gem_context to distinguish clients.

As a bonus, include all the tracing used to debug the death inside
suspend.

v3: Test, test, test. Construct a selftest to exercise and assert the
expected behaviour that multiple switch-to-contexts do not emit
redundant requests.

Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Fixes: a89d1f921c ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk
2018-05-24 15:51:45 +01:00
Chris Wilson
1fc44d9b1a drm/i915: Store a pointer to intel_context in i915_request
To ease the frequent and ugly pointer dance of
&request->gem_context->engine[request->engine->id] during request
submission, store that pointer as request->hw_context. One major
advantage that we will exploit later is that this decouples the logical
context state from the engine itself.

v2: Set mock_context->ops so we don't crash and burn in selftests.
    Cleanups from Tvrtko.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk
2018-05-18 09:35:22 +01:00
Chris Wilson
4e0d64dba8 drm/i915: Move request->ctx aside
In the next patch, we want to store the intel_context pointer inside
i915_request, as it is frequently access via a convoluted dance when
submitting the request to hw. Having two context pointers inside
i915_request leads to confusion so first rename the existing
i915_gem_context pointer to i915_request.gem_context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-1-chris@chris-wilson.co.uk
2018-05-18 09:35:17 +01:00
Chris Wilson
3f6e982230 drm/i915: Stop parking the signaler around reset
We cannot call kthread_park() from softirq context, so let's avoid it
entirely during the reset. We wanted to suspend the signaler so that it
would not mark a request as complete at the same time as we marked it as
being in error. Instead of parking the signaling, stop the engine from
advancing so that the GPU doesn't emit the breadcrumb for our chosen
"guilty" request.

v2: Refactor setting STOP_RING so that we don't have the same code thrice

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michałt Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-8-chris@chris-wilson.co.uk
2018-05-16 20:20:39 +01:00
Chris Wilson
5adfb772f8 drm/i915: Move engine reset prepare/finish to backends
In preparation to more carefully handling incomplete preemption during
reset by execlists, we move the existing code wholesale to the backends
under a couple of new reset vfuncs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-4-chris@chris-wilson.co.uk
2018-05-16 20:20:35 +01:00
Chris Wilson
f351d087d8 drm/i915: Only sync tasklets once for recursive reset preparation
When setting up reset, we may need to recursively prepare an engine. In
which case we should only synchronously flush the tasklets on the outer
most call, the inner calls will then be inside an atomic section where
the tasklet will never be run (and so the sync will never complete).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-2-chris@chris-wilson.co.uk
2018-05-16 20:20:33 +01:00
Chris Wilson
b8444cf85b drm/i915: Remove tasklet flush before disable
The idea was to try and let the existing tasklet run to completion
before we began the reset, but it involves a racy check against anything
else that tries to run the tasklet. Rather than acknowledge and ignore
the race, let it be and don't try and be too clever.

The tasklet will resume execution after reset (after spinning a bit
during reset), but before we allow it to resume we will have cleared all
the pending state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-1-chris@chris-wilson.co.uk
2018-05-16 20:20:32 +01:00
Chris Wilson
0c591a40af drm/i915: Mark up nested spinlocks
When we process the outstanding requests upon banning a context, we need
to acquire both the engine and the client's timeline, nesting the locks.
This requires explicit markup as the two timelines are now of the same
class, since commit a89d1f921c ("drm/i915: Split i915_gem_timeline into
individual timelines").

Testcase: igt/gem_eio/banned
Fixes: a89d1f921c ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180512084957.9829-1-chris@chris-wilson.co.uk
2018-05-14 11:49:09 +01:00
Chris Wilson
4f6d8fcf1a drm/i915: Flush submission tasklet after bumping priority
When called from process context tasklet_schedule() defers itself to
ksoftirqd. From experience this may cause unacceptable latencies of over
200ms in executing the submission tasklet, our goal is to reprioritise
the HW execution queue and trigger HW preemption immediately, so disable
bh over the call to schedule and force the tasklet to run afterwards if
scheduled.

v2: Keep rcu_read_lock() around for PREEMPT_RCU

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180507135731.10587-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2018-05-08 14:58:48 +01:00
Chris Wilson
3365e2268b drm/i915: Lazily unbind vma on close
When userspace is passing around swapbuffers using DRI, we frequently
have to open and close the same object in the foreign address space.
This shows itself as the same object being rebound at roughly 30fps
(with a second object also being rebound at 30fps), which involves us
having to rewrite the page tables and maintain the drm_mm range manager
every time.

However, since the object still exists and it is only the local handle
that disappears, if we are lazy and do not unbind the VMA immediately
when the local user closes the object but defer it until the GPU is
idle, then we can reuse the same VMA binding. We still have to be
careful to mark the handle and lookup tables as closed to maintain the
uABI, just allowing the underlying VMA to be resurrected if the user is
able to access the same object from the same context again.

If the object itself is destroyed (neither userspace keeping a handle to
it), the VMA will be reaped immediately as usual.

In the future, this will be even more useful as instantiating a new VMA
for use on the GPU will become heavier. A nuisance indeed, so nip it in
the bud.

v2: s/__i915_vma_final_close/i915_vma_destroy/ etc.
v3: Leave a hint as to why we deferred the unbind on close.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-1-chris@chris-wilson.co.uk
2018-05-04 07:26:56 +01:00
Chris Wilson
a89d1f921c drm/i915: Split i915_gem_timeline into individual timelines
We need to move to a more flexible timeline that doesn't assume one
fence context per engine, and so allow for a single timeline to be used
across a combination of engines. This means that preallocating a fence
context per engine is now a hindrance, and so we want to introduce the
singular timeline. From the code perspective, this has the notable
advantage of clearing up a lot of mirky semantics and some clumsy
pointer chasing.

By splitting the timeline up into a single entity rather than an array
of per-engine timelines, we can realise the goal of the previous patch
of tracking the timeline alongside the ring.

v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
uninitialised.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
2018-05-02 23:57:18 +01:00
Chris Wilson
65fcb8064d drm/i915: Move timeline from GTT to ring
In the future, we want to move a request between engines. To achieve
this, we first realise that we have two timelines in effect here. The
first runs through the GTT is required for ordering vma access, which is
tracked currently by engine. The second is implied by sequential
execution of commands inside the ringbuffer. This timeline is one that
maps to userspace's expectations when submitting requests (i.e. given the
same context, batch A is executed before batch B). As the rings's
timelines map to userspace and the GTT timeline an implementation
detail, move the timeline from the GTT into the ring itself (per-context
in logical-ring-contexts/execlists, or a global per-engine timeline for
the shared ringbuffers in legacy submission.

The two timelines are still assumed to be equivalent at the moment (no
migrating requests between engines yet) and so we can simply move from
one to the other without adding extra ordering.

v2: Reinforce that one isn't allowed to mix the engine execution
timeline with the client timeline from userspace (on the ring).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
2018-05-02 23:57:13 +01:00
Chris Wilson
643b450a59 drm/i915: Only track live rings for retiring
We don't need to track every ring for its lifetime as they are managed
by the contexts/engines. What we do want to track are the live rings so
that we can sporadically clean up requests if userspace falls behind. We
can simply restrict the gt->rings list to being only gt->live_rings.

v2: s/live/active/ for consistency with gt.active_requests

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk
2018-04-30 16:01:20 +01:00
Chris Wilson
b887d61546 drm/i915: Retire requests along rings
In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.

As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.

In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!

v2: We need both engine-order and ring-order processing to maintain our
knowledge of where individual rings have completed upto as well as
knowing what was last executing on any engine. And finally by decoupling
retiring the contexts on the engine and the timelines along the rings,
we do have to keep a reference to the context on each request
(previously it was guaranteed by the context being pinned).

v3: Not just a reference to the context, but we need to keep it pinned
as we manipulate the rings; i.e. we need a pin for both the manipulation
of the engine state during its retirements, and a separate pin for the
manipulation of the ring state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
2018-04-30 16:01:18 +01:00
Chris Wilson
ab82a0635c drm/i915: Wrap engine->context_pin() and engine->context_unpin()
Make life easier in upcoming patches by moving the context_pin and
context_unpin vfuncs into inline helpers.

v2: Fixup mock_engine to mark the context as pinned on use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk
2018-04-30 16:01:13 +01:00
Chris Wilson
7f961d799f drm/i915: Compile out engine debug for release
The majority of the engine state dumping is too voluminous to be useful
outside of a controlled setup, though a few do accompany severe errors.
Keep the debug dumps next to the errors, but hide the others behind a CI
compile flag. This becomes more useful when adding more dumps to latency
sensitive paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180426103219.22181-1-chris@chris-wilson.co.uk
2018-04-26 15:13:35 +01:00
Chris Wilson
b7268c5eed drm/i915: Pack params to engine->schedule() into a struct
Today we only want to pass along the priority to engine->schedule(), but
in the future we want to have much more control over the various aspects
of the GPU during a context's execution, for example controlling the
frequency allowed. As we need an ever growing number of parameters for
scheduling, move those into a struct for convenience.

v2: Move the anonymous struct into its own function for legibility and
ye olde gcc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-3-chris@chris-wilson.co.uk
2018-04-18 21:09:11 +01:00
Oscar Mateo
59b449d5c8 drm/i915: Split out functions for different kinds of workarounds
There are different kind of workarounds (those that modify registers that
live in the context image, those that modify global registers, those that
whitelist registers, etc...) and they have different requirements in terms
of where they are applied and how. Also, by splitting them apart, it should
be easier to decide where a new workaround should go.

v2:
  - Add multiple MISSING_CASE
  - Rebased

v3:
  - Rename mmio_workarounds to gt_workarounds (Chris, Mika)
  - Create empty placeholders for BDW and CHV GT WAs
  - Rebased

v4: Rebased

v5:
 - Rebased
 - FORCE_TO_NONPRIV register exists since BDW, so make a path
   for it to achieve universality, even if empty (Chris)

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: appease checkpatch]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1523376767-18480-2-git-send-email-oscar.mateo@intel.com
2018-04-11 22:47:46 +01:00
Chris Wilson
3834dc1f0e drm/i915: Don't fiddle with rps/rc6 across GPU reset
Resetting the GPU doesn't affect the RPS/RC6 state, so we can stop
forcibly reloading the registers.

Ville suggested this many moons ago, I said at that time that sanitizing
was no harm and meant that our bookkeeping was kept consistent with the
HW. However, in a forthcoming series, we want to split rps/rc6 GT
powermanagement and one of the key simplifications is the control of
when we enable it. Performing a crude sanitize in the middle of
i915_gem_reset() is then a huge wart.

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180410133354.13425-1-chris@chris-wilson.co.uk
2018-04-10 17:14:16 +01:00
Chris Wilson
d0667e9ce5 drm/i915: Pass the set of guilty engines to i915_reset()
Currently, we rely on inspecting the hangcheck state from within the
i915_reset() routines to determine which engines were guilty of the
hang. This is problematic for cases where we want to run
i915_handle_error() and call i915_reset() independently of hangcheck.
Instead of relying on the indirect parameter passing, turn it into an
explicit parameter providing the set of stalled engines which then are
treated as guilty until proven innocent.

While we are removing the implicit stalled parameter, also make the
reason into an explicit parameter to i915_reset(). We still need a
back-channel for i915_handle_error() to hand over the task to the locked
waiter, but let's keep that its own channel rather than incriminate
another.

This leaves stalled/seqno as being private to hangcheck, with no more
nefarious snooping by reset, be it whole-device or per-engine. \o/

The only real issue now is that this makes it crystal clear that we
don't actually do any testing of hangcheck per se in
drv_selftest/live_hangcheck, merely of resets!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-2-chris@chris-wilson.co.uk
2018-04-06 23:51:40 +01:00
Chris Wilson
bba0869b18 drm/i915: Treat i915_reset_engine() as guilty until proven innocent
If we are resetting just one engine, we know it has stalled. So we can
pass the stalled parameter directly to i915_gem_reset_engine(), which
alleviates the necessity to poke at the generic engine->hangcheck.stalled
magic variable, leaving that under control of hangcheck as its name
implies. Other than simplifying by removing the indirect parameter along
this path, this allows us to introduce new reset mechanisms that run
independently of hangcheck.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-1-chris@chris-wilson.co.uk
2018-04-06 23:43:47 +01:00
Chris Wilson
e4d2006f8f drm/i915: Split out parking from the idle worker for reuse
We will want to park GEM before disengaging the drive^W^W^W unwedging.
Since we already do the work for idling, expose the guts as a new
function that we can then reuse.

v2: Just skip if already parked; makes it more forgiving to use by
future callers.
v3: Extract mark_busy, rename it to i915_gem_unpark and place it next to
i915_gem_park so that we can evaluate it for symmetry more easily.
Calling GEM from inside i915_request looks to be a bit of a layering
violation, for the moment I am imaging them as being notify_cb.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406155144.27791-1-chris@chris-wilson.co.uk
2018-04-06 20:07:13 +01:00
Michal Wajdeczko
a0de908d44 drm/i915: Reorder early initialization
In upcoming patch, we want to perform more actions in early
initialization of the uC. This reordering will help resolve
new dependencies that will be introduced by future patch.

v2: s/i915_gem_load_init/i915_gem_init_early (Chris)
v3: s/i915_gem_load_cleanup/i915_gem_cleanup_early (Michal)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180323123451.59244-1-michal.wajdeczko@intel.com
2018-03-23 17:03:24 +00:00
Chris Wilson
ac697ae801 drm/i915: Stop engines when declaring the machine wedged
If we fail to reset the GPU, we declare the machine wedged. However, the
GPU may well still be running in the background with an in-flight
request. So despite our efforts in cleaning up the request queue and
faking the breadcrumb in the HWSP, the GPU may eventually write the
in-flght seqno there breaking all of our assumptions and throwing the
driver into a deep turmoil, wedging beyond wedged.

To avoid this we ideally want to reset the GPU. Since that has already
failed, make sure the rings have the stop bit set instead. This is part
of the normal GPU reset sequence, but that is actually disabled by
igt/gem_eio to force the wedged state. If we assume the worst, we must
poke at the bit again before we give up.

v2: Move the intel_gpu_reset() from set-wedged in the reset error path
into i915_gem_set_wedged() itself. Even if the reset fails (e.g. if it is
disabled by gem_eio), it still tries to make sure the engines are
stopped. For i915_gem_set_wedged() callers from outside of i915_reset(),
this should make sure the GPU is disabled while the driver is marked as
being wedged.

Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180315151015.22741-1-chris@chris-wilson.co.uk
2018-03-16 10:16:08 +00:00
Chris Wilson
d9b13c4dde drm/i915: Trace GEM steps between submit and wedging
We still have an odd race with wedging/unwedging as shown by igt/gem_eio
that defies expectations. Add some more trace_printks to try and
visualize the flow over the precipice.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180315131451.4060-1-chris@chris-wilson.co.uk
2018-03-16 10:16:07 +00:00
Jackie Li
f08e2035cc drm/i915/guc: Check the locking status of GuC WOPCM registers
GuC WOPCM registers are write-once registers. Current driver code accesses
these registers without checking the accessibility to these registers which
will lead to unpredictable driver behaviors if these registers were touch
by other components (such as faulty BIOS code).

This patch moves the GuC WOPCM registers updating code into intel_wopcm.c
and adds check before and after the update to GuC WOPCM registers so that
we can make sure the driver is in a known state after writing to these
write-once registers.

v6:
 - Made sure module reloading won't bug the kernel while doing
   locking status checking

v7:
 - Fixed patch format issues

v8:
 - Fixed coding style issue on register lock bit macro definition (Sagar)

v9:
 - Avoided to use redundant !! to cast uint to bool (Chris)
 - Return error code instead of GEM_BUG_ON for locked with invalid register
   values case (Sagar)
 - Updated guc_wopcm_hw_init to use guc_wopcm as first parameter (Michal)
 - Added code to set and validate the HuC_LOADING_AGENT_GUC bit in GuC
   WOPCM offset register based on the presence of HuC firmware (Michal)
 - Use bit fields instead of macros for GuC WOPCM flags (Michal)

v10:
 - Refined variable names, removed redundant comments (Joonas)
 - Introduced lockable_reg to handle the write once register write and
   propagate the write error to caller (Joonas)
 - Used lockable_reg abstraction to avoid locking bit check on generic
   i915_reg_t (Michal)
 - Added log message for error paths (Michal)
 - Removed hw_updated flag and only relies on real hardware status

v11:
 - Replaced lockable_reg with simplified function (Michal)
 - Used new macros for locking bits of WOPCM size/offset registers instead
   of using BIT(0) directly (Michal)
 - use intel_wopcm_init_hw() called from intel_gem_init_hw() to do GuC
   WOPCM register setup instead of calling from intel_uc_init_hw() (Michal)

v12:
 - Updated function kernel-doc to align with code changes (Michal)
 - Updated code to use wopcm pointer directly (Michal)

v13:
 - Updated the ordering of s-o-b/cc/r-b tags (Sagar)

BSpec: 10875, 10833

Signed-off-by: Jackie Li <yaodong.li@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> (v11)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v12)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1520987574-19351-5-git-send-email-yaodong.li@intel.com
2018-03-14 15:35:37 +02:00
Jackie Li
6b0478fb72 drm/i915: Implement dynamic GuC WOPCM offset and size calculation
Hardware may have specific restrictions on GuC WOPCM offset and size. On
Gen9, the value of the GuC WOPCM size register needs to be larger than the
value of GuC WOPCM offset register + a Gen9 specific offset (144KB) for
reserved GuC WOPCM. Fail to enforce such a restriction on GuC WOPCM size
will lead to GuC firmware execution failures. On the other hand, with
current static GuC WOPCM offset and size values (512KB for both offset and
size), the GuC WOPCM size verification will fail on Gen9 even if it can be
fixed by lowering the GuC WOPCM offset by calculating its value based on
HuC firmware size (which is likely less than 200KB on Gen9), so that we can
have a GuC WOPCM size value which is large enough to pass the GuC WOPCM
size check.

This patch updates the reserved GuC WOPCM size for RC6 context on Gen9 to
24KB to strictly align with the Gen9 GuC WOPCM layout. It also adds support
to verify the GuC WOPCM size aganist the Gen9 hardware restrictions. To
meet all above requirements, let's provide dynamic partitioning of the
WOPCM that will be based on platform specific HuC/GuC firmware sizes.

v2:
 - Removed intel_wopcm_init (Ville/Sagar/Joonas)
 - Renamed and Moved the intel_wopcm_partition into intel_guc (Sagar)
 - Removed unnecessary function calls (Joonas)
 - Init GuC WOPCM partition as soon as firmware fetching is completed

v3:
 - Fixed indentation issues (Chris)
 - Removed layering violation code (Chris/Michal)
 - Created separat files for GuC wopcm code  (Michal)
 - Used inline function to avoid code duplication (Michal)

v4:
 - Preset the GuC WOPCM top during early GuC init (Chris)
 - Fail intel_uc_init_hw() as soon as GuC WOPCM partitioning failed

v5:
 - Moved GuC DMA WOPCM register updating code into intel_wopcm.c
 - Took care of the locking status before writing to GuC DMA
   Write-Once registers. (Joonas)

v6:
 - Made sure the GuC WOPCM size to be multiple of 4K (4K aligned)

v8:
 - Updated comments and fixed naming issues (Sagar/Joonas)
 - Updated commit message to include more description about the hardware
   restriction on GuC WOPCM size (Sagar)

v9:
 - Minor changes variable names and code comments (Sagar)
 - Added detailed GuC WOPCM layout drawing (Sagar/Michal)
 - Refined macro definitions to be reader friendly (Michal)
 - Removed redundent check to valid flag (Michal)
 - Unified first parameter for exported GuC WOPCM functions (Michal)
 - Refined the name and parameter list of hardware restriction checking
   functions (Michal)

v10:
 - Used shorter function name for internal functions (Joonas)
 - Moved init-ealry function into c file (Joonas)
 - Consolidated and removed redundant size checks (Joonas/Michal)
 - Removed unnecessary unlikely() from code which is only called once
   during boot (Joonas)
 - More fixes to kernel-doc format and content (Michal)
 - Avoided the use of PAGE_MASK for 4K pages (Michal)
 - Added error log messages to error paths (Michal)

v11:
 - Replaced intel_guc_wopcm with more generic intel_wopcm and attached
   intel_wopcm to drm_i915_private instead intel_guc (Michal)
 - dynamic calculation of GuC non-wopcm memory start (a.k.a WOPCM Top
   offset from GuC WOPCM base) (Michal)
 - Moved WOPCM marco definitions into .c source file (Michal)
 - Exported WOPCM layout diagram as kernel-doc (Michal)

v12:
 - Updated naming, function kernel-doc to align with new changes (Michal)

v13:
 - Updated the ordering of s-o-b/cc/r-b tags (Sagar)
 - Corrected one tense error in comment (Sagar)
 - Corrected typos and removed spurious comments (Joonas)

Bspec: 12690

Signed-off-by: Jackie Li <yaodong.li@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Spotswood <john.a.spotswood@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v8)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v9)
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> (v11)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v12)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1520987574-19351-2-git-send-email-yaodong.li@intel.com
2018-03-14 15:35:33 +02:00
Chris Wilson
629820fcd0 drm/i915: Show GEM_TRACE when detecting a failed GPU idle
If we timeout waiting for the GPU to idle, something went seriously
wrong. We currently dump the engine state, but we can also dump the
ftrace buffer showing our last operations (when available).

In passing, note that since commit 559e040f1f ("drm/i915: Show the GPU
state when declaring wedged", we now show the engine state twice, once
in detecting the failed idle and then again on declaring wedged.

v2: ftrace_dump() takes a parameter specifying whether to dump all cpu
buffers or the local cpu's.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180309101114.1138-1-chris@chris-wilson.co.uk
2018-03-13 21:41:09 +00:00
Dhinakaran Pandiyan
07bcd99b80 drm/i915/frontbuffer: Pull frontbuffer_flush out of gem_obj_pin_to_display
i915_gem_obj_pin_to_display() calls frontbuffer_flush with origin set to
DIRTYFB. The callers however are at a vantage point to decide if hardware
frontbuffer tracking can do the flush for us. For example, legacy cursor
updates, like flips, write to MMIO registers, which then triggers PSR flush
by the hardware. Moving frontbuffer_flush out will enable us to skip a
software initiated flush by setting origin to FLIP. Thanks to Chris for the
idea.

v2:
Rebased due to Ville adding intel_plane_pin_fb().
Minor code reordering as fb_obj_flush doesn't need struct_mutex (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307033420.3086-1-dhinakaran.pandiyan@intel.com
2018-03-13 13:49:39 -07:00
Michal Wajdeczko
c37d572820 drm/i915/uc: Sanitize uC together with GEM
Instead of dancing around uC on reset/suspend/resume scenarios,
explicitly sanitize uC when we sanitize GEM to force uC reload
and start from known beginning.

v2: don't forget about reset path (Daniele)
    sanitize uc before gem initiated full reset (Daniele)
v3: drop redundant disable_communication in init_hw (Daniele)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180312130308.22952-3-michal.wajdeczko@intel.com
2018-03-12 22:06:19 +00:00
Chris Wilson
68ad361285 drm/i915: Only call tasklet_kill() on the first prepare_reset
tasklet_kill() will spin waiting for the current tasklet to be executed.
However, if tasklet_disable() has been called, then the tasklet is never
executed but permanently put back onto the runlist until
tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
disable/enable pair. This is the case when we call set-wedge from inside
i915_reset(), and another request was submitted to us concurrent to the
reset.

Fixes: 963ddd63c3 ("drm/i915: Suspend submission tasklets around wedging")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-6-chris@chris-wilson.co.uk
2018-03-09 14:13:35 +00:00
Chris Wilson
47650db02d drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.

Fixes: af7a8ffad9 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-5-chris@chris-wilson.co.uk
2018-03-09 14:13:34 +00:00
Chris Wilson
2d4ecace3a drm/i915: Finish the wait-for-wedge by retiring all the inflight requests
Before we reset the GPU after marking the device as wedged, we wait for
all the remaining requests to be completed (and marked as EIO).
Afterwards, we should flush the request lists so the next batch start
with the driver in an idle state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-1-chris@chris-wilson.co.uk
2018-03-09 14:13:25 +00:00
Chris Wilson
fa73055b84 drm/i915: Only prune fences after wait-for-all
Currently, we only allow ourselves to prune the fences so long as
all the waits completed (i.e. all the fences we checked were signaled),
and that the reservation snapshot did not change across the wait.
However, if we only waited for a subset of the reservation object, i.e.
just waiting for the last writer to complete as opposed to all readers
as well, then we would erroneously conclude we could prune the fences as
indeed although all of our waits were successful, they did not represent
the totality of the reservation object.

v2: We only need to check the shared fences due to construction (i.e.
all of the shared fences will be later than the exclusive fence, if
any).

Fixes: e54ca97747 ("drm/i915: Remove completed fences after a wait")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307171303.29466-1-chris@chris-wilson.co.uk
2018-03-08 18:03:36 +00:00
Michal Wajdeczko
7cfca4afd6 drm/i915/uc: Introduce intel_uc_suspend|resume
We want to use higher level 'uc' functions as the main entry points to
the GuC/HuC code to hide some details and keep code layered.

While here, move call to disable_guc_interrupts after sending suspend
action to the GuC to allow it work also with CTB as comm mechanism.

v2: update commit msg (Sagar)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302111550.21328-1-michal.wajdeczko@intel.com
2018-03-02 23:11:12 +00:00
Chris Wilson
963ddd63c3 drm/i915: Suspend submission tasklets around wedging
After staring hard at sequences like

[   28.199013]  systemd-1       2..s. 26062228us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
[   28.199095]  systemd-1       2..s. 26062229us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000, active=0x1
[   28.199177]  systemd-1       2..s. 26062230us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, seqno=3, prio=-1024
[   28.199258]  systemd-1       2..s. 26062231us : execlists_submission_tasklet: rcs0 completed ctx=0
[   28.199340]  gem_eio-829     1..s1 26066853us : execlists_submission_tasklet: rcs0 in[0]:  ctx=1.1, seqno=1, prio=0
[   28.199421]   <idle>-0       2..s. 26066863us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?]
[   28.199503]   <idle>-0       2..s. 26066865us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[   28.199585]  gem_eio-829     1..s1 26067077us : execlists_submission_tasklet: rcs0 in[1]:  ctx=3.1, seqno=2, prio=0
[   28.199667]  gem_eio-829     1..s1 26067078us : execlists_submission_tasklet: rcs0 in[0]:  ctx=1.2, seqno=1, prio=0
[   28.199749]   <idle>-0       2..s. 26067084us : execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?]
[   28.199830]   <idle>-0       2..s. 26067085us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00008002:0x00000001, active=0x1
[   28.199912]   <idle>-0       2..s. 26067086us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.2, seqno=1, prio=0
[   28.199994]  gem_eio-829     2..s. 28246084us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
[   28.200096]  gem_eio-829     2..s. 28246088us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000014:0x00000001, active=0x5
[   28.200178]  gem_eio-829     2..s. 28246089us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0, prio=0
[   28.200260]  gem_eio-829     2..s. 28246127us : execlists_submission_tasklet: execlists_submission_tasklet:886 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)

the conclusion is that the only place where the ports are reset to zero,
is from engine->cancel_requests called during i915_gem_set_wedged().

The race is horrible as it results from calling set-wedged on active HW
(the GPU reset failed) and as such we need to be careful as the HW state
changes beneath us. Fortunately, it's the same scary conditions as
affect normal reset, so we can reuse the same machinery to disable state
tracking as we clobber it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104945
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Fixes: af7a8ffad9 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302113324.23189-2-chris@chris-wilson.co.uk
2018-03-02 23:11:11 +00:00
Chris Wilson
ffed7bd236 drm/i915: Replace open-coded wait-for loop
Now that we can pass arbitrary commands into the base __wait_for()
macro, we can reimplement the open-coded wait-for inside
i915_gem_idle_work_handler() using the new macro. This means that instead
of using ktime, we now use jiffies, and benefit from the exponential sleep
backoff that allows a fast response if the HW settles quickly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180301103338.5380-1-chris@chris-wilson.co.uk
2018-03-01 17:42:58 +00:00
Chris Wilson
e61e0f51ba drm/i915: Rename drm_i915_gem_request to i915_request
We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)

In short, the spatch:
@@

@@
- struct drm_i915_gem_request
+ struct i915_request

A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-02-21 20:57:22 +00:00
Chris Wilson
5935485f8e drm/i915: Move the policy for placement of the GGTT vma into the caller
Currently we make the unilateral decision inside
i915_gem_object_pin_to_display() where the VMA should resided (inside
the fence and mappable region or above?). This is not our decision to
make as it impacts on how the display engine can use the resulting
scanout object, and it would rather instruct us where to place the VMA so
that it can enable the features it wants. As such, make the pin flags an
argument to i915_gem_object_pin_to_display() and control them from
intel_pin_and_fence_fb_obj()

Whilst taking control of the mapping for ourselves, start tracking how
we use it to avoid trying to free a fence we never claimed:

<3>[  227.151869] GEM_BUG_ON(vma->fence->pin_count <= 0)
<4>[  227.152064] ------------[ cut here ]------------
<2>[  227.152068] kernel BUG at drivers/gpu/drm/i915/i915_vma.h:391!
<4>[  227.152084] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
<0>[  227.152092] Dumping ftrace buffer:
<0>[  227.152099]    (ftrace buffer empty)
<4>[  227.152102] Modules linked in: i915 snd_hda_codec_analog snd_hda_codec_generic coretemp snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm lpc_ich e1000e mei_me mei prime_numbers
<4>[  227.152131] CPU: 1 PID: 1587 Comm: kworker/u16:49 Tainted: G     U           4.16.0-rc1-gbab67b2f6177-kasan_7+ #1
<4>[  227.152134] Hardware name: Dell Inc. OptiPlex 755                 /0PU052, BIOS A08 02/19/2008
<4>[  227.152236] Workqueue: events_unbound intel_atomic_commit_work [i915]
<4>[  227.152292] RIP: 0010:intel_unpin_fb_vma+0x23a/0x2a0 [i915]
<4>[  227.152295] RSP: 0018:ffff88005aad7b68 EFLAGS: 00010286
<4>[  227.152300] RAX: 0000000000000026 RBX: ffff88005c359580 RCX: 0000000000000000
<4>[  227.152304] RDX: 0000000000000026 RSI: ffffffff8707d840 RDI: ffffed000b55af63
<4>[  227.152307] RBP: ffff880056817e58 R08: 0000000000000001 R09: 0000000000000000
<4>[  227.152311] R10: ffff88005aad7b88 R11: 0000000000000000 R12: ffff8800568184d0
<4>[  227.152314] R13: ffff880065b5ab08 R14: 0000000000000000 R15: dffffc0000000000
<4>[  227.152318] FS:  0000000000000000(0000) GS:ffff88006ac40000(0000) knlGS:0000000000000000
<4>[  227.152322] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  227.152325] CR2: 00007f5fb25550a8 CR3: 0000000068c78000 CR4: 00000000000006e0
<4>[  227.152328] Call Trace:
<4>[  227.152385]  intel_cleanup_plane_fb+0x6b/0xd0 [i915]
<4>[  227.152395]  drm_atomic_helper_cleanup_planes+0x166/0x280
<4>[  227.152452]  intel_atomic_commit_tail+0x159d/0x3380 [i915]
<4>[  227.152463]  ? process_one_work+0x66e/0x1460
<4>[  227.152516]  ? skl_update_crtcs+0x9c0/0x9c0 [i915]
<4>[  227.152523]  ? lock_acquire+0x13d/0x390
<4>[  227.152527]  ? lock_acquire+0x13d/0x390
<4>[  227.152534]  process_one_work+0x71a/0x1460
<4>[  227.152540]  ? __schedule+0x815/0x1e20
<4>[  227.152547]  ? pwq_dec_nr_in_flight+0x2b0/0x2b0
<4>[  227.152553]  ? _raw_spin_lock_irq+0xa/0x40
<4>[  227.152559]  worker_thread+0xdf/0xf60
<4>[  227.152569]  ? process_one_work+0x1460/0x1460
<4>[  227.152573]  kthread+0x2cf/0x3c0
<4>[  227.152578]  ? _kthread_create_on_node+0xa0/0xa0
<4>[  227.152583]  ret_from_fork+0x3a/0x50
<4>[  227.152591] Code: c6 00 11 86 c0 48 c7 c7 e0 bd 85 c0 e8 60 e7 a9 c4 0f ff e9 1f fe ff ff 48 c7 c6 40 10 86 c0 48 c7 c7 e0 ca 85 c0 e8 2b 95 bd c4 <0f> 0b 48 89 ef e8 4c 44 e8 c4 e9 ef fd ff ff e8 42 44 e8 c4 e9
<1>[  227.152720] RIP: intel_unpin_fb_vma+0x23a/0x2a0 [i915] RSP: ffff88005aad7b68

v2: i915_vma_pin_fence() is a no-op if a fence isn't required, so check
vma->fence as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180220134208.24988-2-chris@chris-wilson.co.uk
2018-02-20 19:03:59 +00:00
Chris Wilson
ac87a6fd36 drm/i915: Also check view->type for a normal GGTT view
We cannot simply use !view as shorthand for all normal GGTT views as a
few callers will always populate a i915_ggtt_view struct and set the
type to NORMAL instead. So check for (!view || view->type == NORMAL)
inside i915_gem_object_ggtt_pin().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180220134208.24988-1-chris@chris-wilson.co.uk
2018-02-20 19:03:59 +00:00
Chris Wilson
c9c7047154 drm/i915: Track number of pending freed objects
During igt, we frequently call into the driver to reset both HW and
driver state (idling the device, waiting for it to become idle and
freeing off old objects) to ensure that we start each test/subtest/pass
from known state. This process incurs an RCU barrier or two to ensure
that any such pending frees are indeed flushed before we return.
However, unconditionally waiting on the RCU barrier adds needless delay
to many callers, which adds up to several seconds when repeated thousands
of times. We can skip the rcu_barrier() if by tracking how many outstanding
frees we have, we know there are none.

The same path is used along suspend, where we may be able to save the
unconditional RCU barrier.

To put it into perspective with a completely meaningless
microbenchmark, igt/gem_sync/idle is improved from 50ms to 30us on bdw.

v2: Remove the extra synchronize_rcu() inside i915_drop_caches_set()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180219220631.25001-1-chris@chris-wilson.co.uk
2018-02-20 09:10:41 +00:00
Christian König
c0a51fd07b drm: move read_domains and write_domain into i915
i915 is the only driver using those fields in the drm_gem_object
structure, so they only waste memory for all other drivers.

Move the fields into drm_i915_gem_object instead and patch the i915 code
with the following sed commands:

sed -i "s/obj->base.read_domains/obj->read_domains/g" drivers/gpu/drm/i915/*.c drivers/gpu/drm/i915/*/*.c
sed -i "s/obj->base.write_domain/obj->write_domain/g" drivers/gpu/drm/i915/*.c drivers/gpu/drm/i915/*/*.c

Change is only compile tested.

v2: move fields around as suggested by Chris.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180216124338.9087-1-christian.koenig@amd.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2018-02-16 14:12:48 +00:00
Tvrtko Ursulin
c56b89f16d drm/i915: Use INTEL_GEN everywhere
Coccinelle patch:

 @@
 identifier p;
 @@
 -INTEL_INFO(p)->gen
 +INTEL_GEN(p)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180208130606.15556-12-tvrtko.ursulin@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180209215847.6660-1-chris@chris-wilson.co.uk
2018-02-09 22:29:02 +00:00
Chris Wilson
0d73e7a095 drm/i915: Mark the device as wedged from the beginning of set-wedged
Reduce the window of opportunity for set-wedged being called
concurrently with reset (after i915_reset() has performed the
i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
complete the inflight requests. When i915_reset() is being blocked on a
request, such completion may allow it to start and beginning resetting
the GPU before i915_gem_set_wedged() has finished (and so before
set-wedge will have marked the device as wedged). As such,
i915_gem_init_hw() may see a wedged device even from inside
i915_reset().

References: 36703e79a9 ("drm/i915: Break modeset deadlocks on reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207151350.20883-1-chris@chris-wilson.co.uk
2018-02-08 11:44:27 +00:00
Daniele Ceraolo Spurio
ce1599a40d drm/i915: do not stop engines on sanitize if i915.reset=0
Since commit 5896a5c8c9 (drm/i915: Always stop the rings before a
missing GPU reset) we attempt to stop the engines during gem_sanitize
even if reset=0 and nothing bad happened on the gpu.
The specs says that the STOP_RINGS bit needs to be cleared to resume
normal operation, but for some reason the value of the bit seems to be
changing without us writing to it (maybe rc6 entry/exit?), so normal
operation resumes correctly. However, it still feels incorrect to stop
the engines if there hasn't been any issue so skip the whole reset
call in gem_sanitize if i915.reset=0

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207212440.13438-1-daniele.ceraolospurio@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2018-02-08 07:34:32 +00:00
Chris Wilson
3fed180812 drm/i915: Move the scheduler feature bits into the purview of the engines
Rather than having the high level ioctl interface guess the underlying
implementation details, having the implementation declare what
capabilities it exports. We define an intel_driver_caps, similar to the
intel_device_info, which instead of trying to describe the HW gives
details on what the driver itself supports. This is then populated by
the engine backend for the new scheduler capability field for use
elsewhere.

v2: Use caps.scheduler for validating CONTEXT_PARAM_SET_PRIORITY (Mika)
    One less assumption of engine[RCS] \o/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207210544.26351-2-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
2018-02-08 07:30:11 +00:00
Chris Wilson
8177e11252 drm/i915: Tidy up some error messages around reset failure
On blb and pnv, we are seeing sporadic

  i915 0000:00:02.0: Resetting chip after gpu hang
  [drm:intel_gpu_reset [i915]] rcs0: timed out on STOP_RING
  [drm:i915_reset [i915]] *ERROR* Failed hw init on reset -5

which notably lack the actual root cause of the error. Ostensibly it
should be the init_ring_common() that failed, but it's error paths are
covered by DRM_ERROR.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207111545.17078-1-chris@chris-wilson.co.uk
2018-02-07 13:12:32 +00:00
Chris Wilson
01b8fdc522 drm/i915: Skip post-reset request emission if the engine is not idle
Since commit 7b6da818d8 ("drm/i915: Restore the kernel context after a
GPU reset on an idle engine") we submit a request following the engine
reset. The intent is that we don't submit a request if the engine is
busy (as it will restart active by itself) but we only checked to see if
there were remaining requests in flight on the hardware and skipped
checking to see if there were any ready requests that would be
immediately submitted on restart (the same time as our new request would
be). Having convinced the engine to appear idle in the previous patch,
we can use intel_engine_is_idle() as a better test to only submit a new
request if there are no pending requests.

As it happens, this is tripping up igt/drv_selftest/live_hangcheck in CI
as we overfill the kernel_context ringbuffer trigger an infinite
recursion from within the reset.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104786
References: 7b6da818d8 ("drm/i915: Restore the kernel context after a GPU reset on an idle engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205152431.12163-4-chris@chris-wilson.co.uk
2018-02-05 15:27:26 +00:00
Chris Wilson
24eae08d44 drm/i915: Remove unbannable context spam from reset
During testing, we trigger a lot of resets on an unbannable context
leading to massive amounts of irrelevant debug spam. Remove the
ban_score accounting and message for the unbannable context so that we
improve the signal:noise in the log messages for when the unexpected
occurs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205092201.19476-7-chris@chris-wilson.co.uk
2018-02-05 13:24:45 +00:00
Chris Wilson
559e040f1f drm/i915: Show the GPU state when declaring wedged
Dump each engine state when i915_gem_set_wedged() is called to give us
some more clues as to why we had to terminate the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205092201.19476-5-chris@chris-wilson.co.uk
2018-02-05 13:23:40 +00:00
Chris Wilson
9e519bc8b9 drm/i915: Add some newlines to intel_engine_dump() headers
The headers should be on a separate line for consistency, so add the
missing trailing newline in a few intel_engine_dump() callers.

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205100618.11001-1-chris@chris-wilson.co.uk
2018-02-05 10:59:59 +00:00
Chris Wilson
889230489b drm/i915: Always run hangcheck while the GPU is busy
Previously, we relied on only running the hangcheck while somebody was
waiting on the GPU, in order to minimise the amount of time hangcheck
had to run. (If nobody was watching the GPU, nobody would notice if the
GPU wasn't responding -- eventually somebody would care and so kick
hangcheck into action.) However, this falls apart from around commit
4680816be3 ("drm/i915: Wait first for submission, before waiting for
request completion"), as not all waiters declare themselves to hangcheck
and so we could switch off hangcheck and miss GPU hangs even when
waiting under the struct_mutex.

If we enable hangcheck from the first request submission, and let it run
until the GPU is idle again, we forgo all the complexity involved with
only enabling around waiters. We just have to remember to be careful that
we do not declare a GPU hang when idly waiting for the next request to
be come ready, as we will run hangcheck continuously even when the
engines are stalled waiting for external events. This should be true
already as we should only be tracking requests submitted to hardware for
execution as an indicator that the engine is busy.

Fixes: 4680816be3 ("drm/i915: Wait first for submission, before waiting for request completion"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104840
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180129144104.3921-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2018-01-31 10:10:43 +00:00
Sagar Arun Kamble
70deeaddc6 drm/i915/guc: Fix lockdep due to log relay channel handling under struct_mutex
This patch fixes lockdep issue due to circular locking dependency of
struct_mutex, i_mutex_key, mmap_sem, relay_channels_mutex.
For GuC log relay channel we create debugfs file that requires i_mutex_key
lock and we are doing that under struct_mutex. So we introduced newer
dependency as:
    &dev->struct_mutex --> &sb->s_type->i_mutex_key#3 --> &mm->mmap_sem
However, there is dependency from mmap_sem to struct_mutex. Hence we
separate the relay create/destroy operation from under struct_mutex.
Also added runtime check of relay buffer status.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc6-CI-Patchwork_7614+ #1 Not tainted
------------------------------------------------------
debugfs_test/1388 is trying to acquire lock:
 (&dev->struct_mutex){+.+.}, at: [<00000000d5e1d915>] i915_mutex_lock_interruptible+0x47/0x130 [i915]

but task is already holding lock:
 (&mm->mmap_sem){++++}, at: [<0000000029a9c131>] __do_page_fault+0x106/0x560

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&mm->mmap_sem){++++}:
       _copy_to_user+0x1e/0x70
       filldir+0x8c/0xf0
       dcache_readdir+0xeb/0x160
       iterate_dir+0xdc/0x140
       SyS_getdents+0xa0/0x130
       entry_SYSCALL_64_fastpath+0x1c/0x89

-> #2 (&sb->s_type->i_mutex_key#3){++++}:
       start_creating+0x59/0x110
       __debugfs_create_file+0x2e/0xe0
       relay_create_buf_file+0x62/0x80
       relay_late_setup_files+0x84/0x250
       guc_log_late_setup+0x4f/0x110 [i915]
       i915_guc_log_register+0x32/0x40 [i915]
       i915_driver_load+0x7b6/0x1720 [i915]
       i915_pci_probe+0x2e/0x90 [i915]
       pci_device_probe+0x9c/0x120
       driver_probe_device+0x2a3/0x480
       __driver_attach+0xd9/0xe0
       bus_for_each_dev+0x57/0x90
       bus_add_driver+0x168/0x260
       driver_register+0x52/0xc0
       do_one_initcall+0x39/0x150
       do_init_module+0x56/0x1ef
       load_module+0x231c/0x2d70
       SyS_finit_module+0xa5/0xe0
       entry_SYSCALL_64_fastpath+0x1c/0x89

-> #1 (relay_channels_mutex){+.+.}:
       relay_open+0x12c/0x2b0
       intel_guc_log_runtime_create+0xab/0x230 [i915]
       intel_guc_init+0x81/0x120 [i915]
       intel_uc_init+0x29/0xa0 [i915]
       i915_gem_init+0x182/0x530 [i915]
       i915_driver_load+0xaa9/0x1720 [i915]
       i915_pci_probe+0x2e/0x90 [i915]
       pci_device_probe+0x9c/0x120
       driver_probe_device+0x2a3/0x480
       __driver_attach+0xd9/0xe0
       bus_for_each_dev+0x57/0x90
       bus_add_driver+0x168/0x260
       driver_register+0x52/0xc0
       do_one_initcall+0x39/0x150
       do_init_module+0x56/0x1ef
       load_module+0x231c/0x2d70
       SyS_finit_module+0xa5/0xe0
       entry_SYSCALL_64_fastpath+0x1c/0x89

-> #0 (&dev->struct_mutex){+.+.}:
       __mutex_lock+0x81/0x9b0
       i915_mutex_lock_interruptible+0x47/0x130 [i915]
       i915_gem_fault+0x201/0x790 [i915]
       __do_fault+0x15/0x70
       __handle_mm_fault+0x677/0xdc0
       handle_mm_fault+0x14f/0x2f0
       __do_page_fault+0x2d1/0x560
       page_fault+0x4c/0x60

other info that might help us debug this:

Chain exists of:
  &dev->struct_mutex --> &sb->s_type->i_mutex_key#3 --> &mm->mmap_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(&sb->s_type->i_mutex_key#3);
                               lock(&mm->mmap_sem);
  lock(&dev->struct_mutex);

 *** DEADLOCK ***

1 lock held by debugfs_test/1388:
 #0:  (&mm->mmap_sem){++++}, at: [<0000000029a9c131>] __do_page_fault+0x106/0x560

stack backtrace:
CPU: 2 PID: 1388 Comm: debugfs_test Not tainted 4.15.0-rc6-CI-Patchwork_7614+ #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.10 09/29/2016
Call Trace:
 dump_stack+0x5f/0x86
 print_circular_bug.isra.18+0x1d0/0x2c0
 __lock_acquire+0x14ae/0x1b60
 ? lock_acquire+0xaf/0x200
 lock_acquire+0xaf/0x200
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 __mutex_lock+0x81/0x9b0
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? __pm_runtime_resume+0x4f/0x80
 i915_gem_fault+0x201/0x790 [i915]
 __do_fault+0x15/0x70
 ? _raw_spin_unlock+0x29/0x40
 __handle_mm_fault+0x677/0xdc0
 handle_mm_fault+0x14f/0x2f0
 __do_page_fault+0x2d1/0x560
 ? page_fault+0x36/0x60
 page_fault+0x4c/0x60

v2: Added lock protection to guc->log.runtime.relay_chan (Chris)
    Fixed locking inside guc_flush_logs uncovered by new lockdep.

v3: Locking guc_read_update_log_buffer entirely with relay_lock. (Chris)
    Prepared intel_guc_init_early. Moved relay_lock inside relay_create
    relay_destroy, relay_file_create, guc_read_update_log_buffer. (Michal)
    Removed struct_mutex lock around guc_log_flush and removed usage
    of guc_log_has_relay() from runtime_create path as it needs
    struct_mutex lock.

v4: Handle NULL relay sub buffer pointer earlier in read_update_log_buffer
    (Chris). Fixed comment suffix **/. (Michal)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104693
Testcase: igt/debugfs_test/read_all_entries # with enable_guc=1 and guc_log_level=1
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1516808821-3638-3-git-send-email-sagar.a.kamble@intel.com
2018-01-24 19:44:04 +00:00
Chris Wilson
84a1074920 drm/i915: Shrink the GEM kmem_caches upon idling
When we finally decide the gpu is idle, that is a good time to shrink
our kmem_caches.

v3: Defer until an rcu grace period after we idle.
v4: Think about epoch wraparound and how likely that is.
v5: Use I915_EPOCH_INVALID magic.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180124113608.14909-2-chris@chris-wilson.co.uk
2018-01-24 15:28:37 +00:00
Chris Wilson
e9af4ea2b9 drm/i915: Avoid waitboosting on the active request
Watching a light workload on Baytrail (running glxgears and a 1080p
decode), instead of the system remaining at low frequency, the glxgears
would regularly trigger waitboosting after which it would have to spend
a few seconds throttling back down. In this case, the waitboosting is
counter productive as the minimal wait for glxgears doesn't prevent it
from functioning correctly and delivering frames on time. In this case,
glxgears happens to almost always be waiting on the current request,
which we already expect to complete quickly (see i915_spin_request) and
so avoiding the waitboost on the active request and spinning instead
provides the best latency without overcommitting to upclocking.
However, if the system falls behind we still force the waitboost.
Similarly, we will also trigger upclocking if we detect the system is
not delivering frames on time - again using a mechanism that tries to
detect a miss and not preemptively upclock.

v2: Also skip boosting for after missed vblank if the desired request is
already active.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180118131609.16574-1-chris@chris-wilson.co.uk
2018-01-18 17:14:30 +00:00
Chris Wilson
2ef1e729c7 drm/i915: Rewrite some comments around RCU-deferred object free
Tvrtko noticed that the comments describing the interaction of RCU and
the deferred worker for freeing drm_i915_gem_object were a little
confusing, so attempt to bring some sense to them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180115205759.13884-1-chris@chris-wilson.co.uk
2018-01-16 10:47:39 +00:00
Chris Wilson
beacbd1615 drm/i915: Use our singlethreaded wq for freeing objects
As freeing the objects require serialisation on struct_mutex, we should
prefer to use our singlethreaded driver wq that is dedicated to work
requiring struct_mutex (hence serialised).The benefit should be less
clutter on the system wq, allowing it to make progress even when the
driver/struct_mutex is heavily contended.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180115122846.15193-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2018-01-15 20:33:01 +00:00
Sagar Arun Kamble
da943b5ab0 drm/i915/guc: Add uc_fini_wq in gem_init unwind path
While moving code around for solving lockdep issue for GuC log relay,
spotted that uc_fini_wq is not being called in failure path in gem_init.
Missed in the below commit. Add it.

v2: Removed GEM_BUG_ON(!HAS_GUC()) from intel_uc_fini_wq as init happens
only based on enable_guc module parameter and does not consider has_guc
capability. (Michal)

Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Fixes: 3176ff49bc ("drm/i915/guc: Move GuC workqueue allocations outside of the mutex")
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1515588857-10283-1-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2018-01-10 14:03:10 +00:00
Chris Wilson
c218ee03b9 drm/i915: Don't adjust priority on an already signaled fence
When we retire a signaled fence, we free the dependency tree. However,
we skip clearing the list so that if we then try to adjust the priority
of the signaled fence, we may walk the list of freed dependencies.

[ 3083.156757] ==================================================================
[ 3083.156806] BUG: KASAN: use-after-free in execlists_schedule+0x199/0x660 [i915]
[ 3083.156810] Read of size 8 at addr ffff8806bf20f400 by task Xorg/831

[ 3083.156815] CPU: 0 PID: 831 Comm: Xorg Not tainted 4.15.0-rc6-no-psn+ #1
[ 3083.156817] Hardware name: Notebook                         N24_25BU/N24_25BU, BIOS 5.12 02/17/2017
[ 3083.156818] Call Trace:
[ 3083.156823]  dump_stack+0x5c/0x7a
[ 3083.156827]  print_address_description+0x6b/0x290
[ 3083.156830]  kasan_report+0x28f/0x380
[ 3083.156872]  ? execlists_schedule+0x199/0x660 [i915]
[ 3083.156914]  execlists_schedule+0x199/0x660 [i915]
[ 3083.156956]  ? intel_crtc_atomic_check+0x146/0x4e0 [i915]
[ 3083.156997]  ? execlists_submit_request+0xe0/0xe0 [i915]
[ 3083.157038]  ? i915_vma_misplaced.part.4+0x25/0xb0 [i915]
[ 3083.157079]  ? __i915_vma_do_pin+0x7c8/0xc80 [i915]
[ 3083.157121]  ? intel_atomic_state_alloc+0x44/0x60 [i915]
[ 3083.157130]  ? drm_atomic_helper_page_flip+0x3e/0xb0 [drm_kms_helper]
[ 3083.157145]  ? drm_mode_page_flip_ioctl+0x7d2/0x850 [drm]
[ 3083.157159]  ? drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.157172]  ? drm_ioctl+0x45b/0x560 [drm]
[ 3083.157211]  i915_gem_object_wait_priority+0x14c/0x2c0 [i915]
[ 3083.157251]  ? i915_gem_get_aperture_ioctl+0x150/0x150 [i915]
[ 3083.157290]  ? i915_vma_pin_fence+0x1d8/0x320 [i915]
[ 3083.157331]  ? intel_pin_and_fence_fb_obj+0x175/0x250 [i915]
[ 3083.157372]  ? intel_rotation_info_size+0x60/0x60 [i915]
[ 3083.157413]  ? intel_link_compute_m_n+0x80/0x80 [i915]
[ 3083.157428]  ? drm_dev_printk+0x1b0/0x1b0 [drm]
[ 3083.157443]  ? drm_dev_printk+0x1b0/0x1b0 [drm]
[ 3083.157485]  intel_prepare_plane_fb+0x2f8/0x5a0 [i915]
[ 3083.157527]  ? intel_crtc_get_vblank_counter+0x80/0x80 [i915]
[ 3083.157536]  drm_atomic_helper_prepare_planes+0xa0/0x1c0 [drm_kms_helper]
[ 3083.157587]  intel_atomic_commit+0x12e/0x4e0 [i915]
[ 3083.157605]  drm_atomic_helper_page_flip+0xa2/0xb0 [drm_kms_helper]
[ 3083.157621]  drm_mode_page_flip_ioctl+0x7d2/0x850 [drm]
[ 3083.157638]  ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
[ 3083.157652]  ? drm_lease_owner+0x1a/0x30 [drm]
[ 3083.157668]  ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
[ 3083.157681]  drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.157696]  drm_ioctl+0x45b/0x560 [drm]
[ 3083.157711]  ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
[ 3083.157725]  ? drm_getstats+0x20/0x20 [drm]
[ 3083.157729]  ? timerqueue_del+0x49/0x80
[ 3083.157732]  ? __remove_hrtimer+0x62/0xb0
[ 3083.157735]  ? hrtimer_try_to_cancel+0x173/0x210
[ 3083.157738]  do_vfs_ioctl+0x13b/0x880
[ 3083.157741]  ? ioctl_preallocate+0x140/0x140
[ 3083.157744]  ? _raw_spin_unlock_irq+0xe/0x30
[ 3083.157746]  ? do_setitimer+0x234/0x370
[ 3083.157750]  ? SyS_setitimer+0x19e/0x1b0
[ 3083.157752]  ? SyS_alarm+0x140/0x140
[ 3083.157755]  ? __rcu_read_unlock+0x66/0x80
[ 3083.157757]  ? __fget+0xc4/0x100
[ 3083.157760]  SyS_ioctl+0x74/0x80
[ 3083.157763]  entry_SYSCALL_64_fastpath+0x1a/0x7d
[ 3083.157765] RIP: 0033:0x7f6135d0c6a7
[ 3083.157767] RSP: 002b:00007fff01451888 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 3083.157769] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f6135d0c6a7
[ 3083.157771] RDX: 00007fff01451950 RSI: 00000000c01864b0 RDI: 000000000000000c
[ 3083.157772] RBP: 00007f613076f600 R08: 0000000000000001 R09: 0000000000000000
[ 3083.157773] R10: 0000000000000060 R11: 0000000000003246 R12: 0000000000000000
[ 3083.157774] R13: 0000000000000060 R14: 000000000000001b R15: 0000000000000060

[ 3083.157779] Allocated by task 831:
[ 3083.157783]  kmem_cache_alloc+0xc0/0x200
[ 3083.157822]  i915_gem_request_await_dma_fence+0x2c4/0x5d0 [i915]
[ 3083.157861]  i915_gem_request_await_object+0x321/0x370 [i915]
[ 3083.157900]  i915_gem_do_execbuffer+0x1165/0x19c0 [i915]
[ 3083.157937]  i915_gem_execbuffer2+0x1ad/0x550 [i915]
[ 3083.157950]  drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.157962]  drm_ioctl+0x45b/0x560 [drm]
[ 3083.157964]  do_vfs_ioctl+0x13b/0x880
[ 3083.157966]  SyS_ioctl+0x74/0x80
[ 3083.157968]  entry_SYSCALL_64_fastpath+0x1a/0x7d

[ 3083.157971] Freed by task 831:
[ 3083.157973]  kmem_cache_free+0x77/0x220
[ 3083.158012]  i915_gem_request_retire+0x72c/0xa70 [i915]
[ 3083.158051]  i915_gem_request_alloc+0x1e9/0x8b0 [i915]
[ 3083.158089]  i915_gem_do_execbuffer+0xa96/0x19c0 [i915]
[ 3083.158127]  i915_gem_execbuffer2+0x1ad/0x550 [i915]
[ 3083.158140]  drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.158153]  drm_ioctl+0x45b/0x560 [drm]
[ 3083.158155]  do_vfs_ioctl+0x13b/0x880
[ 3083.158156]  SyS_ioctl+0x74/0x80
[ 3083.158158]  entry_SYSCALL_64_fastpath+0x1a/0x7d

[ 3083.158162] The buggy address belongs to the object at ffff8806bf20f400
                which belongs to the cache i915_dependency of size 64
[ 3083.158166] The buggy address is located 0 bytes inside of
                64-byte region [ffff8806bf20f400, ffff8806bf20f440)
[ 3083.158168] The buggy address belongs to the page:
[ 3083.158171] page:00000000d43decc4 count:1 mapcount:0 mapping:          (null) index:0x0
[ 3083.158174] flags: 0x17ffe0000000100(slab)
[ 3083.158179] raw: 017ffe0000000100 0000000000000000 0000000000000000 0000000180200020
[ 3083.158182] raw: ffffea001afc16c0 0000000500000005 ffff880731b881c0 0000000000000000
[ 3083.158184] page dumped because: kasan: bad access detected

[ 3083.158187] Memory state around the buggy address:
[ 3083.158190]  ffff8806bf20f300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158192]  ffff8806bf20f380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158195] >ffff8806bf20f400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158196]                    ^
[ 3083.158199]  ffff8806bf20f480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158201]  ffff8806bf20f500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158203] ==================================================================

Reported-by: Alexandru Chirvasitu <achirvasub@gmail.com>
Reported-by: Mike Keehan <mike@keehan.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104436
Fixes: 1f181225f8 ("drm/i915/execlists: Keep request->priority for its lifetime")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alexandru Chirvasitu <achirvasub@gmail.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tested-by: Alexandru Chirvasitu <achirvasub@gmail.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180106105618.13532-1-chris@chris-wilson.co.uk
2018-01-08 09:16:02 +00:00
Chris Wilson
fcb1de54e2 drm/i915: Add a strong mb to resetting the has-CS-interrupt bit
After a reset, the state of the CSB registers are scrubbed and not valid
until a powercontext is reloaded. We only know when a powercontext has
been reloaded once we see a CS-interrupt, before then we must ignore the
CSB registers within the execlists_submission_tasklet. However, glk is
sporadically dying with an illegal CSB pointer value (both in the HSWP
and mmio) suggesting that it is running with the CS-interrupt bit set
before the powercontext has been reloaded. Make sure the clearing of
that bit is serialised on reset with the re-enabling of the tasklet.

References: https://bugs.freedesktop.org/show_bug.cgi?id=104262
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219090110.11153-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-12-19 12:43:14 +00:00
Matthew Auld
b65a9b9821 drm/i915: prefer i915_gem_object_has_pages()
We have an existing helper for testing obj->mm.pages, so use it.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171218103855.25274-1-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-12-18 11:53:29 +00:00
Chris Wilson
7b6da818d8 drm/i915: Restore the kernel context after a GPU reset on an idle engine
As part of the system requirement for powersaving is that we always have
a context loaded. Upon boot and resume, we load the kernel_context to
ensure that some valid state is set before powersaving kicks in, we
should do so after a full GPU reset as well. We only need to do so for
an idle engine, as any active engines will restart by executing the
stuck request, loading its context. For the idle engine, we create a
new request to load the kernel_context instead.

For whatever reason, perfoming a dummy execute on the idle engine after
reset papers over a subsequent GPU hang in rare circumstances, even on
machines not using contexts (e.g. Pineview).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104259
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104261
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171216000334.8197-1-chris@chris-wilson.co.uk
2017-12-16 09:37:35 +00:00
Michał Winiarski
61b5c1587d drm/i915/guc: Extract guc_init from guc_init_hw
After GPU reset, GuC HW needs to be reinitialized (with FW reload).
Unfortunately, we're doing some extra work there (mostly allocating stuff),
work that can be moved to guc_init and called once at driver load time.

As a side effect we're no longer hitting an assert in
i915_ggtt_enable_guc on suspend/resume.

v2: Do not duplicate disable_communication / reset_guc_interrupts
v3: Add proper teardown after rebase

References: 04f7b24ecc ("drm/i915/guc: Assert that we switch between known ggtt->invalidate functions")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-3-michal.winiarski@intel.com
2017-12-14 08:06:56 +00:00
Michał Winiarski
3176ff49bc drm/i915/guc: Move GuC workqueue allocations outside of the mutex
This gets rid of the following lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc2-CI-Patchwork_7428+ #1 Not tainted
------------------------------------------------------
debugfs_test/1351 is trying to acquire lock:
 (&dev->struct_mutex){+.+.}, at: [<000000009d90d1a3>] i915_mutex_lock_interruptible+0x47/0x130 [i915]

but task is already holding lock:
 (&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&mm->mmap_sem){++++}:
       __might_fault+0x63/0x90
       _copy_to_user+0x1e/0x70
       filldir+0x8c/0xf0
       dcache_readdir+0xeb/0x160
       iterate_dir+0xe6/0x150
       SyS_getdents+0xa0/0x130
       entry_SYSCALL_64_fastpath+0x1c/0x89

-> #5 (&sb->s_type->i_mutex_key#5){++++}:
       lockref_get+0x9/0x20

-> #4 ((completion)&req.done){+.+.}:
       wait_for_common+0x54/0x210
       devtmpfs_create_node+0x130/0x150
       device_add+0x5ad/0x5e0
       device_create_groups_vargs+0xd4/0xe0
       device_create+0x35/0x40
       msr_device_create+0x22/0x40
       cpuhp_invoke_callback+0xc5/0xbf0
       cpuhp_thread_fun+0x167/0x210
       smpboot_thread_fn+0x17f/0x270
       kthread+0x173/0x1b0
       ret_from_fork+0x24/0x30

-> #3 (cpuhp_state-up){+.+.}:
       cpuhp_issue_call+0x132/0x1c0
       __cpuhp_setup_state_cpuslocked+0x12f/0x2a0
       __cpuhp_setup_state+0x3a/0x50
       page_writeback_init+0x3a/0x5c
       start_kernel+0x393/0x3e2
       secondary_startup_64+0xa5/0xb0

-> #2 (cpuhp_state_mutex){+.+.}:
       __mutex_lock+0x81/0x9b0
       __cpuhp_setup_state_cpuslocked+0x4b/0x2a0
       __cpuhp_setup_state+0x3a/0x50
       page_alloc_init+0x1f/0x26
       start_kernel+0x139/0x3e2
       secondary_startup_64+0xa5/0xb0

-> #1 (cpu_hotplug_lock.rw_sem){++++}:
       cpus_read_lock+0x34/0xa0
       apply_workqueue_attrs+0xd/0x40
       __alloc_workqueue_key+0x2c7/0x4e1
       intel_guc_submission_init+0x10c/0x650 [i915]
       intel_uc_init_hw+0x29e/0x460 [i915]
       i915_gem_init_hw+0xca/0x290 [i915]
       i915_gem_init+0x115/0x3a0 [i915]
       i915_driver_load+0x9a8/0x16c0 [i915]
       i915_pci_probe+0x2e/0x90 [i915]
       pci_device_probe+0x9c/0x120
       driver_probe_device+0x2a3/0x480
       __driver_attach+0xd9/0xe0
       bus_for_each_dev+0x57/0x90
       bus_add_driver+0x168/0x260
       driver_register+0x52/0xc0
       do_one_initcall+0x39/0x150
       do_init_module+0x56/0x1ef
       load_module+0x231c/0x2d70
       SyS_finit_module+0xa5/0xe0
       entry_SYSCALL_64_fastpath+0x1c/0x89

-> #0 (&dev->struct_mutex){+.+.}:
       lock_acquire+0xaf/0x200
       __mutex_lock+0x81/0x9b0
       i915_mutex_lock_interruptible+0x47/0x130 [i915]
       i915_gem_fault+0x201/0x760 [i915]
       __do_fault+0x15/0x70
       __handle_mm_fault+0x85b/0xe40
       handle_mm_fault+0x14f/0x2f0
       __do_page_fault+0x2d1/0x560
       page_fault+0x22/0x30

other info that might help us debug this:

Chain exists of:
  &dev->struct_mutex --> &sb->s_type->i_mutex_key#5 --> &mm->mmap_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(&sb->s_type->i_mutex_key#5);
                               lock(&mm->mmap_sem);
  lock(&dev->struct_mutex);

 *** DEADLOCK ***

1 lock held by debugfs_test/1351:
 #0:  (&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560

stack backtrace:
CPU: 2 PID: 1351 Comm: debugfs_test Not tainted 4.15.0-rc2-CI-Patchwork_7428+ #1
Hardware name:                  /NUC6i5SYB, BIOS SYSKLi35.86A.0057.2017.0119.1758 01/19/2017
Call Trace:
 dump_stack+0x5f/0x86
 print_circular_bug+0x230/0x3b0
 check_prev_add+0x439/0x7b0
 ? lockdep_init_map_crosslock+0x20/0x20
 ? unwind_get_return_address+0x16/0x30
 ? __lock_acquire+0x1385/0x15a0
 __lock_acquire+0x1385/0x15a0
 lock_acquire+0xaf/0x200
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 __mutex_lock+0x81/0x9b0
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? __pm_runtime_resume+0x4f/0x80
 i915_gem_fault+0x201/0x760 [i915]
 __do_fault+0x15/0x70
 __handle_mm_fault+0x85b/0xe40
 handle_mm_fault+0x14f/0x2f0
 __do_page_fault+0x2d1/0x560
 page_fault+0x22/0x30
RIP: 0033:0x7f98d6f49116
RSP: 002b:00007ffd6ffc3278 EFLAGS: 00010283
RAX: 00007f98d39a2bc0 RBX: 0000000000000000 RCX: 0000000000001680
RDX: 0000000000001680 RSI: 00007ffd6ffc3400 RDI: 00007f98d39a2bc0
RBP: 00007ffd6ffc33a0 R08: 0000000000000000 R09: 00000000000005a0
R10: 000055e847c2a830 R11: 0000000000000002 R12: 0000000000000001
R13: 000055e847c1d040 R14: 00007ffd6ffc3400 R15: 00007f98d6752ba0

v2: Init preempt_work unconditionally (Chris)
v3: Mention that we need the enable_guc=1 for lockdep splat (Chris)

Testcase: igt/debugfs_test/read_all_entries # with i915.enable_guc=1
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-2-michal.winiarski@intel.com
2017-12-14 08:06:54 +00:00
Chris Wilson
6ca9a2beb5 drm/i915: Unwind i915_gem_init() failure
Since Michal introduced new user controllable errors other than -EIO
during i915_gem_init(), we need to actually unwind on the error path as
we have to abort the module load (and we expect to do so cleanly!).

As we now teardown key state and then mark the driver as wedged (on
EIO), we have to be careful to not allow ourselves to resume and
unwedge, thus attempting to use the uninitialised driver.

v2: Try not to free driver state for the suppressed EIO
v3: Use load-fault-injection to test both error/recovery paths.

References: 8620eb1dbb ("drm/i915/uc: Don't use -EIO to report missing firmware")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213134347.4608-1-chris@chris-wilson.co.uk
2017-12-13 18:53:35 +00:00
Chris Wilson
d7dc4131eb drm/i915: Don't check #active_requests from i915_gem_wait_for_idle()
i915_gem_wait_for_idle() is called from inside the shrinker, to ensure
that we drain the last resources from the GPU in dire circumstances (OOM).
As we may allocate whilst building a request, it is then possible to hit
the shrinker with a request under construction, and so we must account
for the incomplete request whilst waiting. In particular, we
preincrement (in reserve_engine) the i915->gt.active_requests counter
and mark the GPU as busy, therefore we can not use that counter for
shortcircuiting the wait-for-idle.

[  950.859024] GEM_BUG_ON(i915->gt.active_requests)
[  950.859041] WARNING: CPU: 2 PID: 2178 at drivers/gpu/drm/i915/i915_gem.c:3615 i915_gem_wait_for_idle.part.56+0x166/0x4e0
[  950.859041] Modules linked in: ccm tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hda_core btrtl btbcm iwlwifi snd_hwdep btintel bluetooth snd_seq snd_seq_device snd_pcm ecdh_generic x86_pkg_temp_thermal tpm_infineon coretemp tpm_tis crc32_pclmul wmi_bmof crc32c_intel iTCO_wdt hp_wmi snd_timer iTCO_vendor_support sparse_keymap tpm_tis_core mei_me cfg80211
[  950.859082]  snd joydev tpm mei rfkill pcspkr wmi soundcore lpc_ich hp_accel lis3lv02d input_polldev binfmt_misc e1000e ptp serio_raw pps_core
[  950.859094] CPU: 2 PID: 2178 Comm: gem_exec_nop Tainted: G     U           4.15.0-rc2+ #900
[  950.859102] Hardware name: Hewlett-Packard HP ProBook 6360b/1620, BIOS 68SCF Ver. B.42 12/29/2010
[  950.859107] task: c5119cb4 task.stack: f3ccb8d8
[  950.859112] EIP: i915_gem_wait_for_idle.part.56+0x166/0x4e0
[  950.859113] EFLAGS: 00010296 CPU: 2
[  950.859114] EAX: 00000024 EBX: f36c1888 ECX: f777a044 EDX: 00000007
[  950.859115] ESI: f36c1888 EDI: edd53958 EBP: edd53970 ESP: edd53938
[  950.859116]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  950.859117] CR0: 80050033 CR2: b7f39000 CR3: 2f2b3000 CR4: 000406d0
[  950.859118] Call Trace:
[  950.859125]  ? drm_printk+0x70/0x70
[  950.859129]  i915_gem_wait_for_idle+0x18/0x30
[  950.859133]  i915_gem_shrink+0x360/0x410
[  950.859138]  ? vmpressure+0xa8/0xf0
[  950.859142]  ? ktime_get+0x4a/0x100
[  950.859147]  i915_gem_shrink_all+0x21/0x40
[  950.859151]  i915_gem_shrinker_oom+0x23/0x130
[  950.859156]  notifier_call_chain+0x4e/0x70
[  950.859160]  __blocking_notifier_call_chain+0x2f/0x60
[  950.859164]  blocking_notifier_call_chain+0x11/0x20
[  950.859169]  out_of_memory+0x207/0x280
[  950.859174]  __alloc_pages_nodemask+0xd47/0xe60
[  950.859179]  new_slab+0x32d/0x450
[  950.859183]  ___slab_alloc.constprop.81+0x358/0x4e0
[  950.859189]  ? i915_sw_fence_await_dma_fence+0x53/0x160
[  950.859193]  ? __slab_free+0x1fe/0x310
[  950.859197]  ? native_sched_clock+0x1e/0xc0
[  950.859201]  ? i915_gem_request_alloc+0xcf/0x510
[  950.859205]  ? sched_clock+0x9/0x10
[  950.859209]  __slab_alloc.constprop.80+0x29/0x40
[  950.859212]  ? __slab_alloc.constprop.80+0x29/0x40
[  950.859216]  kmem_cache_alloc_trace+0x160/0x1a0
[  950.859220]  ? i915_sw_fence_await_dma_fence+0x53/0x160
[  950.859224]  i915_sw_fence_await_dma_fence+0x53/0x160
[  950.859229]  i915_gem_request_await_dma_fence+0x1eb/0x390
[  950.859233]  i915_gem_request_await_object+0xee/0x230
[  950.859239]  i915_gem_do_execbuffer+0xc16/0x1200
[  950.859246]  ? irqtime_account_irq+0x3e/0xc0
[  950.859251]  ? irq_exit+0x4f/0xb0
[  950.859257]  ? smp_apic_timer_interrupt+0x5f/0x110
[  950.859261]  ? apic_timer_interrupt+0x35/0x3c
[  950.859266]  i915_gem_execbuffer2_ioctl+0x212/0x440
[  950.859270]  ? apic_timer_interrupt+0x35/0x3c
[  950.859274]  ? i915_gem_do_execbuffer+0x1200/0x1200
[  950.859279]  ? insn_get_seg_base+0x1b/0x50
[  950.859283]  ? i915_gem_do_execbuffer+0x1200/0x1200
[  950.859287]  drm_ioctl_kernel+0x51/0xa0
[  950.859291]  drm_ioctl+0x2a3/0x350
[  950.859294]  ? i915_gem_do_execbuffer+0x1200/0x1200
[  950.859300]  ? sched_clock+0x9/0x10
[  950.859303]  ? drm_getunique+0x70/0x70
[  950.859308]  do_vfs_ioctl+0x7d/0x640
[  950.859311]  ? native_sched_clock+0x1e/0xc0
[  950.859315]  ? sched_clock+0x9/0x10
[  950.859319]  ? sched_clock_cpu+0x13/0x120
[  950.859323]  SyS_ioctl+0x4e/0x80
[  950.859326]  do_fast_syscall_32+0x75/0x250
[  950.859331]  ? irq_exit+0x4f/0xb0
[  950.859334]  entry_SYSENTER_32+0x47/0x71
[  950.859338] EIP: 0xb7f81d11
[  950.859339] EFLAGS: 00000296 CPU: 2
[  950.859340] EAX: ffffffda EBX: 00000003 ECX: 40406469 EDX: bfde4c20
[  950.859340] ESI: 00000003 EDI: 40406469 EBP: 00000003 ESP: bfde4b38
[  950.859341]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[  950.859343] Code: e8 30 60 01 00 83 c4 10 83 c3 04 39 f3 75 e0 8b 45 d8 8b 80 14 37 00 00 85 c0 74 13 68 dd 33 e4 c0 68 49 6f e3 c0 e8 4a 55 be ff <0f> ff 5e 5f b8 fe ff ff 3f bb 0a 00 00 00 e8 b7 14 c4 ff 8b 15

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171212132148.8124-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-12-13 11:15:38 +00:00
Chris Wilson
59e4b19d62 drm/i915: Dump the engine state before declaring wedged from wait_for_engines()
If wait_for_engines() fails and we resort to declaring the HW wedged,
dump the engine state for debugging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211194135.27095-2-chris@chris-wilson.co.uk
2017-12-12 21:07:41 +00:00
Chris Wilson
ee42c00e1c drm/i915: Bump timeout for wait_for_engines()
Extract the timeout we use in i915_gem_idle_work_handler() and reuse it
for wait_for_engines() in i915_gem_wait_for_idle(). It too has the same
problem in sometimes having to wait for an extended period before the HW
settles, so make use of the same timeout.

References: 5427f20785 ("drm/i915: Bump wait-times for the final CS interrupt before parking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211194135.27095-1-chris@chris-wilson.co.uk
2017-12-12 21:07:41 +00:00
Matthew Auld
73ebd50303 drm/i915: make mappable struct resource centric
Now that we are using struct resource to track the stolen region, it is
more convenient if we track the mappable region in a resource as well.

v2: prefer iomap and gmadr naming scheme
    prefer DEFINE_RES_MEM

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-8-matthew.auld@intel.com
2017-12-12 12:30:21 +02:00
Tvrtko Ursulin
b68763741a drm/i915: Restore GT performance in headless mode with DMC loaded
It seems that the DMC likes to transition between the DC states a lot when
there are no connected displays (no active power domains) during command
submission.

This activity on DC states has a negative impact on the performance of the
chip with huge latencies observed in the interrupt handlers and elsewhere.
Simple tests like igt/gem_latency -n 0 are slowed down by a factor of
eight.

Work around it by introducing a new power domain named,
POWER_DOMAIN_GT_IRQ, associtated with the "DC off" power well, which is
held for the duration of command submission activity.

CNL has the same problem which will be addressed as a follow-up. Doing
that requires a fix for a DC6 context corruption problem in the CNL DMC
firmware which is yet to be released.

v2:
 * Add commit text as comment in i915_gem_mark_busy. (Chris Wilson)
 * Protect macro body with braces. (Jani Nikula)

v3:
 * Add dedicated power domain for clarity. (Chris, Imre)
 * Commit message and comment text updates.
 * Apply to all big-core GEN9 parts apart for Skylake which is pending DMC
   firmware release.

v4:
 * Power domain should be inner to device runtime pm. (Chris)
 * Simplify NEEDS_CSR_GT_PERF_WA macro. (Chris)
 * Handle async DMC loading by moving the GT_IRQ power domain logic into
   intel_runtime_pm. (Daniel, Chris)
 * Include small core GEN9 as well. (Imre)

v5
 * Special handling for async DMC load is not needed since on failure the
   power domain reference is kept permanently taken. (Imre)

v6:
 * Drop the NEEDS_CSR_GT_PERF_WA macro since all firmwares have now been
   deployed. (Imre, Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100572
Testcase: igt/gem_exec_nop/headless
Cc: Imre Deak <imre.deak@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v5)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[Imre: Add note about applying the WA on CNL as a follow-up]
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171205132854.26380-1-tvrtko.ursulin@linux.intel.com
2017-12-08 12:23:07 +02:00
Chris Wilson
e2189dd078 drm/i915: Refactor common list iteration over GGTT vma
In quite a few places, we have a list iteration over the vma on an
object that only want to inspect GGTT vma. By construction, these are
placed at the start of the list, so we have copied that knowledge into
many callsites. Pull that knowledge back to i915_vma.h and provide a
for_each_ggtt_vma() to tidy up the code.

v2: Add a backreference from vma_create() to remind ourselves why we put
ggtt vma at the head of the obj->vma_list (and ppgtt vma at the tail).
v3: Fixup s/vma/V/

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171207211407.31549-1-chris@chris-wilson.co.uk
2017-12-07 23:26:55 +00:00
Chris Wilson
7125397b82 drm/i915: Track GGTT writes on the vma
As writes through the GTT and GGTT PTE updates do not share the same
path, they are not strictly ordered and so we must explicitly flush the
indirect writes prior to modifying the PTE. We do track outstanding GGTT
writes on the object itself, but since the object may have multiple GGTT
vma, that is overly coarse as we can track and flush individual vma as
required.

Whilst here, update the GGTT flushing behaviour for Cannonlake.

v2: Hard-code ring offset to allow use during unload (after RCS may have
been freed, or never existed!)

References: https://bugs.freedesktop.org/show_bug.cgi?id=104002
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171206124914.19960-2-chris@chris-wilson.co.uk
2017-12-07 14:01:59 +00:00
Chris Wilson
010e3e68cd drm/i915: Remove vma from object on destroy, not close
Originally we translated from the object to the vma by walking
obj->vma_list to find the matching vm (for user lookups). Now we process
user lookups using the rbtree, and we only use obj->vma_list itself for
maintaining state (e.g. ensuring that all vma are flushed or rebound).
As such maintenance needs to go on beyond the user's awareness of the
vma, defer removal of the vma from the obj->vma_list from i915_vma_close()
to i915_vma_destroy()

Fixes: 5888fc9eac ("drm/i915: Flush pending GTT writes before unbinding")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104155
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171206124914.19960-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-12-07 14:00:20 +00:00
Chris Wilson
5888fc9eac drm/i915: Flush pending GTT writes before unbinding
From the shrinker paths, we want to relinquish the GPU and GGTT access to
the object, releasing the backing storage back to the system for
swapout. As a part of that process we would unpin the pages, marking
them for access by the CPU (for the swapout/swapin). However, if that
process was interrupted after unbind the vma, we missed a flush of the
inflight GGTT writes before we made that GTT space available again for
reuse, with the prospect that we would redirect them to another page.

The bug dates back to the introduction of multiple GGTT vma, but the
code itself dates to commit 02bef8f98d ("drm/i915: Unbind closed vma
for i915_gem_object_unbind()").

Fixes: 02bef8f98d ("drm/i915: Unbind closed vma for i915_gem_object_unbind()")
Fixes: c5ad54cf7d ("drm/i915: Use partial view in mmap fault handler")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171204132513.7303-1-chris@chris-wilson.co.uk
2017-12-05 21:50:56 +00:00
Chris Wilson
ecf73eb2d2 drm/i915: Skip switch-to-kernel-context on suspend when wedged
If the HW is already wedged, attempting to submit a request will
generate an -EIO. If we tried this during suspend, we would abort
whereas all we want to do is to go sleep and throw away the corrupt
state.

Fixes: 5ab57c7020 ("drm/i915: Flush logical context image out to memory upon suspend")
Testcase: igt/gem_eio/suspend
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171130102951.14965-1-chris@chris-wilson.co.uk
2017-11-30 11:54:01 +00:00
Chris Wilson
d02a1d8308 drm/i915: Rename i915_gem_timelines_mark_idle
The kerneldoc markup for i915_gem_timelines_mark_idle() was incorrect,
so take the opportunity to also convert it from the "mark_idle" to "park"
naming scheme.

drivers/gpu/drm/i915/i915_gem_timeline.c:120: warning: No description found for parameter 'i915'

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171127123054.20966-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-11-27 16:37:15 +00:00
Chris Wilson
ee48700dd5 drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
We don't need struct_mutex to initialise userptr (it just allocates a
workqueue for itself etc), but we do need struct_mutex later on in
i915_gem_init() in order to feed requests onto the HW.

This should break the chain

[  385.697902] ======================================================
[  385.697907] WARNING: possible circular locking dependency detected
[  385.697913] 4.14.0-CI-Patchwork_7234+ #1 Tainted: G     U
[  385.697917] ------------------------------------------------------
[  385.697922] perf_pmu/2631 is trying to acquire lock:
[  385.697927]  (&mm->mmap_sem){++++}, at: [<ffffffff811bfe1e>] __might_fault+0x3e/0x90
[  385.697941]
               but task is already holding lock:
[  385.697946]  (&cpuctx_mutex){+.+.}, at: [<ffffffff8116fe8c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[  385.697957]
               which lock already depends on the new lock.

[  385.697963]
               the existing dependency chain (in reverse order) is:
[  385.697970]
               -> #4 (&cpuctx_mutex){+.+.}:
[  385.697980]        __mutex_lock+0x86/0x9b0
[  385.697985]        perf_event_init_cpu+0x5a/0x90
[  385.697991]        perf_event_init+0x178/0x1a4
[  385.697997]        start_kernel+0x27f/0x3f1
[  385.698003]        verify_cpu+0x0/0xfb
[  385.698006]
               -> #3 (pmus_lock){+.+.}:
[  385.698015]        __mutex_lock+0x86/0x9b0
[  385.698020]        perf_event_init_cpu+0x21/0x90
[  385.698025]        cpuhp_invoke_callback+0xca/0xc00
[  385.698030]        _cpu_up+0xa7/0x170
[  385.698035]        do_cpu_up+0x57/0x70
[  385.698039]        smp_init+0x62/0xa6
[  385.698044]        kernel_init_freeable+0x97/0x193
[  385.698050]        kernel_init+0xa/0x100
[  385.698055]        ret_from_fork+0x27/0x40
[  385.698058]
               -> #2 (cpu_hotplug_lock.rw_sem){++++}:
[  385.698068]        cpus_read_lock+0x39/0xa0
[  385.698073]        apply_workqueue_attrs+0x12/0x50
[  385.698078]        __alloc_workqueue_key+0x1d8/0x4d8
[  385.698134]        i915_gem_init_userptr+0x5f/0x80 [i915]
[  385.698176]        i915_gem_init+0x7c/0x390 [i915]
[  385.698213]        i915_driver_load+0x99e/0x15c0 [i915]
[  385.698250]        i915_pci_probe+0x33/0x90 [i915]
[  385.698256]        pci_device_probe+0xa1/0x130
[  385.698262]        driver_probe_device+0x293/0x440
[  385.698267]        __driver_attach+0xde/0xe0
[  385.698272]        bus_for_each_dev+0x5c/0x90
[  385.698277]        bus_add_driver+0x16d/0x260
[  385.698282]        driver_register+0x57/0xc0
[  385.698287]        do_one_initcall+0x3e/0x160
[  385.698292]        do_init_module+0x5b/0x1fa
[  385.698297]        load_module+0x2374/0x2dc0
[  385.698302]        SyS_finit_module+0xaa/0xe0
[  385.698307]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[  385.698311]
               -> #1 (&dev->struct_mutex){+.+.}:
[  385.698320]        __mutex_lock+0x86/0x9b0
[  385.698361]        i915_mutex_lock_interruptible+0x4c/0x130 [i915]
[  385.698403]        i915_gem_fault+0x206/0x760 [i915]
[  385.698409]        __do_fault+0x1a/0x70
[  385.698413]        __handle_mm_fault+0x7c4/0xdb0
[  385.698417]        handle_mm_fault+0x154/0x300
[  385.698440]        __do_page_fault+0x2d6/0x570
[  385.698445]        page_fault+0x22/0x30
[  385.698449]
               -> #0 (&mm->mmap_sem){++++}:
[  385.698459]        lock_acquire+0xaf/0x200
[  385.698464]        __might_fault+0x68/0x90
[  385.698470]        _copy_to_user+0x1e/0x70
[  385.698475]        perf_read+0x1aa/0x290
[  385.698480]        __vfs_read+0x23/0x120
[  385.698484]        vfs_read+0xa3/0x150
[  385.698488]        SyS_read+0x45/0xb0
[  385.698493]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[  385.698497]
               other info that might help us debug this:

[  385.698505] Chain exists of:
                 &mm->mmap_sem --> pmus_lock --> &cpuctx_mutex

[  385.698517]  Possible unsafe locking scenario:

[  385.698522]        CPU0                    CPU1
[  385.698526]        ----                    ----
[  385.698529]   lock(&cpuctx_mutex);
[  385.698553]                                lock(pmus_lock);
[  385.698558]                                lock(&cpuctx_mutex);
[  385.698564]   lock(&mm->mmap_sem);
[  385.698568]
                *** DEADLOCK ***

[  385.698574] 1 lock held by perf_pmu/2631:
[  385.698578]  #0:  (&cpuctx_mutex){+.+.}, at: [<ffffffff8116fe8c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[  385.698589]
               stack backtrace:
[  385.698595] CPU: 3 PID: 2631 Comm: perf_pmu Tainted: G     U          4.14.0-CI-Patchwork_7234+ #1
[  385.698602] Hardware name:                  /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
[  385.698609] Call Trace:
[  385.698615]  dump_stack+0x5f/0x86
[  385.698621]  print_circular_bug.isra.18+0x1d0/0x2c0
[  385.698627]  __lock_acquire+0x19c3/0x1b60
[  385.698634]  ? generic_exec_single+0x77/0xe0
[  385.698640]  ? lock_acquire+0xaf/0x200
[  385.698644]  lock_acquire+0xaf/0x200
[  385.698650]  ? __might_fault+0x3e/0x90
[  385.698655]  __might_fault+0x68/0x90
[  385.698660]  ? __might_fault+0x3e/0x90
[  385.698665]  _copy_to_user+0x1e/0x70
[  385.698670]  perf_read+0x1aa/0x290
[  385.698675]  __vfs_read+0x23/0x120
[  385.698682]  ? __fget+0x101/0x1f0
[  385.698686]  vfs_read+0xa3/0x150
[  385.698691]  SyS_read+0x45/0xb0
[  385.698696]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  385.698701] RIP: 0033:0x7ff1c46876ed
[  385.698705] RSP: 002b:00007fff13552f90 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[  385.698712] RAX: ffffffffffffffda RBX: ffffc90000647ff0 RCX: 00007ff1c46876ed
[  385.698718] RDX: 0000000000000010 RSI: 00007fff13552fa0 RDI: 0000000000000005
[  385.698723] RBP: 000056063d300580 R08: 0000000000000000 R09: 0000000000000060
[  385.698729] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000046
[  385.698734] R13: 00007fff13552c6f R14: 00007ff1c6279d00 R15: 00007ff1c6279a40

Testcase: igt/perf_pmu
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171122172621.16158-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-11-22 17:40:37 +00:00
Tvrtko Ursulin
feff0dc6cd drm/i915/pmu: Suspend sampling when GPU is idle
If only a subset of events is enabled we can afford to suspend
the sampling timer when the GPU is idle and so save some cycles
and power.

v2: Rebase and limit timer even more.
v3: Rebase.
v4: Rebase.
v5: Skip action if perf PMU failed to register.
v6: Checkpatch cleanup.
v7:
 * Add a common helper to start the timer if needed. (Chris Wilson)
 * Add comment explaining bitwise logic in pmu_needs_timer.
v8: Fix some comments styles. (Chris Wilson)
v9: Rebase.
v10: Move function declarations to i915_pmu.h.
v11: Rename functions to i915_pmu_gt_(un)parked. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-3-tvrtko.ursulin@linux.intel.com
2017-11-22 11:25:00 +00:00
Chris Wilson
93c6e966b4 drm/i915: Remove i915.semaphores modparam
Having disabled the broken semaphores on Sandybridge, there is no need
for a modparam any more, so remove it in favour of a simple
HAS_LEGACY_SEMAPHORES() guard.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-5-chris@chris-wilson.co.uk
2017-11-20 21:59:09 +00:00
Chris Wilson
0da715ee60 drm/i915: Disable semaphores on Sandybridge
I should have admitted defeat long ago as there has been a rare but
persistent error on Sandybridge where semaphore signaling did not
propagate to the waiter, leading to a GPU hang.

With the work on fence signaling for v4.9, the impact of using CPU driven
signaling was greatly reduced wrt to the latency of GPU semaphores,
though without logical rings support, the benefit of reordering work to
avoid bubbles is not realised (i.e. as it stands fence signaling is just
a slower, more costly version of HW semaphores; but works more
consistently). As a rough indicator of the difference,

with semaphores:
Sequential (3 engines, 1 processes): average 5.470us per cycle [expected 4.988us]
w/o semaphores:
Sequential (3 engines, 1 processes): average 15.771us per cycle [expected 4.923us]

In comparison, v3.4:
with semaphores:
Sequential (3 engines, 1 processes): average 16.066us per cycle [expected 11.842us]
w/o semaphores:
Sequential (3 engines, 1 processes): average 23.460us per cycle [expected 11.839us]

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54226 #and 100+ dupes
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-3-chris@chris-wilson.co.uk
2017-11-20 21:59:08 +00:00
Chris Wilson
79e6770cb1 drm/i915: Remove obsolete ringbuffer emission for gen8+
Since removing the module parameter to force selection of ringbuffer
emission for gen8, the code is defunct. Remove it.

To put the difference into perspective, a couple of microbenchmarks
(bdw i7-5557u, 20170324):
                                        ring          execlists
exec continuous nops on all rings:   1.491us            2.223us
exec sequential nops on each ring:  12.508us           53.682us
single nop + sync:                   9.272us           30.291us

vblank_mode=0 glxgears:            ~11000fps           ~9000fps

Since the earlier submission, gen8 ringbuffer submission has fallen
further and further behind in features. So while ringbuffer may hold the
throughput crown, in terms of interactive latency, execlists is much
better. Alas, we have no convenient metrics for such, other than
demonstrating things we can do with execlists but can not using
legacy ringbuffer submission.

We have made a few improvements to lowlevel execlists throughput,
and ringbuffer currently panics on boot! (bdw i7-5557u, 20171026):

                                        ring          execlists
exec continuous nops on all rings:       n/a            1.921us
exec sequential nops on each ring:       n/a           44.621us
single nop + sync:                       n/a           21.953us

vblank_mode=0 glxgears:                  n/a          ~18500fps

References: https://bugs.freedesktop.org/show_bug.cgi?id=87725
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once-upon-a-time-Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-2-chris@chris-wilson.co.uk
2017-11-20 21:54:58 +00:00
Chris Wilson
fb5c551ad5 drm/i915: Remove i915.enable_execlists module parameter
Execlists and legacy ringbuffer submission are no longer feature
comparable (execlists now offer greater functionality that should
overcome their performance hit) and obsoletes the unsafe module
parameter, i.e. comparing the two modes of execution is no longer
useful, so remove the debug tool.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #i915_perf.c
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-1-chris@chris-wilson.co.uk
2017-11-20 21:53:59 +00:00
Chris Wilson
3fef5cda97 drm/i915: Automatic i915_switch_context for legacy
During request construction, after pinning the context we know whether
or not we have to emit a context switch. So move this common operation
from every caller into i915_gem_request_alloc() itself.

v2: Always submit the request if we emitted some commands during request
construction, as typically it also involves changes in global state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-2-chris@chris-wilson.co.uk
2017-11-20 15:56:16 +00:00
Sagar Arun Kamble
c6dce8f140 drm/i915: Update execlists tasklet naming
intel_lrc_irq_handler and i915_guc_irq_handler are HW submission related
tasklet functions. Name them with "submission_tasklet" suffix and
remove intel/i915 prefix as they are static. Also rename irq_tasklet
as just tasklet for clarity.

v2: s/_bh/_tasklet (Chris)

Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-2-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-16 15:01:31 +00:00
Chris Wilson
7469c62cb6 drm/i915: Resume GuC before using GEM
Resuming GEM presumes it can talk to hw, in particular to ensure the
kernel context is loaded upon resume for powersaving. If the GuC is
still asleep at this point, we upset the HW. Rearrange the resume such
that we restore the original order of init-hw, resume-guc, use-gem.

Fixes: 37cd33006d ("drm/i915: Remove redundant intel_autoenable_gt_powersave()")
References: a1c4199414 ("drm/i915/guc: Add host2guc notification for suspend and resume")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114130300.25677-2-chris@chris-wilson.co.uk
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
2017-11-14 23:50:48 +00:00
Tina Zhang
a03f395ad7 drm/i915: Introduce GEM proxy
GEM proxy is a kind of GEM, whose backing physical memory is pinned
and produced by guest VM and is used by host as read only. With GEM
proxy, host is able to access guest physical memory through GEM object
interface. As GEM proxy is such a special kind of GEM, a new flag
I915_GEM_OBJECT_IS_PROXY is introduced to ban host from changing the
backing storage of GEM proxy.

v3:
- update "Reviewed-by". (Joonas)

v2:
- return -ENXIO when pin and map pages of GEM proxy to kernel space.
  (Chris)

Here are the histories of this patch in "Dma-buf support for Gvt-g"
patch-set:

v14:
- return -ENXIO when gem proxy object is banned by ioctl.
  (Chris) (Daniel)

v13:
- add comments to GEM proxy. (Chris)
- don't ban GEM proxy in i915_gem_sw_finish_ioctl. (Chris)
- check GEM proxy bar after finishing i915_gem_object_wait. (Chris)
- remove GEM proxy bar in i915_gem_madvise_ioctl.

v6:
- add gem proxy barrier in the following ioctls. (Chris)
  i915_gem_set_caching_ioctl
  i915_gem_set_domain_ioctl
  i915_gem_sw_finish_ioctl
  i915_gem_set_tiling_ioctl
  i915_gem_madvise_ioctl

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1510555798-21079-2-git-send-email-tina.zhang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114102513.22269-2-chris@chris-wilson.co.uk
2017-11-14 12:26:36 +00:00
Tina Zhang
274b2462a0 drm/i915: Object w/o backing storage is banned by -ENXIO
-ENXIO should be returned when operations are banned from changing
backing storage of objects without backing storage.

v4:
- update "Reviewed-by". (Joonas)

v3:
- separate this patch from "Introduce GEM proxy" patch-set. (Joonas)

v2:
- update the patch description and subject to just mention objects w/o
  backing storage, instead of "GEM proxy". (Joonas)

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1510555798-21079-1-git-send-email-tina.zhang@intel.com
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114102513.22269-1-chris@chris-wilson.co.uk
2017-11-14 12:23:42 +00:00
Chris Wilson
37cd33006d drm/i915: Remove redundant intel_autoenable_gt_powersave()
Now that we always execute a context switch upon module load, there is
no need to queue a delayed task for doing so. The purpose of the delayed
task is to enable GT powersaving, for which we need the HW state to be
valid (i.e. having loaded a context and initialised basic state). We
used to defer this operation as historically it was slow (due to slow
register polling, fixed with commit 1758b90e38 ("drm/i915: Use a hybrid
scheme for fast register waits")) but now we have a requirement to save
the default HW state.

v2: Load the kernel context (to provide the power context) upon resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171112112738.1463-3-chris@chris-wilson.co.uk
2017-11-12 12:46:55 +00:00
Chris Wilson
9c52d1c816 drm/i915/selftests: Yet another forgotten mock_i915->mm initialiser
Move all of the i915->mm initialisation to a private function that can
be reused by the mock i915 device to save forgetting any more steps.

For example,
<7>[ 1542.046332] [IGT] drv_selftest: starting subtest mock_objects
<4>[ 1542.123924] Setting dangerous option mock_selftests - tainting kernel
<6>[ 1542.167941] i915: Performing mock selftests with st_random_seed=0x246f5ab5 st_timeout=1000
<4>[ 1542.178012] INFO: trying to register non-static key.
<4>[ 1542.178027] the code is fine but needs lockdep annotation.
<4>[ 1542.178032] turning off the locking correctness validator.
<4>[ 1542.178041] CPU: 3 PID: 6008 Comm: kworker/3:7 Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3332+ #1
<4>[ 1542.178049] Hardware name:                  /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
<4>[ 1542.178144] Workqueue: events __i915_gem_free_work [i915]
<4>[ 1542.178152] Call Trace:
<4>[ 1542.178163]  dump_stack+0x68/0x9f
<4>[ 1542.178170]  register_lock_class+0x3fd/0x580
<4>[ 1542.178177]  ? unwind_next_frame+0x14/0x20
<4>[ 1542.178184]  ? __save_stack_trace+0x73/0xd0
<4>[ 1542.178191]  __lock_acquire+0xa4/0x1b00
<4>[ 1542.178254]  ? __i915_gem_free_work+0x28/0xa0 [i915]
<4>[ 1542.178261]  ? __lock_acquire+0x4ab/0x1b00
<4>[ 1542.178268]  lock_acquire+0xb0/0x200
<4>[ 1542.178273]  ? lock_acquire+0xb0/0x200
<4>[ 1542.178336]  ? __i915_gem_free_work+0x28/0xa0 [i915]
<4>[ 1542.178344]  _raw_spin_lock+0x32/0x50
<4>[ 1542.178405]  ? __i915_gem_free_work+0x28/0xa0 [i915]
<4>[ 1542.178468]  __i915_gem_free_work+0x28/0xa0 [i915]
<4>[ 1542.178476]  process_one_work+0x221/0x650
<4>[ 1542.178483]  worker_thread+0x4e/0x3c0
<4>[ 1542.178489]  kthread+0x114/0x150
<4>[ 1542.178494]  ? process_one_work+0x650/0x650
<4>[ 1542.178499]  ? kthread_create_on_node+0x40/0x40
<4>[ 1542.178506]  ret_from_fork+0x27/0x40

v2: Fish out i915->mm.object_stat_lock which was being inited over in
i915_drv.c (Matthew)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110232447.21618-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-11-10 23:42:49 +00:00
Chris Wilson
a0a8b1cf93 drm/i915: Kerneldoc typo s/rps/rps_client/
Update the kerneldoc parameter name to match the real parameter name.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171109140644.10805-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-11-10 17:36:07 +00:00
Chris Wilson
d2b4b97933 drm/i915: Record the default hw state after reset upon load
Take a copy of the HW state after a reset upon module loading by
executing a context switch from a blank context to the kernel context,
thus saving the default hw state over the blank context image.
We can then use the default hw state to initialise any future context,
ensuring that each starts with the default view of hw state.

v2: Unmap our default state from the GTT after stealing it from the
context. This should stop us from accidentally overwriting it via the
GTT (and frees up some precious GTT space).

Testcase: igt/gem_ctx_isolation
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk
2017-11-10 17:23:10 +00:00
Chris Wilson
cc6a818ad6 drm/i915: Move intel_init_clock_gating() to i915_gem_init()
Despite its name intel_init_clock_gating applies both display clock gating
workarounds; GT mmio workarounds and the occasional GT power context
workaround. Worse, sometimes it includes a context register workaround
which we need to apply before we record the default HW state for all
contexts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-4-chris@chris-wilson.co.uk
2017-11-10 17:20:27 +00:00
Chris Wilson
f58d13d571 drm/i915: Move GT powersaving init to i915_gem_init()
GT powersaving is tightly coupled to the request infrastructure. To
avoid complications with the order of initialisation in the next patch
(where we want to send requests to hw during GEM init) move the
powersaving initialisation into the purview of i915_gem_init().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-3-chris@chris-wilson.co.uk
2017-11-10 17:20:26 +00:00
Chris Wilson
ae6c457478 drm/i915: Force the switch to the i915->kernel_context
In the next few patches, we will have a hard requirement that we emit a
context-switch to the perma-pinned i915->kernel_context (so that we can
save the HW state using that context-switch). As the first context
itself may be classed as a kernel context, we want to be explicit in our
comparison. For an extra-layer of finesse, we can check the last
unretired context on the engine; as well as the last retired context
when idle.

v2: verbose verbosity
v3: Always force the switch, even when the engine is idle, and update
the assert that this happens before suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-2-chris@chris-wilson.co.uk
2017-11-10 17:20:25 +00:00
Chris Wilson
f991c492aa drm/i915: Lock llist_del_first() vs llist_del_all()
An oversight in commit 87701b4b55 ("drm/i915: Only free the oldest
stale object before a fresh allocation") was that not only do we have to
serialise concurrent users of llist_del_first(), but we also have to
lock llist_del_first() vs llist_del_all().

From llist.h,

 * This can be summarized as follows:
 *
 *           |   add    | del_first |  del_all
 * add       |    -     |     -     |     -
 * del_first |          |     L     |     L
 * del_all   |          |           |     -
 *
 * Where, a particular row's operation can happen concurrently with a column's
 * operation, with "-" being no lock needed, while "L" being lock is needed.

This should hopefully explain:

<4>[   89.287106] general protection fault: 0000 [#1] PREEMPT SMP
<4>[   89.287126] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii mei_me mei snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
<4>[   89.287226] CPU: 2 PID: 23 Comm: ksoftirqd/2 Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3315+ #1
<4>[   89.287247] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[   89.287270] task: ffff88017ab34ec0 task.stack: ffffc90000128000
<4>[   89.287290] RIP: 0010:llist_add_batch+0x4/0x20
<4>[   89.287301] RSP: 0018:ffffc9000012bdb8 EFLAGS: 00010296
<4>[   89.287314] RAX: ffffffff811017ad RBX: 6e468801a1560000 RCX: ef3e53fceecdeb81
<4>[   89.287330] RDX: 6e468801a1566130 RSI: ffff880103d73d98 RDI: ffff880103d73d98
<4>[   89.287346] RBP: ffffc9000012bdb8 R08: ffff88017ab35780 R09: 0000000000000000
<4>[   89.287361] R10: ffffc9000012bd68 R11: 00000000abb18c3d R12: ffffffffa01369e0
<4>[   89.287377] R13: ffff88017fd1b8f8 R14: ffff88017ab34ec0 R15: 000000000000000a
<4>[   89.287393] FS:  0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
<4>[   89.287411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   89.287424] CR2: 00007ff0c0755018 CR3: 000000016df9b000 CR4: 00000000003406e0
<4>[   89.287440] Call Trace:
<4>[   89.287511]  __i915_gem_free_object_rcu+0x20/0x40 [i915]
<4>[   89.287527]  rcu_process_callbacks+0x27a/0x730
<4>[   89.287544]  __do_softirq+0xc0/0x4ae
<4>[   89.287559]  ? smpboot_thread_fn+0x2d/0x280
<4>[   89.287571]  run_ksoftirqd+0x1f/0x70
<4>[   89.287582]  smpboot_thread_fn+0x18a/0x280
<4>[   89.287595]  kthread+0x114/0x150
<4>[   89.287605]  ? sort_range+0x30/0x30
<4>[   89.287615]  ? kthread_create_on_node+0x40/0x40
<4>[   89.287628]  ret_from_fork+0x27/0x40
<4>[   89.287641] Code: 0d 48 83 ea 01 4c 89 c1 48 83 fa ff 74 12 48 23 0c d7 74 ed 48 c1 e2 06 48 0f bd c9 48 8d 04 0a 5d c3 90 90 90 90 90 55 48 89 e5 <48> 8b 0a 48 89 0e 48 89 c8 f0 48 0f b1 3a 48 39 c1 75 ed 48 85
<1>[   89.287774] RIP: llist_add_batch+0x4/0x20 RSP: ffffc9000012bdb8
<4>[   89.287826] ---[ end trace e775d15174d8ae02 ]---

(Lockless lists are only easy (and lockless) when only using
llist_add/llist_del_all!)

Fixes: 87701b4b55 ("drm/i915: Only free the oldest stale object before
a fresh allocation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171106111508.11941-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-11-06 12:05:02 +00:00
Chris Wilson
136109c67f drm/i915: Set up mocs tables before restarting the engines
After a reset, we may immediately begin executing requests on restarting
the engines. Ergo this has to be last step with all re-initialisation
completed beforehand. The mocs setup was after we started executing the
requests; do it earlier!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171102131430.22328-1-chris@chris-wilson.co.uk
2017-11-03 10:06:53 +00:00
Chris Wilson
680273879d drm/i915: Move parking-while-active warning to intel_engines_park()
We will want to break this down to give detailed per-engine warnings as
to why we still think we are active as we attempt to park the engines.
For the first step, just move the warning verbatim from the idle-worker
to intel_engines_park().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171027110617.31745-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-11-01 15:09:12 +00:00
Chris Wilson
bea6e987c1 drm/i915: Hold rcu_read_lock when iterating over the radixtree (objects)
Kasan spotted

    [IGT] gem_tiled_pread_pwrite: exiting, ret=0
    ==================================================================
    BUG: KASAN: use-after-free in __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
    Read of size 8 at addr ffff8801359da310 by task kworker/3:2/182

    CPU: 3 PID: 182 Comm: kworker/3:2 Tainted: G     U          4.14.0-rc6-CI-Custom_3340+ #1
    Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
    Workqueue: events __i915_gem_free_work [i915]
    Call Trace:
     dump_stack+0x68/0xa0
     print_address_description+0x78/0x290
     ? __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
     kasan_report+0x23d/0x350
     __asan_report_load8_noabort+0x19/0x20
     __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
     ? i915_gem_object_truncate+0x100/0x100 [i915]
     ? lock_acquire+0x380/0x380
     __i915_gem_object_put_pages+0x30d/0x530 [i915]
     __i915_gem_free_objects+0x551/0xbd0 [i915]
     ? lock_acquire+0x13e/0x380
     __i915_gem_free_work+0x4e/0x70 [i915]
     process_one_work+0x6f6/0x1590
     ? pwq_dec_nr_in_flight+0x2b0/0x2b0
     worker_thread+0xe6/0xe90
     ? pci_mmcfg_check_reserved+0x110/0x110
     kthread+0x309/0x410
     ? process_one_work+0x1590/0x1590
     ? kthread_create_on_node+0xb0/0xb0
     ret_from_fork+0x27/0x40

    Allocated by task 1801:
     save_stack_trace+0x1b/0x20
     kasan_kmalloc+0xee/0x190
     kasan_slab_alloc+0x12/0x20
     kmem_cache_alloc+0xdc/0x2e0
     radix_tree_node_alloc.constprop.12+0x48/0x330
     __radix_tree_create+0x274/0x480
     __radix_tree_insert+0xa2/0x610
     i915_gem_object_get_sg+0x224/0x670 [i915]
     i915_gem_object_get_page+0xb5/0x1c0 [i915]
     i915_gem_pread_ioctl+0x822/0xf60 [i915]
     drm_ioctl_kernel+0x13f/0x1c0
     drm_ioctl+0x6cf/0x980
     do_vfs_ioctl+0x184/0xf30
     SyS_ioctl+0x41/0x70
     entry_SYSCALL_64_fastpath+0x1c/0xb1

    Freed by task 37:
     save_stack_trace+0x1b/0x20
     kasan_slab_free+0xaf/0x190
     kmem_cache_free+0xbf/0x340
     radix_tree_node_rcu_free+0x79/0x90
     rcu_process_callbacks+0x46d/0xf40
     __do_softirq+0x21c/0x8d3

    The buggy address belongs to the object at ffff8801359da0f0
    which belongs to the cache radix_tree_node of size 576
    The buggy address is located 544 bytes inside of
    576-byte region [ffff8801359da0f0, ffff8801359da330)
    The buggy address belongs to the page:
    page:ffffea0004d67600 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
    flags: 0x8000000000008100(slab|head)
    raw: 8000000000008100 0000000000000000 0000000000000000 0000000100110011
    raw: ffffea0004b52920 ffffea0004b38020 ffff88015b416a80 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
     ffff8801359da200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     ffff8801359da280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    >ffff8801359da300: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
			     ^
     ffff8801359da380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff8801359da400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ==================================================================
    Disabling lock debugging due to kernel taint

which looks like the slab containing the radixtree iter was freed as we
traversed the tree, taking the rcu read lock across the loop should
prevent that (deferring all the frees until the end).

Reported-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Fixes: 96d7763452 ("drm/i915: Use a radixtree for random access to the object's backing storage")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026130032.10677-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
2017-10-27 12:38:45 +01:00
Michał Winiarski
c41937fd99 drm/i915/guc: Preemption! With GuC
Pretty similar to what we have on execlists.
We're reusing most of the GEM code, however, due to GuC quirks we need a
couple of extra bits.
Preemption is implemented as GuC action, and actions can be pretty slow.
Because of that, we're using a mutex to serialize them. Since we're
requesting preemption from the tasklet, the task of creating a workitem
and wrapping it in GuC action is delegated to a worker.

To distinguish that preemption has finished, we're using additional
piece of HWSP, and since we're not getting context switch interrupts,
we're also adding a user interrupt.

The fact that our special preempt context has completed unfortunately
doesn't mean that we're ready to submit new work. We also need to wait
for GuC to finish its own processing.

v2: Don't compile out the wait for GuC, handle workqueue flush on reset,
no need for ordered workqueue, put on a reviewer hat when looking at my own
patches (Chris)
Move struct work around in intel_guc, move user interruput outside of
conditional (Michał)
Keep ring around rather than chase though intel_context

v3: Extract WA for flushing ggtt writes to a helper (Chris)
Keep work_struct in intel_guc rather than engine (Michał)
Use ordered workqueue for inject_preempt worker to avoid GuC quirks.

v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele)
Drop stray newlines, use container_of for intel_guc in worker,
check for presence of workqueue when flushing it, rather than
enable_guc_submission modparam, reorder preempt postprocessing (Chris)

v5: Make wq NULL after destroying it

v6: Swap struct guc_preempt_work members (Michał)

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com
2017-10-26 21:35:21 +01:00
Michał Winiarski
9bdc3573a5 drm/i915/guc: Initialize GuC before restarting engines
Now that we're handling request resubmission the same way as regular
submission (from the tasklet), we can move GuC initialization earlier,
before restarting the engines. This way, we're no longer being in the
state of flux during engine restart - we're already in user requested
submission mode.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025172519.10670-5-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-10-25 19:41:04 +01:00
Chris Wilson
aba5e27858 drm/i915: Add a hook for making the engines idle (parking) and unparking
In the next patch, we will want to install a callback when the engines
(GT as a whole) become idle and similarly when they first become busy.
To enable that callback, first rename intel_engines_mark_idle() to
intel_engines_park() and provide the companion intel_engines_unpark().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-2-chris@chris-wilson.co.uk
2017-10-25 18:47:30 +01:00
Chris Wilson
ff320d6e72 drm/i915: Synchronize irq before parking each engine
When we park the engine (upon idling), we kill the irq tasklet. However,
to be sure that it is not restarted by a spurious interrupt after doing so,
flush the interrupt handler before parking. As we only park the engines
when we believe the system is idle, there should not be any interrupts to
distrub us; so flushing the final in-flight interrupt should be sufficient.
(However, we are still dependent on the HW behaving in an orderly and
timely fashion, which we shall endeavour to improve upon later.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-10-24 15:47:05 +01:00
Chris Wilson
5427f20785 drm/i915: Bump wait-times for the final CS interrupt before parking
In the idle worker we drop the prolonged GT wakeref used to cover such
essentials as interrupt delivery. (When a CS interrupt arrives, we also
assert that the GT is awake.) However, it turns out that 10ms is not
long enough to be assured that the last CS interrupt has been delivered,
so bump that to 200ms, and move the entirety of that wait to before we
take the struct_mutex to avoid blocking. As this is now a potentially
long wait, restore the earlier behaviour of bailing out early when a new
request arrives.

v2: Break out the repeated check for new requests into its own little
helper to try and improve the self-commentary.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-1-chris@chris-wilson.co.uk
2017-10-24 15:44:48 +01:00
Chris Wilson
8bd8181590 drm/i915: Skip waking the device to service pwrite
If the device is in runtime suspend, resuming takes time and reduces our
powersaving. If this was for a small write into an object, that resume
will take longer than any savings in using the indirect GGTT access to
avoid the cpu cache.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171019063733.31620-1-chris@chris-wilson.co.uk
2017-10-19 13:56:38 +01:00
Chris Wilson
a6d65e451c drm/i915: Report -EFAULT before pwrite fast path into shmemfs
When pwriting into shmemfs, the fast path pagecache_write does not
notice when it is writing to beyond the end of the truncated shmemfs
inode. Report -EFAULT directly when we try to use pwrite into the
!I915_MADV_WILLNEED object.

Fixes: 7c55e2c577 ("drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl")
Testcase: igt/gem_madvise/dontneed-before-pwrite
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171016202732.25459-1-chris@chris-wilson.co.uk
2017-10-17 14:07:47 +01:00
Chris Wilson
6f74b36b92 drm/i915: Skip HW reinitialisation on resume if still wedged
If we fail to recover the HW state upon resume (i.e. our attempt to
clear the wedged bit and reset during i915_gem_sanitize() fails), then
skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
HW restart when successfully unwedging and resetting the HW later,
but attempting to restore a wedged device upon resume is risky as the HW
is in an unknown state.

v2: Suppress the error message when detecting the already wedged HW.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103240
Testcase: igt/gem_eio/in-flight-suspend
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171015143725.27764-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-10-16 20:44:19 +01:00
Chris Wilson
cc731f5a3b drm/i915: Trim struct_mutex hold duration for i915_gem_free_objects
We free objects in bulk after they wait for their RCU grace period.
Currently, we take struct_mutex and unbind all the objects. This can lead
to a long lock duration during which time those objects have their pages
unfreeable (i.e. the shrinker is prevented from reaping those pages). If
we only process a single object under the struct_mutex and then free the
pages, the number of objects locked away from the shrinker is minimal
and we allow regular clients better access to struct_mutex if they need
it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-9-chris@chris-wilson.co.uk
2017-10-16 20:44:19 +01:00
Chris Wilson
87701b4b55 drm/i915: Only free the oldest stale object before a fresh allocation
Inspired by Tvrtko's critique of the reaping of the stale contexts
before allocating a new one, also limit the freed object reaping to the
oldest stale object before allocating a fresh object. Unlike contexts,
objects may have radically different sizes of backing storage, but
similar to contexts, while we want to prevent starvation due to
excessive freed lists, we also do not want to delay fresh allocations
for too long. Only freeing the oldest on the freed object list before
each allocation is a reasonable compromise.

v2: Only a single consumer of llist_del_first() is allowed (although
multiple llist_add are still allowed in parallel). Unlike
i915_gem_context, i915_gem_flush_free_objects() is itself not serialized
and so we need to add our own spinlock. Otherwise KASAN eventually spots
a use-after-free for the race on *first->next.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-8-chris@chris-wilson.co.uk
2017-10-16 20:44:19 +01:00
Chris Wilson
f2123818ff drm/i915: Move dev_priv->mm.[un]bound_list to its own lock
Remove the struct_mutex requirement around dev_priv->mm.bound_list and
dev_priv->mm.unbound_list by giving it its own spinlock. This reduces
one more requirement for struct_mutex and in the process gives us
slightly more accurate unbound_list tracking, which should improve the
shrinker - but the drawback is that we drop the retirement before
counting so i915_gem_object_is_active() may be stale and lead us to
underestimate the number of objects that may be shrunk (see commit
bed50aea61 ("drm/i915/shrinker: Flush active on objects before
counting")).

v2: Crosslink the spinlock to the lists it protects, and btw this
changes s/obj->global_link/obj->mm.link/
v3: Fix decoupling of old links in i915_gem_object_attach_phys()
v3.1: Fix the fix, only unlink if it was linked
v3.2: Use a local for to_i915(obj->base.dev)->mm.obj_lock

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171016114037.5556-1-chris@chris-wilson.co.uk
2017-10-16 20:44:19 +01:00
Chris Wilson
bd3d2252f9 drm/i915: Rename obj->pin_display to obj->pin_global
In the next patch, we want to extend use of the global pin counter for
semi-permanent pinning of context/ring objects. Given that we plan to
extend the usage to encompass a disparate set of objects, we want a name
that reflects both and should entail less confusion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-2-chris@chris-wilson.co.uk
2017-10-16 20:44:19 +01:00
Chris Wilson
f1fa4f442c drm/i915: Refactor testing obj->mm.pages
Since we occasionally stuff an error pointer into obj->mm.pages for a
semi-permanent or even permanent failure, we have to be more careful and
not just test against NULL when deciding if the object has a complete
set of its concurrent pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-1-chris@chris-wilson.co.uk
2017-10-16 20:44:19 +01:00
Chris Wilson
5d031f4e16 drm/i915: Stop asserting on set-wedged vs nop_submit_request ordering
Since the removal of the stop_machine(), it is allowed and expected for
the nop_submit_request() and nop_complete_submit_request() to run in
parallel to the i915_gem_set_wedged() processing. As such we can no
longer assert that i915_gem_set_wedged() has completed inside the
stop_machine prior to the individual nop_submit_request execution.

Fixes: af7a8ffad9 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171012204019.3557-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-10-13 20:57:29 +01:00
Daniel Vetter
af7a8ffad9 drm/i915: Use rcu instead of stop_machine in set_wedged
stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.

This patch tries to address the locking abuse of stop_machine() from

commit 20e4933c47
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 22 14:41:21 2016 +0000

    drm/i915: Stop the machine as we install the wedged submit_request handler

Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

To stay as close as possible to the stop_machine semantics we first
update all the submit function pointers to the nop handler, then call
synchronize_rcu() to make sure no new requests can be submitted. This
should give us exactly the huge barrier we want.

I pondered whether we should annotate engine->submit_request as __rcu
and use rcu_assign_pointer and rcu_dereference on it. But the reason
behind those is to make sure the compiler/cpu barriers are there for
when you have an actual data structure you point at, to make sure all
the writes are seen correctly on the read side. But we just have a
function pointer, and .text isn't changed, so no need for these
barriers and hence no need for annotations.

Unfortunately there's a complication with the call to
intel_engine_init_global_seqno:

- Without stop_machine we must hold the corresponding spinlock.

- Without stop_machine we must ensure that all requests are marked as
  having failed with dma_fence_set_error() before we call it. That
  means we need to split the nop request submission into two phases,
  both synchronized with rcu:

  1. Only stop submitting the requests to hw and mark them as failed.

  2. After all pending requests in the scheduler/ring are suitably
  marked up as failed and we can force complete them all, also force
  complete by calling intel_engine_init_global_seqno().

This should fix the followwing lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
------------------------------------------------------
kworker/3:4/562 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40

but task is already holding lock:
 (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&dev->struct_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_interruptible_nested+0x1b/0x20
       i915_mutex_lock_interruptible+0x51/0x130 [i915]
       i915_gem_fault+0x209/0x650 [i915]
       __do_fault+0x1e/0x80
       __handle_mm_fault+0xa08/0xed0
       handle_mm_fault+0x156/0x300
       __do_page_fault+0x2c5/0x570
       do_page_fault+0x28/0x250
       page_fault+0x22/0x30

-> #5 (&mm->mmap_sem){++++}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __might_fault+0x68/0x90
       _copy_to_user+0x23/0x70
       filldir+0xa5/0x120
       dcache_readdir+0xf9/0x170
       iterate_dir+0x69/0x1a0
       SyS_getdents+0xa5/0x140
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (&sb->s_type->i_mutex_key#5){++++}:
       down_write+0x3b/0x70
       handle_create+0xcb/0x1e0
       devtmpfsd+0x139/0x180
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #3 ((complete)&req.done){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       wait_for_common+0x58/0x210
       wait_for_completion+0x1d/0x20
       devtmpfs_create_node+0x13d/0x160
       device_add+0x5eb/0x620
       device_create_groups_vargs+0xe0/0xf0
       device_create+0x3a/0x40
       msr_device_create+0x2b/0x40
       cpuhp_invoke_callback+0xc9/0xbf0
       cpuhp_thread_fun+0x17b/0x240
       smpboot_thread_fn+0x18a/0x280
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #2 (cpuhp_state-up){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpuhp_issue_call+0x133/0x1c0
       __cpuhp_setup_state_cpuslocked+0x139/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_writeback_init+0x43/0x67
       pagecache_init+0x3d/0x42
       start_kernel+0x3a8/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #1 (cpuhp_state_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       __cpuhp_setup_state_cpuslocked+0x53/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_alloc_init+0x28/0x30
       start_kernel+0x145/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
       check_prev_add+0x430/0x840
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpus_read_lock+0x3d/0xb0
       stop_machine+0x1c/0x40
       i915_gem_set_wedged+0x1a/0x20 [i915]
       i915_reset+0xb9/0x230 [i915]
       i915_reset_device+0x1f6/0x260 [i915]
       i915_handle_error+0x2d8/0x430 [i915]
       hangcheck_declare_hang+0xd3/0xf0 [i915]
       i915_hangcheck_elapsed+0x262/0x2d0 [i915]
       process_one_work+0x233/0x660
       worker_thread+0x4e/0x3b0
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

other info that might help us debug this:

Chain exists of:
  cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev->struct_mutex);
                               lock(&mm->mmap_sem);
                               lock(&dev->struct_mutex);
  lock(cpu_hotplug_lock.rw_sem);

 *** DEADLOCK ***

3 locks held by kworker/3:4/562:
 #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
 #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
 #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

stack backtrace:
CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
Workqueue: events_long i915_hangcheck_elapsed [i915]
Call Trace:
 dump_stack+0x68/0x9f
 print_circular_bug+0x235/0x3c0
 ? lockdep_init_map_crosslock+0x20/0x20
 check_prev_add+0x430/0x840
 ? irq_work_queue+0x86/0xe0
 ? wake_up_klogd+0x53/0x70
 __lock_acquire+0x1420/0x15e0
 ? __lock_acquire+0x1420/0x15e0
 ? lockdep_init_map_crosslock+0x20/0x20
 lock_acquire+0xb0/0x200
 ? stop_machine+0x1c/0x40
 ? i915_gem_object_truncate+0x50/0x50 [i915]
 cpus_read_lock+0x3d/0xb0
 ? stop_machine+0x1c/0x40
 stop_machine+0x1c/0x40
 i915_gem_set_wedged+0x1a/0x20 [i915]
 i915_reset+0xb9/0x230 [i915]
 i915_reset_device+0x1f6/0x260 [i915]
 ? gen8_gt_irq_ack+0x170/0x170 [i915]
 ? work_on_cpu_safe+0x60/0x60
 i915_handle_error+0x2d8/0x430 [i915]
 ? vsnprintf+0xd1/0x4b0
 ? scnprintf+0x3a/0x70
 hangcheck_declare_hang+0xd3/0xf0 [i915]
 ? intel_runtime_pm_put+0x56/0xa0 [i915]
 i915_hangcheck_elapsed+0x262/0x2d0 [i915]
 process_one_work+0x233/0x660
 worker_thread+0x4e/0x3b0
 kthread+0x152/0x190
 ? process_one_work+0x660/0x660
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x27/0x40
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang

v2: Have 1 global synchronize_rcu() barrier across all engines, and
improve commit message.

v3: We need to protect the seqno update with the timeline spinlock (in
set_wedged) to avoid racing with other updates of the seqno, like we
already do in nop_submit_request (Chris).

v4: Use two-phase sequence to plug the race Chris spotted where we can
complete requests before they're marked up with -EIO.

v5: Review from Chris:
- simplify nop_submit_request.
- Add comment to rcu_read_lock section.
- Align comments with the new style.

v6: Remove unused variable to appease CI.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171011091019.1425-1-daniel.vetter@ffwll.ch
2017-10-11 17:51:21 +02:00
Sagar Arun Kamble
562d9bae08 drm/i915: Name structure in dev_priv that contains RPS/RC6 state as "gt_pm"
Prepared substructure rps for RPS related state. autoenable_work is
used for RC6 too hence it is defined outside rps structure. As we do
this lot many functions are refactored to use intel_rps *rps to access
rps related members. Hence renamed intel_rps_client pointer variables
to rps_client in various functions.

v2: Rebase.

v3: s/pm/gt_pm (Chris)
Refactored access to rps structure by declaring struct intel_rps * in
many functions.

Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> #1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1507360055-19948-9-git-send-email-sagar.a.kamble@intel.com
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171010213010.7415-8-chris@chris-wilson.co.uk
2017-10-11 08:56:59 +01:00
Matthew Auld
84e8978e62 drm/i915: s/sg_mask/sg_page_sizes/
It's a little unclear what the sg_mask actually is, so prefer the more
meaningful name of sg_page_sizes.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009110024.29114-1-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-10-09 17:07:29 +01:00
Chris Wilson
43ae70d97c drm/i915: Early rejection of mappable GGTT pin attempts for large bo
Currently, we reject attempting to pin a large bo into the mappable
aperture, but only after trying to create the vma. Under debug kernels,
repeatedly creating and freeing that vma for an oversized bo consumes
one-third of the runtime for pwrite/pread tests as it is spent on
kmalloc/kfree tracking. If we move the rejection to before creating that
vma, we lose some accuracy of checking against the fence_size as opposed
to object size, though the fence can never be smaller than the object.
Note that the vma creation itself will reject an attempt to create a vma
larger than the GTT so we can remove one redundant test.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-7-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-09 17:07:29 +01:00
Chris Wilson
a3259ca9f8 drm/i915: Avoid evicting user fault mappable vma for pread/pwrite
Both pread/pwrite GTT paths provide a fast fallback in case we cannot
map the whole object at a time. Currently, we use the fallback for very
large objects and for active objects that would require remapping, but
we can also add active fault mappable objects to the list that we want
to avoid evicting. The rationale is that such fault mappable objects are
in active use and to evict requires tearing down the CPU PTE and forcing
a page fault on the next access; more costly, and intefers with other
processes, than our per-page GTT fallback.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-6-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-09 17:07:29 +01:00
Chris Wilson
a65adaf8a8 drm/i915: Track user GTT faulting per-vma
We don't wish to refault the entire object (other vma) when unbinding
one partial vma. To do this track which vma have been faulted into the
user's address space.

v2: Use a local vma_offset to tidy up a multiline unmap_mapping_range().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-09 17:07:29 +01:00
Chris Wilson
3bd4073524 drm/i915: Consolidate get_fence with pin_fence
Following the pattern now used for obj->mm.pages, use just pin_fence and
unpin_fence to control access to the fence registers. I.e. instead of
calling get_fence(); pin_fence(), we now just need to call pin_fence().
This will make it easier to reduce the locking requirements around
fence registers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-09 17:07:29 +01:00
Chris Wilson
1749d90ff6 drm/i915: Hold forcewake for the duration of reset+restart
Resetting the engine requires us to hold the forcewake wakeref to
prevent RC6 trying to happen in the middle of the reset sequence. The
consequence of an unwanted RC6 event in the middle is that random state
is then saved to the powercontext and restored later, which may
overwrite the mmio state we need to preserve (e.g. PD_DIR_BASE in the
legacy ringbuffer reset_ring_common()).

This was noticed in the live_hangcheck selftests when Haswell would
sporadically fail to restart during igt_reset_queue().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009110301.21705-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-10-09 17:07:28 +01:00
Matthew Auld
da9fe3f31a drm/i915: disable platform support for vGPU huge gtt pages
Currently gvt gtt handling doesn't support huge page entries, so disable
for now.

v2: remove useless 48b PPGTT check

Suggested-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-20-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-19-chris@chris-wilson.co.uk
2017-10-07 10:12:03 +01:00
Matthew Auld
4049866f09 drm/i915/selftests: huge page tests
v2: mock test page support configurations and add MI_STORE_DWORD test

v3: run all mockable huge page tests on all platforms via the mock_device

v4: add pin_update regression test
    various improvements suggested by Chris

v5: fix issues reported by kbuild
    test single sg spanning multiple page sizes
    don't explode when running the live-tests through the appgtt

v6: lots of improvements from Chris

v7: run on each engine for igt_write_huge
    add simple tmpfs fallback test

v8: size_t is bad
    don't break the i386 build

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-18-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-17-chris@chris-wilson.co.uk
2017-10-07 10:12:00 +01:00
Matthew Auld
a5c0816626 drm/i915: introduce page_size members
In preparation for supporting huge gtt pages for the ppgtt, we introduce
page size members for gem objects.  We fill in the page sizes by
scanning the sg table.

v2: pass the sg_mask to set_pages

v3: calculate the sg_mask inline with populating the sg_table where
possible, and pass to set_pages along with the pages.

v4: bunch of improvements from Joonas

v5: fix num_pages blunder
    introduce i915_sg_page_sizes helper

v6: prefer GEM_BUG_ON(sizes == 0)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-7-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-6-chris@chris-wilson.co.uk
2017-10-07 10:11:48 +01:00
Matthew Auld
b91b09eea7 drm/i915: push set_pages down to the callers
Each backend is now responsible for calling __i915_gem_object_set_pages
upon successfully gathering its backing storage. This eliminates the
inconsistency between the async and sync paths, which stands out even
more when we start throwing around an sg_mask in a later patch.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-6-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-5-chris@chris-wilson.co.uk
2017-10-07 10:11:45 +01:00
Matthew Auld
465c403cb5 drm/i915: introduce simple gemfs
Not a fully blown gemfs, just our very own tmpfs kernel mount. Doing so
moves us away from the shmemfs shm_mnt, and gives us the much needed
flexibility to do things like set our own mount options, namely huge=
which should allow us to enable the use of transparent-huge-pages for
our shmem backed objects.

v2: various improvements suggested by Joonas

v3: move gemfs instance to i915.mm and simplify now that we have
file_setup_with_mnt

v4: fallback to tmpfs shm_mnt upon failure to setup gemfs

v5: make tmpfs fallback kinder

v5: better gemfs failure message
    flags variable

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-3-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-2-chris@chris-wilson.co.uk
2017-10-07 10:11:41 +01:00
Chris Wilson
8d550824c6 drm/i915: Order two completing nop_submit_request
If two nop's (requests in-flight following a wedged device) complete at
the same time, the global_seqno value written to the HWSP is undefined
as the two threads are not serialized.

v2: Use irqsafe spinlock. We expect the callback may be called from
inside another irq spinlock, so we can't unconditionally restore irqs.

Fixes: ce1135c7de ("drm/i915: Complete requests in nop_submit_request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006115617.18432-1-chris@chris-wilson.co.uk
2017-10-06 18:23:20 +01:00
Chris Wilson
7c26240e8a drm/i915: Try harder to finish the idle-worker
If a worker requeues itself, it may switch to a different kworker pool,
which flush_work() considers as complete. To be strict, we then need to
keep flushing the work until it is no longer pending.

References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006104038.22337-1-chris@chris-wilson.co.uk
2017-10-06 17:49:46 +01:00
Sagar Arun Kamble
269e6ea953 drm/i915: Move i915_gem_restore_fences to i915_gem_resume
i915_gem_restore_fences is GEM resumption task hence it is moved to
i915_gem_resume from i915_restore_state.

Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1506661116-12106-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-29 12:30:17 +01:00
Jani Nikula
32f35b8634 Merge drm-upstream/drm-next into drm-intel-next-queued
Need MST sideband message transaction to power up/down nodes.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-09-28 15:56:49 +03:00
Dave Airlie
9afafdbfbf Merge tag 'drm-intel-next-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel into drm-next
Getting started with v4.15 features:

- Cannonlake workarounds (Rodrigo, Oscar)
- Infoframe refactoring and fixes to enable infoframes for DP (Ville)
- VBT definition updates (Jani)
- Sparse warning fixes (Ville, Chris)
- Crtc state usage fixes and cleanups (Ville)
- DP vswing, pre-emph and buffer translation refactoring and fixes (Rodrigo)
- Prevent IPS from interfering with CRC capture (Ville, Marta)
- Enable Mesa to advertise ARB_timer_query (Nanley)
- Refactor GT number into intel_device_info (Lionel)
- Avoid eDP DP AUX CH timeouts harder (Manasi)
- CDCLK check improvements (Ville)
- Restore GPU clock boost on missed pageflip vblanks (Chris)
- Fence register reservation API for vGPU (Changbin)
- First batch of CCS fixes (Ville)
- Finally, numerous GEM fixes, cleanups and improvements (Chris)

* tag 'drm-intel-next-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel: (100 commits)
  drm/i915: Update DRIVER_DATE to 20170907
  drm/i915/cnl: WaThrottleEUPerfToAvoidTDBackPressure:cnl(pre-prod)
  drm/i915: Lift has-pinned-pages assert to caller of ____i915_gem_object_get_pages
  drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk
  drm/i915/cnl: Allow the reg_read ioctl to read the RCS TIMESTAMP register
  drm/i915: Move device_info.has_snoop into the static tables
  drm/i915: Disable MI_STORE_DATA_IMM for i915g/i915gm
  drm/i915: Re-enable GTT following a device reset
  drm/i915/cnp: Wa 1181: Fix Backlight issue
  drm/i915: Annotate user relocs with __user
  drm/i915: Constify load detect mode
  drm/i915/perf: Remove __user from u64 in drm_i915_perf_oa_config
  drm/i915: Silence sparse by using gfp_t
  drm/i915: io unmap functions want __iomem
  drm/i915: Add __rcu to radix tree slot pointer
  drm/i915: Wake up the device for the fbdev setup
  drm/i915: Add interface to reserve fence registers for vGPU
  drm/i915: Use correct path to trace include
  drm/i915: Fix the missing PPAT cache attributes on CNL
  drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
  ...
2017-09-28 07:12:44 +10:00
Mika Kuoppala
b620e87021 drm/i915: Make own struct for execlist items
Engine's execlist related items have been increasing to
a point where a separate struct is warranted. Carve execlist
specific items to a dedicated struct to add clarity.

v2: add kerneldoc and fix whitespace (Joonas, Chris)
v3: csb_mmio changes, rebase
v4: s/\b(el|execlist)\b/execlists/ (Joonas)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> (v3)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-1-mika.kuoppala@intel.com
2017-09-25 11:33:23 +03:00
Michal Wajdeczko
4f044a88a8 drm/i915: Rename global i915 to i915_modparams
Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.

v5: pure rename
v6: fix

Credits-to: Coccinelle

@@
identifier n;
@@
(
-	i915.n
+	i915_modparams.n
)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
2017-09-22 14:50:36 +03:00
Chris Wilson
27a5f61b37 drm/i915: Cancel all ready but queued requests when wedging
When wedging the hw, we want to mark all in-flight requests as -EIO.
This is made slightly more complex by execlists who store the ready but
not yet submitted-to-hw requests on a private queue (an rbtree
priolist). Call into execlists to cancel not only the ELSP tracking for
the submitted requests, but also the queue of unsubmitted requests.

v2: Move the majority of engine_set_wedged to the backends (both legacy
ringbuffer and execlists handling their own lists).

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_eio/in-flight-contexts
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-09-18 10:59:55 +01:00
Michal Hocko
0ee931c4e3 mm: treewide: remove GFP_TEMPORARY allocation flag
GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13 18:53:16 -07:00
Linus Torvalds
7d95565612 Merge tag 'drm-intel-next-fixes-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel
Pull i916 drm fixes from Rodrigo Vivi:
 "Since Dave is on paternity leave we are sending drm/i915 fixes for
  v4.14-rc1 directly to you as he had asked us to do.

  The most critical ones are the GPU reset fix for gen2-4 and GVT fix
  for a regression that is blocking gvt init to work on your tree.

  The rest is general fixes for patches coming from drm-next"

Acked-by: Dave Airlie <airlied@redhat.com>

* tag 'drm-intel-next-fixes-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: Re-enable GTT following a device reset
  drm/i915: Annotate user relocs with __user
  drm/i915: Silence sparse by using gfp_t
  drm/i915: Add __rcu to radix tree slot pointer
  drm/i915: Fix the missing PPAT cache attributes on CNL
  drm/i915/gvt: Remove one duplicated MMIO
  drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
  drm/i915: Make i2c lock ops static
  drm/i915: Make i9xx_load_ycbcr_conversion_matrix() static
  drm/i915/edp: Increase T12 panel delay to 900 ms to fix DP AUX CH timeouts
  drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
  drm/i915: Skip fence alignemnt check for the CCS plane
  drm/i915: Treat fb->offsets[] as a raw byte offset instead of a linear offset
  drm/i915: Always wake the device to flush the GTT
  drm/i915: Recreate vmapping even when the object is pinned
  drm/i915: Quietly cancel FBC activation if CRTC is turned off before worker
2017-09-07 14:37:25 -07:00
Chris Wilson
c5ba5b2465 drm/i915: Apply the GTT write flush for all !llc machines
We also see the delayed GTT write issue on i915g/i915gm, so let's
presume that it is a universal problem for all !llc machines, and that we
just haven't yet noticed on g33, gen4 and gen5 machines.

v2: Use a register that exists on all platforms

Testcase: igt/gem_mmap_gtt/coherency # i915gm
References: https://bugs.freedesktop.org/show_bug.cgi?id=102577
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170907184520.5032-1-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-09-07 21:46:42 +01:00
Ville Syrjälä
750fae2324 i915: Fix obj size vs. alignment for drm_pci_alloc()
drm_pci_alloc() refuses to cooperate if the passed alignment exceeds the
object size. So round up the obj size to the next power of two as well
to make this actually work.

Obviously things work just fine as long as the size was a power of two
to begin with. However kms_cursor_crc doesn't always use power of two
sizes so we hit a failure when we try to allocate the phys memory.

Testcase: igt/kms_cursor_crc
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170907143203.13055-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-07 20:12:36 +03:00
Tvrtko Ursulin
5602452e4c drm/i915: Use __sg_alloc_table_from_pages for userptr allocations
With the addition of __sg_alloc_table_from_pages we can control
the maximum coalescing size and eliminate a separate path for
allocating backing store here.

Similar to 871dfbd67d ("drm/i915: Allow compaction upto
SWIOTLB max segment size") this enables more compact sg lists to
be created and so has a beneficial effect on workloads with many
and/or large objects of this class.

v2:
 * Rename helper to i915_sg_segment_size and fix swiotlb override.
 * Commit message update.

v3:
 * Actually include the swiotlb override fix.

v4:
 * Regroup parameters a bit. (Chris Wilson)

v5:
 * Rebase for swiotlb_max_segment.
 * Add DMA map failure handling as in abb0deacb5
   ("drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping").

v6: Handle swiotlb_max_segment() returning 1. (Joonas Lahtinen)

v7: Rebase.
v8: Commit spelling fix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803091417.23677-1-tvrtko.ursulin@linux.intel.com
2017-09-07 10:48:29 +01:00
Chris Wilson
912d572d63 drm/i915: wire up shrinkctl->nr_scanned
shrink_slab() allows us to report back the number of objects we
successfully scanned (out of the target shrinkctl->nr_to_scan).  As
report the number of pages owned by each GEM object as a separate item
to the shrinker, we cannot precisely control the number of shrinker
objects we scan on each pass; and indeed may free more than requested.
If we fail to tell the shrinker about the number of objects we process,
it will continue to hold a grudge against us as any objects left
unscanned are added to the next reclaim -- and so we will keep on
"unfairly" shrinking our own slab in comparison to other slabs.

Link: http://lkml.kernel.org/r/20170822135325.9191-2-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shaohua Li <shli@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:25 -07:00
Chris Wilson
88c880bbde drm/i915: Lift has-pinned-pages assert to caller of ____i915_gem_object_get_pages
i915_gem_object_attach_phys() is trying to swap out its shmemfs pages
for a new set of physically contiguous pages, but unfortunately triggers
an assert inside get-pages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102561
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170906135220.13508-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-09-06 22:12:10 +01:00
Ville Syrjälä
6910d85250 drm/i915: Add __rcu to radix tree slot pointer
radix_tree_for_each_slot() wants an __rcu annotated pointer for the
slot. So let's add the annotation.

Fixes the following sparse warnings:
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9:    expected void **slot
i915_gem.c:2217:9:    got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9:    expected void **slot
i915_gem.c:2217:9:    got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:2217:9:    expected void [noderef] <asn:4>**slot
i915_gem.c:2217:9:    got void **slot
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9:    expected void **slot
i915_gem.c:2217:9:    got void [noderef] <asn:4>**

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 96d7763452 ("drm/i915: Use a radixtree for random access to the object's backing storage")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit c23aa71bcf)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-09-05 10:00:14 -07:00
Ville Syrjälä
afe722bee4 drm/i915: io unmap functions want __iomem
Don't cast away the __iomem from the io_mapping functions so that
sparse won't be so unhappy when we pass the pointer to the unmap
functions. Instead let's move the cast to where we actually use the
pointer.

Fixes the following sparse warnings:
i915_gem.c:1022:33: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1022:33:    expected void [noderef] <asn:2>*vaddr
i915_gem.c:1022:33:    got void *[assigned] vaddr
i915_gem.c:1027:34: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1027:34:    expected void [noderef] <asn:2>*vaddr
i915_gem.c:1027:34:    got void *[assigned] vaddr
i915_gem.c:1199:33: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1199:33:    expected void [noderef] <asn:2>*vaddr
i915_gem.c:1199:33:    got void *[assigned] vaddr
i915_gem.c:1204:34: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1204:34:    expected void [noderef] <asn:2>*vaddr
i915_gem.c:1204:34:    got void *[assigned] vaddr

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-2-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-04 19:31:55 +03:00
Ville Syrjälä
c23aa71bcf drm/i915: Add __rcu to radix tree slot pointer
radix_tree_for_each_slot() wants an __rcu annotated pointer for the
slot. So let's add the annotation.

Fixes the following sparse warnings:
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9:    expected void **slot
i915_gem.c:2217:9:    got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9:    expected void **slot
i915_gem.c:2217:9:    got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:2217:9:    expected void [noderef] <asn:4>**slot
i915_gem.c:2217:9:    got void **slot
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9:    expected void **slot
i915_gem.c:2217:9:    got void [noderef] <asn:4>**

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 96d7763452 ("drm/i915: Use a radixtree for random access to the object's backing storage")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-04 19:31:40 +03:00
Chris Wilson
fa3722f649 drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
By using drm_gem_flink/drm_gem_open on an object using the same fd, it
is possible for a client to create multiple handles pointing to the same
object (tied to the same contexts and VMA), as exemplified by
igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible
since forever, we cannot assume that the handle:(fpriv, object) is
unique and so must handle the multiple users of a single VMA.

v2: Added commentary noise.

Testcase: igt/gem_close
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355
Fixes: d1b48c1e71 ("drm/i915: Replace execbuf vma ht with an idr")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry-picked from commit 3ffff01749)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-08-30 12:17:22 -07:00
Chris Wilson
0168bdfc9c drm/i915: Always wake the device to flush the GTT
Since we hold the device wakeref when writing through the GTT (otherwise
the writes would fail), we presumed that before the device sleeps those
writes would naturally be flushed and that we wouldn't need our mmio
read trick. However, that presumption seems false and a sleepy bxt seems
to require us to always manually flush the GTT writes prior to direct
access.

Fixes: e2a2aa36a5 ("drm/i915: Check we have an wake device before flushing GTT writes")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170829192546.1087-1-chris@chris-wilson.co.uk
(cherry picked from commit b69a784f5e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-08-30 12:08:14 -07:00
Chris Wilson
3b24e7e810 drm/i915: Recreate vmapping even when the object is pinned
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.

This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).

We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.

v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.

Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit a575c67617)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-08-30 12:08:11 -07:00
Chris Wilson
b69a784f5e drm/i915: Always wake the device to flush the GTT
Since we hold the device wakeref when writing through the GTT (otherwise
the writes would fail), we presumed that before the device sleeps those
writes would naturally be flushed and that we wouldn't need our mmio
read trick. However, that presumption seems false and a sleepy bxt seems
to require us to always manually flush the GTT writes prior to direct
access.

Fixes: e2a2aa36a5 ("drm/i915: Check we have an wake device before flushing GTT writes")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170829192546.1087-1-chris@chris-wilson.co.uk
2017-08-30 09:10:35 +01:00
Chris Wilson
fc692bd31b drm/i915: Discard the request queue if we fail to sleep before suspend
If we fail to clear the outstanding request queue before suspending,
mark those requests as lost.

References: https://bugs.freedesktop.org/show_bug.cgi?id=102037
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-08-29 16:56:39 +01:00
Chris Wilson
f36325f378 drm/i915: Clear wedged status upon resume
When we wake up from suspend, the device has been powered down and
should come back afresh. We should be able to safely remove the wedged
status from the previous session and start afresh.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-08-29 16:56:38 +01:00
Chris Wilson
cad9946c2a drm/i915: Always sanity check engine state upon idling
When we do a locked idle we know that afterwards all requests have been
completed and the engines have been cleared of tasks. For whatever
reason, this doesn't always happen and we may go into a suspend with
ELSP still full, and this causes an issue upon resume as we get very,
very confused.

If the engines refuse to idle, mark the device as wedged. In the process
we get rid of the maybe unused open-coded version of wait_for_engines
reported by Nick Desaulniers and Matthias Kaehlcke.

v2: Suppress the -EIO before suspend, but keep it for seqno wrap.

References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-08-29 16:56:04 +01:00
Chris Wilson
a575c67617 drm/i915: Recreate vmapping even when the object is pinned
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.

This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).

We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.

v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.

Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-29 10:39:08 +01:00
Chris Wilson
3ffff01749 drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
By using drm_gem_flink/drm_gem_open on an object using the same fd, it
is possible for a client to create multiple handles pointing to the same
object (tied to the same contexts and VMA), as exemplified by
igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible
since forever, we cannot assume that the handle:(fpriv, object) is
unique and so must handle the multiple users of a single VMA.

v2: Added commentary noise.

Testcase: igt/gem_close
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355
Fixes: d1b48c1e71 ("drm/i915: Replace execbuf vma ht with an idr")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-08-24 15:28:05 +01:00
Chris Wilson
67b4804025 drm/i915: Assert that the handle->vma lut is empty on object close
Make sure that we are not leaking an entry in the ctx->handles_lut by
asserting that the object was removed prior to being freed. This should
be enforced by all such handles being removed by i915_gem_close_object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-08-24 15:22:46 +01:00
Chris Wilson
432295d7b9 drm/i915: Assert the context is not closed on object-close
During the context-close, we should be decoupling all the vma from the
object so that upon object-closing we shouldn't see any vma from the
already closed contexts. So include a check upon closing the object that
the context is still open.

v2: Eek, the fpriv check is required for shared objects. Double eek, BAT
passed?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-08-24 15:22:21 +01:00
Chris Wilson
d1b48c1e71 drm/i915: Replace execbuf vma ht with an idr
This was the competing idea long ago, but it was only with the rewrite
of the idr as an radixtree and using the radixtree directly ourselves,
along with the realisation that we can store the vma directly in the
radixtree and only need a list for the reverse mapping, that made the
patch performant enough to displace using a hashtable. Though the vma ht
is fast and doesn't require any extra allocation (as we can embed the node
inside the vma), it does require a thread for resizing and serialization
and will have the occasional slow lookup. That is hairy enough to
investigate alternatives and favour them if equivalent in peak performance.
One advantage of allocating an indirection entry is that we can support a
single shared bo between many clients, something that was done on a
first-come first-serve basis for shared GGTT vma previously. To offset
the extra allocations, we create yet another kmem_cache for them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
2017-08-18 11:59:02 +01:00
Chris Wilson
b805014897 drm/i915: Handle full s64 precision for wait-ioctl
The wait-ioctl is optionally supplied a timeout with nanosecond
precision in a s64 field. We use nsecs_to_jiffies64() to convert that
into the jiffies consumed by the scheduler, but internally
nsecs_to_jiffies64() does not guard against overflow (as it's purpose is
for use by the scheduler and not drivers!). So we must guard against the
overflow ourselves, and in the process note that we may then return
much earlier than the timeout selected by the user, so don't report
ETIME unless we do hit the timeout. (Woe betold us though if the user
waits for a year (32bit) and the request is still not complete!)

v2: Refine overflow detection (to not include an overffow itself)

Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811105731.9482-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-15 16:25:25 +01:00
Chris Wilson
b8f55be644 drm/i915: Split obj->cache_coherent to track r/w
Another month, another story in the cache coherency saga. This time, we
come to the realisation that i915_gem_object_is_coherent() has been
reporting whether we can read from the target without requiring a cache
invalidate; but we were using it in places for testing whether we could
write into the object without requiring a cache flush. So split the
tracking into two, one to decide before reads, one after writes.

See commit e27ab73d17 ("drm/i915: Mark CPU cache as dirty on every
transition for CPU writes") for the previous entry in this saga.

v2: Be verbose
v3: Remove unused function (i915_gem_object_is_coherent)
v4: Fix inverted coherency check prior to execbuf (from v2)
v5: Add comment for nasty code where we are optimising on gcc's behalf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555
Testcase: igt/kms_mmap_write_crc
Testcase: igt/kms_pwrite_crc
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-15 15:46:57 +01:00
Chris Wilson
8fb6a5df46 drm/i915: Call the unlocked version of i915_gem_object_get_pages()
When we hold for the lock for swapping out the shmem pages for the
physically contiguous pages, we have to call the unlocked version of
get_pages!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101934
Fixes: 35d23516946e ("drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-2-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:56:11 +02:00
Chris Wilson
8eeb79060b drm/i915: Move i915_gem_object_phys_attach()
Prevent a forward declaration in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:56:11 +02:00
Chris Wilson
6ea1d55d31 drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately
Actually transferring from shmemfs to the physically contiguous set of
pages should be wholly guarded by its obj->mm.lock!

v2: Remember to free the old pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726160038.29487-2-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:56:10 +02:00
Chris Wilson
d1d1ebf412 drm/i915: Don't touch fence->error when resetting an innocent request
If the request has been completed before the reset took effect, we don't
need to mark it up as being a victim. Touching fence->error after the
fence has been signaled is detected by dma_fence_set_error() and
triggers a BUG:

[  231.743133] kernel BUG at ./include/linux/dma-fence.h:434!
[  231.743156] invalid opcode: 0000 [#1] SMP KASAN
[  231.743172] Modules linked in: i915 drm_kms_helper drm iptable_nat nf_nat_ipv4 nf_nat x86_pkg_temp_thermal iosf_mbi i2c_algo_bit cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev [last unloaded: drm]
[  231.743221] CPU: 2 PID: 20 Comm: kworker/2:0 Tainted: G     U          4.13.0-rc1+ #52
[  231.743236] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
[  231.743363] Workqueue: events_long i915_hangcheck_elapsed [i915]
[  231.743382] task: ffff8801f42e9780 task.stack: ffff8801f42f8000
[  231.743489] RIP: 0010:i915_gem_reset_engine+0x45a/0x460 [i915]
[  231.743505] RSP: 0018:ffff8801f42ff770 EFLAGS: 00010202
[  231.743521] RAX: 0000000000000007 RBX: ffff8801bf6b1880 RCX: ffffffffa02881a6
[  231.743537] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff8801bf6b18c8
[  231.743551] RBP: ffff8801f42ff7c8 R08: 0000000000000001 R09: 0000000000000000
[  231.743566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801edb02d00
[  231.743581] R13: ffff8801e19d4200 R14: 000000000000001d R15: ffff8801ce2a4000
[  231.743599] FS:  0000000000000000(0000) GS:ffff8801f5a80000(0000) knlGS:0000000000000000
[  231.743614] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.743629] CR2: 00007f0ebd1add10 CR3: 0000000002621000 CR4: 00000000000406e0
[  231.743643] Call Trace:
[  231.743752]  i915_gem_reset+0x6c/0x150 [i915]
[  231.743853]  i915_reset+0x175/0x210 [i915]
[  231.743958]  i915_reset_device+0x33b/0x350 [i915]
[  231.744061]  ? valleyview_pipestat_irq_handler+0xe0/0xe0 [i915]
[  231.744081]  ? trace_hardirqs_off_caller+0x70/0x110
[  231.744102]  ? _raw_spin_unlock_irqrestore+0x46/0x50
[  231.744120]  ? find_held_lock+0x119/0x150
[  231.744138]  ? mark_lock+0x6d/0x850
[  231.744241]  ? gen8_gt_irq_ack+0x1f0/0x1f0 [i915]
[  231.744262]  ? work_on_cpu_safe+0x60/0x60
[  231.744284]  ? rcu_read_lock_sched_held+0x57/0xa0
[  231.744400]  ? gen6_read32+0x2ba/0x320 [i915]
[  231.744506]  i915_handle_error+0x382/0x5f0 [i915]
[  231.744611]  ? gen6_rps_reset_ei+0x20/0x20 [i915]
[  231.744630]  ? vsnprintf+0x128/0x8e0
[  231.744649]  ? pointer+0x6b0/0x6b0
[  231.744667]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[  231.744688]  ? scnprintf+0x92/0xe0
[  231.744706]  ? snprintf+0xb0/0xb0
[  231.744820]  hangcheck_declare_hang+0x15a/0x1a0 [i915]
[  231.744932]  ? engine_stuck+0x440/0x440 [i915]
[  231.744951]  ? rcu_read_lock_sched_held+0x57/0xa0
[  231.745062]  ? gen6_read32+0x2ba/0x320 [i915]
[  231.745173]  ? gen6_read16+0x320/0x320 [i915]
[  231.745284]  ? intel_engine_get_active_head+0x91/0x170 [i915]
[  231.745401]  i915_hangcheck_elapsed+0x3d8/0x400 [i915]
[  231.745424]  process_one_work+0x3e8/0xac0
[  231.745444]  ? pwq_dec_nr_in_flight+0x110/0x110
[  231.745464]  ? do_raw_spin_lock+0x8e/0x120
[  231.745484]  worker_thread+0x8d/0x720
[  231.745506]  kthread+0x19e/0x1f0
[  231.745524]  ? process_one_work+0xac0/0xac0
[  231.745541]  ? kthread_create_on_node+0xa0/0xa0
[  231.745560]  ret_from_fork+0x27/0x40
[  231.745581] Code: 8b 7d c8 e8 49 0d 02 e1 49 8b 7f 38 48 8b 75 b8 48 83 c7 10 e8 b8 89 be e1 e9 95 fc ff ff 4c 89 e7 e8 4b b9 ff ff e9 30 ff ff ff <0f> 0b 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe
[  231.745767] RIP: i915_gem_reset_engine+0x45a/0x460 [i915] RSP: ffff8801f42ff770

At first glance this looks to be related to commit c64992e035
("drm/i915: Look for active requests earlier in the reset path"), but it
could easily happen before as well. On the other hand, we no longer
logged victims due to the active_request being dropped earlier.

v2: Be trickier to unwind the incomplete request as we cannot rely on
request retirement for the lockless per-engine reset.
v3: Reprobe the active request at the time of the reset.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: c64992e035 ("drm/i915: Look for active requests earlier in the reset path")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-15-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:57 +02:00
Chris Wilson
77b25a972b drm/i915: Make i915_gem_context_mark_guilty() safe for unlocked updates
Since we make call i915_gem_context_mark_guilty() concurrently when
resetting different engines in parallel, we need to make sure that our
updates are safe for the unlocked access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-12-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:47 +02:00
Chris Wilson
ed454f2cd6 drm/i915: Clear engine irq posted following a reset
When the GPU is reset, we want to discard all pending notifications as
either we have manually completed them, or they are no longer
applicable. Make sure we do reset the engine->irq_posted prior to
re-enabling the engine (e.g. the interrupt tasklets) in
i915_gem_reset_finish_engine().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-11-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:47 +02:00
Chris Wilson
bf2eac3bee drm/i915: Assert that machine is wedged for nop_submit_request
We should only ever do nop_submit_request when the machine is wedged, so
assert it is so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-10-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:46 +02:00
Chris Wilson
3d7adbbf49 drm/i915: Wake up waiters after setting the WEDGED bit
After setting the WEDGED bit, make sure that we do wake up waiters as
they may not be waiting for a request completion yet, just for its
execution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-9-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:46 +02:00
Chris Wilson
5e32d7482e drm/i915: Clear execlist port[] before updating seqno on wedging
When we wedge the device, we clear out the in-flight requests and
advance the breadcrumb to indicate they are complete. However, the
breadcrumb advance includes an assert that the engine is idle, so that
advancement needs to be the last step to ensure we pass our own sanity
checks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-7-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:45 +02:00
Daniel Vetter
64282ea2d2 Merge airlied/drm-next into drm-intel-next-queued
Resync with upstream to avoid git getting too badly confused. Also, we
have a conflict with the drm_vblank_cleanup removal, which cannot be
resolved by simply taking our side. Bake that in properly.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:33:49 +02:00
Daniel Vetter
8b5d27b911 drm/i915: Remove intel_flip_work infrastructure
This gets rid of all the interactions between the legacy flip code and
the modeset code. Yay!

This highlights an ommission in the atomic paths, where we fail to
apply a boost to the pending rendering when we miss the target vblank.
But the existing code is still dead and can be removed.

v2: Note that the boosting doesn't work in atomic (Chris).

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170720175754.30751-7-daniel.vetter@ffwll.ch
2017-07-20 22:45:36 +02:00
Dave Airlie
2d62c799f8 Merge tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel into drm-next
2nd round of 4.14 features:

- prep for deferred fbdev setup
- refactor fixed 16.16 computations and skl+ wm code (Mahesh Kumar)
- more cnl paches (Rodrigo, Imre et al)
- tighten context cleanup and handling (Chris Wilson)
- fix interlaced handling on skl+ (Mahesh Kumar)
- small bits as usual

* tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel: (84 commits)
  drm/i915: Update DRIVER_DATE to 20170717
  drm/i915: Protect against deferred fbdev setup
  drm/i915/fbdev: Always forward hotplug events
  drm/i915/skl+: unify cpp value in WM calculation
  drm/i915/skl+: WM calculation don't require height
  drm/i915: Addition wrapper for fixed16.16 operation
  drm/i915: cleanup fixed-point wrappers naming
  drm/i915: Always perform internal fixed16 division in 64 bits
  drm/i915: take-out common clamping code of fixed16 wrappers
  drm/i915/cnl: Add missing type case.
  drm/i915/cnl: Add max allowed Cannonlake DC.
  drm/i915: Make DP-MST connector info work
  drm/i915/cnl: Get DDI clock based on PLLs.
  drm/i915/cnl: Inherit RPS stuff from previous platforms.
  drm/i915/cnl: Gen10 render context size.
  drm/i915/cnl: Don't trust VBT's alternate pin for port D for now.
  drm/i915: Fix the kernel panic when using aliasing ppgtt
  drm/i915/cnl: Cannonlake color init.
  drm/i915/cnl: Add force wake for gen10+.
  x86/gpu: CNL uses the same GMS values as SKL
  ...
2017-07-20 11:31:43 +10:00
Michal Hocko
dbb329561a drm/i915: use __GFP_RETRY_MAYFAIL
Commit 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") has tried to remove disruptive OOM
killer because the userspace should be able to cope with allocation
failures.

At the time only __GFP_NORETRY could achieve that and it turned out that
this would fail the allocations just too easily.  So "drm/i915: Remove
__GFP_NORETRY from our buffer allocator" removed it and hoped for a
better solution.  __GFP_RETRY_MAYFAIL is that solution.  It will keep
retrying the allocation until there is no more progress and we would go
OOM.  Instead we fail the allocation and let the caller to deal with it.

Link: http://lkml.kernel.org/r/20170623085345.11304-6-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:04 -07:00
Chris Wilson
7b92c1bd05 drm/i915: Avoid keeping waitboost active for signaling threads
Once a client has requested a waitboost, we keep that waitboost active
until all clients are no longer waiting. This is because we don't
distinguish which waiter deserves the boost. However, with the advent of
fence signaling, the signaler threads appear as waiters to the RPS
interrupt handler. So instead of using a single boolean to track when to
keep the waitboost active, use a counter of all outstanding waitboosted
requests.

At this point, I have removed all vestiges of the rate limiting on
clients. Whilst this means that compositors should remain more fluid,
it also means that boosts are more prevalent. See commit b29c19b645
("drm/i915: Boost RPS frequency for CPU stalls") for a longer discussion
on the pros and cons of both approaches.

A drawback of this implementation is that it requires constant request
submission to keep the waitboost trimmed (as it is now cancelled when the
request is completed). This will be fine for a busy system, but near
idle the boosts may be kept for longer than desired (effectively tens of
vblanks worstcase) and there is a reliance on rc6 instead.

v2: Remove defunct rps.client_lock

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170628123548.9236-1-chris@chris-wilson.co.uk
2017-06-28 15:23:12 +01:00
Chris Wilson
13f8458f9a drm/i915: Drop flushing of the object free list/worker from i915_gem_suspend
i915_gem_suspend() is called from all of our finalization paths
(suspend, hibernate, unload). i915_gem_drain_freed_objects() adds an
arbitrary delay as it uses an rcu_barrier() to ensure that there are no
more freed objects in flight, and this delay causes a large amount of
variability in suspend timings. For S3 suspend, we do not need to free
pages as doing so does not impact at all upon the system in its
suspended state, unlike S4 hibernation where we do want the hibernation
image to be as small as possible. Therefore we can forgo waiting inside
i915_gem_suspend(), so long as we ensure that we do cleanup before
unload (see i915_gem_load_cleanup()) and prefer to reap our objects
prior to hibernation (see i915_gem_freeze()).

Removing the rcu_barrier() from i915_gem_suspend() improves S3 latency
by about 30ms on Skylake (ymmv).

Reported-by: David Weinehall <david.weinehall@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170627173731.11566-1-chris@chris-wilson.co.uk
Tested-by: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
2017-06-28 12:15:20 +01:00
Chris Wilson
36703e79a9 drm/i915: Break modeset deadlocks on reset
Trying to do a modeset from within a reset is fraught with danger. We
can fall into a cyclic deadlock where the modeset is waiting on a
previous modeset that is waiting on a request, and since the GPU hung
that request completion is waiting on the reset. As modesetting doesn't
allow its locks to be broken and restarted, or for its *own* reset
mechanism to take over the display, we have to do something very
evil instead. If we detect that we are stuck waiting to prepare the
display reset (by using a very simple timeout), resort to cancelling all
in-flight requests and throwing the user data into /dev/null, which is
marginally better than the driver locking up and keeping that data to
itself.

This is not a fix; this is just a workaround that unbreaks machines
until we can resolve the deadlock in a way that doesn't lose data!

v2: Move the retirement from set-wegded to the i915_reset() error path,
after which we no longer any delayed worker cleanup for
i915_handle_error()
v3: C abuse for syntactic sugar
v4: Cover all waits with the timeout to catch more driver breakage

References: https://bugs.freedesktop.org/show_bug.cgi?id=99093
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170622105625.16952-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-23 13:41:55 +01:00
Chris Wilson
4ee056f418 drm/i915: Cancel pending execlist tasklet upon wedging
Highly unlikely, but if the stop_machine() did suspend the tasklet, we
want to make sure that when it wakes it finds there is nothing to do.
Otherwise, it will loudly complain that the ELSP port tracking no longer
matches the hardware, and we will be mightly confused.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170621124804.4529-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-21 15:15:33 +01:00
Michel Thierry
a1ef70e144 drm/i915: Add support for per engine reset recovery
This change implements support for per-engine reset as an initial, less
intrusive hang recovery option to be attempted before falling back to the
legacy full GPU reset recovery mode if necessary. This is only supported
from Gen8 onwards.

Hangchecker determines which engines are hung and invokes error handler to
recover from it. Error handler schedules recovery for each of those engines
that are hung. The recovery procedure is as follows,
 - identifies the request that caused the hang and it is dropped
 - force engine to idle: this is done by issuing a reset request
 - reset the engine
 - re-init the engine to resume submissions.

If engine reset fails then we fall back to heavy weight full gpu reset
which resets all engines and reinitiazes complete state of HW and SW.

v2: Rebase.
v3: s/*engine_reset*/*reset_engine*/; freeze engine and irqs before
calling i915_gem_reset_engine (Chris).
v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and
reuse the function for reset_engine.
v5: intel_reset_engine_start/cancel instead of request/unrequest_reset.
v6: Clean up reset_engine function to not require mutex, i.e. no need to call
revoke/restore_fences and _retire_requests (Chris).
v7: Remove leftovers from v5, i.e. no need to disable irq, hold
forcewake or wakeup the handoff bit (Chris).
v8: engine_retire_requests should be (and it was) static; explain that
we have to re-init the engine after reset, which is why the init_hw call
is needed; check reset-in-progress flag (Chris).
v9: Rebase, include code to pass the active request to gem_reset_engine
(as it is already done in full reset). Remove unnecessary
intel_reset_engine_start/cancel, these are executed as part of the
reset.
v10: Rebase, use the right I915_RESET_ENGINE flag.
v11: Fixup to call reset_finish_engine even on error.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-6-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-6-chris@chris-wilson.co.uk
2017-06-20 21:00:16 +01:00
Michel Thierry
c64992e035 drm/i915: Look for active requests earlier in the reset path
And store the active request so that we only search for it once.

v2: Check for request completion inside _prepare_engine, don't use
ECANCELED, remove unnecessary null checks (Chris).

v3: Capture active requests during reset_prepare and store it the
engine hangcheck obj.

v4: Rename commit, change i915_gem_reset_request to just confirm the
active_request is still incomplete, instead of engine_stalled (Chris).

v5: With style; pass the active request to gem_reset_engine, keep single
return in reset_prepare_engine (Chris).

v6: Moved before reset-engine code appears (Chris)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-2-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-3-chris@chris-wilson.co.uk
2017-06-20 21:00:03 +01:00
Chris Wilson
829a0af29f drm/i915: Group all the global context information together
Create a substruct to hold all the global context state under
drm_i915_private.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk
2017-06-20 17:13:40 +01:00
Chris Wilson
7dd4f6729f drm/i915: Async GPU relocation processing
If the user requires patching of their batch or auxiliary buffers, we
currently make the alterations on the cpu. If they are active on the GPU
at the time, we wait under the struct_mutex for them to finish executing
before we rewrite the contents. This happens if shared relocation trees
are used between different contexts with separate address space (and the
buffers then have different addresses in each), the 3D state will need
to be adjusted between execution on each context. However, we don't need
to use the CPU to do the relocation patching, as we could queue commands
to the GPU to perform it and use fences to serialise the operation with
the current activity and future - so the operation on the GPU appears
just as atomic as performing it immediately. Performing the relocation
rewrites on the GPU is not free, in terms of pure throughput, the number
of relocations/s is about halved - but more importantly so is the time
under the struct_mutex.

v2: Break out the request/batch allocation for clearer error flow.
v3: A few asserts to ensure rq ordering is maintained

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16 16:54:05 +01:00
Chris Wilson
8a2421bd0d drm/i915: Wait upon userptr get-user-pages within execbuffer
This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-06-16 16:54:05 +01:00
Chris Wilson
4ff4b44cbb drm/i915: Store a direct lookup from object handle to vma
The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.

In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.

v2: Prettier names, more magic.
v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16 16:54:04 +01:00
Chris Wilson
7fc92e96c3 drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty
For ease of use (i.e. avoiding a few checks and function calls), store
the object's cache coherency next to the cache is dirty bit.

Specifically this patch aims to reduce the frequency of no-op calls to
i915_gem_object_clflush() to counter-act the increase of such calls for
GPU only objects in the previous patch.

v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
!cache_coherent as gcc generates much better code for the latter
(Tvrtko)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-16 14:52:27 +01:00
Chris Wilson
e27ab73d17 drm/i915: Mark CPU cache as dirty on every transition for CPU writes
Currently, we only mark the CPU cache as dirty if we skip a clflush.
This leads to some confusion where we have to ask if the object is in
the write domain or missed a clflush. If we always mark the cache as
dirty, this becomes a much simply question to answer.

The goal remains to do as few clflushes as required and to do them as
late as possible, in the hope of deferring the work to a kthread and not
block the caller (e.g. execbuf, flips).

v2: Always call clflush before GPU execution when the cache_dirty flag
is set. This may cause some extra work on llc systems that migrate dirty
buffers back and forth - but we do try to limit that by only setting
cache_dirty at the end of the gpu sequence.

v3: Always mark the cache as dirty upon a level change, as we need to
invalidate any stale cachelines due to external writes.

Reported-by: Dongwon Kim <dongwon.kim@intel.com>
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-16 14:50:52 +01:00
Chris Wilson
0f6ab55d7a drm/i915: Only restrict noreclaim in the early shrink passes
In our first pass, we do not want to use reclaim at all as we want to
solely reap the i915 buffer caches (its purgeable pages). But we don't
mind it initiates IO or pulls via the FS (but it shouldn't anyway as we
say no to reclaim!). Just drop the GFP_IO constraint for simplicity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-14 10:53:37 +01:00
Chris Wilson
eaf4180155 drm/i915: Remove __GFP_NORETRY from our buffer allocator
I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
struggles with handling reclaim of our dirty buffers and relies on
reclaim via kswapd. As a result, a single pass of direct reclaim is
unreliable when i915 occupies the majority of available memory, and the
only means of effectively waiting on kswapd to amke progress is by not
setting the __GFP_NORETRY flag and lopping. That leaves us with the
dilemma of invoking the oomkiller instead of propagating the allocation
failure back to userspace where it can be handled more gracefully (one
hopes).  In the future we may have __GFP_MAYFAIL to allow repeats up until
we genuinely run out of memory and the oomkiller would have been invoked.
Until then, let the oomkiller wreck havoc.

v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
v3: Update comments that direct reclaim only appears to be ignoring our
dirty buffers!

Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Testcase: igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Hocko <mhocko@suse.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-14 10:53:09 +01:00
Chris Wilson
4846bf0ca8 drm/i915: Encourage our shrinker more when our shmemfs allocations fails
Commit 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") made the bold decision to try and
avoid the oomkiller by reporting -ENOMEM to userspace if our allocation
failed after attempting to free enough buffer objects. In short, it
appears we were giving up too easily (even before we start wondering if
one pass of reclaim is as strong as we would like). Part of the problem
is that if we only shrink just enough pages for our expected allocation,
the likelihood of those pages becoming available to us is less than 100%
To counter-act that we ask for twice the number of pages to be made
available. Furthermore, we allow the shrinker to pull pages from the
active list in later passes.

v2: Be a little more cautious in paging out gfx buffers, and leave that
to a more balanced approach from shrink_slab(). Important when combined
with "drm/i915: Start writeback from the shrinker" as anything shrunk is
immediately swapped out and so should be more conservative.

Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk
2017-06-14 10:51:50 +01:00
Weinan Li
ff8f797557 drm/i915: return the correct usable aperture size under gvt environment
I915_GEM_GET_APERTURE ioctl is used to probe aperture size from userspace.
In gvt environment, each vm only use the ballooned part of aperture, so we
should return the correct available aperture size exclude the reserved part
by balloon.

v2: add 'reserved' in struct i915_address_space to record the reserved size
in ggtt (Chris)

v3: remain aper_size as total, adjust aper_available_size exclude reserved
and pinned. UMD driver need to adjust the max allocation size according to
the available aperture size but not total size. KMD return the correct
usable aperture size any time (Chris, Joonas)

v4: decrease reserved in deballoon (Joonas)

v5: add onion teardown in balloon, add vgt_deballoon_space (Joonas)

v6: change title name (Zhenyu)

v7: code style refine (Joonas)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496198152-14175-1-git-send-email-weinan.z.li@intel.com
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-06-02 14:28:46 +01:00
Chris Wilson
863e9fde1a drm/i915: Short-circuit i915_gem_wait_for_idle() if already idle
If the device is asleep (no GT wakeref), we know the GPU is already idle.
If we add an early return, we can avoid touching registers and checking
hw state outside of the assumed GT wakelock. This prevents causing such
errors whilst debugging:

[ 2613.401647] RPM wakelock ref not held during HW access
[ 2613.401684] ------------[ cut here ]------------
[ 2613.401720] WARNING: CPU: 5 PID: 7739 at drivers/gpu/drm/i915/intel_drv.h:1787 gen6_read32+0x21f/0x2b0 [i915]
[ 2613.401731] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me lpc_ich mei prime_numbers [last unloaded: i915]
[ 2613.401823] CPU: 5 PID: 7739 Comm: drv_missed_irq Tainted: G     U          4.12.0-rc2-CI-CI_DRM_421+ #1
[ 2613.401825] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[ 2613.401840] task: ffff880409e3a740 task.stack: ffffc900084dc000
[ 2613.401861] RIP: 0010:gen6_read32+0x21f/0x2b0 [i915]
[ 2613.401863] RSP: 0018:ffffc900084dfce8 EFLAGS: 00010292
[ 2613.401869] RAX: 000000000000002a RBX: ffff8804016a8000 RCX: 0000000000000006
[ 2613.401871] RDX: 0000000000000006 RSI: ffffffff81cbf2d9 RDI: ffffffff81c9e3a7
[ 2613.401874] RBP: ffffc900084dfd18 R08: ffff880409e3afc8 R09: 0000000000000000
[ 2613.401877] R10: 000000008a1c483f R11: 0000000000000000 R12: 000000000000209c
[ 2613.401879] R13: 0000000000000001 R14: ffff8804016a8000 R15: ffff8804016ac150
[ 2613.401882] FS:  00007f39ef3dd8c0(0000) GS:ffff88041fb40000(0000) knlGS:0000000000000000
[ 2613.401885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2613.401887] CR2: 00000000023717c8 CR3: 00000002e7b34000 CR4: 00000000001406e0
[ 2613.401889] Call Trace:
[ 2613.401912]  intel_engine_is_idle+0x76/0x90 [i915]
[ 2613.401931]  i915_gem_wait_for_idle+0xe6/0x1e0 [i915]
[ 2613.401951]  fault_irq_set+0x40/0x90 [i915]
[ 2613.401970]  i915_ring_test_irq_set+0x42/0x50 [i915]
[ 2613.401976]  simple_attr_write+0xc7/0xe0
[ 2613.401981]  full_proxy_write+0x4f/0x70
[ 2613.401987]  __vfs_write+0x23/0x120
[ 2613.401992]  ? rcu_read_lock_sched_held+0x75/0x80
[ 2613.401996]  ? rcu_sync_lockdep_assert+0x2a/0x50
[ 2613.401999]  ? __sb_start_write+0xfa/0x1f0
[ 2613.402004]  vfs_write+0xc5/0x1d0
[ 2613.402008]  ? trace_hardirqs_on_caller+0xe7/0x1c0
[ 2613.402013]  SyS_write+0x44/0xb0
[ 2613.402020]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 2613.402022] RIP: 0033:0x7f39eded6670
[ 2613.402025] RSP: 002b:00007fffdcdcb1a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2613.402030] RAX: ffffffffffffffda RBX: ffffffff81470203 RCX: 00007f39eded6670
[ 2613.402033] RDX: 0000000000000001 RSI: 000000000041bc33 RDI: 0000000000000006
[ 2613.402036] RBP: ffffc900084dff88 R08: 00007f39ef3dd8c0 R09: 0000000000000001
[ 2613.402038] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000041bc33
[ 2613.402041] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
[ 2613.402046]  ? __this_cpu_preempt_check+0x13/0x20
[ 2613.402052] Code: 01 9b fa e0 0f ff e9 28 fe ff ff 80 3d 6a dd 0e 00 00 0f 85 29 fe ff ff 48 c7 c7 48 19 29 a0 c6 05 56 dd 0e 00 01 e8 da 9a fa e0 <0f> ff e9 0f fe ff ff b9 01 00 00 00 ba 01 00 00 00 44 89 e6 48
[ 2613.402199] ---[ end trace 31f0cfa93ab632bf ]---

Fixes: 25112b64b3 ("drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170530121334.17364-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-05-30 17:18:28 +01:00
Dave Airlie
a82256bc02 Merge tag 'drm-intel-next-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel into drm-next
More stuff for 4.13:

- skl+ wm fixes from Mahesh Kumar
- some refactor and tests for i915_sw_fence (Chris)
- tune execlist/scheduler code (Chris)
- g4x,g33 gpu reset improvements (Chris, Mika)
- guc code cleanup (Michal Wajdeczko, Michał Winiarski)
- dp aux backlight improvements (Puthikorn Voravootivat)
- buffer based guc/host communication (Michal Wajdeczko)

* tag 'drm-intel-next-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel: (253 commits)
  drm/i915: Update DRIVER_DATE to 20170529
  drm/i915: Keep the forcewake timer alive for 1ms past the most recent use
  drm/i915/guc: capture GuC logs if FW fails to load
  drm/i915/guc: Introduce buffer based cmd transport
  drm/i915/guc: Disable send function on fini
  drm: Add definition for eDP backlight frequency
  drm/i915: Drop AUX backlight enable check for backlight control
  drm/i915: Consolidate #ifdef CONFIG_INTEL_IOMMU
  drm/i915: Only GGTT vma may be pinned and prevent shrinking
  drm/i915: Serialize GTT/Aperture accesses on BXT
  drm/i915: Convert i915_gem_object_ops->flags values to use BIT()
  drm/i915/selftests: Silence compiler warning in igt_ctx_exec
  drm/i915/guc: Skip port assign on first iteration of GuC dequeue
  drm/i915: Remove misleading comment in request_alloc
  drm/i915/g33: Improve reset reliability
  Revert "drm/i915: Restore lost "Initialized i915" welcome message"
  drm/i915/huc: Update GLK HuC version
  drm/i915: Check for allocation failure
  drm/i915/guc: Remove action status and statistics from debugfs
  drm/i915/g4x: Improve gpu reset reliability
  ...
2017-05-30 15:25:28 +10:00
Chris Wilson
80debff8d9 drm/i915: Consolidate #ifdef CONFIG_INTEL_IOMMU
We depend on intel_iommu_gfx_mapped for various workarounds, but that is
only available under an #ifdef CONFIG_INTEL_IOMMU. Refactor all the
cut-and-paste ifdefs to a common routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170525121612.2190-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-25 21:51:49 +01:00
Michal Hocko
2098105ec6 drm: drop drm_[cm]alloc* helpers
Now that drm_[cm]alloc* helpers are simple one line wrappers around
kvmalloc_array and drm_free_large is just kvfree alias we can drop
them and replace by their native forms.

This shouldn't introduce any functional change.

Changes since v1
- fix typo in drivers/gpu//drm/etnaviv/etnaviv_gem.c - noticed by 0day
  build robot

Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michal Hocko <mhocko@suse.com>drm: drop drm_[cm]alloc* helpers
[danvet: Fixup vgem which grew another user very recently.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517122312.GK18247@dhcp22.suse.cz
2017-05-18 17:22:39 +02:00
Chris Wilson
c5cf9a9147 drm/i915: Create a kmem_cache to allocate struct i915_priolist from
The i915_priolist are allocated within an atomic context on a path where
we wish to minimise latency. If we use a dedicated kmem_cache, we have
the advantage of a local freelist from which to service new requests
that should keep the latency impact of an allocation small. Though
currently we expect the majority of requests to be at default priority
(and so hit the preallocate priolist), once userspace starts using
priorities they are likely to use many fine grained policies improving
the utilisation of a private slab.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
2017-05-17 13:38:12 +01:00
Chris Wilson
6c067579e6 drm/i915: Split execlist priority queue into rbtree + linked list
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.

Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.

There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).

v2: Avoid use-after-free when deleting a depleted priolist

v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
2017-05-17 13:38:09 +01:00
Chris Wilson
77f0d0e925 drm/i915/execlists: Pack the count into the low bits of the port.request
add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function                                     old     new   delta
execlists_submit_ports                       262     471    +209
port_assign.isra                               -     136    +136
capture                                     6344    6359     +15
reset_common_ring                            438     452     +14
execlists_submit_request                     228     238     +10
gen8_init_common_ring                        334     341      +7
intel_engine_is_idle                         106     105      -1
i915_engine_info                            2314    2290     -24
__i915_gem_set_wedged_BKL                    485     411     -74
intel_lrc_irq_handler                       1789    1604    -185
execlists_update_context                     294       -    -294

The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).

v2: Use the port_api() for guc as well (even though currently we do not
pack any counters in there, yet) and hide all port->request_count inside
the helpers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
2017-05-17 13:38:06 +01:00
Chris Wilson
0ce8178808 drm/i915: Redefine ptr_pack_bits() and friends
Rebrand the current (pointer | bits) pack/unpack utility macros as
explicit bit twiddling for PAGE_SIZE so that we can use the more
flexible underlying macros for different bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-4-chris@chris-wilson.co.uk
2017-05-17 13:38:04 +01:00
Chris Wilson
991bfc64db drm/i915: Make ptr_unpack_bits() more function-like
ptr_unpack_bits() is a function-like macro, as such it is meant to be
replaceable by a function. In this case, we should be passing in the
out-param as a pointer.

Bizarrely this does affect code generation:

function                                     old     new   delta
i915_gem_object_pin_map                      409     389     -20

An improvement(?) in this case, but one can't help wonder what
strict-aliasing optimisations we are preventing.

The generated code looks identical in using ptr_unpack_bits (no extra
motions to stack, the pointer and bits appear to be kept in registers),
the difference appears to be code ordering and with a reorder it is able
to use smaller forward jumps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-3-chris@chris-wilson.co.uk
2017-05-17 13:38:03 +01:00
Linus Torvalds
de4d195308 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Debloat RCU headers

   - Parallelize SRCU callback handling (plus overlapping patches)

   - Improve the performance of Tree SRCU on a CPU-hotplug stress test

   - Documentation updates

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  rcu: Open-code the rcu_cblist_n_lazy_cbs() function
  rcu: Open-code the rcu_cblist_n_cbs() function
  rcu: Open-code the rcu_cblist_empty() function
  rcu: Separately compile large rcu_segcblist functions
  srcu: Debloat the <linux/rcu_segcblist.h> header
  srcu: Adjust default auto-expediting holdoff
  srcu: Specify auto-expedite holdoff time
  srcu: Expedite first synchronize_srcu() when idle
  srcu: Expedited grace periods with reduced memory contention
  srcu: Make rcutorture writer stalls print SRCU GP state
  srcu: Exact tracking of srcu_data structures containing callbacks
  srcu: Make SRCU be built by default
  srcu: Fix Kconfig botch when SRCU not selected
  rcu: Make non-preemptive schedule be Tasks RCU quiescent state
  srcu: Expedite srcu_schedule_cbs_snp() callback invocation
  srcu: Parallelize callback handling
  kvm: Move srcu_struct fields to end of struct kvm
  rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
  rcu: Use true/false in assignment to bool
  rcu: Use bool value directly
  ...
2017-05-10 10:30:46 -07:00
Chris Wilson
4797948071 drm/i915: Squash repeated awaits on the same fence
Track the latest fence waited upon on each context, and only add a new
asynchronous wait if the new fence is more recent than the recorded
fence for that context. This requires us to filter out unordered
timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the
absence of a universal identifier, we have to use our own
i915->mm.unordered_timeline token.

v2: Throw around the debug crutches
v3: Inline the likely case of the pre-allocation cache being full.
v4: Drop the pre-allocation support, we can lose the most recent fence
in case of allocation failure -- it just means we may emit more awaits
than strictly necessary but will not break.
v5: Trim allocation size for leaf nodes, they only need an array of u32
not pointers.
v6: Create mock_timeline to tidy selftest writing
v7: s/intel_timeline_sync_get/intel_timeline_sync_is_later/ (Tvrtko)
v8: Prune the stale sync points when we idle.
v9: Include a small benchmark in the kselftests
v10: Separate the idr implementation into its own compartment. (Tvrkto)
v11: Refactor igt_sync kselftests to avoid deep nesting (Tvrkto)
v12: __sync_leaf_idx() to assert that p->height is 0 when checking leaves
v13: kselftests to investigate struct i915_syncmap itself (Tvrtko)
v14: Foray into ascii art graphs
v15: Take into account that the random lookup/insert does 2 prng calls,
not 1, when benchmarking, and use for_each_set_bit() (Tvrtko)
v16: Improved ascii art

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-4-chris@chris-wilson.co.uk
2017-05-03 11:08:48 +01:00
Chris Wilson
9431282832 drm/i915: Mark up clflushes as belonging to an unordered timeline
2 clflushes on two different objects are not ordered, and so do not
belong to the same timeline (context). Either we use a unique context
for each, or we reserve a special global context to mean unordered.
Ideally, we would reserve 0 to mean unordered (DMA_FENCE_NO_CONTEXT) to
have the same semantics everywhere.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-1-chris@chris-wilson.co.uk
2017-05-03 11:08:45 +01:00
Dave Airlie
73ba2d5c2b Merge tag 'drm-intel-next-fixes-2017-04-27' of git://anongit.freedesktop.org/git/drm-intel into drm-next
drm/i915 and gvt fixes for drm-next/v4.12

* tag 'drm-intel-next-fixes-2017-04-27' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: Confirm the request is still active before adding it to the await
  drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio
  drm/i915/selftests: Allocate inode/file dynamically
  drm/i915: Fix system hang with EI UP masked on Haswell
  drm/i915: checking for NULL instead of IS_ERR() in mock selftests
  drm/i915: Perform link quality check unconditionally during long pulse
  drm/i915: Fix use after free in lpe_audio_platdev_destroy()
  drm/i915: Use the right mapping_gfp_mask for final shmem allocation
  drm/i915: Make legacy cursor updates more unsynced
  drm/i915: Apply a cond_resched() to the saturated signaler
  drm/i915: Park the signaler before sleeping
  drm/i915/gvt: fix a bounds check in ring_id_to_context_switch_event()
  drm/i915/gvt: Fix PTE write flush for taking runtime pm properly
  drm/i915/gvt: remove some debug messages in scheduler timer handler
  drm/i915/gvt: add mmio init for virtual display
  drm/i915/gvt: use directly assignment for structure copying
  drm/i915/gvt: remove redundant ring id check which cause significant CPU misprediction
  drm/i915/gvt: remove redundant platform check for mocs load/restore
  drm/i915/gvt: Align render mmio list to cacheline
  drm/i915/gvt: cleanup some too chatty scheduler message
2017-04-29 05:50:27 +10:00
Joonas Lahtinen
ea117b8ddc drm/i915: Reset ILK during GEM sanitization
ILK should survive a reset without display corruption.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-28 12:11:59 +03:00
Joonas Lahtinen
f2e4d76ec2 drm/i915: Eliminate HAS_HW_CONTEXTS
HAS_HW_CONTEXTS is misleading condition for GPU reset and CCID,
replace it with Gen specific (to be updated in next patches).

HAS_HW_CONTEXTS in i915_l3_write is bogus because each HAS_L3_DPF
match also has .has_hw_contexts = 1 set.

This leads to us being able to get rid of the property completely.

v2:
- Keep the checks at Gen6 for no functional change (Ville)

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-04-28 12:11:59 +03:00
Chris Wilson
bdb57b8dca drm/i915: Use the right mapping_gfp_mask for final shmem allocation
Many sightings report the greater prevalence of allocation failures.
This is all due to the incorrect use of mapping_gfp_constraint(), so
remove it in favour of just querying the mapping_gfp_mask() which are
the exact gfp_t we wanted in the first place.

We still do expect a higher chance of reporting ENOMEM, as that is the
intention of using __GFP_NORETRY -- to fail rather than oom after having
reclaimed from our bo caches, and having done a direct|kswapd reclaim
pass.

Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100594
Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170405221514.23251-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit b268d9fe0f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-04-26 16:28:08 +03:00
Ingo Molnar
58d30c36d4 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Documentation updates.

 - Miscellaneous fixes.

 - Parallelize SRCU callback handling (plus overlapping patches).

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-23 11:12:44 +02:00
Dave Airlie
856ee92e86 Linux 4.11-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJY881cAAoJEHm+PkMAQRiGG4UH+wa2z6Qet36Uc4nXFZuSMYrO
 ErUWs1QpTDDv4a+LE4fgyMvM3j9XqtpfQLy1n70jfD14IqPBhHe4gytasAf+8lg1
 YvddFx0Yl3sygVu3dDBNigWeVDbfwepW59coN0vI5nrMo+wrei8aVIWcFKOxdMuO
 n72u9vuhrkEnLJuQk7SF+t4OQob9McXE3s7QgyRopmlKhKo7mh8On7K2BRI5uluL
 t0j5kZM0a43EUT5rq9xR8f5pgtyfTMG/FO2MuzZn43MJcZcyfmnOP/cTSIvAKA5U
 1i12lxlokYhURNUe+S6jm8A47TrqSRSJxaQJZRlfGJksZ0LJa8eUaLDCviBQEoE=
 =6QWZ
 -----END PGP SIGNATURE-----

Merge tag 'v4.11-rc7' into drm-next

Backmerge Linux 4.11-rc7 from Linus tree, to fix some
conflicts that were causing problems with the rerere cache
in drm-tip.
2017-04-19 11:07:14 +10:00
Paul E. McKenney
5f0d5a3ae7 mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU
A group of Linux kernel hackers reported chasing a bug that resulted
from their assumption that SLAB_DESTROY_BY_RCU provided an existence
guarantee, that is, that no block from such a slab would be reallocated
during an RCU read-side critical section.  Of course, that is not the
case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
slab of blocks.

However, there is a phrase for this, namely "type safety".  This commit
therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
to avoid future instances of this sort of confusion.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
[ paulmck: Add comments mentioning the old name, as requested by Eric
  Dumazet, in order to help people familiar with the old name find
  the new one. ]
Acked-by: David Rientjes <rientjes@google.com>
2017-04-18 11:42:36 -07:00
Chris Wilson
e22d8e3c69 drm/i915: Treat WC a separate cache domain
When discussing a new WC mmap, we based the interface upon the
assumption that GTT was fully coherent. How naive! Commits 3b5724d702
("drm/i915: Wait for writes through the GTT to land before reading
back") and ed4596ea99 ("drm/i915/guc: WA to address the Ringbuffer
coherency issue") demonstrate that writes through the GTT are indeed
delayed and may be overtaken by direct WC access. To be safe, if
userspace is mixing WC mmaps with other potential GTT access (pwrite,
GTT mmaps) it should use set_domain(WC).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96563
Testcase: igt/gem_pwrite/small-gtt*
Testcase: igt/drv_selftest/coherency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170412110111.26626-2-chris@chris-wilson.co.uk
2017-04-12 12:35:17 +01:00
Chris Wilson
ef74921bc6 drm/i915: Combine write_domain flushes to a single function
In the next patch, we will introduce a new cache domain for
differentiating between GTT access and direct WC access. This will
require us to include WC in our write_domain flushes. Rather than
duplicate a third function, combine the existing two into one and
flushing WC writes will then be automatically handled as well.

v2: Be smarter and clearer by passing in the write domains to flush (Joonas)
v3: One missed ~ in v2 conversion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170412110111.26626-1-chris@chris-wilson.co.uk
2017-04-12 12:35:16 +01:00
Chris Wilson
1d39f28170 drm/i915: Rename intel_engine_cs.exec_id to uabi_id
We want to refer to the index of the engine consistently throughout the
userspace ABI. We already have such an index through the execbuffer
engine specifier, that needs to be able to refer to each engine
specifically, so rename it the index to uabi_id to reflect its
generality beyond execbuf.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170411124306.15448-1-chris@chris-wilson.co.uk
2017-04-11 14:31:39 +01:00
Sagar Arun Kamble
63987bfebd drm/i915: Suspend GuC prior to GPU Reset during GEM suspend
i915 is currently doing a full GPU reset at the end of
i915_gem_suspend() followed by GuC suspend in i915_drm_suspend(). This
GPU reset clobbers the GuC, causing the suspend request to then fail,
leaving the GuC in an undefined state. We need to tell the GuC to
suspend before we do the direct intel_gpu_reset().

v2: Commit message update. (Chris, Daniele)

Fixes: 1c777c5d1d ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491387710-20553-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit fd08923384)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-04-11 13:25:12 +03:00
Chris Wilson
f2be9d6833 drm/i915: Insert cond_resched() into i915_gem_free_objects
As we may have very many objects to free, check to see if the task needs
to be rescheduled whilst freeing them.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-07 13:37:41 +01:00
Chris Wilson
5ad08be7e3 drm/i915: Break up long runs of freeing objects
Before freeing the next batch of objects from the worker, check if the
worker's timeslice has expired and if so, defer the next batch to the
next invocation of the worker.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-07 13:37:41 +01:00
Joonas Lahtinen
e92075ff7d drm/i915: Simplify shrinker locking
By using the same structure for both interruptible and
uninterruptible locking in shrinker code, combined with the
information that mm.interruptible is only being written to, the
code can be greatly simplified.

Also removing the i915_gem_ prefix from the locking functions so
that nobody in their wildest dreams considers exporting them.

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1491562175-27680-1-git-send-email-joonas.lahtinen@linux.intel.com
2017-04-07 14:33:39 +03:00
Chris Wilson
17b93c40e7 drm/i915: Drain any freed objects prior to hibernation
As we call into the shrinker during freeze, we may have freed more
objects since we idled during i915_gem_suspend. Make sure we flush the
i915_gem_free_objects worker prior to saving the unwanted pages into the
hibernation image.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-07 12:26:00 +01:00
Chris Wilson
d0aa301ae5 drm/i915: The shrinker already acquires struct_mutex, so call it unlocked
The shrinker is prepared to be called unlocked (and at other times with
struct_mutex held for DIRECT_RECLAIM) so we can skip acquiring the
struct_mutex prior to calling the shrinker during freeze. This improves
our ability to shrink as we can be more aggressive when we know the
caller isn't holding struct_mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-07 12:25:11 +01:00
Chris Wilson
b268d9fe0f drm/i915: Use the right mapping_gfp_mask for final shmem allocation
Many sightings report the greater prevalence of allocation failures.
This is all due to the incorrect use of mapping_gfp_constraint(), so
remove it in favour of just querying the mapping_gfp_mask() which are
the exact gfp_t we wanted in the first place.

We still do expect a higher chance of reporting ENOMEM, as that is the
intention of using __GFP_NORETRY -- to fail rather than oom after having
reclaimed from our bo caches, and having done a direct|kswapd reclaim
pass.

Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100594
Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170405221514.23251-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-06 13:02:02 +01:00
Sagar Arun Kamble
fd08923384 drm/i915: Suspend GuC prior to GPU Reset during GEM suspend
i915 is currently doing a full GPU reset at the end of
i915_gem_suspend() followed by GuC suspend in i915_drm_suspend(). This
GPU reset clobbers the GuC, causing the suspend request to then fail,
leaving the GuC in an undefined state. We need to tell the GuC to
suspend before we do the direct intel_gpu_reset().

v2: Commit message update. (Chris, Daniele)

Fixes: 1c777c5d1d ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491387710-20553-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-06 11:10:01 +01:00
Tvrtko Ursulin
7a3ee5deb4 drm/i915: Remove user-triggerable WARN from i915_gem_object_create
Since this can be triggered by simply attempting a huge object,
a WARN_ON is not appropriate.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330163130.24141-1-tvrtko.ursulin@linux.intel.com
2017-04-04 09:39:13 +01:00
Chris Wilson
25112b64b3 drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()
Make i915_gem_wait_for_idle() be a little heavier in order to try and
guarantee that the GPU is indeed idle (by checking each engine
individually is idle, i.e. all writes are complete and the rings
stopped) after waiting for in-flight requests to be completed.

v2: And return the final error.
v3: Break the wait_for() out from under the WARN -- the macro expansion
is hideous and unreadable in the warning message
v4: If wait_for_engine() fails the result is catastrophic, mark the
device as wedged and wait for the repair team.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98836
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-31 12:07:17 +01:00
Chris Wilson
72022a705e drm/i915: Move retire-requests into i915_gem_wait_for_idle()
As we now distinguish everywhere that can call
i915_gem_retire_requests() following a successful wait_for_idle, we can
remove the duplication by moving that call into i915_gem_wait_for_idle()
itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-31 12:03:46 +01:00
Chris Wilson
8490ae207f drm/i915: Suppress busy status for engines if wedged
If the driver is wedged, HW state may be very inconsistent and
report that it is still busy, even though we have stopped using it. This
can lead to a double *ERROR* rather than a graceful cleanup after
wedging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-2-chris@chris-wilson.co.uk
2017-03-30 17:58:44 +01:00
Chris Wilson
2c170af76c drm/i915: Do request retirement before marking engines as wedged
As we declare an engine as wedged, we mark all of its active requests as
in error. However, we don't want to mark successfully completed requests
as in error, which requires us to retire those requests first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-1-chris@chris-wilson.co.uk
2017-03-30 17:56:21 +01:00
Oscar Mateo
b8991403ea drm/i915/guc: Take enable_guc_loading check out of GEM core code
The should happen as soon as possible, but always within the logic that
depends on it (and not interrupting the top-level driver control flow).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1490720027-23234-1-git-send-email-oscar.mateo@intel.com
2017-03-30 13:11:40 +03:00
Chris Wilson
e2a2aa36a5 drm/i915: Check we have an wake device before flushing GTT writes
We can assume that if the device is asleep then all pending GTT writes
will have been posted, and so we can defer the flush from
i915_gem_object_flush_gtt_write_domain()

[ 1957.462568] WARNING: CPU: 0 PID: 6132 at drivers/gpu/drm/i915/intel_drv.h:1742 fwtable_read32+0x123/0x150 [i915]
[ 1957.462582] RPM wakelock ref not held during HW access
[ 1957.462583] Modules linked in: i915 intel_gtt drm_kms_helper prime_numbers
[ 1957.462607] CPU: 0 PID: 6132 Comm: gem_concurrent_ Tainted: G     U          4.11.0-rc1+ #464
[ 1957.462619] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[ 1957.462630] Call Trace:
[ 1957.462646]  dump_stack+0x4d/0x6f
[ 1957.462657]  __warn+0xc1/0xe0
[ 1957.462667]  warn_slowpath_fmt+0x4a/0x50
[ 1957.462709]  fwtable_read32+0x123/0x150 [i915]
[ 1957.462750]  i915_gem_object_flush_gtt_write_domain+0x43/0x70 [i915]
[ 1957.462791]  i915_gem_object_set_to_cpu_domain+0x46/0xa0 [i915]
[ 1957.462831]  i915_gem_set_domain_ioctl+0x15d/0x220 [i915]
[ 1957.462843]  drm_ioctl+0x1d7/0x440
[ 1957.462885]  ? i915_gem_obj_prepare_shmem_write+0x1d0/0x1d0 [i915]
[ 1957.462896]  ? pick_next_task_fair+0x436/0x440
[ 1957.462906]  ? mntput+0x1f/0x30
[ 1957.462915]  do_vfs_ioctl+0x8f/0x5c0
[ 1957.462925]  ? __schedule+0x16f/0x5f0
[ 1957.462935]  ? ____fput+0x9/0x10
[ 1957.462943]  SyS_ioctl+0x3c/0x70
[ 1957.462952]  entry_SYSCALL_64_fastpath+0x17/0x98
[ 1957.462961] RIP: 0033:0x7fc542179ca7
[ 1957.462968] RSP: 002b:00007ffeef12ff98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1957.462982] RAX: ffffffffffffffda RBX: 00007ffeef1301d0 RCX: 00007fc542179ca7
[ 1957.462990] RDX: 00007ffeef12ffd0 RSI: 00000000400c645f RDI: 0000000000000003
[ 1957.462999] RBP: 0000000000000003 R08: 000055f433bc7c40 R09: 000000000000002c
[ 1957.463006] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000018
[ 1957.463015] R13: 000055f432c89d20 R14: 000055f432c87690 R15: 0000000000000000

Fixes: 3b5724d702 ("drm/i915: Wait for writes through the GTT to land before reading back")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170323150053.28582-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-27 12:48:44 +01:00
Oscar Mateo
3950bf3dbf drm/i915/guc: Add onion teardown to the GuC setup
Starting with intel_guc_loader, down to intel_guc_submission
and finally to intel_guc_log.

v2:
  - Null execbuf client outside guc_client_free (Daniele)
  - Assert if things try to get allocated twice (Daniele/Joonas)
  - Null guc->log.buf_addr when destroyed (Daniele)
  - Newline between returning success and error labels (Joonas)
  - Remove some unnecessary comments (Joonas)
  - Keep guc_log_create_extras naming convention (Joonas)
  - Helper function guc_log_has_extras (Joonas)
  - No need for separate relay_channel create/destroy. It's just another extra.
  - No need to nullify guc->log.flush_wq when destroyed (Joonas)
  - Hoist the check for has_extras out of guc_log_create_extras (Joonas)
  - Try to do i915_guc_log_register/unregister calls (kind of) symmetric (Daniele)
  - Make sure initel_guc_fini is not called before init is ever called (Daniele)

v3:
  - Remove unnecessary parenthesis (Joonas)
  - Check for logs enabled on debugfs registration
  - Rebase on top of Tvrtko's "Fix request re-submission after reset"

v4:
  - Rebased
  - Comment around enabling/disabling interrupts inside GuC logging (Joonas)

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-23 14:57:36 +02:00
Chris Wilson
40149f00fb drm/i915: Actually pass the reclaim gfp_t along to shmemfs!
Words cannot describe the embarrassment at creating a new gfp_t relaim to
only prevent the oomkiller but allow direct|kswapd reclaim, and then not
use it in the shmem_read_mapping_page_gfp().

Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170322223447.7493-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-23 09:30:20 +00:00
Chris Wilson
24f8e00a8a drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations
Since gfx allocations tend to be large, unmovable and disposable, report
the allocation failure back to userspace as an ENOMEM rather than incur
the oomkiller. We have already tried to make room by purging our own
cached gfx objects, and the oomkiller doesn't attribute ownership of gfx
objects so will likely pick the wrong candidate. Instead, let userspace
see the ENOMEM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170322110521.29930-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-03-22 19:26:46 +00:00
Chris Wilson
54ec12af2f drm/i915: Skip force-wake for uncached mmio flush of GGTT writes
The trick of using an uncached mmio read to ensure that the GGTT writes
are flushed does not require us to do the forcewake dance, so avoid it
in the hope of reducing the frequency that we do keep the device forced
awake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170318104257.694-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-20 10:45:49 +00:00
Chris Wilson
be062fa427 drm/i915: Initialise i915_gem_object_create_from_data() directly
Use pagecache_write to avoid shmemfs clearing the pages prior to us
immediately overwriting them with our data.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170317194648.12468-2-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-03-17 22:55:56 +00:00
Chris Wilson
ce8ff099c4 drm/i915: i915_gem_object_create_from_data() doesn't require struct_mutex
Both object creation and backing storage page allocation do not require
struct_mutex, so do not require the caller to take it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170317194648.12468-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-03-17 22:54:06 +00:00
Chris Wilson
2e8f9d3229 drm/i915: Restore engine->submit_request before unwedging
When we wedge the device, we override engine->submit_request with a nop
to ensure that all in-flight requests are marked in error. However, igt
would like to unwedge the device to test -EIO handling. This requires us
to flush those in-flight requests and restore the original
engine->submit_request.

v2: Use a vfunc to unify enabling request submission to engines
v3: Split new vfunc to a separate patch.
v4: Make the wait interruptible -- the third party fences we wait upon
may be indefinitely broken, so allow the reset to be aborted.

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v3
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-3-chris@chris-wilson.co.uk
2017-03-16 17:17:14 +00:00
Chris Wilson
8c185ecaf4 drm/i915: Split I915_RESET_IN_PROGRESS into two flags
I915_RESET_IN_PROGRESS is being used for both signaling the requirement
to i915_mutex_lock_interruptible() to avoid taking the struct_mutex and
to instruct a waiter (already holding the struct_mutex) to perform the
reset. To allow for a little more coordination, split these two meaning
into a couple of distinct flags. I915_RESET_BACKOFF tells
i915_mutex_lock_interruptible() not to acquire the mutex and
I915_RESET_HANDOFF tells the waiter to call i915_reset().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-1-chris@chris-wilson.co.uk
2017-03-16 17:17:10 +00:00
Arkadiusz Hiler
6cd5a72c35 drm/i915/guc: Simplify intel_guc_init_hw()
Current version of intel_guc_init_hw() does a lot:
 - cares about submission
 - loads huc
 - implement WA

This change offloads some of the logic to intel_uc_init_hw(), which now
cares about the above.

v2: rename guc_hw_reset and fix typo in define name (M. Wajdeczko)
v3: rename once again
v4: remove spurious comments and add some style (J. Lahtinen)
v5: flow changes, got rid of dead checks (M. Wajdeczko)
v6: rebase
v7: rebase & onion teardown (J. Lahtinen)

Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-15 14:26:30 +02:00
Arkadiusz Hiler
882d1db09b drm/i915/uc: Rename intel_?uc_{setup, load}() to _init_hw()
GuC historically has two "startup" functions called _init() and _setup()

Then HuC came with it's _init() and _load().

This commit renames intel_guc_setup() and intel_huc_load() to
*uc_init_hw() as they called from the i915_gem_init_hw().

The aim is to be consistent in that entry points called during
particular driver init phases (e.g. init_hw) are all suffixed by that
phase. When reading the leaf functions, it should be clear at what stage
during the driver load it is called and therefore what operations are
legal at that point.

Also, since the functions start with intel_guc and intel_huc they take
appropiate structure.

v2: commit message update (Chris Wilson)
v3: change taken parameters to be more "semantic" (M. Wajdeczko)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-15 14:26:30 +02:00
Kenneth Graunke
0f5418e564 drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters.
This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
(indicating the optional feature is not supported), and makes execbuf
always return -EINVAL if the flags are used.

Apparently, no userspace ever shipped which used this optional feature:
I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
and there were zero commits showing a use of these flags.  Kernel commit
72bfa19c8d apparently introduced the feature prematurely.  According
to Chris, the intention was to use this in cairo-drm, but "the use was
broken for gen6", so I don't think it ever happened.

'relative_constants_mode' has always been tracked per-device, but this
has actually been wrong ever since hardware contexts were introduced, as
the INSTPM register is saved (and automatically restored) as part of the
render ring context. The software per-device value could therefore get
out of sync with the hardware per-context value.  This meant that using
them is actually unsafe: a client which tried to use them could damage
the state of other clients, causing the GPU to interpret their BO
offsets as absolute pointers, leading to bogus memory reads.

These flags were also never ported to execlist mode, making them no-ops
on Gen9+ (which requires execlists), and Gen8 in the default mode.

On Gen8+, userspace can write these registers directly, achieving the
same effect.  On Gen6-7.5, it likely makes sense to extend the command
parser to support them.  I don't think anyone wants this on Gen4-5.

Based on a patch by Dave Gordon.

v3: Return -ENODEV for the getparam, as this is what we do for other
    obsolete features.  Suggested by Chris Wilson.

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170313170433.26843-1-chris@chris-wilson.co.uk
(cherry picked from commit ef0f411f51)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-14 12:28:25 +02:00
Chris Wilson
4565bf58d4 drm/i915: Disable engine->irq_tasklet around resets
When we restart the engines, and we have active requests, a request on
the first engine may complete and queue a request to the second engine
before we try to restart the second engine. That queueing of the
request may race with the engine to restart, and so may corrupt the
current state. Disabling the engine->irq_tasklet prevents the two paths
from writing into ELSP simultaneously (and modifyin the execlists_port[]
at the same time).

Include fixup 1d309634bc ("drm/i915: Kill the tasklet then disable")

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_exec_fence/await-hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-3-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/20170313165958.13970-2-chris@chris-wilson.co.uk
(cherry picked from commit 1f7b847d72)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-14 12:26:44 +02:00
Chris Wilson
da9a796f54 drm/i915: Split GEM resetting into 3 phases
Currently we do a reset prepare/finish around the call to reset the GPU,
but it looks like we need a later stage after the hw has been
reinitialised to allow GEM to restart itself. Start by splitting the 2
GEM phases into 3:

  prepare - before the reset, check if GEM recovered, then stop GEM

  reset - after the reset, update GEM bookkeeping

  finish - after the re-initialisation following the reset, restart GEM

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-2-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/20170313165958.13970-1-chris@chris-wilson.co.uk
(cherry picked from commit d802709313)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-14 12:11:49 +02:00
Chris Wilson
7f5f95d8ac drm/i915: Move whole object to CPU domain for coherent shmem access
If the object is coherent, we can simply update the cache domain on the
whole object rather than calculate the before/after clflushes. The
advantage is that we then get correct tracking of ellided flushes when
changing coherency later.

Testcase: igt/gem_pwrite_snooped
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170310000942.11661-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-13 11:16:09 +00:00
Chris Wilson
4e6fdafa7a drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl
Before we instantiate/pin the backing store for our use, we
can prepopulate the shmemfs filp efficiently using a write into the
pagecache. We avoid the penalty of instantiating all the pages, important
if the user is just writing to a few and never uses the object on the GPU,
and using a direct write into shmemfs allows it to avoid the cost of
retrieving a page (mostly the clear-before-use, but in theory we could
curtail swapin) before it is overwritten.

This can be extended later to provide additional specialisation for
other backends (other than shmemfs). For now it provides a defense
against very large write-only allocations from exhausting all of system
memory.

v2: Smelling fixes.

Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
References: https://bugs.freedesktop.org/show_bug.cgi?id=99107
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307120338.7277-2-chris@chris-wilson.co.uk
(cherry picked from commit 7c55e2c577)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-09 10:46:07 +02:00
Chris Wilson
0d9dc306e1 drm/i915: Store a permanent error in obj->mm.pages
Once the object has been truncated, it is unrecoverable. To facilitate
detection of this state store the error in obj->mm.pages.

This is required for the next patch which should be applied to v4.10
(via stable), so we also need to mark this patch for backporting. In
that regard, let's consider this to be a fix/improvement too.

v2: Avoid dereferencing the ERR_PTR when freeing the object.

Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170307132031.32461-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 4e5462ee84)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-09 10:45:58 +02:00
Chris Wilson
89cf83d4e0 drm/i915: Squelch any ktime/jiffie rounding errors for wait-ioctl
We wait upon jiffies, but report the time elapsed using a
high-resolution timer. This discrepancy can lead to us timing out the
wait prior to us reporting the elapsed time as complete.

This restores the squelching lost in commit e95433c73a ("drm/i915:
Rearrange i915_wait_request() accounting with callers").

Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170216125441.30923-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit c1d2061b28)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-09 10:42:44 +02:00
Chris Wilson
03d1cac6ee drm/i915: Avoiding recursing on ww_mutex inside shrinker
We have to avoid taking ww_mutex inside the shrinker as we use it as a
plain mutex type and so need to avoid recursive deadlocks:

[  602.771969] =================================
[  602.771970] [ INFO: inconsistent lock state ]
[  602.771973] 4.10.0gpudebug+ #122 Not tainted
[  602.771974] ---------------------------------
[  602.771975] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[  602.771978] kswapd0/40 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  602.771979]  (reservation_ww_class_mutex){+.+.?.}, at: [<ffffffffa054680a>] i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772020] {RECLAIM_FS-ON-W} state was registered at:
[  602.772024]   mark_held_locks+0x76/0x90
[  602.772026]   lockdep_trace_alloc+0xb8/0xc0
[  602.772028]   __kmalloc_track_caller+0x5d/0x130
[  602.772031]   krealloc+0x89/0xb0
[  602.772033]   reservation_object_reserve_shared+0xaf/0xd0
[  602.772055]   i915_gem_do_execbuffer.isra.35+0x1413/0x18b0 [i915]
[  602.772075]   i915_gem_execbuffer2+0x10e/0x1d0 [i915]
[  602.772078]   drm_ioctl+0x291/0x480
[  602.772079]   do_vfs_ioctl+0x695/0x6f0
[  602.772081]   SyS_ioctl+0x3c/0x70
[  602.772084]   entry_SYSCALL_64_fastpath+0x18/0xad
[  602.772085] irq event stamp: 5197423
[  602.772088] hardirqs last  enabled at (5197423): [<ffffffff8116751d>] kfree+0xdd/0x170
[  602.772091] hardirqs last disabled at (5197422): [<ffffffff811674f9>] kfree+0xb9/0x170
[  602.772095] softirqs last  enabled at (5190992): [<ffffffff8107bfe1>] __do_softirq+0x221/0x280
[  602.772097] softirqs last disabled at (5190575): [<ffffffff8107c294>] irq_exit+0x64/0xc0
[  602.772099]
               other info that might help us debug this:
[  602.772100]  Possible unsafe locking scenario:

[  602.772101]        CPU0
[  602.772101]        ----
[  602.772102]   lock(reservation_ww_class_mutex);
[  602.772104]   <Interrupt>
[  602.772105]     lock(reservation_ww_class_mutex);
[  602.772107]
                *** DEADLOCK ***

[  602.772109] 2 locks held by kswapd0/40:
[  602.772110]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811337b5>] shrink_slab.constprop.62+0x35/0x280
[  602.772116]  #1:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0553957>] i915_gem_shrinker_lock+0x27/0x60 [i915]
[  602.772141]
               stack backtrace:
[  602.772144] CPU: 2 PID: 40 Comm: kswapd0 Not tainted 4.10.0gpudebug+ #122
[  602.772145] Hardware name: LENOVO 42433ZG/42433ZG, BIOS 8AET64WW (1.44 ) 07/26/2013
[  602.772147] Call Trace:
[  602.772151]  dump_stack+0x68/0xa1
[  602.772153]  print_usage_bug+0x1d4/0x1f0
[  602.772155]  mark_lock+0x390/0x530
[  602.772157]  ? print_irq_inversion_bug+0x200/0x200
[  602.772159]  __lock_acquire+0x405/0x1260
[  602.772181]  ? i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772183]  lock_acquire+0x60/0x80
[  602.772205]  ? i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772207]  mutex_lock_nested+0x69/0x760
[  602.772229]  ? i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772231]  ? kfree+0xdd/0x170
[  602.772253]  ? i915_gem_object_wait+0x163/0x410 [i915]
[  602.772255]  ? trace_hardirqs_on_caller+0x18d/0x1c0
[  602.772256]  ? trace_hardirqs_on+0xd/0x10
[  602.772278]  i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772300]  i915_gem_object_unbind+0x5e/0x130 [i915]
[  602.772323]  i915_gem_shrink+0x22d/0x3d0 [i915]
[  602.772347]  i915_gem_shrinker_scan+0x3f/0x80 [i915]
[  602.772349]  shrink_slab.constprop.62+0x1ad/0x280
[  602.772352]  shrink_node+0x52/0x80
[  602.772355]  kswapd+0x427/0x5c0
[  602.772358]  kthread+0x122/0x130
[  602.772360]  ? try_to_free_pages+0x270/0x270
[  602.772362]  ? kthread_stop+0x70/0x70
[  602.772365]  ret_from_fork+0x2e/0x40

v2: Add commentary about the pruning being opportunistic

Reported-by: Jan Nordholz <jckn@gmx.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99977#c10
Fixes: e54ca97747 ("drm/i915: Remove completed fences after a wait")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170308132629.7987-1-chris@chris-wilson.co.uk
2017-03-08 20:42:17 +00:00
Daniel Vetter
7ffe939dd9 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge drm-next to get at all the good stuff in drm-misc. We need
that because:

- drm_connector_list_iter conversion for i915 needs the core patches.
- Maarten's patches to use the new atomic state iterators also need
  the core patches.
- We need the new link status property to complete the DP retraining
  work, merging through 2 branches wasn't a good idea and we had to
  partially backtrack.
- Chris needs reservation_object_trylock and we want to roll out
  kref_read everywhere.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-03-08 10:54:45 +01:00
Dave Airlie
2e16101780 Merge tag 'drm-intel-next-2017-03-06' of git://anongit.freedesktop.org/git/drm-intel into drm-next
4 weeks worth of stuff since I was traveling&lazy:

- lspcon improvements (Imre)
- proper atomic state for cdclk handling (Ville)
- gpu reset improvements (Chris)
- lots and lots of polish around fences, requests, waiting and
  everything related all over (both gem and modeset code), from Chris
- atomic by default on gen5+ minus byt/bsw (Maarten did the patch to
  flip the default, really this is a massive joint team effort)
- moar power domains, now 64bit (Ander)
- big pile of in-kernel unit tests for various gem subsystems (Chris),
  including simple mock objects for i915 device and and the ggtt
  manager.
- i915_gpu_info in debugfs, for taking a snapshot of the current gpu
  state. Same thing as i915_error_state, but useful if the kernel didn't
  notice something is stick. From Chris.
- bxt dsi fixes (Umar Shankar)
- bxt w/a updates (Jani)
- no more struct_mutex for gem object unreference (Chris)
- some execlist refactoring (Tvrtko)
- color manager support for glk (Ander)
- improve the power-well sync code to better take over from the
  firmware (Imre)
- gem tracepoint polish (Tvrtko)
- lots of glk fixes all around (Ander)
- ctx switch improvements (Chris)
- glk dsi support&fixes (Deepak M)
- dsi fixes for vlv and clanups, lots of them (Hans de Goede)
- switch to i915.ko types in lots of our internal modeset code (Ander)
- byt/bsw atomic wm update code, yay (Ville)

* tag 'drm-intel-next-2017-03-06' of git://anongit.freedesktop.org/git/drm-intel: (432 commits)
  drm/i915: Update DRIVER_DATE to 20170306
  drm/i915: Don't use enums for hardware engine id
  drm/i915: Split breadcrumbs spinlock into two
  drm/i915: Refactor wakeup of the next breadcrumb waiter
  drm/i915: Take reference for signaling the request from hardirq
  drm/i915: Add FIFO underrun tracepoints
  drm/i915: Add cxsr toggle tracepoint
  drm/i915: Add VLV/CHV watermark/FIFO programming tracepoints
  drm/i915: Add plane update/disable tracepoints
  drm/i915: Kill level 0 wm hack for VLV/CHV
  drm/i915: Workaround VLV/CHV sprite1->sprite0 enable underrun
  drm/i915: Sanitize VLV/CHV watermarks properly
  drm/i915: Only use update_wm_{pre,post} for pre-ilk platforms
  drm/i915: Nuke crtc->wm.cxsr_allowed
  drm/i915: Compute proper intermediate wms for vlv/cvh
  drm/i915: Skip useless watermark/FIFO related work on VLV/CHV when not needed
  drm/i915: Compute vlv/chv wms the atomic way
  drm/i915: Compute VLV/CHV FIFO sizes based on the PM2 watermarks
  drm/i915: Plop vlv/chv fifo sizes into crtc state
  drm/i915: Plop vlv wm state into crtc_state
  ...
2017-03-08 12:41:47 +10:00
Chris Wilson
7c55e2c577 drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl
Before we instantiate/pin the backing store for our use, we
can prepopulate the shmemfs filp efficiently using a write into the
pagecache. We avoid the penalty of instantiating all the pages, important
if the user is just writing to a few and never uses the object on the GPU,
and using a direct write into shmemfs allows it to avoid the cost of
retrieving a page (mostly the clear-before-use, but in theory we could
curtail swapin) before it is overwritten.

This can be extended later to provide additional specialisation for
other backends (other than shmemfs). For now it provides a defense
against very large write-only allocations from exhausting all of system
memory.

v2: Smelling fixes.

Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
References: https://bugs.freedesktop.org/show_bug.cgi?id=99107
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307120338.7277-2-chris@chris-wilson.co.uk
2017-03-07 21:26:59 +00:00
Chris Wilson
4e5462ee84 drm/i915: Store a permanent error in obj->mm.pages
Once the object has been truncated, it is unrecoverable. To facilitate
detection of this state store the error in obj->mm.pages.

This is required for the next patch which should be applied to v4.10
(via stable), so we also need to mark this patch for backporting. In
that regard, let's consider this to be a fix/improvement too.

v2: Avoid dereferencing the ERR_PTR when freeing the object.

Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170307132031.32461-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-07 21:25:38 +00:00
Chris Wilson
0542524944 drm/i915: Generalise wait for execlists to be idle
The code to check for execlists completion is generic, so move it to
intel_engine_cs.c, where we can reuse the new intel_engine_is_idle().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170303121947.20482-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-03 13:08:15 +00:00
Chris Wilson
c8659efac5 drm/i915: Drop spinlocks around adding to the client request list
Adding to the tail of the client request list as the only other user is
in the throttle ioctl that iterates forwards over the list. It only
needs protection against deletion of a request as it reads it, it simply
won't see a new request added to the end of the list, or it would be too
early and rejected. We can further reduce the number of spinlocks
required when throttling by removing stale requests from the client_list
as we throttle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302122525.19675-1-chris@chris-wilson.co.uk
2017-03-02 22:33:41 +00:00
Chris Wilson
c998e8a0f4 drm/i915: Hold rpm during GEM suspend in driver unload/suspend
i915_gem_suspend() tries to access the device to ensure it is idle and
all writes from the device are flushed to memory. It assumed is already
held the runtime pm wakeref, but we should explicitly acquire it for our
access to be safe.

[  619.926287] WARNING: CPU: 3 PID: 9353 at drivers/gpu/drm/i915/intel_drv.h:1750 gen6_write32+0x23e/0x2a0 [i915]
[  619.926300] RPM wakelock ref not held during HW access
[  619.926311] Modules linked in: vgem x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul snd_pcm mei_me mei lpc_ich ghash_clmulni_intel i915(-) sdhci_pci sdhci mmc_core e1000e ptp pps_core prime_numbers [last unloaded: snd_hda_intel]
[  619.926578] CPU: 3 PID: 9353 Comm: drv_module_relo Tainted: G     U          4.10.0-CI-Trybot_609+ #1
[  619.926585] Hardware name: LENOVO 42962WU/42962WU, BIOS 8DET56WW (1.26 ) 12/01/2011
[  619.926592] Call Trace:
[  619.926609]  dump_stack+0x67/0x92
[  619.926625]  __warn+0xc6/0xe0
[  619.926640]  warn_slowpath_fmt+0x4a/0x50
[  619.926726]  gen6_write32+0x23e/0x2a0 [i915]
[  619.926801]  gen6_mm_switch+0x38/0x70 [i915]
[  619.926871]  i915_switch_context+0xec/0xa10 [i915]
[  619.926942]  i915_gem_switch_to_kernel_context+0x13c/0x2b0 [i915]
[  619.927019]  i915_gem_suspend+0x2b/0x180 [i915]
[  619.927079]  i915_driver_unload+0x22/0x200 [i915]
[  619.927093]  ? __this_cpu_preempt_check+0x13/0x20
[  619.927105]  ? trace_hardirqs_on_caller+0xe7/0x200
[  619.927118]  ? trace_hardirqs_on+0xd/0x10
[  619.927128]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
[  619.927192]  i915_pci_remove+0x14/0x20 [i915]
[  619.927205]  pci_device_remove+0x34/0xb0
[  619.927219]  device_release_driver_internal+0x158/0x210
[  619.927234]  driver_detach+0x3b/0x80
[  619.927245]  bus_remove_driver+0x53/0xd0
[  619.927256]  driver_unregister+0x27/0x50
[  619.927267]  pci_unregister_driver+0x25/0xa0
[  619.927351]  i915_exit+0x1a/0xb1a [i915]
[  619.927362]  SyS_delete_module+0x193/0x1e0
[  619.927378]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  619.927386] RIP: 0033:0x7f82b46c5d37
[  619.927393] RSP: 002b:00007ffdb6f610d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[  619.927408] RAX: ffffffffffffffda RBX: ffffffff81481ff3 RCX: 00007f82b46c5d37
[  619.927415] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 000000000224f558
[  619.927422] RBP: ffffc90001187f88 R08: 0000000000000000 R09: 00007ffdb6f61100
[  619.927428] R10: 000000000224f4e0 R11: 0000000000000246 R12: 0000000000000000
[  619.927435] R13: 00007ffdb6f612b0 R14: 0000000000000000 R15: 0000000000000000
[  619.927451]  ? __this_cpu_preempt_check+0x13/0x20

or

[  641.646590] WARNING: CPU: 1 PID: 8913 at drivers/gpu/drm/i915/intel_drv.h:1750 intel_runtime_pm_get_noresume+0x8b/0x90 [i915]
[  641.646595] RPM wakelock ref not held during HW access
[  641.646600] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ghash_clmulni_intel snd_pcm mei_me mei i915(-) r8169 mii prime_numbers i2c_hid [last unloaded: snd_hda_intel]
[  641.646825] CPU: 1 PID: 8913 Comm: drv_module_relo Tainted: G     U          4.10.0-CI-Trybot_609+ #1
[  641.646836] Hardware name: TOSHIBA SATELLITE P50-C/06F4                            , BIOS 1.20 10/08/2015
[  641.646843] Call Trace:
[  641.646857]  dump_stack+0x67/0x92
[  641.646869]  __warn+0xc6/0xe0
[  641.646880]  warn_slowpath_fmt+0x4a/0x50
[  641.646893]  ? __this_cpu_preempt_check+0x13/0x20
[  641.646904]  ? trace_hardirqs_on_caller+0xe7/0x200
[  641.646957]  intel_runtime_pm_get_noresume+0x8b/0x90 [i915]
[  641.647022]  __i915_add_request+0x423/0x540 [i915]
[  641.647080]  i915_gem_switch_to_kernel_context+0x148/0x2b0 [i915]
[  641.647145]  i915_gem_suspend+0x2b/0x180 [i915]
[  641.647189]  i915_driver_unload+0x22/0x200 [i915]
[  641.647200]  ? __this_cpu_preempt_check+0x13/0x20
[  641.647210]  ? trace_hardirqs_on_caller+0xe7/0x200
[  641.647220]  ? trace_hardirqs_on+0xd/0x10
[  641.647231]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
[  641.647276]  i915_pci_remove+0x14/0x20 [i915]
[  641.647293]  pci_device_remove+0x34/0xb0
[  641.647307]  device_release_driver_internal+0x158/0x210
[  641.647321]  driver_detach+0x3b/0x80
[  641.647330]  bus_remove_driver+0x53/0xd0
[  641.647338]  driver_unregister+0x27/0x50
[  641.647348]  pci_unregister_driver+0x25/0xa0
[  641.647415]  i915_exit+0x1a/0xb1a [i915]
[  641.647429]  SyS_delete_module+0x193/0x1e0
[  641.647444]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  641.647453] RIP: 0033:0x7fc622bd2d37
[  641.647463] RSP: 002b:00007ffff8ffb5c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[  641.647475] RAX: ffffffffffffffda RBX: ffffffff81481ff3 RCX: 00007fc622bd2d37
[  641.647480] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 0000000000d49118
[  641.647485] RBP: ffffc90000997f88 R08: 0000000000000000 R09: 00007ffff8ffb5f0
[  641.647491] R10: 0000000000d490a0 R11: 0000000000000246 R12: 0000000000000000
[  641.647498] R13: 00007ffff8ffb7a0 R14: 0000000000000000 R15: 0000000000000000
[  641.647510]  ? __this_cpu_preempt_check+0x13/0x20

v2: Keep holding rpm until the end to cover i915_gem_sanitize() as well.

Fixes: 5ab57c7020 ("drm/i915: Flush logical context image out to memory upon suspend")
Fixes: 1c777c5d1d ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302083029.19576-1-chris@chris-wilson.co.uk
Reviewed-by: Imre Deak <imre.deak@intel.com>
Cc: <stable@vger.kernel.org> # v4.9+
2017-03-02 12:44:08 +00:00
Chris Wilson
67b807a892 drm/i915: Delay disabling the user interrupt for breadcrumbs
A significant cost in setting up a wait is the overhead of enabling the
interrupt. As we disable the interrupt whenever the queue of waiters is
empty, if we are frequently waiting on alternating batches, we end up
re-enabling the interrupt on a frequent basis. We do want to disable the
interrupt during normal operations as under high load it may add several
thousand interrupts/s - we have been known in the past to occupy whole
cores with our interrupt handler after accidentally leaving user
interrupts enabled. As a compromise, leave the interrupt enabled until
the next IRQ, or the system is idle. This gives a small window for a
waiter to keep the interrupt active and not be delayed by having to
re-enable the interrupt.

v2: Restore hangcheck/missed-irq detection for continuations
v3: Be more careful restoring the hangcheck timer after reset
v4: Be more careful restoring the fake irq after reset (if required!)
v5: Redo changes to intel_engine_wakeup()
v6: Factor out __intel_engine_wakeup()
v7: Improve commentary for declaring a missed wakeup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-4-chris@chris-wilson.co.uk
2017-02-27 21:57:23 +00:00
Dave Jiang
11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Kenneth Graunke
ef0f411f51 drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters.
This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
(indicating the optional feature is not supported), and makes execbuf
always return -EINVAL if the flags are used.

Apparently, no userspace ever shipped which used this optional feature:
I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
and there were zero commits showing a use of these flags.  Kernel commit
72bfa19c8d apparently introduced the feature prematurely.  According
to Chris, the intention was to use this in cairo-drm, but "the use was
broken for gen6", so I don't think it ever happened.

'relative_constants_mode' has always been tracked per-device, but this
has actually been wrong ever since hardware contexts were introduced, as
the INSTPM register is saved (and automatically restored) as part of the
render ring context. The software per-device value could therefore get
out of sync with the hardware per-context value.  This meant that using
them is actually unsafe: a client which tried to use them could damage
the state of other clients, causing the GPU to interpret their BO
offsets as absolute pointers, leading to bogus memory reads.

These flags were also never ported to execlist mode, making them no-ops
on Gen9+ (which requires execlists), and Gen8 in the default mode.

On Gen8+, userspace can write these registers directly, achieving the
same effect.  On Gen6-7.5, it likely makes sense to extend the command
parser to support them.  I don't think anyone wants this on Gen4-5.

Based on a patch by Dave Gordon.

v3: Return -ENODEV for the getparam, as this is what we do for other
    obsolete features.  Suggested by Chris Wilson.

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-24 17:51:05 +00:00
Chris Wilson
754c9fd576 drm/i915: Protect the request->global_seqno with the engine->timeline lock
A request is assigned a global seqno only when it is on the hardware
execution queue. The global seqno can be used to maintain a list of
requests on the same engine in retirement order, for example for
constructing a priority queue for waiting. Prior to its execution, or
if it is subsequently removed in the event of preemption, its global
seqno is zero. As both insertion and removal from the execution queue
may operate in IRQ context, it is not guarded by the usual struct_mutex
BKL. Instead those relying on the global seqno must be prepared for its
value to change between reads. Only when the request is complete can
the global seqno be stable (due to the memory barriers on submitting
the commands to the hardware to write the breadcrumb, if the HWS shows
that it has passed the global seqno and the global seqno is unchanged
after the read, it is indeed complete).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-9-chris@chris-wilson.co.uk
2017-02-23 14:49:32 +00:00
Dave Airlie
94000cc329 Linux 4.10-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYoM2fAAoJEHm+PkMAQRiGr9MH/izEAMri7rJ0QMc3ejt+WmD0
 8pkZw3+MVn71z6cIEgpzk4QkEWJd5rfhkETCeCp7qQ9V6cDW1FDE9+0OmPjiphDt
 nnzKs7t7skEBwH5Mq5xygmIfkv+Z0QGHZ20gfQWY3F56Uxo+ARF88OBHBLKhqx3v
 98C7YbMFLKBslKClA78NUEIdx0UfBaRqerlERx0Lfl9aoOrbBS6WI3iuREiylpih
 9o7HTrwaGKkU4Kd6NdgJP2EyWPsd1LGalxBBjeDSpm5uokX6ALTdNXDZqcQscHjE
 RmTqJTGRdhSThXOpNnvUJvk9L442yuNRrVme/IqLpxMdHPyjaXR3FGSIDb2SfjY=
 =VMy8
 -----END PGP SIGNATURE-----

Merge tag 'v4.10-rc8' into drm-next

Linux 4.10-rc8

Backmerge Linus rc8 to fix some conflicts, but also
to avoid pulling it in via a fixes pull from someone.
2017-02-23 12:10:12 +10:00
Chris Wilson
d59b21ec6f drm/i915: Remove 'retire' parameter from intel_fb_obj_flush
Setting retire=true is identical to using origin=ORIGIN_CS, so make the
same simplification to intel_fb_obj_flush() as already employed for
intel_fb_obj_invalidate().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-6-chris@chris-wilson.co.uk
2017-02-22 12:12:17 +00:00
Chris Wilson
57822dc6b9 drm/i915: Perform object clflushing asynchronously
Flushing the cachelines for an object is slow, can be as much as 100ms
for a large framebuffer. We currently do this under the struct_mutex BKL
on execution or on pageflip. But now with the ability to add fences to
obj->resv for both flips and execbuf (and we naturally wait on the fence
before CPU access), we can move the clflush operation to a workqueue and
signal a fence for completion, thereby doing the work asynchronously and
not blocking the driver or its clients.

v2: Introduce i915_gem_clflush.h and use a new name, split out some
extras into separate patches.

Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-5-chris@chris-wilson.co.uk
2017-02-22 12:12:15 +00:00
Chris Wilson
f6aaba4dfb drm/i915: Skip clflushes for all non-page backed objects
Generalise the skip for physical and stolen objects by skipping anything
we do not have a valid address for inside the sg.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-4-chris@chris-wilson.co.uk
2017-02-22 12:12:14 +00:00
Chris Wilson
5a97bcc69c drm/i915: Amalgamate flushing of display objects
We have three different paths by which userspace wants to flush the
display plane (i.e. objects with obj->pin_display). Use a common helper
to identify those paths and to simplify a later change.

v2: Include the conditional in the name, i915_gem_object_flush_if_display

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-3-chris@chris-wilson.co.uk
2017-02-22 12:12:13 +00:00
Chris Wilson
e59dc17211 drm/i915: Move cpu_cache_is_coherent() to header
For use in the next patch, take the current is-coherent helper and add
it to i915_gem_object.h

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-2-chris@chris-wilson.co.uk
2017-02-22 12:12:12 +00:00
Chris Wilson
208b84a375 drm/i915: Remove change_domain tracepoint
The change_domain tracepoint has been inaccurate for a few years - it
doesn't fully capture the domains, especially with userspace bypassing
them. It is defunct, misleading and time to be removed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-1-chris@chris-wilson.co.uk
2017-02-22 12:12:11 +00:00
Chris Wilson
e54ca97747 drm/i915: Remove completed fences after a wait
If we wait upon the full (i.e. all shared fences, or upon an exclusive
fence) reservation object successfully, we know that all fences beneath
it have been signaled, so long as no new fences were added whilst we
slept. If the reservation_object remains the same, as detected by its
seqcount, we can then reap all the fences upon completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170217151304.16665-6-chris@chris-wilson.co.uk
2017-02-17 15:31:15 +00:00
Chris Wilson
581ab1fe52 drm/i915: Unwind conversion to i915_gem_phys_ops on failure
The physical object is treated as permanently pinned. If we fail to take
this initial pin during i915_gem_object_attach_phys() we need to revert
it back to an ordinary shmemfs object before reporting the failure.

v2: git-add

Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215163900.11606-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-16 20:31:13 +00:00
Chris Wilson
c1d2061b28 drm/i915: Squelch any ktime/jiffie rounding errors for wait-ioctl
We wait upon jiffies, but report the time elapsed using a
high-resolution timer. This discrepancy can lead to us timing out the
wait prior to us reporting the elapsed time as complete.

This restores the squelching lost in commit e95433c73a ("drm/i915:
Rearrange i915_wait_request() accounting with callers").

Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170216125441.30923-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-16 20:31:13 +00:00
Chris Wilson
636deb5b22 drm/i915: Pass timeout==0 on to i915_gem_object_wait_fence()
The i915_gem_object_wait_fence() uses an incoming timeout=0 to query
whether the current fence is busy or idle, without waiting. This can be
used by the wait-ioctl to implement a busy query.

Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Testcase: igt/gem_wait/basic-busy-write-all
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170212215344.16600-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit d892e9398e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-02-16 11:59:13 +02:00
Chris Wilson
ec62ed3e1d drm/i915: Restore context and pd for ringbuffer submission after reset
Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).

v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
(cherry picked from commit c0dcb203fb)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-02-16 11:59:11 +02:00
Chris Wilson
d892e9398e drm/i915: Pass timeout==0 on to i915_gem_object_wait_fence()
The i915_gem_object_wait_fence() uses an incoming timeout=0 to query
whether the current fence is busy or idle, without waiting. This can be
used by the wait-ioctl to implement a busy query.

Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Testcase: igt/gem_wait/basic-busy-write-all
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170212215344.16600-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-14 09:38:42 +00:00
Chris Wilson
170594502c drm/i915: Test coherency of and barriers between cache domains
Write into an object using WB, WC, GTT, and GPU paths and make sure that
our internal API is sufficient to ensure coherent reads and writes.

v2: Avoid invalid free upon allocation error

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-21-chris@chris-wilson.co.uk
2017-02-13 20:45:45 +00:00
Chris Wilson
8335fd65ce drm/i915: Add selftests for object allocation, phys
The phys object is a rarely used device (only very old machines require
a chunk of physically contiguous pages for a few hardware interactions).
As such, it is not exercised by CI and to combat that we want to add a
test that exercises the phys object on all platforms.

v2: Always set err on error paths and not rely on inheriting the err.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-17-chris@chris-wilson.co.uk
2017-02-13 20:45:42 +00:00
Chris Wilson
44653988ef drm/i915: Create a fake object for testing huge allocations
We would like to be able to exercise huge allocations even on memory
constrained devices. To do this we create an object that allocates only
a few pages and remaps them across its whole range - each page is reused
multiple times. We can therefore pretend we are rendering into a much
larger object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-9-chris@chris-wilson.co.uk
2017-02-13 20:45:34 +00:00
Chris Wilson
66d9cb5d80 drm/i915: Mock the GEM device for self-testing
A simulacrum of drm_i915_private to let us pretend interactions with the
device.

v2: Tidy init error paths

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-6-chris@chris-wilson.co.uk
2017-02-13 20:45:28 +00:00
Chris Wilson
935a2f776a drm/i915: Add some selftests for sg_table manipulation
Start exercising the scattergather lists, especially looking at
iteration after coalescing.

v2: Comment on the peculiarity of table construction (i.e. why this
sg_table might be interesting).
v3: Added one __func__ to identify expect_pfn_sg()
v4: Loop until we have crossed the chain boundary (forcing sg_table to
do multiple allocations) before squelching a potential ENOMEM from oom.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-2-chris@chris-wilson.co.uk
2017-02-13 20:45:23 +00:00
Chris Wilson
2ae557388d drm/i915: Clear the last_retired_context following a hang/reset
Following a hang and reset, we know that the engine is idle and all
context state has been saved or lost. Consequently, we know that the
engine is no longer referencing the last context and we can relinquish
our tracking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-5-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-13 11:19:12 +00:00
Chris Wilson
fe3288b5da drm/i915: Park the breadcrumbs signaler across a GPU reset
The signal threads may be running concurrently with the GPU reset. The
completion from the GPU run asynchronous with the reset and two threads
may see different snapshots of the state, and the signaler may mark a
request as complete as we try to reset it. We don't tolerate 2 different
views of the same state and complain if we try to mark a request as
failed if it is already complete. Disable the signal threads during
reset to prevent this conflict (even though the conflict implies that
the state we resetting to is invalid, we have already made our
decision!).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99733
References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-4-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-13 11:18:28 +00:00
Chris Wilson
1d309634bc drm/i915: Kill the tasklet then disable
Disabling the tasklet leaves it if scheduled on the ready to run list
until it is re-enabled. This will leave the ksoftird thread spinning
until satisfied. To prevent this situation on starting the GPU reset, we
want to kill the tasklet first and then disable. The same problem will
arise when a tasklet is scheduled from another device, so a better
solution is required for the general case.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1f7b847d72 ("drm/i915: Disable engine->irq_tasklet around resets")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-13 11:18:23 +00:00
Chris Wilson
c00122f33f drm/i915: Assert that the active request hasn't been signaled
As the request is not complete, it should not be signaled. Assert that
this is true before we process the request for a reset.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-13 11:06:35 +00:00
Chris Wilson
8c12d12159 drm/i915: Move the irq_barrier for reset earlier into reset_prepare
When updating the bookkeeping following the reset, we need the seqno to
be coherent on the CPU prior to trusting its result for deciding whether
any request is completed. We need the irq_barrier before we start making
these decisions, i.e. in reset_prepare.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99733
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210185214.23463-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
2017-02-10 21:10:51 +00:00
Chris Wilson
c4d4c1c66b drm/i915: Flush the freed object queue on device release
As dmabufs may live beyond the PCI device removal, we need to flush the
freed object worker on device release, and include a warning in case
there is a leak.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210163523.17533-3-chris@chris-wilson.co.uk
2017-02-10 19:10:05 +00:00
Daniel Vetter
51a831a772 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Chris Wilson needs the new drm_driver->release callback to make sure
the shiny new dma-buf testcases don't oops the driver on unload.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-02-10 16:27:24 +01:00
Chris Wilson
1f7b847d72 drm/i915: Disable engine->irq_tasklet around resets
When we restart the engines, and we have active requests, a request on
the first engine may complete and queue a request to the second engine
before we try to restart the second engine. That queueing of the
request may race with the engine to restart, and so may corrupt the
current state. Disabling the engine->irq_tasklet prevents the two paths
from writing into ELSP simultaneously (and modifyin the execlists_port[]
at the same time).

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_exec_fence/await-hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-3-chris@chris-wilson.co.uk
2017-02-08 16:24:42 +00:00
Chris Wilson
d802709313 drm/i915: Split GEM resetting into 3 phases
Currently we do a reset prepare/finish around the call to reset the GPU,
but it looks like we need a later stage after the hw has been
reinitialised to allow GEM to restart itself. Start by splitting the 2
GEM phases into 3:

  prepare - before the reset, check if GEM recovered, then stop GEM

  reset - after the reset, update GEM bookkeeping

  finish - after the re-initialisation following the reset, restart GEM

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-2-chris@chris-wilson.co.uk
2017-02-08 16:24:42 +00:00
Chris Wilson
20a8a74ad9 drm/i915: Move calling engine->init_hw() to its own function
Just a simple refactor to move a loop and some support code out of
i915_gem_init_hw(). This is in preparation for avoiding a race between
the tasklet writing to ELSP whilst simultaneously being written by
engine->init_hw() following a GPU reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-1-chris@chris-wilson.co.uk
2017-02-08 16:24:42 +00:00
Chris Wilson
519d524981 drm/i915: i915_gem_shrink_all() needs an awake device
Since to unbind an object, we may need a powered up device to access the
GTT entries, we only shrink bound objects if awake. Callers to
i915_gem_shrink_all() had to take this into account and take the rpm
wakeref, but we can move this wakeref into the shrink_all itself for
convenience and making the function live up to its name.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208104710.18089-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-08 16:24:42 +00:00
Chris Wilson
83bf6d55c1 drm/i915: Remove overzealous fence warn on runtime suspend
The goal of the WARN was to catch when we are still actively using the
fence as we go into the runtime suspend. However, the reg->pin_count is
too coarse as it does not distinguish between exclusive ownership of the
fence register from activity.

I've not improved on the WARN, nor have we captured this WARN in an
exact igt, but it is showing up regularly in the wild:

[ 1915.935332] WARNING: CPU: 1 PID: 10861 at drivers/gpu/drm/i915/i915_gem.c:2022 i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.935383] WARN_ON(reg->pin_count)[ 1915.935399] Modules linked in:
 snd_hda_intel i915 drm_kms_helper vgem netconsole scsi_transport_iscsi fuse vfat fat x86_pkg_temp_thermal coretemp intel_cstate intel_uncore snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mei_me mei serio_raw intel_rapl_perf intel_pch_thermal soundcore wmi acpi_pad i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 mii video [last unloaded: drm_kms_helper]
[ 1915.935785] CPU: 1 PID: 10861 Comm: kworker/1:0 Tainted: G     U  W       4.9.0-rc5+ #170
[ 1915.935799] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
[ 1915.935822] Workqueue: pm pm_runtime_work
[ 1915.935845]  ffffc900044fbbf0 ffffffffac3220bc ffffc900044fbc40 0000000000000000
[ 1915.935890]  ffffc900044fbc30 ffffffffac059bcb 000007e6044fbc60 ffff8801626e3198
[ 1915.935937]  ffff8801626e0000 0000000000000002 ffffffffc05e5d4e 0000000000000000
[ 1915.935985] Call Trace:
[ 1915.936013]  [<ffffffffac3220bc>] dump_stack+0x4f/0x73
[ 1915.936038]  [<ffffffffac059bcb>] __warn+0xcb/0xf0
[ 1915.936060]  [<ffffffffac059c4f>] warn_slowpath_fmt+0x5f/0x80
[ 1915.936158]  [<ffffffffc052d916>] i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.936251]  [<ffffffffc04f1c74>] intel_runtime_suspend+0x64/0x280 [i915]
[ 1915.936277]  [<ffffffffac0926f1>] ? dequeue_entity+0x241/0xbc0
[ 1915.936298]  [<ffffffffac36bb85>] pci_pm_runtime_suspend+0x55/0x180
[ 1915.936317]  [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936339]  [<ffffffffac4514e2>] __rpm_callback+0x32/0x70
[ 1915.936356]  [<ffffffffac451544>] rpm_callback+0x24/0x80
[ 1915.936375]  [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936392]  [<ffffffffac45222d>] rpm_suspend+0x12d/0x680
[ 1915.936415]  [<ffffffffac69f6d7>] ? _raw_spin_unlock_irq+0x17/0x30
[ 1915.936435]  [<ffffffffac0810b8>] ? finish_task_switch+0x88/0x220
[ 1915.936455]  [<ffffffffac4534bf>] pm_runtime_work+0x6f/0xb0
[ 1915.936477]  [<ffffffffac074353>] process_one_work+0x1f3/0x4d0
[ 1915.936501]  [<ffffffffac074678>] worker_thread+0x48/0x4e0
[ 1915.936523]  [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936542]  [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936559]  [<ffffffffac07a2c9>] kthread+0xd9/0xf0
[ 1915.936580]  [<ffffffffac07a1f0>] ? kthread_park+0x60/0x60
[ 1915.936600]  [<ffffffffac69fe62>] ret_from_fork+0x22/0x30

In the case the register is pinned, it should be present and we will
need to invalidate them to be restored upon resume as we cannot expect
the owner of the pin to call get_fence prior to use after resume.

Fixes: 7c108fd8fe ("drm/i915: Move fence cancellation to runtime suspend")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98804
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170203125717.8431-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit e0ec3ec698)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-02-08 13:27:27 +02:00
Chris Wilson
e3818697e1 drm/i915: Flush untouched framebuffers before display on !llc
On a non-llc system, the objects are created with .cache_level =
CACHE_NONE and so the transition to uncached for scanout is a no-op.
However, if the object was never written to, it will still be in the CPU
domain (having been zeroed out by shmemfs). Those cachelines need to be
flushed prior to display.

Reported-and-tested-by: Vito Caputo
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170109111932.6342-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 69aeafeae9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-02-08 13:10:24 +02:00
Chris Wilson
c0dcb203fb drm/i915: Restore context and pd for ringbuffer submission after reset
Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).

v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-07 21:34:58 +00:00
Chris Wilson
e0ec3ec698 drm/i915: Remove overzealous fence warn on runtime suspend
The goal of the WARN was to catch when we are still actively using the
fence as we go into the runtime suspend. However, the reg->pin_count is
too coarse as it does not distinguish between exclusive ownership of the
fence register from activity.

I've not improved on the WARN, nor have we captured this WARN in an
exact igt, but it is showing up regularly in the wild:

[ 1915.935332] WARNING: CPU: 1 PID: 10861 at drivers/gpu/drm/i915/i915_gem.c:2022 i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.935383] WARN_ON(reg->pin_count)[ 1915.935399] Modules linked in:
 snd_hda_intel i915 drm_kms_helper vgem netconsole scsi_transport_iscsi fuse vfat fat x86_pkg_temp_thermal coretemp intel_cstate intel_uncore snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mei_me mei serio_raw intel_rapl_perf intel_pch_thermal soundcore wmi acpi_pad i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 mii video [last unloaded: drm_kms_helper]
[ 1915.935785] CPU: 1 PID: 10861 Comm: kworker/1:0 Tainted: G     U  W       4.9.0-rc5+ #170
[ 1915.935799] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
[ 1915.935822] Workqueue: pm pm_runtime_work
[ 1915.935845]  ffffc900044fbbf0 ffffffffac3220bc ffffc900044fbc40 0000000000000000
[ 1915.935890]  ffffc900044fbc30 ffffffffac059bcb 000007e6044fbc60 ffff8801626e3198
[ 1915.935937]  ffff8801626e0000 0000000000000002 ffffffffc05e5d4e 0000000000000000
[ 1915.935985] Call Trace:
[ 1915.936013]  [<ffffffffac3220bc>] dump_stack+0x4f/0x73
[ 1915.936038]  [<ffffffffac059bcb>] __warn+0xcb/0xf0
[ 1915.936060]  [<ffffffffac059c4f>] warn_slowpath_fmt+0x5f/0x80
[ 1915.936158]  [<ffffffffc052d916>] i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.936251]  [<ffffffffc04f1c74>] intel_runtime_suspend+0x64/0x280 [i915]
[ 1915.936277]  [<ffffffffac0926f1>] ? dequeue_entity+0x241/0xbc0
[ 1915.936298]  [<ffffffffac36bb85>] pci_pm_runtime_suspend+0x55/0x180
[ 1915.936317]  [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936339]  [<ffffffffac4514e2>] __rpm_callback+0x32/0x70
[ 1915.936356]  [<ffffffffac451544>] rpm_callback+0x24/0x80
[ 1915.936375]  [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936392]  [<ffffffffac45222d>] rpm_suspend+0x12d/0x680
[ 1915.936415]  [<ffffffffac69f6d7>] ? _raw_spin_unlock_irq+0x17/0x30
[ 1915.936435]  [<ffffffffac0810b8>] ? finish_task_switch+0x88/0x220
[ 1915.936455]  [<ffffffffac4534bf>] pm_runtime_work+0x6f/0xb0
[ 1915.936477]  [<ffffffffac074353>] process_one_work+0x1f3/0x4d0
[ 1915.936501]  [<ffffffffac074678>] worker_thread+0x48/0x4e0
[ 1915.936523]  [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936542]  [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936559]  [<ffffffffac07a2c9>] kthread+0xd9/0xf0
[ 1915.936580]  [<ffffffffac07a1f0>] ? kthread_park+0x60/0x60
[ 1915.936600]  [<ffffffffac69fe62>] ret_from_fork+0x22/0x30

In the case the register is pinned, it should be present and we will
need to invalidate them to be restored upon resume as we cannot expect
the owner of the pin to call get_fence prior to use after resume.

Fixes: 7c108fd8fe ("drm/i915: Move fence cancellation to runtime suspend")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98804
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170203125717.8431-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-07 14:37:37 +00:00
Chris Wilson
4e64e5539d drm: Improve drm_mm search (and fix topdown allocation) with rbtrees
The drm_mm range manager claimed to support top-down insertion, but it
was neither searching for the top-most hole that could fit the
allocation request nor fitting the request to the hole correctly.

In order to search the range efficiently, we create a secondary index
for the holes using either their size or their address. This index
allows us to find the smallest hole or the hole at the bottom or top of
the range efficiently, whilst keeping the hole stack to rapidly service
evictions.

v2: Search for holes both high and low. Rename flags to mode.
v3: Discover rb_entry_safe() and use it!
v4: Kerneldoc for enum drm_mm_insert_mode.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk
2017-02-03 11:10:32 +01:00
Chris Wilson
69aeafeae9 drm/i915: Flush untouched framebuffers before display on !llc
On a non-llc system, the objects are created with .cache_level =
CACHE_NONE and so the transition to uncached for scanout is a no-op.
However, if the object was never written to, it will still be in the CPU
domain (having been zeroed out by shmemfs). Those cachelines need to be
flushed prior to display.

Reported-and-tested-by: Vito Caputo
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170109111932.6342-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-02-01 10:54:17 +00:00
Chris Wilson
241455172b drm/i915: Reset the gpu on takeover
The GPU may be in an unknown state following resume and module load. The
previous occupant may have left contexts loaded, or other dangerous
state, which can cause an immediate GPU hang for us. The only save
course of action is to reset the GPU prior to using it - similarly to
how we reset the GPU prior to unload (before a second user may be
affected by our leftover state).

We need to reset the GPU very early in our load/resume sequence so that
any stale HW pointers are revoked prior to any resource allocations we
make (that may conflict).

A reset should only be a couple of milliseconds on a slow device, a cost
we should easily be able to absorb into our initialisation times.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110135.6418-2-chris@chris-wilson.co.uk
2017-01-24 12:29:48 +00:00
Chris Wilson
e0216b762a drm/i915: Treat an error from i915_vma_instance() as unlikely
When pinning into the global GTT, an error from creating the VMA is
unlikely, so mark it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170119192659.31789-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-21 10:32:21 +00:00
Chris Wilson
befedbb7e2 drm/i915: Use common LRU inactive vma bumping for unpin_from_display
Now that i915_gem_object_bump_inactive_ggtt() exists, also make use of
it for the LRU bumping from i915_gem_object_unpin_from_display()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170119192659.31789-2-chris@chris-wilson.co.uk
Reviewed-by: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-21 10:31:44 +00:00
Chris Wilson
d65415df07 drm/i915: Do an unlocked wait before set-cache-level ioctl
Since a change in cache level is likely to trigger an unbind, avoid
waiting under the mutex by preemptively doing an unlocked wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170119082211.21257-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-01-21 09:19:17 +00:00
Chris Wilson
718659a630 drm/i915: Rename some warts in the VMA API
Whilst writing testcases to exercise the VMA API, some oddities came to
light, such as i915_gem_obj_lookup_or_create(). Joonas suggested
i915_vma_instance() as a neat replacement, so rename them, move them to
i915_vma.c and add some kerneldoc as a sugary bonus.

s/i915_gem_obj_to_vma/i915_vma_lookup/
s/i915_gem_obj_lookup_or_create_vma/i915_vma_instance/

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-2-chris@chris-wilson.co.uk
2017-01-19 10:15:26 +00:00
Mika Kuoppala
71895a0858 drm/i915: Add comment how we treat hung contexts
Explain in a comment how and why we treat hung context like we do.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-7-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:47:32 +00:00
Chris Wilson
0e178aef8f drm/i915: Detect a failed GPU reset+recovery
If we can't recover the GPU after the reset, mark it as wedged to cancel
the outstanding tasks and to prevent new users from trying to use the
broken GPU.

v2: Check the same ring is hung again before declaring the reset broken.
v3: use engine_stalled (Mika)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-6-git-send-email-mika.kuoppala@intel.com
2017-01-18 10:47:26 +00:00
Mika Kuoppala
61da536204 drm/i915: Tidy up engine reset logic
Split engine reset for engine and request specific parts.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-5-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:44:51 +00:00
Mika Kuoppala
bf2f04366c drm/i915: Introduce engine_stalled helper
Move the engine stalled/pardoned check into a helper function.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-4-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:44:50 +00:00
Mika Kuoppala
211b12afe6 drm/i915: Cleanup request skip decision
Since we now only skip banned contexts, preventing the skip of default
contexts is no longer sensible. For a similar argument as before
'commit 7ec73b7e36 ("drm/i915: Only skip requests once a context is banned")'
we end up with an inconsistent API if we only mark future execbufs from
the default ctx as banned but fail to mark those currently executing
as failed.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-3-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:44:50 +00:00
Mika Kuoppala
36193acd54 drm/i915: Introduce engine_skip_context
Add a new function for skipping all pending requests
for a context in order to make engine reset flow more
readable.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-2-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:44:47 +00:00
Chris Wilson
4c96554365 drm/i915: Move engine reset preparation to i915_gem_reset_prepare()
Now that we have prepare/finish routines for the GEM reset, move the
disabling of the engine->irq_tasklet into them to reduce repetition. The
device irq enable/disable is split out to ensure it is run first and
last always (even if the GPU reset fails).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-1-git-send-email-mika.kuoppala@intel.com
2017-01-18 10:44:20 +00:00
Chris Wilson
f131e3562e drm/i915: Skip switch to kernel context if already done
Some engines are never user or already sitting idle in the kernel
context and for those we can skip flushing the current context for
i915_gem_switch_to_kernel_context(). We used to perform this
optimisation but that was removed for convenience of converting over to
multiple timelines and handling the pending request queues.

From the perspective of writing selftests, reducing the number of
background operations on the engines makes defining assertions easier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114162334.10271-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-16 12:21:20 +00:00
Chris Wilson
47a8e3f6ae drm/i915: Eliminate superfluous i915_ggtt_view_normal
Since commit 058d88c433 ("drm/i915: Track pinned VMA"), there is only
one user of i915_ggtt_view_normal rodate. Just treat NULL as no special
view in pin_to_display() like everywhere else.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-7-chris@chris-wilson.co.uk
2017-01-14 16:18:36 +00:00
Chris Wilson
8bab1193c1 drm/i915: Convert i915_ggtt_view to use an anonymous union
Reading the ggtt_views is much more pleasant without the extra
characters from specifying the union (i.e. ggtt_view.partial rather than
ggtt_view.params.partial). To make this work inside i915_vma_compare()
with only a single memcmp requires us to ensure that there are no
uninitialised bytes within each branch of the union (we make sure the
structs are packed) and we need to store the size of each branch.

v4: Rewrite changelog and add comments explaining the assert.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-14 16:18:03 +00:00
Chris Wilson
3bf4d57519 drm/i915: Stop clearing i915_ggtt_view
As we now use a compact memcmp in i915_vma_compare(), we can forgo
clearing the entire view and only set the precise parameters used in
this view.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-4-chris@chris-wilson.co.uk
2017-01-14 16:17:45 +00:00
Chris Wilson
e4621b73b6 drm/i915: Fix phys pwrite for struct_mutex-less operation
Since commit fe115628d5 ("drm/i915: Implement pwrite without
struct-mutex") the lowlevel pwrite calls are now called without the
protection of struct_mutex, but pwrite_phys was still asserting that it
held the struct_mutex and later tried to drop and relock it.

Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
(cherry picked from commit 10466d2a59)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-12 10:15:44 +02:00
Chris Wilson
f51455d442 drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.

v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10 20:54:32 +00:00
Chris Wilson
2a20d6f858 drm/i915: Rename i915_gem_engine_cleanup() to engine_set_wedged()
It has been some time since i915_gem_engine_cleanup was only called from
the module unload path, and now it is only called when the GPU is
wedged. Mika complained that the name is confusing, especially in light
of the existence of i915_gem_cleanup_engines().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-5-chris@chris-wilson.co.uk
2017-01-10 20:49:32 +00:00
Chris Wilson
3cd9442f66 drm/i915: Mark all incomplete requests as -EIO when wedged
Similarly to a normal reset, after we mark the GPU as wedged (completely
fubar and no more requests can be executed), set the error status on all
the in flight requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-4-chris@chris-wilson.co.uk
2017-01-10 20:49:31 +00:00
Chris Wilson
3c1b284759 drm/i915: Set an error status for a resubmitted request
Let userspace know if its request was resubmitted due to it being
executed at the time of a global reset. In this case, the reset was for
a guilty request on another engine, and this request was an innocent
victim that will be re-executed upon restarting. However, since it was
running at the time of the reset, we can not guarantee that it suffered
no ill-effects from the reset (e.g. some context state may be lost, or
some self-modifying fragment shaders will be restarted from the final
state not their initial state), to let userspace know that it has been
corrupted set a special value on the fence->error, -EAGAIN.

If the request does hang on resubmission, the error will be overwritten
with -EIO.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-3-chris@chris-wilson.co.uk
2017-01-10 20:49:30 +00:00
Chris Wilson
c0d5f32c50 drm/i915: Set guilty-flag on fence after detecting a hang
The struct dma_fence carries a status field exposed to userspace by
sync_file. This is inspected after the fence is signaled and can convey
whether or not the request completed successfully, or in our case if we
detected a hang during the request (signaled via -EIO in
SYNC_IOC_FILE_INFO).

v2: Mark all cancelled requests as failed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-2-chris@chris-wilson.co.uk
2017-01-10 20:49:30 +00:00
Chris Wilson
2edc6e0df1 drm/i915: Consolidate reset_request()
Always reset the requests of the guilty context, including the hung
request that we tell the hardware to skip. This should help if the
reprogram fails entirely, but more importantly makes the guilty path
more uniform (and simplifies the subsequent patch to tweak the cancelled
requests).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-1-chris@chris-wilson.co.uk
2017-01-10 20:49:29 +00:00
Daniel Vetter
05adb1850a Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Pull in latest drm-next from Dave Airlie to get at all the drm-misc
goodies, specifically:
- dma_fence error state handling rework (Chris needs that for error
  recovery)
- crc support locking changes (Tomeu's i915 crc patches need that).

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-01-10 17:03:46 +01:00
Chris Wilson
8201c1fad4 drm/i915: Clip the partial view against the object not vma
The VMA is later clipped against the vm_area_struct before insertion of
the faulting PTE so we are free to create the partial view as we desire.
If we use the object as the extents rather than the area, this partial
can then be used for other areas.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110095633.6612-2-chris@chris-wilson.co.uk
2017-01-10 11:59:24 +00:00
Chris Wilson
2d4281bb93 drm/i915: Extract compute_partial_view()
In order to reuse the partial view for selftesting, extract the common
function for computing the view.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110095633.6612-1-chris@chris-wilson.co.uk
2017-01-10 11:59:24 +00:00
Chris Wilson
91d4e0aa92 drm/i915: Move ggtt fence/alignment to i915_gem_tiling.c
Rename i915_gem_get_ggtt_size() and i915_gem_get_ggtt_alignment() to
i915_gem_fence_size() and i915_gem_fence_alignment() respectively to
better match usage. Similarly move the pair of functions into
i915_gem_tiling.c next to the fence restrictions.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-6-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10 08:12:22 +00:00
Chris Wilson
944397f04f drm/i915: Store required fence size/alignment for GGTT vma
The fence size/alignment is a combination of the vma size plus object
tiling parameters. Those parameters are rarely changed, making the fence
size/alignemnt roughly constant for the lifetime of the VMA. We can
simplify subsequent calculations by precalculating the size/alignment
required for GGTT vma taking fencing into account (with an update if we
do change the tiling or stride).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-4-chris@chris-wilson.co.uk
2017-01-10 08:12:21 +00:00
Chris Wilson
5b30694b47 drm/i915: Align GGTT sizes to a fence tile row
Ensure the view occupies the full tile row so that reads/writes into the
VMA do not escape (via fenced detiling) into neighbouring objects - we
will pad the object with scratch pages to satisfy the fence. This
applies the lazy-tiling we employed on gen2/3 to gen4+.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-2-chris@chris-wilson.co.uk
2017-01-10 08:12:20 +00:00
Chris Wilson
6649a0b650 drm/i915: Extract tile_row_size for fencing
Computing the tile row size of a tiled object (for use with fence
registers) is repeated, so extract it to a common helper.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10 08:12:09 +00:00
Dave Airlie
5c37daf5dd Merge tag 'drm-intel-next-2017-01-09' of git://anongit.freedesktop.org/git/drm-intel into drm-next
More 4.11 stuff, holidays edition (i.e. not much):

- docs and cleanups for shared dpll code (Ander)
- some kerneldoc work (Chris)
- fbc by default on gen9+ too, yeah! (Paulo)
- fixes, polish and other small things all over gem code (Chris)
- and a few small things on top

Plus a backmerge, because Dave was enjoying time off too.

* tag 'drm-intel-next-2017-01-09' of git://anongit.freedesktop.org/git/drm-intel: (275 commits)
  drm/i915: Update DRIVER_DATE to 20170109
  drm/i915: Drain freed objects for mmap space exhaustion
  drm/i915: Purge loose pages if we run out of DMA remap space
  drm/i915: Fix phys pwrite for struct_mutex-less operation
  drm/i915: Simplify testing for am-I-the-kernel-context?
  drm/i915: Use range_overflows()
  drm/i915: Use fixed-sized types for stolen
  drm/i915: Use phys_addr_t for the address of stolen memory
  drm/i915: Consolidate checks for memcpy-from-wc support
  drm/i915: Only skip requests once a context is banned
  drm/i915: Move a few more utility macros to i915_utils.h
  drm/i915: Clear ret before unbinding in i915_gem_evict_something()
  drm/i915/guc: Exclude the upper end of the Global GTT for the GuC
  drm/i915: Move a few utility macros into a separate header
  drm/i915/execlists: Reorder execlists register enabling
  drm/i915: Assert that we do create the deferred context
  drm/i915: Assert all timeline requests are gone before fini
  drm/i915: Revoke fenced GTT mmapings across GPU reset
  drm/i915: enable FBC on gen9+ too
  drm/i915: actually drive the BDW reserved IDs
  ...
2017-01-10 08:02:09 +10:00
Linus Torvalds
2fd8774c79 Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
 "This has one fix to make i915 work when using Xen SWIOTLB, and a
  feature from Geert to aid in debugging of devices that can't do DMA
  outside the 32-bit address space.

  The feature from Geert is on top of v4.10 merge window commit
  (specifically you pulling my previous branch), as his changes were
  dependent on the Documentation/ movement patches.

  I figured it would just easier than me trying than to cherry-pick the
  Documentation patches to satisfy git.

  The patches have been soaking since 12/20, albeit I updated the last
  patch due to linux-next catching an compiler error and adding an
  Tested-and-Reported-by tag"

* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: Export swiotlb_max_segment to users
  swiotlb: Add swiotlb=noforce debug option
  swiotlb: Convert swiotlb_force from int to enum
  x86, swiotlb: Simplify pci_swiotlb_detect_override()
2017-01-06 10:53:21 -08:00
Konrad Rzeszutek Wilk
7453c549f5 swiotlb: Export swiotlb_max_segment to users
So they can figure out what is the optimal number of pages
that can be contingously stitched together without fear of
bounce buffer.

We also expose an mechanism for sub-users of SWIOTLB API, such
as Xen-SWIOTLB to set the max segment value. And lastly
if swiotlb=force is set (which mandates we bounce buffer everything)
we set max_segment so at least we can bounce buffer one 4K page
instead of a giant 512KB one for which we may not have space.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-Tested-by: Juergen Gross <jgross@suse.com>
2017-01-06 13:00:01 -05:00
Chris Wilson
b42a13d918 drm/i915: Drain freed objects for mmap space exhaustion
As we now use a deferred free queue for objects, simply retiring the
active objects is not enough to immediately free them and recover their
mmap space - we must now also drain the freed object list.

Fixes: fbbd37b36f ("drm/i915: Move object release to a freelist + worker"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-06 16:54:47 +00:00
Chris Wilson
10466d2a59 drm/i915: Fix phys pwrite for struct_mutex-less operation
Since commit fe115628d5 ("drm/i915: Implement pwrite without
struct-mutex") the lowlevel pwrite calls are now called without the
protection of struct_mutex, but pwrite_phys was still asserting that it
held the struct_mutex and later tried to drop and relock it.

Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-06 16:53:44 +00:00
Chris Wilson
984ff29f74 drm/i915: Simplify testing for am-I-the-kernel-context?
The kernel context (dev_priv->kernel_context) is unique in that it is
not associated with any user filp - it is the only one with
ctx->file_priv == NULL. This is a simpler test than comparing it against
dev_priv->kernel_context which involves some pointer dancing.

In checking that this is true, we notice that the gvt context is
allocating itself a i915_hw_ppgtt it doesn't use and not flagging that
its file_priv should be invalid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk
2017-01-06 16:02:12 +00:00
Chris Wilson
7ec73b7e36 drm/i915: Only skip requests once a context is banned
If we skip before banning, we have an inconsistent interface between
execbuf still queueing valid request but those requests already queued
being cancelled. If we only cancel the pending requests once we stop
accepting new requests, the user interface is more consistent.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org> # v4.9+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105170059.344-1-chris@chris-wilson.co.uk
2017-01-05 21:06:20 +00:00
Chris Wilson
40b326eefe drm/i915: Move a few utility macros into a separate header
In order to defeat some circular dependencies between headers to allow use
of e.g. range_overflows() in a header, move the simple independent macros
into their own header.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-4-chris@chris-wilson.co.uk
2017-01-05 15:34:45 +00:00
Chris Wilson
b1ed35d917 drm/i915: Revoke fenced GTT mmapings across GPU reset
The fence registers are clobbered by a GPU reset. If there is concurrent
user access to a fenced region via a GTT mmaping, the access will not be
fenced during the reset (until we restore the fences afterwards). In order
to prevent invalid access during the reset, before we clobber the fences
first we must invalidate the GTT mmapings. Access to the mmap will then
be forced to fault in the page, and in handling the fault, i915_gem_fault()
will take the struct_mutex and wait upon the reset to complete.

v2: Fix up commentary.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99274
Testcase: igt/gem_mmap_gtt/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170104145110.1486-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-04 18:44:30 +00:00
Daniel Vetter
a402eae64d Linux 4.10-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYaYNlAAoJEHm+PkMAQRiGtCUH/18PMUJpHqRKjxL3Yscw+QZC
 RmGlD/hwBRLUSgiTCfURNGKP4QZv2kQW7BGsGC72oL01lmxozsU72ixUIO+wXzDY
 K2b0OOKGZZWzFtaVm7Qs+5JhHAEKZcT046mLD8sjJuqkrFAhmNLKdwHjihKBEkm9
 J3s2tpdXdN0x/Uyga/GY9khEYIrvLPeBoKSz+JXcQKdC0iq3/+PMpWnN47QCNScr
 7azojkJkj/rs2cqVdOi7Wbh6PSqIvPsl8E3qJefpaVJF/IQaU1pFdy5g8kYm4V7T
 fr6HgIbuN4EQWdN/5cgKrUdpQyV7D8iYx02klk4R8WgfS0QMYoUcsg+XsTd02TI=
 =OhGe
 -----END PGP SIGNATURE-----

Merge tag 'v4.10-rc2' into drm-intel-next-queued

Backmerge Linux 4.10-rc2 to resync with our -fixes cherry-picks. I've
done the backmerge directly because Dave is on vacation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-01-04 11:35:18 +01:00
Chris Wilson
2471eb5fb6 drm/i915: Prevent timeline updates whilst performing reset
As the fence may be signaled concurrently from an interrupt on another
device, it is possible for the list of requests on the timeline to be
modified as we walk it. Take both (the context's timeline and the global
timeline) locks to prevent such modifications.

Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk
(cherry picked from commit 00c25e3f40)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-03 11:41:57 +02:00
Chris Wilson
64d1461ce0 drm/i915: Silence allocation failure during sg_trim()
As trimming the sg table is merely an optimisation that gracefully fails
if we cannot allocate a new table, we do not need to report the failure
either.

Fixes: 0c40ce130e ("drm/i915: Trim the object sg table")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk
(cherry picked from commit 8bfc478fa4)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-03 11:41:48 +02:00
Chris Wilson
c3f923b554 drm/i915: Don't clflush before release phys object
When we teardown the backing storage for the phys object, we copy from
the coherent contiguous block back to the shmemfs object, clflushing as
we go. Trying to clflush the invalid sg beforehand just oops and would
be redundant (due to it already being coherent, and clflushed
afterwards).

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk
(cherry picked from commit e5facdf964)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-03 11:41:38 +02:00
Chris Wilson
6095868a27 drm/i915: Complete kerneldoc for struct i915_gem_context
The existing kerneldoc was outdated, so time for a refresh.

v2: Use single line kdoc, mention functions for manipulation

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161231112012.29263-3-chris@chris-wilson.co.uk
2016-12-31 11:41:47 +00:00
Chris Wilson
00c25e3f40 drm/i915: Prevent timeline updates whilst performing reset
As the fence may be signaled concurrently from an interrupt on another
device, it is possible for the list of requests on the timeline to be
modified as we walk it. Take both (the context's timeline and the global
timeline) locks to prevent such modifications.

Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk
2016-12-23 16:08:10 +00:00
Chris Wilson
8bfc478fa4 drm/i915: Silence allocation failure during sg_trim()
As trimming the sg table is merely an optimisation that gracefully fails
if we cannot allocate a new table, we do not need to report the failure
either.

Fixes: 0c40ce130e ("drm/i915: Trim the object sg table")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk
2016-12-23 16:07:32 +00:00
Chris Wilson
e5facdf964 drm/i915: Don't clflush before release phys object
When we teardown the backing storage for the phys object, we copy from
the coherent contiguous block back to the shmemfs object, clflushing as
we go. Trying to clflush the invalid sg beforehand just oops and would
be redundant (due to it already being coherent, and clflushed
afterwards).

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk
2016-12-23 16:07:20 +00:00
Chris Wilson
bdeb978506 drm/i915: Repeat flush of idle work during suspend
The idle work handler is self-arming - if it detects that it needs to
run again it will queue itself from its work handler. Take greater care
when trying to drain the idle work, and double check that it is flushed.

The free worker has a similar issue where it is armed by an RCU task
which may be running concurrently with us.

This should hopefully help with the sporadic WARN_ON(dev_priv->gt.awake)
from i915_gem_suspend.

v2: Reuse drain_freed_objects.
v3: Don't try to flush the freed objects from the shrinker, as it may be
underneath the struct_mutex already.
v4: do while and comment upon the excess rcu_barrier in drain_freed_objects

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-2-chris@chris-wilson.co.uk
2016-12-23 16:07:11 +00:00
Chris Wilson
28f412e02b drm/i915: Break after walking all GGTT vma in bump_inactive_ggtt
Since commit db6c2b4151 ("drm/i915: Store the vma in an rbtree under
the object") the vma are once again sorted into GGTT first, then ppGTT
so that the typical case of walking the GGTT vma can stop as soon as we
find a non-ppGTT. Apply that optimisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-1-chris@chris-wilson.co.uk
2016-12-23 16:06:56 +00:00
Dave Airlie
4a401ceeef Merge tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
First set of i915 fixes for code in next.

* tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: skip the first 4k of stolen memory on everything >= gen8
  drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
  drm/i915: Fix use after free in logical_render_ring_init
  drm/i915: disable PSR by default on HSW/BDW
  drm/i915: Fix setting of boost freq tunable
  drm/i915: tune down the fast link training vs boot fail
  drm/i915: Reorder phys backing storage release
  drm/i915/gen9: Fix PCODE polling during SAGV disabling
  drm/i915/gen9: Fix PCODE polling during CDCLK change notification
  drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting
  drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET
  drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
  drm/i915: drop the struct_mutex when wedged or trying to reset
2016-12-23 05:28:02 +10:00
Chris Wilson
abb0deacb5 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:

	 i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)

Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d766ef5300)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-12-20 16:30:47 +02:00
Chris Wilson
057f803ff1 drm/i915: Reorder phys backing storage release
In commit a4f5ea64f0 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.

v2: Add commentary about always aligning to page size.

Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk
(cherry picked from commit dbb4351bab)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-12-20 16:29:26 +02:00
Chris Wilson
c2dc6cc946 drm/i915: Add a test that we terminate the trimmed sgtable as expected
In commit 0c40ce130e ("drm/i915: Trim the object sg table"), we expect
to copy exactly orig_st->nents across and allocate the table thusly.
The copy loop should therefore end with the new_sg being NULL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-2-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-12-20 12:31:52 +00:00
Chris Wilson
d766ef5300 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:

	 i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)

Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-12-20 12:30:56 +00:00
Chris Wilson
e8a9c58fcd drm/i915: Unify active context tracking between legacy/execlists/guc
The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.

We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.

We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.

And finally an ulterior motive for unifying context handling was to
prepare for mock requests.

v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
2016-12-18 16:18:50 +00:00
Matthew Auld
966d5bf5eb drm/i915: convert to using range_overflows
Convert some of the obvious hand-rolled ranged overflow sanity checks to
our shiny new range_overflows macro.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-4-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-16 21:22:12 +00:00
Jan Kara
1a29d85eb0 mm: use vmf->address instead of of vmf->virtual_address
Every single user of vmf->virtual_address typed that entry to unsigned
long before doing anything with it so the type of virtual_address does
not really provide us any additional safety.  Just use masked
vmf->address which already has the appropriate type.

Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:09 -08:00
Chris Wilson
dbb4351bab drm/i915: Reorder phys backing storage release
In commit a4f5ea64f0 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.

v2: Add commentary about always aligning to page size.

Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk
2016-12-12 12:24:36 +00:00
Jani Nikula
73f67aa8cc drm/i915: distinguish G33 and Pineview from each other
Pineview deserves to use its own platform enum (which was already added,
unused, previously). IS_G33() no longer matches Pineview, and gets
replaced by IS_G33() || IS_PINEVIEW() or equivalent. Pineview is no
longer an outlier among platform definitions.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481143689-19672-1-git-send-email-jani.nikula@intel.com
2016-12-07 23:28:33 +02:00
Jani Nikula
c0f86832e3 drm/i915: rename BROADWATER and CRESTLINE to I965G and I965GM, respectively
Add more consistency to our naming. Pineview remains the outlier. Keep
using code names for gen5+.

v2: rebased

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481105584-23033-1-git-send-email-jani.nikula@intel.com
2016-12-07 15:18:33 +02:00
Chris Wilson
85fd4f58d7 drm/i915: Mark all non-vma being inserted into the address spaces
We need to distinguish between full i915_vma structs and simple
drm_mm_nodes when considering eviction (i.e. we must be careful not to
treat a mere drm_mm_node as a much larger i915_vma causing memory
corruption, if we are lucky). To do this, color these not-a-vma with -1
(I915_COLOR_UNEVICTABLE).

v2...v200: New name for -1.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk
2016-12-05 20:49:17 +00:00
Chris Wilson
ce1135c7de drm/i915: Complete requests in nop_submit_request
Since the submit/execute split in commit d55ac5bf97 ("drm/i915: Defer
transfer onto execution timeline to actual hw submission") the
global seqno advance was deferred until the submit_request callback.
After wedging the GPU, we were installing a nop_submit_request handler
(to avoid waking up the dead hw) but I had missed converting this over
to the new scheme. Under the new scheme, we have to explicitly call
i915_gem_submit_request() from the submit_request handler to mark the
request as on the hardware. If we don't the request is always pending,
and any waiter will continue to wait indefinitely and hangcheck will not
be able to resolve the lockup.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
Testcase: igt/gem_eio/in-flight
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk
(cherry picked from commit 3dcf93f7f2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-12-05 10:38:01 +02:00
Tvrtko Ursulin
cb15d9f8c3 drm/i915: More GEM init dev_priv cleanup
Simplifies the code to pass the right parameter in.

v2: Commit message. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-12-01 18:01:16 +00:00
Tvrtko Ursulin
bf9e8429ab drm/i915: Make various init functions take dev_priv
Like GEM init, GUC init, MOCS init and context creation.

Enables them to lose dev_priv locals.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-12-01 18:01:15 +00:00
Tvrtko Ursulin
12d79d7828 drm/i915: Make GEM object create and create from data take dev_priv
Makes all GEM object constructors consistent.

v2: Fix compilation in GVT code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
2016-12-01 18:01:08 +00:00
Tvrtko Ursulin
187685cb90 drm/i915: Make GEM object alloc/free and stolen created take dev_priv
Where it is more appropriate and also to be consistent with
the direction of the driver.

v2: Leave out object alloc/free inlining. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-12-01 18:00:15 +00:00
Chris Wilson
49d73912cb drm/i915: Convert vm->dev backpointer to vm->i915
99% of the time we access i915_address_space->dev we want the i915
device and not the drm device, so let's store the drm_i915_private
backpointer instead. The only real complication here are the inlines
in i915_vma.h where drm_i915_private is not yet defined and so we have
to choose an alternate path for our asserts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-29 11:38:00 +00:00
Chris Wilson
20e4933c47 drm/i915: Stop the machine as we install the wedged submit_request handler
In order to prevent a race between the old callback submitting an
incomplete request and i915_gem_set_wedged() installing its nop handler,
we must ensure that the swap occurs when the machine is idle
(stop_machine).

v2: move context lost from out of BKL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-4-chris@chris-wilson.co.uk
2016-11-22 17:42:19 +00:00