2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

2220 Commits

Author SHA1 Message Date
Ville Syrjälä
c23aa71bcf drm/i915: Add __rcu to radix tree slot pointer
radix_tree_for_each_slot() wants an __rcu annotated pointer for the
slot. So let's add the annotation.

Fixes the following sparse warnings:
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9:    expected void **slot
i915_gem.c:2217:9:    got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9:    expected void **slot
i915_gem.c:2217:9:    got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:2217:9:    expected void [noderef] <asn:4>**slot
i915_gem.c:2217:9:    got void **slot
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9:    expected void **slot
i915_gem.c:2217:9:    got void [noderef] <asn:4>**

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 96d7763452 ("drm/i915: Use a radixtree for random access to the object's backing storage")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-04 19:31:40 +03:00
Chris Wilson
fa3722f649 drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
By using drm_gem_flink/drm_gem_open on an object using the same fd, it
is possible for a client to create multiple handles pointing to the same
object (tied to the same contexts and VMA), as exemplified by
igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible
since forever, we cannot assume that the handle:(fpriv, object) is
unique and so must handle the multiple users of a single VMA.

v2: Added commentary noise.

Testcase: igt/gem_close
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355
Fixes: d1b48c1e71 ("drm/i915: Replace execbuf vma ht with an idr")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry-picked from commit 3ffff01749)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-08-30 12:17:22 -07:00
Chris Wilson
0168bdfc9c drm/i915: Always wake the device to flush the GTT
Since we hold the device wakeref when writing through the GTT (otherwise
the writes would fail), we presumed that before the device sleeps those
writes would naturally be flushed and that we wouldn't need our mmio
read trick. However, that presumption seems false and a sleepy bxt seems
to require us to always manually flush the GTT writes prior to direct
access.

Fixes: e2a2aa36a5 ("drm/i915: Check we have an wake device before flushing GTT writes")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170829192546.1087-1-chris@chris-wilson.co.uk
(cherry picked from commit b69a784f5e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-08-30 12:08:14 -07:00
Chris Wilson
3b24e7e810 drm/i915: Recreate vmapping even when the object is pinned
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.

This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).

We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.

v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.

Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit a575c67617)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-08-30 12:08:11 -07:00
Chris Wilson
b69a784f5e drm/i915: Always wake the device to flush the GTT
Since we hold the device wakeref when writing through the GTT (otherwise
the writes would fail), we presumed that before the device sleeps those
writes would naturally be flushed and that we wouldn't need our mmio
read trick. However, that presumption seems false and a sleepy bxt seems
to require us to always manually flush the GTT writes prior to direct
access.

Fixes: e2a2aa36a5 ("drm/i915: Check we have an wake device before flushing GTT writes")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170829192546.1087-1-chris@chris-wilson.co.uk
2017-08-30 09:10:35 +01:00
Chris Wilson
fc692bd31b drm/i915: Discard the request queue if we fail to sleep before suspend
If we fail to clear the outstanding request queue before suspending,
mark those requests as lost.

References: https://bugs.freedesktop.org/show_bug.cgi?id=102037
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-08-29 16:56:39 +01:00
Chris Wilson
f36325f378 drm/i915: Clear wedged status upon resume
When we wake up from suspend, the device has been powered down and
should come back afresh. We should be able to safely remove the wedged
status from the previous session and start afresh.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-08-29 16:56:38 +01:00
Chris Wilson
cad9946c2a drm/i915: Always sanity check engine state upon idling
When we do a locked idle we know that afterwards all requests have been
completed and the engines have been cleared of tasks. For whatever
reason, this doesn't always happen and we may go into a suspend with
ELSP still full, and this causes an issue upon resume as we get very,
very confused.

If the engines refuse to idle, mark the device as wedged. In the process
we get rid of the maybe unused open-coded version of wait_for_engines
reported by Nick Desaulniers and Matthias Kaehlcke.

v2: Suppress the -EIO before suspend, but keep it for seqno wrap.

References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-08-29 16:56:04 +01:00
Chris Wilson
a575c67617 drm/i915: Recreate vmapping even when the object is pinned
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.

This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).

We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.

v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.

Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-29 10:39:08 +01:00
Chris Wilson
3ffff01749 drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
By using drm_gem_flink/drm_gem_open on an object using the same fd, it
is possible for a client to create multiple handles pointing to the same
object (tied to the same contexts and VMA), as exemplified by
igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible
since forever, we cannot assume that the handle:(fpriv, object) is
unique and so must handle the multiple users of a single VMA.

v2: Added commentary noise.

Testcase: igt/gem_close
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355
Fixes: d1b48c1e71 ("drm/i915: Replace execbuf vma ht with an idr")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-08-24 15:28:05 +01:00
Chris Wilson
67b4804025 drm/i915: Assert that the handle->vma lut is empty on object close
Make sure that we are not leaking an entry in the ctx->handles_lut by
asserting that the object was removed prior to being freed. This should
be enforced by all such handles being removed by i915_gem_close_object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-08-24 15:22:46 +01:00
Chris Wilson
432295d7b9 drm/i915: Assert the context is not closed on object-close
During the context-close, we should be decoupling all the vma from the
object so that upon object-closing we shouldn't see any vma from the
already closed contexts. So include a check upon closing the object that
the context is still open.

v2: Eek, the fpriv check is required for shared objects. Double eek, BAT
passed?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-08-24 15:22:21 +01:00
Chris Wilson
d1b48c1e71 drm/i915: Replace execbuf vma ht with an idr
This was the competing idea long ago, but it was only with the rewrite
of the idr as an radixtree and using the radixtree directly ourselves,
along with the realisation that we can store the vma directly in the
radixtree and only need a list for the reverse mapping, that made the
patch performant enough to displace using a hashtable. Though the vma ht
is fast and doesn't require any extra allocation (as we can embed the node
inside the vma), it does require a thread for resizing and serialization
and will have the occasional slow lookup. That is hairy enough to
investigate alternatives and favour them if equivalent in peak performance.
One advantage of allocating an indirection entry is that we can support a
single shared bo between many clients, something that was done on a
first-come first-serve basis for shared GGTT vma previously. To offset
the extra allocations, we create yet another kmem_cache for them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
2017-08-18 11:59:02 +01:00
Chris Wilson
b805014897 drm/i915: Handle full s64 precision for wait-ioctl
The wait-ioctl is optionally supplied a timeout with nanosecond
precision in a s64 field. We use nsecs_to_jiffies64() to convert that
into the jiffies consumed by the scheduler, but internally
nsecs_to_jiffies64() does not guard against overflow (as it's purpose is
for use by the scheduler and not drivers!). So we must guard against the
overflow ourselves, and in the process note that we may then return
much earlier than the timeout selected by the user, so don't report
ETIME unless we do hit the timeout. (Woe betold us though if the user
waits for a year (32bit) and the request is still not complete!)

v2: Refine overflow detection (to not include an overffow itself)

Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811105731.9482-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-15 16:25:25 +01:00
Chris Wilson
b8f55be644 drm/i915: Split obj->cache_coherent to track r/w
Another month, another story in the cache coherency saga. This time, we
come to the realisation that i915_gem_object_is_coherent() has been
reporting whether we can read from the target without requiring a cache
invalidate; but we were using it in places for testing whether we could
write into the object without requiring a cache flush. So split the
tracking into two, one to decide before reads, one after writes.

See commit e27ab73d17 ("drm/i915: Mark CPU cache as dirty on every
transition for CPU writes") for the previous entry in this saga.

v2: Be verbose
v3: Remove unused function (i915_gem_object_is_coherent)
v4: Fix inverted coherency check prior to execbuf (from v2)
v5: Add comment for nasty code where we are optimising on gcc's behalf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555
Testcase: igt/kms_mmap_write_crc
Testcase: igt/kms_pwrite_crc
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-15 15:46:57 +01:00
Chris Wilson
8fb6a5df46 drm/i915: Call the unlocked version of i915_gem_object_get_pages()
When we hold for the lock for swapping out the shmem pages for the
physically contiguous pages, we have to call the unlocked version of
get_pages!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101934
Fixes: 35d23516946e ("drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-2-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:56:11 +02:00
Chris Wilson
8eeb79060b drm/i915: Move i915_gem_object_phys_attach()
Prevent a forward declaration in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:56:11 +02:00
Chris Wilson
6ea1d55d31 drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately
Actually transferring from shmemfs to the physically contiguous set of
pages should be wholly guarded by its obj->mm.lock!

v2: Remember to free the old pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726160038.29487-2-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:56:10 +02:00
Chris Wilson
d1d1ebf412 drm/i915: Don't touch fence->error when resetting an innocent request
If the request has been completed before the reset took effect, we don't
need to mark it up as being a victim. Touching fence->error after the
fence has been signaled is detected by dma_fence_set_error() and
triggers a BUG:

[  231.743133] kernel BUG at ./include/linux/dma-fence.h:434!
[  231.743156] invalid opcode: 0000 [#1] SMP KASAN
[  231.743172] Modules linked in: i915 drm_kms_helper drm iptable_nat nf_nat_ipv4 nf_nat x86_pkg_temp_thermal iosf_mbi i2c_algo_bit cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev [last unloaded: drm]
[  231.743221] CPU: 2 PID: 20 Comm: kworker/2:0 Tainted: G     U          4.13.0-rc1+ #52
[  231.743236] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
[  231.743363] Workqueue: events_long i915_hangcheck_elapsed [i915]
[  231.743382] task: ffff8801f42e9780 task.stack: ffff8801f42f8000
[  231.743489] RIP: 0010:i915_gem_reset_engine+0x45a/0x460 [i915]
[  231.743505] RSP: 0018:ffff8801f42ff770 EFLAGS: 00010202
[  231.743521] RAX: 0000000000000007 RBX: ffff8801bf6b1880 RCX: ffffffffa02881a6
[  231.743537] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff8801bf6b18c8
[  231.743551] RBP: ffff8801f42ff7c8 R08: 0000000000000001 R09: 0000000000000000
[  231.743566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801edb02d00
[  231.743581] R13: ffff8801e19d4200 R14: 000000000000001d R15: ffff8801ce2a4000
[  231.743599] FS:  0000000000000000(0000) GS:ffff8801f5a80000(0000) knlGS:0000000000000000
[  231.743614] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.743629] CR2: 00007f0ebd1add10 CR3: 0000000002621000 CR4: 00000000000406e0
[  231.743643] Call Trace:
[  231.743752]  i915_gem_reset+0x6c/0x150 [i915]
[  231.743853]  i915_reset+0x175/0x210 [i915]
[  231.743958]  i915_reset_device+0x33b/0x350 [i915]
[  231.744061]  ? valleyview_pipestat_irq_handler+0xe0/0xe0 [i915]
[  231.744081]  ? trace_hardirqs_off_caller+0x70/0x110
[  231.744102]  ? _raw_spin_unlock_irqrestore+0x46/0x50
[  231.744120]  ? find_held_lock+0x119/0x150
[  231.744138]  ? mark_lock+0x6d/0x850
[  231.744241]  ? gen8_gt_irq_ack+0x1f0/0x1f0 [i915]
[  231.744262]  ? work_on_cpu_safe+0x60/0x60
[  231.744284]  ? rcu_read_lock_sched_held+0x57/0xa0
[  231.744400]  ? gen6_read32+0x2ba/0x320 [i915]
[  231.744506]  i915_handle_error+0x382/0x5f0 [i915]
[  231.744611]  ? gen6_rps_reset_ei+0x20/0x20 [i915]
[  231.744630]  ? vsnprintf+0x128/0x8e0
[  231.744649]  ? pointer+0x6b0/0x6b0
[  231.744667]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[  231.744688]  ? scnprintf+0x92/0xe0
[  231.744706]  ? snprintf+0xb0/0xb0
[  231.744820]  hangcheck_declare_hang+0x15a/0x1a0 [i915]
[  231.744932]  ? engine_stuck+0x440/0x440 [i915]
[  231.744951]  ? rcu_read_lock_sched_held+0x57/0xa0
[  231.745062]  ? gen6_read32+0x2ba/0x320 [i915]
[  231.745173]  ? gen6_read16+0x320/0x320 [i915]
[  231.745284]  ? intel_engine_get_active_head+0x91/0x170 [i915]
[  231.745401]  i915_hangcheck_elapsed+0x3d8/0x400 [i915]
[  231.745424]  process_one_work+0x3e8/0xac0
[  231.745444]  ? pwq_dec_nr_in_flight+0x110/0x110
[  231.745464]  ? do_raw_spin_lock+0x8e/0x120
[  231.745484]  worker_thread+0x8d/0x720
[  231.745506]  kthread+0x19e/0x1f0
[  231.745524]  ? process_one_work+0xac0/0xac0
[  231.745541]  ? kthread_create_on_node+0xa0/0xa0
[  231.745560]  ret_from_fork+0x27/0x40
[  231.745581] Code: 8b 7d c8 e8 49 0d 02 e1 49 8b 7f 38 48 8b 75 b8 48 83 c7 10 e8 b8 89 be e1 e9 95 fc ff ff 4c 89 e7 e8 4b b9 ff ff e9 30 ff ff ff <0f> 0b 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe
[  231.745767] RIP: i915_gem_reset_engine+0x45a/0x460 [i915] RSP: ffff8801f42ff770

At first glance this looks to be related to commit c64992e035
("drm/i915: Look for active requests earlier in the reset path"), but it
could easily happen before as well. On the other hand, we no longer
logged victims due to the active_request being dropped earlier.

v2: Be trickier to unwind the incomplete request as we cannot rely on
request retirement for the lockless per-engine reset.
v3: Reprobe the active request at the time of the reset.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: c64992e035 ("drm/i915: Look for active requests earlier in the reset path")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-15-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:57 +02:00
Chris Wilson
77b25a972b drm/i915: Make i915_gem_context_mark_guilty() safe for unlocked updates
Since we make call i915_gem_context_mark_guilty() concurrently when
resetting different engines in parallel, we need to make sure that our
updates are safe for the unlocked access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-12-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:47 +02:00
Chris Wilson
ed454f2cd6 drm/i915: Clear engine irq posted following a reset
When the GPU is reset, we want to discard all pending notifications as
either we have manually completed them, or they are no longer
applicable. Make sure we do reset the engine->irq_posted prior to
re-enabling the engine (e.g. the interrupt tasklets) in
i915_gem_reset_finish_engine().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-11-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:47 +02:00
Chris Wilson
bf2eac3bee drm/i915: Assert that machine is wedged for nop_submit_request
We should only ever do nop_submit_request when the machine is wedged, so
assert it is so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-10-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:46 +02:00
Chris Wilson
3d7adbbf49 drm/i915: Wake up waiters after setting the WEDGED bit
After setting the WEDGED bit, make sure that we do wake up waiters as
they may not be waiting for a request completion yet, just for its
execution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-9-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:46 +02:00
Chris Wilson
5e32d7482e drm/i915: Clear execlist port[] before updating seqno on wedging
When we wedge the device, we clear out the in-flight requests and
advance the breadcrumb to indicate they are complete. However, the
breadcrumb advance includes an assert that the engine is idle, so that
advancement needs to be the last step to ensure we pass our own sanity
checks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-7-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:45 +02:00
Daniel Vetter
64282ea2d2 Merge airlied/drm-next into drm-intel-next-queued
Resync with upstream to avoid git getting too badly confused. Also, we
have a conflict with the drm_vblank_cleanup removal, which cannot be
resolved by simply taking our side. Bake that in properly.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:33:49 +02:00
Daniel Vetter
8b5d27b911 drm/i915: Remove intel_flip_work infrastructure
This gets rid of all the interactions between the legacy flip code and
the modeset code. Yay!

This highlights an ommission in the atomic paths, where we fail to
apply a boost to the pending rendering when we miss the target vblank.
But the existing code is still dead and can be removed.

v2: Note that the boosting doesn't work in atomic (Chris).

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170720175754.30751-7-daniel.vetter@ffwll.ch
2017-07-20 22:45:36 +02:00
Dave Airlie
2d62c799f8 Merge tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel into drm-next
2nd round of 4.14 features:

- prep for deferred fbdev setup
- refactor fixed 16.16 computations and skl+ wm code (Mahesh Kumar)
- more cnl paches (Rodrigo, Imre et al)
- tighten context cleanup and handling (Chris Wilson)
- fix interlaced handling on skl+ (Mahesh Kumar)
- small bits as usual

* tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel: (84 commits)
  drm/i915: Update DRIVER_DATE to 20170717
  drm/i915: Protect against deferred fbdev setup
  drm/i915/fbdev: Always forward hotplug events
  drm/i915/skl+: unify cpp value in WM calculation
  drm/i915/skl+: WM calculation don't require height
  drm/i915: Addition wrapper for fixed16.16 operation
  drm/i915: cleanup fixed-point wrappers naming
  drm/i915: Always perform internal fixed16 division in 64 bits
  drm/i915: take-out common clamping code of fixed16 wrappers
  drm/i915/cnl: Add missing type case.
  drm/i915/cnl: Add max allowed Cannonlake DC.
  drm/i915: Make DP-MST connector info work
  drm/i915/cnl: Get DDI clock based on PLLs.
  drm/i915/cnl: Inherit RPS stuff from previous platforms.
  drm/i915/cnl: Gen10 render context size.
  drm/i915/cnl: Don't trust VBT's alternate pin for port D for now.
  drm/i915: Fix the kernel panic when using aliasing ppgtt
  drm/i915/cnl: Cannonlake color init.
  drm/i915/cnl: Add force wake for gen10+.
  x86/gpu: CNL uses the same GMS values as SKL
  ...
2017-07-20 11:31:43 +10:00
Michal Hocko
dbb329561a drm/i915: use __GFP_RETRY_MAYFAIL
Commit 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") has tried to remove disruptive OOM
killer because the userspace should be able to cope with allocation
failures.

At the time only __GFP_NORETRY could achieve that and it turned out that
this would fail the allocations just too easily.  So "drm/i915: Remove
__GFP_NORETRY from our buffer allocator" removed it and hoped for a
better solution.  __GFP_RETRY_MAYFAIL is that solution.  It will keep
retrying the allocation until there is no more progress and we would go
OOM.  Instead we fail the allocation and let the caller to deal with it.

Link: http://lkml.kernel.org/r/20170623085345.11304-6-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:04 -07:00
Chris Wilson
7b92c1bd05 drm/i915: Avoid keeping waitboost active for signaling threads
Once a client has requested a waitboost, we keep that waitboost active
until all clients are no longer waiting. This is because we don't
distinguish which waiter deserves the boost. However, with the advent of
fence signaling, the signaler threads appear as waiters to the RPS
interrupt handler. So instead of using a single boolean to track when to
keep the waitboost active, use a counter of all outstanding waitboosted
requests.

At this point, I have removed all vestiges of the rate limiting on
clients. Whilst this means that compositors should remain more fluid,
it also means that boosts are more prevalent. See commit b29c19b645
("drm/i915: Boost RPS frequency for CPU stalls") for a longer discussion
on the pros and cons of both approaches.

A drawback of this implementation is that it requires constant request
submission to keep the waitboost trimmed (as it is now cancelled when the
request is completed). This will be fine for a busy system, but near
idle the boosts may be kept for longer than desired (effectively tens of
vblanks worstcase) and there is a reliance on rc6 instead.

v2: Remove defunct rps.client_lock

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170628123548.9236-1-chris@chris-wilson.co.uk
2017-06-28 15:23:12 +01:00
Chris Wilson
13f8458f9a drm/i915: Drop flushing of the object free list/worker from i915_gem_suspend
i915_gem_suspend() is called from all of our finalization paths
(suspend, hibernate, unload). i915_gem_drain_freed_objects() adds an
arbitrary delay as it uses an rcu_barrier() to ensure that there are no
more freed objects in flight, and this delay causes a large amount of
variability in suspend timings. For S3 suspend, we do not need to free
pages as doing so does not impact at all upon the system in its
suspended state, unlike S4 hibernation where we do want the hibernation
image to be as small as possible. Therefore we can forgo waiting inside
i915_gem_suspend(), so long as we ensure that we do cleanup before
unload (see i915_gem_load_cleanup()) and prefer to reap our objects
prior to hibernation (see i915_gem_freeze()).

Removing the rcu_barrier() from i915_gem_suspend() improves S3 latency
by about 30ms on Skylake (ymmv).

Reported-by: David Weinehall <david.weinehall@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170627173731.11566-1-chris@chris-wilson.co.uk
Tested-by: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
2017-06-28 12:15:20 +01:00
Chris Wilson
36703e79a9 drm/i915: Break modeset deadlocks on reset
Trying to do a modeset from within a reset is fraught with danger. We
can fall into a cyclic deadlock where the modeset is waiting on a
previous modeset that is waiting on a request, and since the GPU hung
that request completion is waiting on the reset. As modesetting doesn't
allow its locks to be broken and restarted, or for its *own* reset
mechanism to take over the display, we have to do something very
evil instead. If we detect that we are stuck waiting to prepare the
display reset (by using a very simple timeout), resort to cancelling all
in-flight requests and throwing the user data into /dev/null, which is
marginally better than the driver locking up and keeping that data to
itself.

This is not a fix; this is just a workaround that unbreaks machines
until we can resolve the deadlock in a way that doesn't lose data!

v2: Move the retirement from set-wegded to the i915_reset() error path,
after which we no longer any delayed worker cleanup for
i915_handle_error()
v3: C abuse for syntactic sugar
v4: Cover all waits with the timeout to catch more driver breakage

References: https://bugs.freedesktop.org/show_bug.cgi?id=99093
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170622105625.16952-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-23 13:41:55 +01:00
Chris Wilson
4ee056f418 drm/i915: Cancel pending execlist tasklet upon wedging
Highly unlikely, but if the stop_machine() did suspend the tasklet, we
want to make sure that when it wakes it finds there is nothing to do.
Otherwise, it will loudly complain that the ELSP port tracking no longer
matches the hardware, and we will be mightly confused.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170621124804.4529-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-21 15:15:33 +01:00
Michel Thierry
a1ef70e144 drm/i915: Add support for per engine reset recovery
This change implements support for per-engine reset as an initial, less
intrusive hang recovery option to be attempted before falling back to the
legacy full GPU reset recovery mode if necessary. This is only supported
from Gen8 onwards.

Hangchecker determines which engines are hung and invokes error handler to
recover from it. Error handler schedules recovery for each of those engines
that are hung. The recovery procedure is as follows,
 - identifies the request that caused the hang and it is dropped
 - force engine to idle: this is done by issuing a reset request
 - reset the engine
 - re-init the engine to resume submissions.

If engine reset fails then we fall back to heavy weight full gpu reset
which resets all engines and reinitiazes complete state of HW and SW.

v2: Rebase.
v3: s/*engine_reset*/*reset_engine*/; freeze engine and irqs before
calling i915_gem_reset_engine (Chris).
v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and
reuse the function for reset_engine.
v5: intel_reset_engine_start/cancel instead of request/unrequest_reset.
v6: Clean up reset_engine function to not require mutex, i.e. no need to call
revoke/restore_fences and _retire_requests (Chris).
v7: Remove leftovers from v5, i.e. no need to disable irq, hold
forcewake or wakeup the handoff bit (Chris).
v8: engine_retire_requests should be (and it was) static; explain that
we have to re-init the engine after reset, which is why the init_hw call
is needed; check reset-in-progress flag (Chris).
v9: Rebase, include code to pass the active request to gem_reset_engine
(as it is already done in full reset). Remove unnecessary
intel_reset_engine_start/cancel, these are executed as part of the
reset.
v10: Rebase, use the right I915_RESET_ENGINE flag.
v11: Fixup to call reset_finish_engine even on error.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-6-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-6-chris@chris-wilson.co.uk
2017-06-20 21:00:16 +01:00
Michel Thierry
c64992e035 drm/i915: Look for active requests earlier in the reset path
And store the active request so that we only search for it once.

v2: Check for request completion inside _prepare_engine, don't use
ECANCELED, remove unnecessary null checks (Chris).

v3: Capture active requests during reset_prepare and store it the
engine hangcheck obj.

v4: Rename commit, change i915_gem_reset_request to just confirm the
active_request is still incomplete, instead of engine_stalled (Chris).

v5: With style; pass the active request to gem_reset_engine, keep single
return in reset_prepare_engine (Chris).

v6: Moved before reset-engine code appears (Chris)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-2-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-3-chris@chris-wilson.co.uk
2017-06-20 21:00:03 +01:00
Chris Wilson
829a0af29f drm/i915: Group all the global context information together
Create a substruct to hold all the global context state under
drm_i915_private.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk
2017-06-20 17:13:40 +01:00
Chris Wilson
7dd4f6729f drm/i915: Async GPU relocation processing
If the user requires patching of their batch or auxiliary buffers, we
currently make the alterations on the cpu. If they are active on the GPU
at the time, we wait under the struct_mutex for them to finish executing
before we rewrite the contents. This happens if shared relocation trees
are used between different contexts with separate address space (and the
buffers then have different addresses in each), the 3D state will need
to be adjusted between execution on each context. However, we don't need
to use the CPU to do the relocation patching, as we could queue commands
to the GPU to perform it and use fences to serialise the operation with
the current activity and future - so the operation on the GPU appears
just as atomic as performing it immediately. Performing the relocation
rewrites on the GPU is not free, in terms of pure throughput, the number
of relocations/s is about halved - but more importantly so is the time
under the struct_mutex.

v2: Break out the request/batch allocation for clearer error flow.
v3: A few asserts to ensure rq ordering is maintained

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16 16:54:05 +01:00
Chris Wilson
8a2421bd0d drm/i915: Wait upon userptr get-user-pages within execbuffer
This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-06-16 16:54:05 +01:00
Chris Wilson
4ff4b44cbb drm/i915: Store a direct lookup from object handle to vma
The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.

In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.

v2: Prettier names, more magic.
v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16 16:54:04 +01:00
Chris Wilson
7fc92e96c3 drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty
For ease of use (i.e. avoiding a few checks and function calls), store
the object's cache coherency next to the cache is dirty bit.

Specifically this patch aims to reduce the frequency of no-op calls to
i915_gem_object_clflush() to counter-act the increase of such calls for
GPU only objects in the previous patch.

v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
!cache_coherent as gcc generates much better code for the latter
(Tvrtko)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-16 14:52:27 +01:00
Chris Wilson
e27ab73d17 drm/i915: Mark CPU cache as dirty on every transition for CPU writes
Currently, we only mark the CPU cache as dirty if we skip a clflush.
This leads to some confusion where we have to ask if the object is in
the write domain or missed a clflush. If we always mark the cache as
dirty, this becomes a much simply question to answer.

The goal remains to do as few clflushes as required and to do them as
late as possible, in the hope of deferring the work to a kthread and not
block the caller (e.g. execbuf, flips).

v2: Always call clflush before GPU execution when the cache_dirty flag
is set. This may cause some extra work on llc systems that migrate dirty
buffers back and forth - but we do try to limit that by only setting
cache_dirty at the end of the gpu sequence.

v3: Always mark the cache as dirty upon a level change, as we need to
invalidate any stale cachelines due to external writes.

Reported-by: Dongwon Kim <dongwon.kim@intel.com>
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-16 14:50:52 +01:00
Chris Wilson
0f6ab55d7a drm/i915: Only restrict noreclaim in the early shrink passes
In our first pass, we do not want to use reclaim at all as we want to
solely reap the i915 buffer caches (its purgeable pages). But we don't
mind it initiates IO or pulls via the FS (but it shouldn't anyway as we
say no to reclaim!). Just drop the GFP_IO constraint for simplicity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-14 10:53:37 +01:00
Chris Wilson
eaf4180155 drm/i915: Remove __GFP_NORETRY from our buffer allocator
I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
struggles with handling reclaim of our dirty buffers and relies on
reclaim via kswapd. As a result, a single pass of direct reclaim is
unreliable when i915 occupies the majority of available memory, and the
only means of effectively waiting on kswapd to amke progress is by not
setting the __GFP_NORETRY flag and lopping. That leaves us with the
dilemma of invoking the oomkiller instead of propagating the allocation
failure back to userspace where it can be handled more gracefully (one
hopes).  In the future we may have __GFP_MAYFAIL to allow repeats up until
we genuinely run out of memory and the oomkiller would have been invoked.
Until then, let the oomkiller wreck havoc.

v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
v3: Update comments that direct reclaim only appears to be ignoring our
dirty buffers!

Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Testcase: igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Hocko <mhocko@suse.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-14 10:53:09 +01:00
Chris Wilson
4846bf0ca8 drm/i915: Encourage our shrinker more when our shmemfs allocations fails
Commit 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") made the bold decision to try and
avoid the oomkiller by reporting -ENOMEM to userspace if our allocation
failed after attempting to free enough buffer objects. In short, it
appears we were giving up too easily (even before we start wondering if
one pass of reclaim is as strong as we would like). Part of the problem
is that if we only shrink just enough pages for our expected allocation,
the likelihood of those pages becoming available to us is less than 100%
To counter-act that we ask for twice the number of pages to be made
available. Furthermore, we allow the shrinker to pull pages from the
active list in later passes.

v2: Be a little more cautious in paging out gfx buffers, and leave that
to a more balanced approach from shrink_slab(). Important when combined
with "drm/i915: Start writeback from the shrinker" as anything shrunk is
immediately swapped out and so should be more conservative.

Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk
2017-06-14 10:51:50 +01:00
Weinan Li
ff8f797557 drm/i915: return the correct usable aperture size under gvt environment
I915_GEM_GET_APERTURE ioctl is used to probe aperture size from userspace.
In gvt environment, each vm only use the ballooned part of aperture, so we
should return the correct available aperture size exclude the reserved part
by balloon.

v2: add 'reserved' in struct i915_address_space to record the reserved size
in ggtt (Chris)

v3: remain aper_size as total, adjust aper_available_size exclude reserved
and pinned. UMD driver need to adjust the max allocation size according to
the available aperture size but not total size. KMD return the correct
usable aperture size any time (Chris, Joonas)

v4: decrease reserved in deballoon (Joonas)

v5: add onion teardown in balloon, add vgt_deballoon_space (Joonas)

v6: change title name (Zhenyu)

v7: code style refine (Joonas)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496198152-14175-1-git-send-email-weinan.z.li@intel.com
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-06-02 14:28:46 +01:00
Chris Wilson
863e9fde1a drm/i915: Short-circuit i915_gem_wait_for_idle() if already idle
If the device is asleep (no GT wakeref), we know the GPU is already idle.
If we add an early return, we can avoid touching registers and checking
hw state outside of the assumed GT wakelock. This prevents causing such
errors whilst debugging:

[ 2613.401647] RPM wakelock ref not held during HW access
[ 2613.401684] ------------[ cut here ]------------
[ 2613.401720] WARNING: CPU: 5 PID: 7739 at drivers/gpu/drm/i915/intel_drv.h:1787 gen6_read32+0x21f/0x2b0 [i915]
[ 2613.401731] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me lpc_ich mei prime_numbers [last unloaded: i915]
[ 2613.401823] CPU: 5 PID: 7739 Comm: drv_missed_irq Tainted: G     U          4.12.0-rc2-CI-CI_DRM_421+ #1
[ 2613.401825] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[ 2613.401840] task: ffff880409e3a740 task.stack: ffffc900084dc000
[ 2613.401861] RIP: 0010:gen6_read32+0x21f/0x2b0 [i915]
[ 2613.401863] RSP: 0018:ffffc900084dfce8 EFLAGS: 00010292
[ 2613.401869] RAX: 000000000000002a RBX: ffff8804016a8000 RCX: 0000000000000006
[ 2613.401871] RDX: 0000000000000006 RSI: ffffffff81cbf2d9 RDI: ffffffff81c9e3a7
[ 2613.401874] RBP: ffffc900084dfd18 R08: ffff880409e3afc8 R09: 0000000000000000
[ 2613.401877] R10: 000000008a1c483f R11: 0000000000000000 R12: 000000000000209c
[ 2613.401879] R13: 0000000000000001 R14: ffff8804016a8000 R15: ffff8804016ac150
[ 2613.401882] FS:  00007f39ef3dd8c0(0000) GS:ffff88041fb40000(0000) knlGS:0000000000000000
[ 2613.401885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2613.401887] CR2: 00000000023717c8 CR3: 00000002e7b34000 CR4: 00000000001406e0
[ 2613.401889] Call Trace:
[ 2613.401912]  intel_engine_is_idle+0x76/0x90 [i915]
[ 2613.401931]  i915_gem_wait_for_idle+0xe6/0x1e0 [i915]
[ 2613.401951]  fault_irq_set+0x40/0x90 [i915]
[ 2613.401970]  i915_ring_test_irq_set+0x42/0x50 [i915]
[ 2613.401976]  simple_attr_write+0xc7/0xe0
[ 2613.401981]  full_proxy_write+0x4f/0x70
[ 2613.401987]  __vfs_write+0x23/0x120
[ 2613.401992]  ? rcu_read_lock_sched_held+0x75/0x80
[ 2613.401996]  ? rcu_sync_lockdep_assert+0x2a/0x50
[ 2613.401999]  ? __sb_start_write+0xfa/0x1f0
[ 2613.402004]  vfs_write+0xc5/0x1d0
[ 2613.402008]  ? trace_hardirqs_on_caller+0xe7/0x1c0
[ 2613.402013]  SyS_write+0x44/0xb0
[ 2613.402020]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 2613.402022] RIP: 0033:0x7f39eded6670
[ 2613.402025] RSP: 002b:00007fffdcdcb1a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2613.402030] RAX: ffffffffffffffda RBX: ffffffff81470203 RCX: 00007f39eded6670
[ 2613.402033] RDX: 0000000000000001 RSI: 000000000041bc33 RDI: 0000000000000006
[ 2613.402036] RBP: ffffc900084dff88 R08: 00007f39ef3dd8c0 R09: 0000000000000001
[ 2613.402038] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000041bc33
[ 2613.402041] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
[ 2613.402046]  ? __this_cpu_preempt_check+0x13/0x20
[ 2613.402052] Code: 01 9b fa e0 0f ff e9 28 fe ff ff 80 3d 6a dd 0e 00 00 0f 85 29 fe ff ff 48 c7 c7 48 19 29 a0 c6 05 56 dd 0e 00 01 e8 da 9a fa e0 <0f> ff e9 0f fe ff ff b9 01 00 00 00 ba 01 00 00 00 44 89 e6 48
[ 2613.402199] ---[ end trace 31f0cfa93ab632bf ]---

Fixes: 25112b64b3 ("drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170530121334.17364-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-05-30 17:18:28 +01:00
Dave Airlie
a82256bc02 Merge tag 'drm-intel-next-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel into drm-next
More stuff for 4.13:

- skl+ wm fixes from Mahesh Kumar
- some refactor and tests for i915_sw_fence (Chris)
- tune execlist/scheduler code (Chris)
- g4x,g33 gpu reset improvements (Chris, Mika)
- guc code cleanup (Michal Wajdeczko, Michał Winiarski)
- dp aux backlight improvements (Puthikorn Voravootivat)
- buffer based guc/host communication (Michal Wajdeczko)

* tag 'drm-intel-next-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel: (253 commits)
  drm/i915: Update DRIVER_DATE to 20170529
  drm/i915: Keep the forcewake timer alive for 1ms past the most recent use
  drm/i915/guc: capture GuC logs if FW fails to load
  drm/i915/guc: Introduce buffer based cmd transport
  drm/i915/guc: Disable send function on fini
  drm: Add definition for eDP backlight frequency
  drm/i915: Drop AUX backlight enable check for backlight control
  drm/i915: Consolidate #ifdef CONFIG_INTEL_IOMMU
  drm/i915: Only GGTT vma may be pinned and prevent shrinking
  drm/i915: Serialize GTT/Aperture accesses on BXT
  drm/i915: Convert i915_gem_object_ops->flags values to use BIT()
  drm/i915/selftests: Silence compiler warning in igt_ctx_exec
  drm/i915/guc: Skip port assign on first iteration of GuC dequeue
  drm/i915: Remove misleading comment in request_alloc
  drm/i915/g33: Improve reset reliability
  Revert "drm/i915: Restore lost "Initialized i915" welcome message"
  drm/i915/huc: Update GLK HuC version
  drm/i915: Check for allocation failure
  drm/i915/guc: Remove action status and statistics from debugfs
  drm/i915/g4x: Improve gpu reset reliability
  ...
2017-05-30 15:25:28 +10:00
Chris Wilson
80debff8d9 drm/i915: Consolidate #ifdef CONFIG_INTEL_IOMMU
We depend on intel_iommu_gfx_mapped for various workarounds, but that is
only available under an #ifdef CONFIG_INTEL_IOMMU. Refactor all the
cut-and-paste ifdefs to a common routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170525121612.2190-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-25 21:51:49 +01:00
Michal Hocko
2098105ec6 drm: drop drm_[cm]alloc* helpers
Now that drm_[cm]alloc* helpers are simple one line wrappers around
kvmalloc_array and drm_free_large is just kvfree alias we can drop
them and replace by their native forms.

This shouldn't introduce any functional change.

Changes since v1
- fix typo in drivers/gpu//drm/etnaviv/etnaviv_gem.c - noticed by 0day
  build robot

Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michal Hocko <mhocko@suse.com>drm: drop drm_[cm]alloc* helpers
[danvet: Fixup vgem which grew another user very recently.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517122312.GK18247@dhcp22.suse.cz
2017-05-18 17:22:39 +02:00
Chris Wilson
c5cf9a9147 drm/i915: Create a kmem_cache to allocate struct i915_priolist from
The i915_priolist are allocated within an atomic context on a path where
we wish to minimise latency. If we use a dedicated kmem_cache, we have
the advantage of a local freelist from which to service new requests
that should keep the latency impact of an allocation small. Though
currently we expect the majority of requests to be at default priority
(and so hit the preallocate priolist), once userspace starts using
priorities they are likely to use many fine grained policies improving
the utilisation of a private slab.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
2017-05-17 13:38:12 +01:00
Chris Wilson
6c067579e6 drm/i915: Split execlist priority queue into rbtree + linked list
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.

Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.

There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).

v2: Avoid use-after-free when deleting a depleted priolist

v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
2017-05-17 13:38:09 +01:00
Chris Wilson
77f0d0e925 drm/i915/execlists: Pack the count into the low bits of the port.request
add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function                                     old     new   delta
execlists_submit_ports                       262     471    +209
port_assign.isra                               -     136    +136
capture                                     6344    6359     +15
reset_common_ring                            438     452     +14
execlists_submit_request                     228     238     +10
gen8_init_common_ring                        334     341      +7
intel_engine_is_idle                         106     105      -1
i915_engine_info                            2314    2290     -24
__i915_gem_set_wedged_BKL                    485     411     -74
intel_lrc_irq_handler                       1789    1604    -185
execlists_update_context                     294       -    -294

The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).

v2: Use the port_api() for guc as well (even though currently we do not
pack any counters in there, yet) and hide all port->request_count inside
the helpers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
2017-05-17 13:38:06 +01:00
Chris Wilson
0ce8178808 drm/i915: Redefine ptr_pack_bits() and friends
Rebrand the current (pointer | bits) pack/unpack utility macros as
explicit bit twiddling for PAGE_SIZE so that we can use the more
flexible underlying macros for different bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-4-chris@chris-wilson.co.uk
2017-05-17 13:38:04 +01:00
Chris Wilson
991bfc64db drm/i915: Make ptr_unpack_bits() more function-like
ptr_unpack_bits() is a function-like macro, as such it is meant to be
replaceable by a function. In this case, we should be passing in the
out-param as a pointer.

Bizarrely this does affect code generation:

function                                     old     new   delta
i915_gem_object_pin_map                      409     389     -20

An improvement(?) in this case, but one can't help wonder what
strict-aliasing optimisations we are preventing.

The generated code looks identical in using ptr_unpack_bits (no extra
motions to stack, the pointer and bits appear to be kept in registers),
the difference appears to be code ordering and with a reorder it is able
to use smaller forward jumps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-3-chris@chris-wilson.co.uk
2017-05-17 13:38:03 +01:00
Linus Torvalds
de4d195308 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Debloat RCU headers

   - Parallelize SRCU callback handling (plus overlapping patches)

   - Improve the performance of Tree SRCU on a CPU-hotplug stress test

   - Documentation updates

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  rcu: Open-code the rcu_cblist_n_lazy_cbs() function
  rcu: Open-code the rcu_cblist_n_cbs() function
  rcu: Open-code the rcu_cblist_empty() function
  rcu: Separately compile large rcu_segcblist functions
  srcu: Debloat the <linux/rcu_segcblist.h> header
  srcu: Adjust default auto-expediting holdoff
  srcu: Specify auto-expedite holdoff time
  srcu: Expedite first synchronize_srcu() when idle
  srcu: Expedited grace periods with reduced memory contention
  srcu: Make rcutorture writer stalls print SRCU GP state
  srcu: Exact tracking of srcu_data structures containing callbacks
  srcu: Make SRCU be built by default
  srcu: Fix Kconfig botch when SRCU not selected
  rcu: Make non-preemptive schedule be Tasks RCU quiescent state
  srcu: Expedite srcu_schedule_cbs_snp() callback invocation
  srcu: Parallelize callback handling
  kvm: Move srcu_struct fields to end of struct kvm
  rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
  rcu: Use true/false in assignment to bool
  rcu: Use bool value directly
  ...
2017-05-10 10:30:46 -07:00
Chris Wilson
4797948071 drm/i915: Squash repeated awaits on the same fence
Track the latest fence waited upon on each context, and only add a new
asynchronous wait if the new fence is more recent than the recorded
fence for that context. This requires us to filter out unordered
timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the
absence of a universal identifier, we have to use our own
i915->mm.unordered_timeline token.

v2: Throw around the debug crutches
v3: Inline the likely case of the pre-allocation cache being full.
v4: Drop the pre-allocation support, we can lose the most recent fence
in case of allocation failure -- it just means we may emit more awaits
than strictly necessary but will not break.
v5: Trim allocation size for leaf nodes, they only need an array of u32
not pointers.
v6: Create mock_timeline to tidy selftest writing
v7: s/intel_timeline_sync_get/intel_timeline_sync_is_later/ (Tvrtko)
v8: Prune the stale sync points when we idle.
v9: Include a small benchmark in the kselftests
v10: Separate the idr implementation into its own compartment. (Tvrkto)
v11: Refactor igt_sync kselftests to avoid deep nesting (Tvrkto)
v12: __sync_leaf_idx() to assert that p->height is 0 when checking leaves
v13: kselftests to investigate struct i915_syncmap itself (Tvrtko)
v14: Foray into ascii art graphs
v15: Take into account that the random lookup/insert does 2 prng calls,
not 1, when benchmarking, and use for_each_set_bit() (Tvrtko)
v16: Improved ascii art

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-4-chris@chris-wilson.co.uk
2017-05-03 11:08:48 +01:00
Chris Wilson
9431282832 drm/i915: Mark up clflushes as belonging to an unordered timeline
2 clflushes on two different objects are not ordered, and so do not
belong to the same timeline (context). Either we use a unique context
for each, or we reserve a special global context to mean unordered.
Ideally, we would reserve 0 to mean unordered (DMA_FENCE_NO_CONTEXT) to
have the same semantics everywhere.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-1-chris@chris-wilson.co.uk
2017-05-03 11:08:45 +01:00
Dave Airlie
73ba2d5c2b Merge tag 'drm-intel-next-fixes-2017-04-27' of git://anongit.freedesktop.org/git/drm-intel into drm-next
drm/i915 and gvt fixes for drm-next/v4.12

* tag 'drm-intel-next-fixes-2017-04-27' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: Confirm the request is still active before adding it to the await
  drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio
  drm/i915/selftests: Allocate inode/file dynamically
  drm/i915: Fix system hang with EI UP masked on Haswell
  drm/i915: checking for NULL instead of IS_ERR() in mock selftests
  drm/i915: Perform link quality check unconditionally during long pulse
  drm/i915: Fix use after free in lpe_audio_platdev_destroy()
  drm/i915: Use the right mapping_gfp_mask for final shmem allocation
  drm/i915: Make legacy cursor updates more unsynced
  drm/i915: Apply a cond_resched() to the saturated signaler
  drm/i915: Park the signaler before sleeping
  drm/i915/gvt: fix a bounds check in ring_id_to_context_switch_event()
  drm/i915/gvt: Fix PTE write flush for taking runtime pm properly
  drm/i915/gvt: remove some debug messages in scheduler timer handler
  drm/i915/gvt: add mmio init for virtual display
  drm/i915/gvt: use directly assignment for structure copying
  drm/i915/gvt: remove redundant ring id check which cause significant CPU misprediction
  drm/i915/gvt: remove redundant platform check for mocs load/restore
  drm/i915/gvt: Align render mmio list to cacheline
  drm/i915/gvt: cleanup some too chatty scheduler message
2017-04-29 05:50:27 +10:00
Joonas Lahtinen
ea117b8ddc drm/i915: Reset ILK during GEM sanitization
ILK should survive a reset without display corruption.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-28 12:11:59 +03:00
Joonas Lahtinen
f2e4d76ec2 drm/i915: Eliminate HAS_HW_CONTEXTS
HAS_HW_CONTEXTS is misleading condition for GPU reset and CCID,
replace it with Gen specific (to be updated in next patches).

HAS_HW_CONTEXTS in i915_l3_write is bogus because each HAS_L3_DPF
match also has .has_hw_contexts = 1 set.

This leads to us being able to get rid of the property completely.

v2:
- Keep the checks at Gen6 for no functional change (Ville)

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-04-28 12:11:59 +03:00
Chris Wilson
bdb57b8dca drm/i915: Use the right mapping_gfp_mask for final shmem allocation
Many sightings report the greater prevalence of allocation failures.
This is all due to the incorrect use of mapping_gfp_constraint(), so
remove it in favour of just querying the mapping_gfp_mask() which are
the exact gfp_t we wanted in the first place.

We still do expect a higher chance of reporting ENOMEM, as that is the
intention of using __GFP_NORETRY -- to fail rather than oom after having
reclaimed from our bo caches, and having done a direct|kswapd reclaim
pass.

Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100594
Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170405221514.23251-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit b268d9fe0f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-04-26 16:28:08 +03:00
Ingo Molnar
58d30c36d4 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Documentation updates.

 - Miscellaneous fixes.

 - Parallelize SRCU callback handling (plus overlapping patches).

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-23 11:12:44 +02:00
Dave Airlie
856ee92e86 Linux 4.11-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJY881cAAoJEHm+PkMAQRiGG4UH+wa2z6Qet36Uc4nXFZuSMYrO
 ErUWs1QpTDDv4a+LE4fgyMvM3j9XqtpfQLy1n70jfD14IqPBhHe4gytasAf+8lg1
 YvddFx0Yl3sygVu3dDBNigWeVDbfwepW59coN0vI5nrMo+wrei8aVIWcFKOxdMuO
 n72u9vuhrkEnLJuQk7SF+t4OQob9McXE3s7QgyRopmlKhKo7mh8On7K2BRI5uluL
 t0j5kZM0a43EUT5rq9xR8f5pgtyfTMG/FO2MuzZn43MJcZcyfmnOP/cTSIvAKA5U
 1i12lxlokYhURNUe+S6jm8A47TrqSRSJxaQJZRlfGJksZ0LJa8eUaLDCviBQEoE=
 =6QWZ
 -----END PGP SIGNATURE-----

Merge tag 'v4.11-rc7' into drm-next

Backmerge Linux 4.11-rc7 from Linus tree, to fix some
conflicts that were causing problems with the rerere cache
in drm-tip.
2017-04-19 11:07:14 +10:00
Paul E. McKenney
5f0d5a3ae7 mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU
A group of Linux kernel hackers reported chasing a bug that resulted
from their assumption that SLAB_DESTROY_BY_RCU provided an existence
guarantee, that is, that no block from such a slab would be reallocated
during an RCU read-side critical section.  Of course, that is not the
case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
slab of blocks.

However, there is a phrase for this, namely "type safety".  This commit
therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
to avoid future instances of this sort of confusion.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
[ paulmck: Add comments mentioning the old name, as requested by Eric
  Dumazet, in order to help people familiar with the old name find
  the new one. ]
Acked-by: David Rientjes <rientjes@google.com>
2017-04-18 11:42:36 -07:00
Chris Wilson
e22d8e3c69 drm/i915: Treat WC a separate cache domain
When discussing a new WC mmap, we based the interface upon the
assumption that GTT was fully coherent. How naive! Commits 3b5724d702
("drm/i915: Wait for writes through the GTT to land before reading
back") and ed4596ea99 ("drm/i915/guc: WA to address the Ringbuffer
coherency issue") demonstrate that writes through the GTT are indeed
delayed and may be overtaken by direct WC access. To be safe, if
userspace is mixing WC mmaps with other potential GTT access (pwrite,
GTT mmaps) it should use set_domain(WC).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96563
Testcase: igt/gem_pwrite/small-gtt*
Testcase: igt/drv_selftest/coherency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170412110111.26626-2-chris@chris-wilson.co.uk
2017-04-12 12:35:17 +01:00
Chris Wilson
ef74921bc6 drm/i915: Combine write_domain flushes to a single function
In the next patch, we will introduce a new cache domain for
differentiating between GTT access and direct WC access. This will
require us to include WC in our write_domain flushes. Rather than
duplicate a third function, combine the existing two into one and
flushing WC writes will then be automatically handled as well.

v2: Be smarter and clearer by passing in the write domains to flush (Joonas)
v3: One missed ~ in v2 conversion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170412110111.26626-1-chris@chris-wilson.co.uk
2017-04-12 12:35:16 +01:00
Chris Wilson
1d39f28170 drm/i915: Rename intel_engine_cs.exec_id to uabi_id
We want to refer to the index of the engine consistently throughout the
userspace ABI. We already have such an index through the execbuffer
engine specifier, that needs to be able to refer to each engine
specifically, so rename it the index to uabi_id to reflect its
generality beyond execbuf.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170411124306.15448-1-chris@chris-wilson.co.uk
2017-04-11 14:31:39 +01:00
Sagar Arun Kamble
63987bfebd drm/i915: Suspend GuC prior to GPU Reset during GEM suspend
i915 is currently doing a full GPU reset at the end of
i915_gem_suspend() followed by GuC suspend in i915_drm_suspend(). This
GPU reset clobbers the GuC, causing the suspend request to then fail,
leaving the GuC in an undefined state. We need to tell the GuC to
suspend before we do the direct intel_gpu_reset().

v2: Commit message update. (Chris, Daniele)

Fixes: 1c777c5d1d ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491387710-20553-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit fd08923384)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-04-11 13:25:12 +03:00
Chris Wilson
f2be9d6833 drm/i915: Insert cond_resched() into i915_gem_free_objects
As we may have very many objects to free, check to see if the task needs
to be rescheduled whilst freeing them.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-07 13:37:41 +01:00
Chris Wilson
5ad08be7e3 drm/i915: Break up long runs of freeing objects
Before freeing the next batch of objects from the worker, check if the
worker's timeslice has expired and if so, defer the next batch to the
next invocation of the worker.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-07 13:37:41 +01:00
Joonas Lahtinen
e92075ff7d drm/i915: Simplify shrinker locking
By using the same structure for both interruptible and
uninterruptible locking in shrinker code, combined with the
information that mm.interruptible is only being written to, the
code can be greatly simplified.

Also removing the i915_gem_ prefix from the locking functions so
that nobody in their wildest dreams considers exporting them.

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1491562175-27680-1-git-send-email-joonas.lahtinen@linux.intel.com
2017-04-07 14:33:39 +03:00
Chris Wilson
17b93c40e7 drm/i915: Drain any freed objects prior to hibernation
As we call into the shrinker during freeze, we may have freed more
objects since we idled during i915_gem_suspend. Make sure we flush the
i915_gem_free_objects worker prior to saving the unwanted pages into the
hibernation image.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-07 12:26:00 +01:00
Chris Wilson
d0aa301ae5 drm/i915: The shrinker already acquires struct_mutex, so call it unlocked
The shrinker is prepared to be called unlocked (and at other times with
struct_mutex held for DIRECT_RECLAIM) so we can skip acquiring the
struct_mutex prior to calling the shrinker during freeze. This improves
our ability to shrink as we can be more aggressive when we know the
caller isn't holding struct_mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-07 12:25:11 +01:00
Chris Wilson
b268d9fe0f drm/i915: Use the right mapping_gfp_mask for final shmem allocation
Many sightings report the greater prevalence of allocation failures.
This is all due to the incorrect use of mapping_gfp_constraint(), so
remove it in favour of just querying the mapping_gfp_mask() which are
the exact gfp_t we wanted in the first place.

We still do expect a higher chance of reporting ENOMEM, as that is the
intention of using __GFP_NORETRY -- to fail rather than oom after having
reclaimed from our bo caches, and having done a direct|kswapd reclaim
pass.

Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100594
Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170405221514.23251-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-06 13:02:02 +01:00
Sagar Arun Kamble
fd08923384 drm/i915: Suspend GuC prior to GPU Reset during GEM suspend
i915 is currently doing a full GPU reset at the end of
i915_gem_suspend() followed by GuC suspend in i915_drm_suspend(). This
GPU reset clobbers the GuC, causing the suspend request to then fail,
leaving the GuC in an undefined state. We need to tell the GuC to
suspend before we do the direct intel_gpu_reset().

v2: Commit message update. (Chris, Daniele)

Fixes: 1c777c5d1d ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491387710-20553-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-06 11:10:01 +01:00
Tvrtko Ursulin
7a3ee5deb4 drm/i915: Remove user-triggerable WARN from i915_gem_object_create
Since this can be triggered by simply attempting a huge object,
a WARN_ON is not appropriate.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330163130.24141-1-tvrtko.ursulin@linux.intel.com
2017-04-04 09:39:13 +01:00
Chris Wilson
25112b64b3 drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()
Make i915_gem_wait_for_idle() be a little heavier in order to try and
guarantee that the GPU is indeed idle (by checking each engine
individually is idle, i.e. all writes are complete and the rings
stopped) after waiting for in-flight requests to be completed.

v2: And return the final error.
v3: Break the wait_for() out from under the WARN -- the macro expansion
is hideous and unreadable in the warning message
v4: If wait_for_engine() fails the result is catastrophic, mark the
device as wedged and wait for the repair team.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98836
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-31 12:07:17 +01:00
Chris Wilson
72022a705e drm/i915: Move retire-requests into i915_gem_wait_for_idle()
As we now distinguish everywhere that can call
i915_gem_retire_requests() following a successful wait_for_idle, we can
remove the duplication by moving that call into i915_gem_wait_for_idle()
itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-31 12:03:46 +01:00
Chris Wilson
8490ae207f drm/i915: Suppress busy status for engines if wedged
If the driver is wedged, HW state may be very inconsistent and
report that it is still busy, even though we have stopped using it. This
can lead to a double *ERROR* rather than a graceful cleanup after
wedging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-2-chris@chris-wilson.co.uk
2017-03-30 17:58:44 +01:00
Chris Wilson
2c170af76c drm/i915: Do request retirement before marking engines as wedged
As we declare an engine as wedged, we mark all of its active requests as
in error. However, we don't want to mark successfully completed requests
as in error, which requires us to retire those requests first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-1-chris@chris-wilson.co.uk
2017-03-30 17:56:21 +01:00
Oscar Mateo
b8991403ea drm/i915/guc: Take enable_guc_loading check out of GEM core code
The should happen as soon as possible, but always within the logic that
depends on it (and not interrupting the top-level driver control flow).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1490720027-23234-1-git-send-email-oscar.mateo@intel.com
2017-03-30 13:11:40 +03:00
Chris Wilson
e2a2aa36a5 drm/i915: Check we have an wake device before flushing GTT writes
We can assume that if the device is asleep then all pending GTT writes
will have been posted, and so we can defer the flush from
i915_gem_object_flush_gtt_write_domain()

[ 1957.462568] WARNING: CPU: 0 PID: 6132 at drivers/gpu/drm/i915/intel_drv.h:1742 fwtable_read32+0x123/0x150 [i915]
[ 1957.462582] RPM wakelock ref not held during HW access
[ 1957.462583] Modules linked in: i915 intel_gtt drm_kms_helper prime_numbers
[ 1957.462607] CPU: 0 PID: 6132 Comm: gem_concurrent_ Tainted: G     U          4.11.0-rc1+ #464
[ 1957.462619] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[ 1957.462630] Call Trace:
[ 1957.462646]  dump_stack+0x4d/0x6f
[ 1957.462657]  __warn+0xc1/0xe0
[ 1957.462667]  warn_slowpath_fmt+0x4a/0x50
[ 1957.462709]  fwtable_read32+0x123/0x150 [i915]
[ 1957.462750]  i915_gem_object_flush_gtt_write_domain+0x43/0x70 [i915]
[ 1957.462791]  i915_gem_object_set_to_cpu_domain+0x46/0xa0 [i915]
[ 1957.462831]  i915_gem_set_domain_ioctl+0x15d/0x220 [i915]
[ 1957.462843]  drm_ioctl+0x1d7/0x440
[ 1957.462885]  ? i915_gem_obj_prepare_shmem_write+0x1d0/0x1d0 [i915]
[ 1957.462896]  ? pick_next_task_fair+0x436/0x440
[ 1957.462906]  ? mntput+0x1f/0x30
[ 1957.462915]  do_vfs_ioctl+0x8f/0x5c0
[ 1957.462925]  ? __schedule+0x16f/0x5f0
[ 1957.462935]  ? ____fput+0x9/0x10
[ 1957.462943]  SyS_ioctl+0x3c/0x70
[ 1957.462952]  entry_SYSCALL_64_fastpath+0x17/0x98
[ 1957.462961] RIP: 0033:0x7fc542179ca7
[ 1957.462968] RSP: 002b:00007ffeef12ff98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1957.462982] RAX: ffffffffffffffda RBX: 00007ffeef1301d0 RCX: 00007fc542179ca7
[ 1957.462990] RDX: 00007ffeef12ffd0 RSI: 00000000400c645f RDI: 0000000000000003
[ 1957.462999] RBP: 0000000000000003 R08: 000055f433bc7c40 R09: 000000000000002c
[ 1957.463006] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000018
[ 1957.463015] R13: 000055f432c89d20 R14: 000055f432c87690 R15: 0000000000000000

Fixes: 3b5724d702 ("drm/i915: Wait for writes through the GTT to land before reading back")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170323150053.28582-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-27 12:48:44 +01:00
Oscar Mateo
3950bf3dbf drm/i915/guc: Add onion teardown to the GuC setup
Starting with intel_guc_loader, down to intel_guc_submission
and finally to intel_guc_log.

v2:
  - Null execbuf client outside guc_client_free (Daniele)
  - Assert if things try to get allocated twice (Daniele/Joonas)
  - Null guc->log.buf_addr when destroyed (Daniele)
  - Newline between returning success and error labels (Joonas)
  - Remove some unnecessary comments (Joonas)
  - Keep guc_log_create_extras naming convention (Joonas)
  - Helper function guc_log_has_extras (Joonas)
  - No need for separate relay_channel create/destroy. It's just another extra.
  - No need to nullify guc->log.flush_wq when destroyed (Joonas)
  - Hoist the check for has_extras out of guc_log_create_extras (Joonas)
  - Try to do i915_guc_log_register/unregister calls (kind of) symmetric (Daniele)
  - Make sure initel_guc_fini is not called before init is ever called (Daniele)

v3:
  - Remove unnecessary parenthesis (Joonas)
  - Check for logs enabled on debugfs registration
  - Rebase on top of Tvrtko's "Fix request re-submission after reset"

v4:
  - Rebased
  - Comment around enabling/disabling interrupts inside GuC logging (Joonas)

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-23 14:57:36 +02:00
Chris Wilson
40149f00fb drm/i915: Actually pass the reclaim gfp_t along to shmemfs!
Words cannot describe the embarrassment at creating a new gfp_t relaim to
only prevent the oomkiller but allow direct|kswapd reclaim, and then not
use it in the shmem_read_mapping_page_gfp().

Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170322223447.7493-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-23 09:30:20 +00:00
Chris Wilson
24f8e00a8a drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations
Since gfx allocations tend to be large, unmovable and disposable, report
the allocation failure back to userspace as an ENOMEM rather than incur
the oomkiller. We have already tried to make room by purging our own
cached gfx objects, and the oomkiller doesn't attribute ownership of gfx
objects so will likely pick the wrong candidate. Instead, let userspace
see the ENOMEM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170322110521.29930-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-03-22 19:26:46 +00:00
Chris Wilson
54ec12af2f drm/i915: Skip force-wake for uncached mmio flush of GGTT writes
The trick of using an uncached mmio read to ensure that the GGTT writes
are flushed does not require us to do the forcewake dance, so avoid it
in the hope of reducing the frequency that we do keep the device forced
awake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170318104257.694-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-20 10:45:49 +00:00
Chris Wilson
be062fa427 drm/i915: Initialise i915_gem_object_create_from_data() directly
Use pagecache_write to avoid shmemfs clearing the pages prior to us
immediately overwriting them with our data.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170317194648.12468-2-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-03-17 22:55:56 +00:00
Chris Wilson
ce8ff099c4 drm/i915: i915_gem_object_create_from_data() doesn't require struct_mutex
Both object creation and backing storage page allocation do not require
struct_mutex, so do not require the caller to take it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170317194648.12468-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-03-17 22:54:06 +00:00
Chris Wilson
2e8f9d3229 drm/i915: Restore engine->submit_request before unwedging
When we wedge the device, we override engine->submit_request with a nop
to ensure that all in-flight requests are marked in error. However, igt
would like to unwedge the device to test -EIO handling. This requires us
to flush those in-flight requests and restore the original
engine->submit_request.

v2: Use a vfunc to unify enabling request submission to engines
v3: Split new vfunc to a separate patch.
v4: Make the wait interruptible -- the third party fences we wait upon
may be indefinitely broken, so allow the reset to be aborted.

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v3
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-3-chris@chris-wilson.co.uk
2017-03-16 17:17:14 +00:00
Chris Wilson
8c185ecaf4 drm/i915: Split I915_RESET_IN_PROGRESS into two flags
I915_RESET_IN_PROGRESS is being used for both signaling the requirement
to i915_mutex_lock_interruptible() to avoid taking the struct_mutex and
to instruct a waiter (already holding the struct_mutex) to perform the
reset. To allow for a little more coordination, split these two meaning
into a couple of distinct flags. I915_RESET_BACKOFF tells
i915_mutex_lock_interruptible() not to acquire the mutex and
I915_RESET_HANDOFF tells the waiter to call i915_reset().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-1-chris@chris-wilson.co.uk
2017-03-16 17:17:10 +00:00
Arkadiusz Hiler
6cd5a72c35 drm/i915/guc: Simplify intel_guc_init_hw()
Current version of intel_guc_init_hw() does a lot:
 - cares about submission
 - loads huc
 - implement WA

This change offloads some of the logic to intel_uc_init_hw(), which now
cares about the above.

v2: rename guc_hw_reset and fix typo in define name (M. Wajdeczko)
v3: rename once again
v4: remove spurious comments and add some style (J. Lahtinen)
v5: flow changes, got rid of dead checks (M. Wajdeczko)
v6: rebase
v7: rebase & onion teardown (J. Lahtinen)

Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-15 14:26:30 +02:00
Arkadiusz Hiler
882d1db09b drm/i915/uc: Rename intel_?uc_{setup, load}() to _init_hw()
GuC historically has two "startup" functions called _init() and _setup()

Then HuC came with it's _init() and _load().

This commit renames intel_guc_setup() and intel_huc_load() to
*uc_init_hw() as they called from the i915_gem_init_hw().

The aim is to be consistent in that entry points called during
particular driver init phases (e.g. init_hw) are all suffixed by that
phase. When reading the leaf functions, it should be clear at what stage
during the driver load it is called and therefore what operations are
legal at that point.

Also, since the functions start with intel_guc and intel_huc they take
appropiate structure.

v2: commit message update (Chris Wilson)
v3: change taken parameters to be more "semantic" (M. Wajdeczko)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-15 14:26:30 +02:00
Kenneth Graunke
0f5418e564 drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters.
This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
(indicating the optional feature is not supported), and makes execbuf
always return -EINVAL if the flags are used.

Apparently, no userspace ever shipped which used this optional feature:
I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
and there were zero commits showing a use of these flags.  Kernel commit
72bfa19c8d apparently introduced the feature prematurely.  According
to Chris, the intention was to use this in cairo-drm, but "the use was
broken for gen6", so I don't think it ever happened.

'relative_constants_mode' has always been tracked per-device, but this
has actually been wrong ever since hardware contexts were introduced, as
the INSTPM register is saved (and automatically restored) as part of the
render ring context. The software per-device value could therefore get
out of sync with the hardware per-context value.  This meant that using
them is actually unsafe: a client which tried to use them could damage
the state of other clients, causing the GPU to interpret their BO
offsets as absolute pointers, leading to bogus memory reads.

These flags were also never ported to execlist mode, making them no-ops
on Gen9+ (which requires execlists), and Gen8 in the default mode.

On Gen8+, userspace can write these registers directly, achieving the
same effect.  On Gen6-7.5, it likely makes sense to extend the command
parser to support them.  I don't think anyone wants this on Gen4-5.

Based on a patch by Dave Gordon.

v3: Return -ENODEV for the getparam, as this is what we do for other
    obsolete features.  Suggested by Chris Wilson.

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170313170433.26843-1-chris@chris-wilson.co.uk
(cherry picked from commit ef0f411f51)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-14 12:28:25 +02:00
Chris Wilson
4565bf58d4 drm/i915: Disable engine->irq_tasklet around resets
When we restart the engines, and we have active requests, a request on
the first engine may complete and queue a request to the second engine
before we try to restart the second engine. That queueing of the
request may race with the engine to restart, and so may corrupt the
current state. Disabling the engine->irq_tasklet prevents the two paths
from writing into ELSP simultaneously (and modifyin the execlists_port[]
at the same time).

Include fixup 1d309634bc ("drm/i915: Kill the tasklet then disable")

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_exec_fence/await-hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-3-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/20170313165958.13970-2-chris@chris-wilson.co.uk
(cherry picked from commit 1f7b847d72)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-14 12:26:44 +02:00
Chris Wilson
da9a796f54 drm/i915: Split GEM resetting into 3 phases
Currently we do a reset prepare/finish around the call to reset the GPU,
but it looks like we need a later stage after the hw has been
reinitialised to allow GEM to restart itself. Start by splitting the 2
GEM phases into 3:

  prepare - before the reset, check if GEM recovered, then stop GEM

  reset - after the reset, update GEM bookkeeping

  finish - after the re-initialisation following the reset, restart GEM

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-2-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/20170313165958.13970-1-chris@chris-wilson.co.uk
(cherry picked from commit d802709313)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-14 12:11:49 +02:00
Chris Wilson
7f5f95d8ac drm/i915: Move whole object to CPU domain for coherent shmem access
If the object is coherent, we can simply update the cache domain on the
whole object rather than calculate the before/after clflushes. The
advantage is that we then get correct tracking of ellided flushes when
changing coherency later.

Testcase: igt/gem_pwrite_snooped
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170310000942.11661-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-13 11:16:09 +00:00
Chris Wilson
4e6fdafa7a drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl
Before we instantiate/pin the backing store for our use, we
can prepopulate the shmemfs filp efficiently using a write into the
pagecache. We avoid the penalty of instantiating all the pages, important
if the user is just writing to a few and never uses the object on the GPU,
and using a direct write into shmemfs allows it to avoid the cost of
retrieving a page (mostly the clear-before-use, but in theory we could
curtail swapin) before it is overwritten.

This can be extended later to provide additional specialisation for
other backends (other than shmemfs). For now it provides a defense
against very large write-only allocations from exhausting all of system
memory.

v2: Smelling fixes.

Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
References: https://bugs.freedesktop.org/show_bug.cgi?id=99107
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307120338.7277-2-chris@chris-wilson.co.uk
(cherry picked from commit 7c55e2c577)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-09 10:46:07 +02:00
Chris Wilson
0d9dc306e1 drm/i915: Store a permanent error in obj->mm.pages
Once the object has been truncated, it is unrecoverable. To facilitate
detection of this state store the error in obj->mm.pages.

This is required for the next patch which should be applied to v4.10
(via stable), so we also need to mark this patch for backporting. In
that regard, let's consider this to be a fix/improvement too.

v2: Avoid dereferencing the ERR_PTR when freeing the object.

Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170307132031.32461-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 4e5462ee84)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-09 10:45:58 +02:00
Chris Wilson
89cf83d4e0 drm/i915: Squelch any ktime/jiffie rounding errors for wait-ioctl
We wait upon jiffies, but report the time elapsed using a
high-resolution timer. This discrepancy can lead to us timing out the
wait prior to us reporting the elapsed time as complete.

This restores the squelching lost in commit e95433c73a ("drm/i915:
Rearrange i915_wait_request() accounting with callers").

Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170216125441.30923-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit c1d2061b28)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-09 10:42:44 +02:00
Chris Wilson
03d1cac6ee drm/i915: Avoiding recursing on ww_mutex inside shrinker
We have to avoid taking ww_mutex inside the shrinker as we use it as a
plain mutex type and so need to avoid recursive deadlocks:

[  602.771969] =================================
[  602.771970] [ INFO: inconsistent lock state ]
[  602.771973] 4.10.0gpudebug+ #122 Not tainted
[  602.771974] ---------------------------------
[  602.771975] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[  602.771978] kswapd0/40 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  602.771979]  (reservation_ww_class_mutex){+.+.?.}, at: [<ffffffffa054680a>] i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772020] {RECLAIM_FS-ON-W} state was registered at:
[  602.772024]   mark_held_locks+0x76/0x90
[  602.772026]   lockdep_trace_alloc+0xb8/0xc0
[  602.772028]   __kmalloc_track_caller+0x5d/0x130
[  602.772031]   krealloc+0x89/0xb0
[  602.772033]   reservation_object_reserve_shared+0xaf/0xd0
[  602.772055]   i915_gem_do_execbuffer.isra.35+0x1413/0x18b0 [i915]
[  602.772075]   i915_gem_execbuffer2+0x10e/0x1d0 [i915]
[  602.772078]   drm_ioctl+0x291/0x480
[  602.772079]   do_vfs_ioctl+0x695/0x6f0
[  602.772081]   SyS_ioctl+0x3c/0x70
[  602.772084]   entry_SYSCALL_64_fastpath+0x18/0xad
[  602.772085] irq event stamp: 5197423
[  602.772088] hardirqs last  enabled at (5197423): [<ffffffff8116751d>] kfree+0xdd/0x170
[  602.772091] hardirqs last disabled at (5197422): [<ffffffff811674f9>] kfree+0xb9/0x170
[  602.772095] softirqs last  enabled at (5190992): [<ffffffff8107bfe1>] __do_softirq+0x221/0x280
[  602.772097] softirqs last disabled at (5190575): [<ffffffff8107c294>] irq_exit+0x64/0xc0
[  602.772099]
               other info that might help us debug this:
[  602.772100]  Possible unsafe locking scenario:

[  602.772101]        CPU0
[  602.772101]        ----
[  602.772102]   lock(reservation_ww_class_mutex);
[  602.772104]   <Interrupt>
[  602.772105]     lock(reservation_ww_class_mutex);
[  602.772107]
                *** DEADLOCK ***

[  602.772109] 2 locks held by kswapd0/40:
[  602.772110]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811337b5>] shrink_slab.constprop.62+0x35/0x280
[  602.772116]  #1:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0553957>] i915_gem_shrinker_lock+0x27/0x60 [i915]
[  602.772141]
               stack backtrace:
[  602.772144] CPU: 2 PID: 40 Comm: kswapd0 Not tainted 4.10.0gpudebug+ #122
[  602.772145] Hardware name: LENOVO 42433ZG/42433ZG, BIOS 8AET64WW (1.44 ) 07/26/2013
[  602.772147] Call Trace:
[  602.772151]  dump_stack+0x68/0xa1
[  602.772153]  print_usage_bug+0x1d4/0x1f0
[  602.772155]  mark_lock+0x390/0x530
[  602.772157]  ? print_irq_inversion_bug+0x200/0x200
[  602.772159]  __lock_acquire+0x405/0x1260
[  602.772181]  ? i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772183]  lock_acquire+0x60/0x80
[  602.772205]  ? i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772207]  mutex_lock_nested+0x69/0x760
[  602.772229]  ? i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772231]  ? kfree+0xdd/0x170
[  602.772253]  ? i915_gem_object_wait+0x163/0x410 [i915]
[  602.772255]  ? trace_hardirqs_on_caller+0x18d/0x1c0
[  602.772256]  ? trace_hardirqs_on+0xd/0x10
[  602.772278]  i915_gem_object_wait+0x39a/0x410 [i915]
[  602.772300]  i915_gem_object_unbind+0x5e/0x130 [i915]
[  602.772323]  i915_gem_shrink+0x22d/0x3d0 [i915]
[  602.772347]  i915_gem_shrinker_scan+0x3f/0x80 [i915]
[  602.772349]  shrink_slab.constprop.62+0x1ad/0x280
[  602.772352]  shrink_node+0x52/0x80
[  602.772355]  kswapd+0x427/0x5c0
[  602.772358]  kthread+0x122/0x130
[  602.772360]  ? try_to_free_pages+0x270/0x270
[  602.772362]  ? kthread_stop+0x70/0x70
[  602.772365]  ret_from_fork+0x2e/0x40

v2: Add commentary about the pruning being opportunistic

Reported-by: Jan Nordholz <jckn@gmx.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99977#c10
Fixes: e54ca97747 ("drm/i915: Remove completed fences after a wait")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170308132629.7987-1-chris@chris-wilson.co.uk
2017-03-08 20:42:17 +00:00
Daniel Vetter
7ffe939dd9 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge drm-next to get at all the good stuff in drm-misc. We need
that because:

- drm_connector_list_iter conversion for i915 needs the core patches.
- Maarten's patches to use the new atomic state iterators also need
  the core patches.
- We need the new link status property to complete the DP retraining
  work, merging through 2 branches wasn't a good idea and we had to
  partially backtrack.
- Chris needs reservation_object_trylock and we want to roll out
  kref_read everywhere.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-03-08 10:54:45 +01:00
Dave Airlie
2e16101780 Merge tag 'drm-intel-next-2017-03-06' of git://anongit.freedesktop.org/git/drm-intel into drm-next
4 weeks worth of stuff since I was traveling&lazy:

- lspcon improvements (Imre)
- proper atomic state for cdclk handling (Ville)
- gpu reset improvements (Chris)
- lots and lots of polish around fences, requests, waiting and
  everything related all over (both gem and modeset code), from Chris
- atomic by default on gen5+ minus byt/bsw (Maarten did the patch to
  flip the default, really this is a massive joint team effort)
- moar power domains, now 64bit (Ander)
- big pile of in-kernel unit tests for various gem subsystems (Chris),
  including simple mock objects for i915 device and and the ggtt
  manager.
- i915_gpu_info in debugfs, for taking a snapshot of the current gpu
  state. Same thing as i915_error_state, but useful if the kernel didn't
  notice something is stick. From Chris.
- bxt dsi fixes (Umar Shankar)
- bxt w/a updates (Jani)
- no more struct_mutex for gem object unreference (Chris)
- some execlist refactoring (Tvrtko)
- color manager support for glk (Ander)
- improve the power-well sync code to better take over from the
  firmware (Imre)
- gem tracepoint polish (Tvrtko)
- lots of glk fixes all around (Ander)
- ctx switch improvements (Chris)
- glk dsi support&fixes (Deepak M)
- dsi fixes for vlv and clanups, lots of them (Hans de Goede)
- switch to i915.ko types in lots of our internal modeset code (Ander)
- byt/bsw atomic wm update code, yay (Ville)

* tag 'drm-intel-next-2017-03-06' of git://anongit.freedesktop.org/git/drm-intel: (432 commits)
  drm/i915: Update DRIVER_DATE to 20170306
  drm/i915: Don't use enums for hardware engine id
  drm/i915: Split breadcrumbs spinlock into two
  drm/i915: Refactor wakeup of the next breadcrumb waiter
  drm/i915: Take reference for signaling the request from hardirq
  drm/i915: Add FIFO underrun tracepoints
  drm/i915: Add cxsr toggle tracepoint
  drm/i915: Add VLV/CHV watermark/FIFO programming tracepoints
  drm/i915: Add plane update/disable tracepoints
  drm/i915: Kill level 0 wm hack for VLV/CHV
  drm/i915: Workaround VLV/CHV sprite1->sprite0 enable underrun
  drm/i915: Sanitize VLV/CHV watermarks properly
  drm/i915: Only use update_wm_{pre,post} for pre-ilk platforms
  drm/i915: Nuke crtc->wm.cxsr_allowed
  drm/i915: Compute proper intermediate wms for vlv/cvh
  drm/i915: Skip useless watermark/FIFO related work on VLV/CHV when not needed
  drm/i915: Compute vlv/chv wms the atomic way
  drm/i915: Compute VLV/CHV FIFO sizes based on the PM2 watermarks
  drm/i915: Plop vlv/chv fifo sizes into crtc state
  drm/i915: Plop vlv wm state into crtc_state
  ...
2017-03-08 12:41:47 +10:00
Chris Wilson
7c55e2c577 drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl
Before we instantiate/pin the backing store for our use, we
can prepopulate the shmemfs filp efficiently using a write into the
pagecache. We avoid the penalty of instantiating all the pages, important
if the user is just writing to a few and never uses the object on the GPU,
and using a direct write into shmemfs allows it to avoid the cost of
retrieving a page (mostly the clear-before-use, but in theory we could
curtail swapin) before it is overwritten.

This can be extended later to provide additional specialisation for
other backends (other than shmemfs). For now it provides a defense
against very large write-only allocations from exhausting all of system
memory.

v2: Smelling fixes.

Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
References: https://bugs.freedesktop.org/show_bug.cgi?id=99107
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307120338.7277-2-chris@chris-wilson.co.uk
2017-03-07 21:26:59 +00:00
Chris Wilson
4e5462ee84 drm/i915: Store a permanent error in obj->mm.pages
Once the object has been truncated, it is unrecoverable. To facilitate
detection of this state store the error in obj->mm.pages.

This is required for the next patch which should be applied to v4.10
(via stable), so we also need to mark this patch for backporting. In
that regard, let's consider this to be a fix/improvement too.

v2: Avoid dereferencing the ERR_PTR when freeing the object.

Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170307132031.32461-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-07 21:25:38 +00:00
Chris Wilson
0542524944 drm/i915: Generalise wait for execlists to be idle
The code to check for execlists completion is generic, so move it to
intel_engine_cs.c, where we can reuse the new intel_engine_is_idle().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170303121947.20482-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-03 13:08:15 +00:00
Chris Wilson
c8659efac5 drm/i915: Drop spinlocks around adding to the client request list
Adding to the tail of the client request list as the only other user is
in the throttle ioctl that iterates forwards over the list. It only
needs protection against deletion of a request as it reads it, it simply
won't see a new request added to the end of the list, or it would be too
early and rejected. We can further reduce the number of spinlocks
required when throttling by removing stale requests from the client_list
as we throttle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302122525.19675-1-chris@chris-wilson.co.uk
2017-03-02 22:33:41 +00:00
Chris Wilson
c998e8a0f4 drm/i915: Hold rpm during GEM suspend in driver unload/suspend
i915_gem_suspend() tries to access the device to ensure it is idle and
all writes from the device are flushed to memory. It assumed is already
held the runtime pm wakeref, but we should explicitly acquire it for our
access to be safe.

[  619.926287] WARNING: CPU: 3 PID: 9353 at drivers/gpu/drm/i915/intel_drv.h:1750 gen6_write32+0x23e/0x2a0 [i915]
[  619.926300] RPM wakelock ref not held during HW access
[  619.926311] Modules linked in: vgem x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul snd_pcm mei_me mei lpc_ich ghash_clmulni_intel i915(-) sdhci_pci sdhci mmc_core e1000e ptp pps_core prime_numbers [last unloaded: snd_hda_intel]
[  619.926578] CPU: 3 PID: 9353 Comm: drv_module_relo Tainted: G     U          4.10.0-CI-Trybot_609+ #1
[  619.926585] Hardware name: LENOVO 42962WU/42962WU, BIOS 8DET56WW (1.26 ) 12/01/2011
[  619.926592] Call Trace:
[  619.926609]  dump_stack+0x67/0x92
[  619.926625]  __warn+0xc6/0xe0
[  619.926640]  warn_slowpath_fmt+0x4a/0x50
[  619.926726]  gen6_write32+0x23e/0x2a0 [i915]
[  619.926801]  gen6_mm_switch+0x38/0x70 [i915]
[  619.926871]  i915_switch_context+0xec/0xa10 [i915]
[  619.926942]  i915_gem_switch_to_kernel_context+0x13c/0x2b0 [i915]
[  619.927019]  i915_gem_suspend+0x2b/0x180 [i915]
[  619.927079]  i915_driver_unload+0x22/0x200 [i915]
[  619.927093]  ? __this_cpu_preempt_check+0x13/0x20
[  619.927105]  ? trace_hardirqs_on_caller+0xe7/0x200
[  619.927118]  ? trace_hardirqs_on+0xd/0x10
[  619.927128]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
[  619.927192]  i915_pci_remove+0x14/0x20 [i915]
[  619.927205]  pci_device_remove+0x34/0xb0
[  619.927219]  device_release_driver_internal+0x158/0x210
[  619.927234]  driver_detach+0x3b/0x80
[  619.927245]  bus_remove_driver+0x53/0xd0
[  619.927256]  driver_unregister+0x27/0x50
[  619.927267]  pci_unregister_driver+0x25/0xa0
[  619.927351]  i915_exit+0x1a/0xb1a [i915]
[  619.927362]  SyS_delete_module+0x193/0x1e0
[  619.927378]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  619.927386] RIP: 0033:0x7f82b46c5d37
[  619.927393] RSP: 002b:00007ffdb6f610d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[  619.927408] RAX: ffffffffffffffda RBX: ffffffff81481ff3 RCX: 00007f82b46c5d37
[  619.927415] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 000000000224f558
[  619.927422] RBP: ffffc90001187f88 R08: 0000000000000000 R09: 00007ffdb6f61100
[  619.927428] R10: 000000000224f4e0 R11: 0000000000000246 R12: 0000000000000000
[  619.927435] R13: 00007ffdb6f612b0 R14: 0000000000000000 R15: 0000000000000000
[  619.927451]  ? __this_cpu_preempt_check+0x13/0x20

or

[  641.646590] WARNING: CPU: 1 PID: 8913 at drivers/gpu/drm/i915/intel_drv.h:1750 intel_runtime_pm_get_noresume+0x8b/0x90 [i915]
[  641.646595] RPM wakelock ref not held during HW access
[  641.646600] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ghash_clmulni_intel snd_pcm mei_me mei i915(-) r8169 mii prime_numbers i2c_hid [last unloaded: snd_hda_intel]
[  641.646825] CPU: 1 PID: 8913 Comm: drv_module_relo Tainted: G     U          4.10.0-CI-Trybot_609+ #1
[  641.646836] Hardware name: TOSHIBA SATELLITE P50-C/06F4                            , BIOS 1.20 10/08/2015
[  641.646843] Call Trace:
[  641.646857]  dump_stack+0x67/0x92
[  641.646869]  __warn+0xc6/0xe0
[  641.646880]  warn_slowpath_fmt+0x4a/0x50
[  641.646893]  ? __this_cpu_preempt_check+0x13/0x20
[  641.646904]  ? trace_hardirqs_on_caller+0xe7/0x200
[  641.646957]  intel_runtime_pm_get_noresume+0x8b/0x90 [i915]
[  641.647022]  __i915_add_request+0x423/0x540 [i915]
[  641.647080]  i915_gem_switch_to_kernel_context+0x148/0x2b0 [i915]
[  641.647145]  i915_gem_suspend+0x2b/0x180 [i915]
[  641.647189]  i915_driver_unload+0x22/0x200 [i915]
[  641.647200]  ? __this_cpu_preempt_check+0x13/0x20
[  641.647210]  ? trace_hardirqs_on_caller+0xe7/0x200
[  641.647220]  ? trace_hardirqs_on+0xd/0x10
[  641.647231]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
[  641.647276]  i915_pci_remove+0x14/0x20 [i915]
[  641.647293]  pci_device_remove+0x34/0xb0
[  641.647307]  device_release_driver_internal+0x158/0x210
[  641.647321]  driver_detach+0x3b/0x80
[  641.647330]  bus_remove_driver+0x53/0xd0
[  641.647338]  driver_unregister+0x27/0x50
[  641.647348]  pci_unregister_driver+0x25/0xa0
[  641.647415]  i915_exit+0x1a/0xb1a [i915]
[  641.647429]  SyS_delete_module+0x193/0x1e0
[  641.647444]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  641.647453] RIP: 0033:0x7fc622bd2d37
[  641.647463] RSP: 002b:00007ffff8ffb5c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[  641.647475] RAX: ffffffffffffffda RBX: ffffffff81481ff3 RCX: 00007fc622bd2d37
[  641.647480] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 0000000000d49118
[  641.647485] RBP: ffffc90000997f88 R08: 0000000000000000 R09: 00007ffff8ffb5f0
[  641.647491] R10: 0000000000d490a0 R11: 0000000000000246 R12: 0000000000000000
[  641.647498] R13: 00007ffff8ffb7a0 R14: 0000000000000000 R15: 0000000000000000
[  641.647510]  ? __this_cpu_preempt_check+0x13/0x20

v2: Keep holding rpm until the end to cover i915_gem_sanitize() as well.

Fixes: 5ab57c7020 ("drm/i915: Flush logical context image out to memory upon suspend")
Fixes: 1c777c5d1d ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302083029.19576-1-chris@chris-wilson.co.uk
Reviewed-by: Imre Deak <imre.deak@intel.com>
Cc: <stable@vger.kernel.org> # v4.9+
2017-03-02 12:44:08 +00:00
Chris Wilson
67b807a892 drm/i915: Delay disabling the user interrupt for breadcrumbs
A significant cost in setting up a wait is the overhead of enabling the
interrupt. As we disable the interrupt whenever the queue of waiters is
empty, if we are frequently waiting on alternating batches, we end up
re-enabling the interrupt on a frequent basis. We do want to disable the
interrupt during normal operations as under high load it may add several
thousand interrupts/s - we have been known in the past to occupy whole
cores with our interrupt handler after accidentally leaving user
interrupts enabled. As a compromise, leave the interrupt enabled until
the next IRQ, or the system is idle. This gives a small window for a
waiter to keep the interrupt active and not be delayed by having to
re-enable the interrupt.

v2: Restore hangcheck/missed-irq detection for continuations
v3: Be more careful restoring the hangcheck timer after reset
v4: Be more careful restoring the fake irq after reset (if required!)
v5: Redo changes to intel_engine_wakeup()
v6: Factor out __intel_engine_wakeup()
v7: Improve commentary for declaring a missed wakeup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-4-chris@chris-wilson.co.uk
2017-02-27 21:57:23 +00:00
Dave Jiang
11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Kenneth Graunke
ef0f411f51 drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters.
This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
(indicating the optional feature is not supported), and makes execbuf
always return -EINVAL if the flags are used.

Apparently, no userspace ever shipped which used this optional feature:
I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
and there were zero commits showing a use of these flags.  Kernel commit
72bfa19c8d apparently introduced the feature prematurely.  According
to Chris, the intention was to use this in cairo-drm, but "the use was
broken for gen6", so I don't think it ever happened.

'relative_constants_mode' has always been tracked per-device, but this
has actually been wrong ever since hardware contexts were introduced, as
the INSTPM register is saved (and automatically restored) as part of the
render ring context. The software per-device value could therefore get
out of sync with the hardware per-context value.  This meant that using
them is actually unsafe: a client which tried to use them could damage
the state of other clients, causing the GPU to interpret their BO
offsets as absolute pointers, leading to bogus memory reads.

These flags were also never ported to execlist mode, making them no-ops
on Gen9+ (which requires execlists), and Gen8 in the default mode.

On Gen8+, userspace can write these registers directly, achieving the
same effect.  On Gen6-7.5, it likely makes sense to extend the command
parser to support them.  I don't think anyone wants this on Gen4-5.

Based on a patch by Dave Gordon.

v3: Return -ENODEV for the getparam, as this is what we do for other
    obsolete features.  Suggested by Chris Wilson.

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-24 17:51:05 +00:00
Chris Wilson
754c9fd576 drm/i915: Protect the request->global_seqno with the engine->timeline lock
A request is assigned a global seqno only when it is on the hardware
execution queue. The global seqno can be used to maintain a list of
requests on the same engine in retirement order, for example for
constructing a priority queue for waiting. Prior to its execution, or
if it is subsequently removed in the event of preemption, its global
seqno is zero. As both insertion and removal from the execution queue
may operate in IRQ context, it is not guarded by the usual struct_mutex
BKL. Instead those relying on the global seqno must be prepared for its
value to change between reads. Only when the request is complete can
the global seqno be stable (due to the memory barriers on submitting
the commands to the hardware to write the breadcrumb, if the HWS shows
that it has passed the global seqno and the global seqno is unchanged
after the read, it is indeed complete).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-9-chris@chris-wilson.co.uk
2017-02-23 14:49:32 +00:00
Dave Airlie
94000cc329 Linux 4.10-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYoM2fAAoJEHm+PkMAQRiGr9MH/izEAMri7rJ0QMc3ejt+WmD0
 8pkZw3+MVn71z6cIEgpzk4QkEWJd5rfhkETCeCp7qQ9V6cDW1FDE9+0OmPjiphDt
 nnzKs7t7skEBwH5Mq5xygmIfkv+Z0QGHZ20gfQWY3F56Uxo+ARF88OBHBLKhqx3v
 98C7YbMFLKBslKClA78NUEIdx0UfBaRqerlERx0Lfl9aoOrbBS6WI3iuREiylpih
 9o7HTrwaGKkU4Kd6NdgJP2EyWPsd1LGalxBBjeDSpm5uokX6ALTdNXDZqcQscHjE
 RmTqJTGRdhSThXOpNnvUJvk9L442yuNRrVme/IqLpxMdHPyjaXR3FGSIDb2SfjY=
 =VMy8
 -----END PGP SIGNATURE-----

Merge tag 'v4.10-rc8' into drm-next

Linux 4.10-rc8

Backmerge Linus rc8 to fix some conflicts, but also
to avoid pulling it in via a fixes pull from someone.
2017-02-23 12:10:12 +10:00
Chris Wilson
d59b21ec6f drm/i915: Remove 'retire' parameter from intel_fb_obj_flush
Setting retire=true is identical to using origin=ORIGIN_CS, so make the
same simplification to intel_fb_obj_flush() as already employed for
intel_fb_obj_invalidate().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-6-chris@chris-wilson.co.uk
2017-02-22 12:12:17 +00:00
Chris Wilson
57822dc6b9 drm/i915: Perform object clflushing asynchronously
Flushing the cachelines for an object is slow, can be as much as 100ms
for a large framebuffer. We currently do this under the struct_mutex BKL
on execution or on pageflip. But now with the ability to add fences to
obj->resv for both flips and execbuf (and we naturally wait on the fence
before CPU access), we can move the clflush operation to a workqueue and
signal a fence for completion, thereby doing the work asynchronously and
not blocking the driver or its clients.

v2: Introduce i915_gem_clflush.h and use a new name, split out some
extras into separate patches.

Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-5-chris@chris-wilson.co.uk
2017-02-22 12:12:15 +00:00
Chris Wilson
f6aaba4dfb drm/i915: Skip clflushes for all non-page backed objects
Generalise the skip for physical and stolen objects by skipping anything
we do not have a valid address for inside the sg.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-4-chris@chris-wilson.co.uk
2017-02-22 12:12:14 +00:00
Chris Wilson
5a97bcc69c drm/i915: Amalgamate flushing of display objects
We have three different paths by which userspace wants to flush the
display plane (i.e. objects with obj->pin_display). Use a common helper
to identify those paths and to simplify a later change.

v2: Include the conditional in the name, i915_gem_object_flush_if_display

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-3-chris@chris-wilson.co.uk
2017-02-22 12:12:13 +00:00
Chris Wilson
e59dc17211 drm/i915: Move cpu_cache_is_coherent() to header
For use in the next patch, take the current is-coherent helper and add
it to i915_gem_object.h

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-2-chris@chris-wilson.co.uk
2017-02-22 12:12:12 +00:00
Chris Wilson
208b84a375 drm/i915: Remove change_domain tracepoint
The change_domain tracepoint has been inaccurate for a few years - it
doesn't fully capture the domains, especially with userspace bypassing
them. It is defunct, misleading and time to be removed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-1-chris@chris-wilson.co.uk
2017-02-22 12:12:11 +00:00
Chris Wilson
e54ca97747 drm/i915: Remove completed fences after a wait
If we wait upon the full (i.e. all shared fences, or upon an exclusive
fence) reservation object successfully, we know that all fences beneath
it have been signaled, so long as no new fences were added whilst we
slept. If the reservation_object remains the same, as detected by its
seqcount, we can then reap all the fences upon completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170217151304.16665-6-chris@chris-wilson.co.uk
2017-02-17 15:31:15 +00:00
Chris Wilson
581ab1fe52 drm/i915: Unwind conversion to i915_gem_phys_ops on failure
The physical object is treated as permanently pinned. If we fail to take
this initial pin during i915_gem_object_attach_phys() we need to revert
it back to an ordinary shmemfs object before reporting the failure.

v2: git-add

Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215163900.11606-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-16 20:31:13 +00:00
Chris Wilson
c1d2061b28 drm/i915: Squelch any ktime/jiffie rounding errors for wait-ioctl
We wait upon jiffies, but report the time elapsed using a
high-resolution timer. This discrepancy can lead to us timing out the
wait prior to us reporting the elapsed time as complete.

This restores the squelching lost in commit e95433c73a ("drm/i915:
Rearrange i915_wait_request() accounting with callers").

Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170216125441.30923-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-16 20:31:13 +00:00
Chris Wilson
636deb5b22 drm/i915: Pass timeout==0 on to i915_gem_object_wait_fence()
The i915_gem_object_wait_fence() uses an incoming timeout=0 to query
whether the current fence is busy or idle, without waiting. This can be
used by the wait-ioctl to implement a busy query.

Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Testcase: igt/gem_wait/basic-busy-write-all
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170212215344.16600-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit d892e9398e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-02-16 11:59:13 +02:00
Chris Wilson
ec62ed3e1d drm/i915: Restore context and pd for ringbuffer submission after reset
Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).

v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
(cherry picked from commit c0dcb203fb)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-02-16 11:59:11 +02:00
Chris Wilson
d892e9398e drm/i915: Pass timeout==0 on to i915_gem_object_wait_fence()
The i915_gem_object_wait_fence() uses an incoming timeout=0 to query
whether the current fence is busy or idle, without waiting. This can be
used by the wait-ioctl to implement a busy query.

Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Testcase: igt/gem_wait/basic-busy-write-all
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170212215344.16600-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-14 09:38:42 +00:00
Chris Wilson
170594502c drm/i915: Test coherency of and barriers between cache domains
Write into an object using WB, WC, GTT, and GPU paths and make sure that
our internal API is sufficient to ensure coherent reads and writes.

v2: Avoid invalid free upon allocation error

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-21-chris@chris-wilson.co.uk
2017-02-13 20:45:45 +00:00
Chris Wilson
8335fd65ce drm/i915: Add selftests for object allocation, phys
The phys object is a rarely used device (only very old machines require
a chunk of physically contiguous pages for a few hardware interactions).
As such, it is not exercised by CI and to combat that we want to add a
test that exercises the phys object on all platforms.

v2: Always set err on error paths and not rely on inheriting the err.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-17-chris@chris-wilson.co.uk
2017-02-13 20:45:42 +00:00
Chris Wilson
44653988ef drm/i915: Create a fake object for testing huge allocations
We would like to be able to exercise huge allocations even on memory
constrained devices. To do this we create an object that allocates only
a few pages and remaps them across its whole range - each page is reused
multiple times. We can therefore pretend we are rendering into a much
larger object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-9-chris@chris-wilson.co.uk
2017-02-13 20:45:34 +00:00
Chris Wilson
66d9cb5d80 drm/i915: Mock the GEM device for self-testing
A simulacrum of drm_i915_private to let us pretend interactions with the
device.

v2: Tidy init error paths

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-6-chris@chris-wilson.co.uk
2017-02-13 20:45:28 +00:00
Chris Wilson
935a2f776a drm/i915: Add some selftests for sg_table manipulation
Start exercising the scattergather lists, especially looking at
iteration after coalescing.

v2: Comment on the peculiarity of table construction (i.e. why this
sg_table might be interesting).
v3: Added one __func__ to identify expect_pfn_sg()
v4: Loop until we have crossed the chain boundary (forcing sg_table to
do multiple allocations) before squelching a potential ENOMEM from oom.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-2-chris@chris-wilson.co.uk
2017-02-13 20:45:23 +00:00
Chris Wilson
2ae557388d drm/i915: Clear the last_retired_context following a hang/reset
Following a hang and reset, we know that the engine is idle and all
context state has been saved or lost. Consequently, we know that the
engine is no longer referencing the last context and we can relinquish
our tracking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-5-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-13 11:19:12 +00:00
Chris Wilson
fe3288b5da drm/i915: Park the breadcrumbs signaler across a GPU reset
The signal threads may be running concurrently with the GPU reset. The
completion from the GPU run asynchronous with the reset and two threads
may see different snapshots of the state, and the signaler may mark a
request as complete as we try to reset it. We don't tolerate 2 different
views of the same state and complain if we try to mark a request as
failed if it is already complete. Disable the signal threads during
reset to prevent this conflict (even though the conflict implies that
the state we resetting to is invalid, we have already made our
decision!).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99733
References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-4-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-13 11:18:28 +00:00
Chris Wilson
1d309634bc drm/i915: Kill the tasklet then disable
Disabling the tasklet leaves it if scheduled on the ready to run list
until it is re-enabled. This will leave the ksoftird thread spinning
until satisfied. To prevent this situation on starting the GPU reset, we
want to kill the tasklet first and then disable. The same problem will
arise when a tasklet is scheduled from another device, so a better
solution is required for the general case.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1f7b847d72 ("drm/i915: Disable engine->irq_tasklet around resets")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-13 11:18:23 +00:00
Chris Wilson
c00122f33f drm/i915: Assert that the active request hasn't been signaled
As the request is not complete, it should not be signaled. Assert that
this is true before we process the request for a reset.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-13 11:06:35 +00:00
Chris Wilson
8c12d12159 drm/i915: Move the irq_barrier for reset earlier into reset_prepare
When updating the bookkeeping following the reset, we need the seqno to
be coherent on the CPU prior to trusting its result for deciding whether
any request is completed. We need the irq_barrier before we start making
these decisions, i.e. in reset_prepare.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99733
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210185214.23463-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
2017-02-10 21:10:51 +00:00
Chris Wilson
c4d4c1c66b drm/i915: Flush the freed object queue on device release
As dmabufs may live beyond the PCI device removal, we need to flush the
freed object worker on device release, and include a warning in case
there is a leak.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210163523.17533-3-chris@chris-wilson.co.uk
2017-02-10 19:10:05 +00:00
Daniel Vetter
51a831a772 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Chris Wilson needs the new drm_driver->release callback to make sure
the shiny new dma-buf testcases don't oops the driver on unload.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-02-10 16:27:24 +01:00
Chris Wilson
1f7b847d72 drm/i915: Disable engine->irq_tasklet around resets
When we restart the engines, and we have active requests, a request on
the first engine may complete and queue a request to the second engine
before we try to restart the second engine. That queueing of the
request may race with the engine to restart, and so may corrupt the
current state. Disabling the engine->irq_tasklet prevents the two paths
from writing into ELSP simultaneously (and modifyin the execlists_port[]
at the same time).

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_exec_fence/await-hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-3-chris@chris-wilson.co.uk
2017-02-08 16:24:42 +00:00
Chris Wilson
d802709313 drm/i915: Split GEM resetting into 3 phases
Currently we do a reset prepare/finish around the call to reset the GPU,
but it looks like we need a later stage after the hw has been
reinitialised to allow GEM to restart itself. Start by splitting the 2
GEM phases into 3:

  prepare - before the reset, check if GEM recovered, then stop GEM

  reset - after the reset, update GEM bookkeeping

  finish - after the re-initialisation following the reset, restart GEM

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-2-chris@chris-wilson.co.uk
2017-02-08 16:24:42 +00:00
Chris Wilson
20a8a74ad9 drm/i915: Move calling engine->init_hw() to its own function
Just a simple refactor to move a loop and some support code out of
i915_gem_init_hw(). This is in preparation for avoiding a race between
the tasklet writing to ELSP whilst simultaneously being written by
engine->init_hw() following a GPU reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-1-chris@chris-wilson.co.uk
2017-02-08 16:24:42 +00:00
Chris Wilson
519d524981 drm/i915: i915_gem_shrink_all() needs an awake device
Since to unbind an object, we may need a powered up device to access the
GTT entries, we only shrink bound objects if awake. Callers to
i915_gem_shrink_all() had to take this into account and take the rpm
wakeref, but we can move this wakeref into the shrink_all itself for
convenience and making the function live up to its name.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208104710.18089-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-08 16:24:42 +00:00
Chris Wilson
83bf6d55c1 drm/i915: Remove overzealous fence warn on runtime suspend
The goal of the WARN was to catch when we are still actively using the
fence as we go into the runtime suspend. However, the reg->pin_count is
too coarse as it does not distinguish between exclusive ownership of the
fence register from activity.

I've not improved on the WARN, nor have we captured this WARN in an
exact igt, but it is showing up regularly in the wild:

[ 1915.935332] WARNING: CPU: 1 PID: 10861 at drivers/gpu/drm/i915/i915_gem.c:2022 i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.935383] WARN_ON(reg->pin_count)[ 1915.935399] Modules linked in:
 snd_hda_intel i915 drm_kms_helper vgem netconsole scsi_transport_iscsi fuse vfat fat x86_pkg_temp_thermal coretemp intel_cstate intel_uncore snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mei_me mei serio_raw intel_rapl_perf intel_pch_thermal soundcore wmi acpi_pad i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 mii video [last unloaded: drm_kms_helper]
[ 1915.935785] CPU: 1 PID: 10861 Comm: kworker/1:0 Tainted: G     U  W       4.9.0-rc5+ #170
[ 1915.935799] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
[ 1915.935822] Workqueue: pm pm_runtime_work
[ 1915.935845]  ffffc900044fbbf0 ffffffffac3220bc ffffc900044fbc40 0000000000000000
[ 1915.935890]  ffffc900044fbc30 ffffffffac059bcb 000007e6044fbc60 ffff8801626e3198
[ 1915.935937]  ffff8801626e0000 0000000000000002 ffffffffc05e5d4e 0000000000000000
[ 1915.935985] Call Trace:
[ 1915.936013]  [<ffffffffac3220bc>] dump_stack+0x4f/0x73
[ 1915.936038]  [<ffffffffac059bcb>] __warn+0xcb/0xf0
[ 1915.936060]  [<ffffffffac059c4f>] warn_slowpath_fmt+0x5f/0x80
[ 1915.936158]  [<ffffffffc052d916>] i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.936251]  [<ffffffffc04f1c74>] intel_runtime_suspend+0x64/0x280 [i915]
[ 1915.936277]  [<ffffffffac0926f1>] ? dequeue_entity+0x241/0xbc0
[ 1915.936298]  [<ffffffffac36bb85>] pci_pm_runtime_suspend+0x55/0x180
[ 1915.936317]  [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936339]  [<ffffffffac4514e2>] __rpm_callback+0x32/0x70
[ 1915.936356]  [<ffffffffac451544>] rpm_callback+0x24/0x80
[ 1915.936375]  [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936392]  [<ffffffffac45222d>] rpm_suspend+0x12d/0x680
[ 1915.936415]  [<ffffffffac69f6d7>] ? _raw_spin_unlock_irq+0x17/0x30
[ 1915.936435]  [<ffffffffac0810b8>] ? finish_task_switch+0x88/0x220
[ 1915.936455]  [<ffffffffac4534bf>] pm_runtime_work+0x6f/0xb0
[ 1915.936477]  [<ffffffffac074353>] process_one_work+0x1f3/0x4d0
[ 1915.936501]  [<ffffffffac074678>] worker_thread+0x48/0x4e0
[ 1915.936523]  [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936542]  [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936559]  [<ffffffffac07a2c9>] kthread+0xd9/0xf0
[ 1915.936580]  [<ffffffffac07a1f0>] ? kthread_park+0x60/0x60
[ 1915.936600]  [<ffffffffac69fe62>] ret_from_fork+0x22/0x30

In the case the register is pinned, it should be present and we will
need to invalidate them to be restored upon resume as we cannot expect
the owner of the pin to call get_fence prior to use after resume.

Fixes: 7c108fd8fe ("drm/i915: Move fence cancellation to runtime suspend")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98804
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170203125717.8431-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit e0ec3ec698)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-02-08 13:27:27 +02:00
Chris Wilson
e3818697e1 drm/i915: Flush untouched framebuffers before display on !llc
On a non-llc system, the objects are created with .cache_level =
CACHE_NONE and so the transition to uncached for scanout is a no-op.
However, if the object was never written to, it will still be in the CPU
domain (having been zeroed out by shmemfs). Those cachelines need to be
flushed prior to display.

Reported-and-tested-by: Vito Caputo
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170109111932.6342-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 69aeafeae9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-02-08 13:10:24 +02:00
Chris Wilson
c0dcb203fb drm/i915: Restore context and pd for ringbuffer submission after reset
Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).

v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-07 21:34:58 +00:00
Chris Wilson
e0ec3ec698 drm/i915: Remove overzealous fence warn on runtime suspend
The goal of the WARN was to catch when we are still actively using the
fence as we go into the runtime suspend. However, the reg->pin_count is
too coarse as it does not distinguish between exclusive ownership of the
fence register from activity.

I've not improved on the WARN, nor have we captured this WARN in an
exact igt, but it is showing up regularly in the wild:

[ 1915.935332] WARNING: CPU: 1 PID: 10861 at drivers/gpu/drm/i915/i915_gem.c:2022 i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.935383] WARN_ON(reg->pin_count)[ 1915.935399] Modules linked in:
 snd_hda_intel i915 drm_kms_helper vgem netconsole scsi_transport_iscsi fuse vfat fat x86_pkg_temp_thermal coretemp intel_cstate intel_uncore snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mei_me mei serio_raw intel_rapl_perf intel_pch_thermal soundcore wmi acpi_pad i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 mii video [last unloaded: drm_kms_helper]
[ 1915.935785] CPU: 1 PID: 10861 Comm: kworker/1:0 Tainted: G     U  W       4.9.0-rc5+ #170
[ 1915.935799] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
[ 1915.935822] Workqueue: pm pm_runtime_work
[ 1915.935845]  ffffc900044fbbf0 ffffffffac3220bc ffffc900044fbc40 0000000000000000
[ 1915.935890]  ffffc900044fbc30 ffffffffac059bcb 000007e6044fbc60 ffff8801626e3198
[ 1915.935937]  ffff8801626e0000 0000000000000002 ffffffffc05e5d4e 0000000000000000
[ 1915.935985] Call Trace:
[ 1915.936013]  [<ffffffffac3220bc>] dump_stack+0x4f/0x73
[ 1915.936038]  [<ffffffffac059bcb>] __warn+0xcb/0xf0
[ 1915.936060]  [<ffffffffac059c4f>] warn_slowpath_fmt+0x5f/0x80
[ 1915.936158]  [<ffffffffc052d916>] i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.936251]  [<ffffffffc04f1c74>] intel_runtime_suspend+0x64/0x280 [i915]
[ 1915.936277]  [<ffffffffac0926f1>] ? dequeue_entity+0x241/0xbc0
[ 1915.936298]  [<ffffffffac36bb85>] pci_pm_runtime_suspend+0x55/0x180
[ 1915.936317]  [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936339]  [<ffffffffac4514e2>] __rpm_callback+0x32/0x70
[ 1915.936356]  [<ffffffffac451544>] rpm_callback+0x24/0x80
[ 1915.936375]  [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936392]  [<ffffffffac45222d>] rpm_suspend+0x12d/0x680
[ 1915.936415]  [<ffffffffac69f6d7>] ? _raw_spin_unlock_irq+0x17/0x30
[ 1915.936435]  [<ffffffffac0810b8>] ? finish_task_switch+0x88/0x220
[ 1915.936455]  [<ffffffffac4534bf>] pm_runtime_work+0x6f/0xb0
[ 1915.936477]  [<ffffffffac074353>] process_one_work+0x1f3/0x4d0
[ 1915.936501]  [<ffffffffac074678>] worker_thread+0x48/0x4e0
[ 1915.936523]  [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936542]  [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936559]  [<ffffffffac07a2c9>] kthread+0xd9/0xf0
[ 1915.936580]  [<ffffffffac07a1f0>] ? kthread_park+0x60/0x60
[ 1915.936600]  [<ffffffffac69fe62>] ret_from_fork+0x22/0x30

In the case the register is pinned, it should be present and we will
need to invalidate them to be restored upon resume as we cannot expect
the owner of the pin to call get_fence prior to use after resume.

Fixes: 7c108fd8fe ("drm/i915: Move fence cancellation to runtime suspend")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98804
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170203125717.8431-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-07 14:37:37 +00:00
Chris Wilson
4e64e5539d drm: Improve drm_mm search (and fix topdown allocation) with rbtrees
The drm_mm range manager claimed to support top-down insertion, but it
was neither searching for the top-most hole that could fit the
allocation request nor fitting the request to the hole correctly.

In order to search the range efficiently, we create a secondary index
for the holes using either their size or their address. This index
allows us to find the smallest hole or the hole at the bottom or top of
the range efficiently, whilst keeping the hole stack to rapidly service
evictions.

v2: Search for holes both high and low. Rename flags to mode.
v3: Discover rb_entry_safe() and use it!
v4: Kerneldoc for enum drm_mm_insert_mode.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk
2017-02-03 11:10:32 +01:00
Chris Wilson
69aeafeae9 drm/i915: Flush untouched framebuffers before display on !llc
On a non-llc system, the objects are created with .cache_level =
CACHE_NONE and so the transition to uncached for scanout is a no-op.
However, if the object was never written to, it will still be in the CPU
domain (having been zeroed out by shmemfs). Those cachelines need to be
flushed prior to display.

Reported-and-tested-by: Vito Caputo
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170109111932.6342-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-02-01 10:54:17 +00:00
Chris Wilson
241455172b drm/i915: Reset the gpu on takeover
The GPU may be in an unknown state following resume and module load. The
previous occupant may have left contexts loaded, or other dangerous
state, which can cause an immediate GPU hang for us. The only save
course of action is to reset the GPU prior to using it - similarly to
how we reset the GPU prior to unload (before a second user may be
affected by our leftover state).

We need to reset the GPU very early in our load/resume sequence so that
any stale HW pointers are revoked prior to any resource allocations we
make (that may conflict).

A reset should only be a couple of milliseconds on a slow device, a cost
we should easily be able to absorb into our initialisation times.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110135.6418-2-chris@chris-wilson.co.uk
2017-01-24 12:29:48 +00:00
Chris Wilson
e0216b762a drm/i915: Treat an error from i915_vma_instance() as unlikely
When pinning into the global GTT, an error from creating the VMA is
unlikely, so mark it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170119192659.31789-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-21 10:32:21 +00:00
Chris Wilson
befedbb7e2 drm/i915: Use common LRU inactive vma bumping for unpin_from_display
Now that i915_gem_object_bump_inactive_ggtt() exists, also make use of
it for the LRU bumping from i915_gem_object_unpin_from_display()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170119192659.31789-2-chris@chris-wilson.co.uk
Reviewed-by: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-21 10:31:44 +00:00
Chris Wilson
d65415df07 drm/i915: Do an unlocked wait before set-cache-level ioctl
Since a change in cache level is likely to trigger an unbind, avoid
waiting under the mutex by preemptively doing an unlocked wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170119082211.21257-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-01-21 09:19:17 +00:00
Chris Wilson
718659a630 drm/i915: Rename some warts in the VMA API
Whilst writing testcases to exercise the VMA API, some oddities came to
light, such as i915_gem_obj_lookup_or_create(). Joonas suggested
i915_vma_instance() as a neat replacement, so rename them, move them to
i915_vma.c and add some kerneldoc as a sugary bonus.

s/i915_gem_obj_to_vma/i915_vma_lookup/
s/i915_gem_obj_lookup_or_create_vma/i915_vma_instance/

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-2-chris@chris-wilson.co.uk
2017-01-19 10:15:26 +00:00
Mika Kuoppala
71895a0858 drm/i915: Add comment how we treat hung contexts
Explain in a comment how and why we treat hung context like we do.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-7-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:47:32 +00:00
Chris Wilson
0e178aef8f drm/i915: Detect a failed GPU reset+recovery
If we can't recover the GPU after the reset, mark it as wedged to cancel
the outstanding tasks and to prevent new users from trying to use the
broken GPU.

v2: Check the same ring is hung again before declaring the reset broken.
v3: use engine_stalled (Mika)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-6-git-send-email-mika.kuoppala@intel.com
2017-01-18 10:47:26 +00:00
Mika Kuoppala
61da536204 drm/i915: Tidy up engine reset logic
Split engine reset for engine and request specific parts.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-5-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:44:51 +00:00
Mika Kuoppala
bf2f04366c drm/i915: Introduce engine_stalled helper
Move the engine stalled/pardoned check into a helper function.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-4-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:44:50 +00:00
Mika Kuoppala
211b12afe6 drm/i915: Cleanup request skip decision
Since we now only skip banned contexts, preventing the skip of default
contexts is no longer sensible. For a similar argument as before
'commit 7ec73b7e36 ("drm/i915: Only skip requests once a context is banned")'
we end up with an inconsistent API if we only mark future execbufs from
the default ctx as banned but fail to mark those currently executing
as failed.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-3-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:44:50 +00:00
Mika Kuoppala
36193acd54 drm/i915: Introduce engine_skip_context
Add a new function for skipping all pending requests
for a context in order to make engine reset flow more
readable.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-2-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-01-18 10:44:47 +00:00
Chris Wilson
4c96554365 drm/i915: Move engine reset preparation to i915_gem_reset_prepare()
Now that we have prepare/finish routines for the GEM reset, move the
disabling of the engine->irq_tasklet into them to reduce repetition. The
device irq enable/disable is split out to ensure it is run first and
last always (even if the GPU reset fails).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-1-git-send-email-mika.kuoppala@intel.com
2017-01-18 10:44:20 +00:00
Chris Wilson
f131e3562e drm/i915: Skip switch to kernel context if already done
Some engines are never user or already sitting idle in the kernel
context and for those we can skip flushing the current context for
i915_gem_switch_to_kernel_context(). We used to perform this
optimisation but that was removed for convenience of converting over to
multiple timelines and handling the pending request queues.

From the perspective of writing selftests, reducing the number of
background operations on the engines makes defining assertions easier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114162334.10271-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-16 12:21:20 +00:00
Chris Wilson
47a8e3f6ae drm/i915: Eliminate superfluous i915_ggtt_view_normal
Since commit 058d88c433 ("drm/i915: Track pinned VMA"), there is only
one user of i915_ggtt_view_normal rodate. Just treat NULL as no special
view in pin_to_display() like everywhere else.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-7-chris@chris-wilson.co.uk
2017-01-14 16:18:36 +00:00
Chris Wilson
8bab1193c1 drm/i915: Convert i915_ggtt_view to use an anonymous union
Reading the ggtt_views is much more pleasant without the extra
characters from specifying the union (i.e. ggtt_view.partial rather than
ggtt_view.params.partial). To make this work inside i915_vma_compare()
with only a single memcmp requires us to ensure that there are no
uninitialised bytes within each branch of the union (we make sure the
structs are packed) and we need to store the size of each branch.

v4: Rewrite changelog and add comments explaining the assert.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-14 16:18:03 +00:00
Chris Wilson
3bf4d57519 drm/i915: Stop clearing i915_ggtt_view
As we now use a compact memcmp in i915_vma_compare(), we can forgo
clearing the entire view and only set the precise parameters used in
this view.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-4-chris@chris-wilson.co.uk
2017-01-14 16:17:45 +00:00
Chris Wilson
e4621b73b6 drm/i915: Fix phys pwrite for struct_mutex-less operation
Since commit fe115628d5 ("drm/i915: Implement pwrite without
struct-mutex") the lowlevel pwrite calls are now called without the
protection of struct_mutex, but pwrite_phys was still asserting that it
held the struct_mutex and later tried to drop and relock it.

Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
(cherry picked from commit 10466d2a59)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-12 10:15:44 +02:00
Chris Wilson
f51455d442 drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.

v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10 20:54:32 +00:00
Chris Wilson
2a20d6f858 drm/i915: Rename i915_gem_engine_cleanup() to engine_set_wedged()
It has been some time since i915_gem_engine_cleanup was only called from
the module unload path, and now it is only called when the GPU is
wedged. Mika complained that the name is confusing, especially in light
of the existence of i915_gem_cleanup_engines().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-5-chris@chris-wilson.co.uk
2017-01-10 20:49:32 +00:00
Chris Wilson
3cd9442f66 drm/i915: Mark all incomplete requests as -EIO when wedged
Similarly to a normal reset, after we mark the GPU as wedged (completely
fubar and no more requests can be executed), set the error status on all
the in flight requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-4-chris@chris-wilson.co.uk
2017-01-10 20:49:31 +00:00
Chris Wilson
3c1b284759 drm/i915: Set an error status for a resubmitted request
Let userspace know if its request was resubmitted due to it being
executed at the time of a global reset. In this case, the reset was for
a guilty request on another engine, and this request was an innocent
victim that will be re-executed upon restarting. However, since it was
running at the time of the reset, we can not guarantee that it suffered
no ill-effects from the reset (e.g. some context state may be lost, or
some self-modifying fragment shaders will be restarted from the final
state not their initial state), to let userspace know that it has been
corrupted set a special value on the fence->error, -EAGAIN.

If the request does hang on resubmission, the error will be overwritten
with -EIO.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-3-chris@chris-wilson.co.uk
2017-01-10 20:49:30 +00:00
Chris Wilson
c0d5f32c50 drm/i915: Set guilty-flag on fence after detecting a hang
The struct dma_fence carries a status field exposed to userspace by
sync_file. This is inspected after the fence is signaled and can convey
whether or not the request completed successfully, or in our case if we
detected a hang during the request (signaled via -EIO in
SYNC_IOC_FILE_INFO).

v2: Mark all cancelled requests as failed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-2-chris@chris-wilson.co.uk
2017-01-10 20:49:30 +00:00
Chris Wilson
2edc6e0df1 drm/i915: Consolidate reset_request()
Always reset the requests of the guilty context, including the hung
request that we tell the hardware to skip. This should help if the
reprogram fails entirely, but more importantly makes the guilty path
more uniform (and simplifies the subsequent patch to tweak the cancelled
requests).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-1-chris@chris-wilson.co.uk
2017-01-10 20:49:29 +00:00
Daniel Vetter
05adb1850a Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Pull in latest drm-next from Dave Airlie to get at all the drm-misc
goodies, specifically:
- dma_fence error state handling rework (Chris needs that for error
  recovery)
- crc support locking changes (Tomeu's i915 crc patches need that).

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-01-10 17:03:46 +01:00
Chris Wilson
8201c1fad4 drm/i915: Clip the partial view against the object not vma
The VMA is later clipped against the vm_area_struct before insertion of
the faulting PTE so we are free to create the partial view as we desire.
If we use the object as the extents rather than the area, this partial
can then be used for other areas.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110095633.6612-2-chris@chris-wilson.co.uk
2017-01-10 11:59:24 +00:00
Chris Wilson
2d4281bb93 drm/i915: Extract compute_partial_view()
In order to reuse the partial view for selftesting, extract the common
function for computing the view.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110095633.6612-1-chris@chris-wilson.co.uk
2017-01-10 11:59:24 +00:00
Chris Wilson
91d4e0aa92 drm/i915: Move ggtt fence/alignment to i915_gem_tiling.c
Rename i915_gem_get_ggtt_size() and i915_gem_get_ggtt_alignment() to
i915_gem_fence_size() and i915_gem_fence_alignment() respectively to
better match usage. Similarly move the pair of functions into
i915_gem_tiling.c next to the fence restrictions.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-6-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10 08:12:22 +00:00
Chris Wilson
944397f04f drm/i915: Store required fence size/alignment for GGTT vma
The fence size/alignment is a combination of the vma size plus object
tiling parameters. Those parameters are rarely changed, making the fence
size/alignemnt roughly constant for the lifetime of the VMA. We can
simplify subsequent calculations by precalculating the size/alignment
required for GGTT vma taking fencing into account (with an update if we
do change the tiling or stride).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-4-chris@chris-wilson.co.uk
2017-01-10 08:12:21 +00:00
Chris Wilson
5b30694b47 drm/i915: Align GGTT sizes to a fence tile row
Ensure the view occupies the full tile row so that reads/writes into the
VMA do not escape (via fenced detiling) into neighbouring objects - we
will pad the object with scratch pages to satisfy the fence. This
applies the lazy-tiling we employed on gen2/3 to gen4+.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-2-chris@chris-wilson.co.uk
2017-01-10 08:12:20 +00:00
Chris Wilson
6649a0b650 drm/i915: Extract tile_row_size for fencing
Computing the tile row size of a tiled object (for use with fence
registers) is repeated, so extract it to a common helper.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10 08:12:09 +00:00
Dave Airlie
5c37daf5dd Merge tag 'drm-intel-next-2017-01-09' of git://anongit.freedesktop.org/git/drm-intel into drm-next
More 4.11 stuff, holidays edition (i.e. not much):

- docs and cleanups for shared dpll code (Ander)
- some kerneldoc work (Chris)
- fbc by default on gen9+ too, yeah! (Paulo)
- fixes, polish and other small things all over gem code (Chris)
- and a few small things on top

Plus a backmerge, because Dave was enjoying time off too.

* tag 'drm-intel-next-2017-01-09' of git://anongit.freedesktop.org/git/drm-intel: (275 commits)
  drm/i915: Update DRIVER_DATE to 20170109
  drm/i915: Drain freed objects for mmap space exhaustion
  drm/i915: Purge loose pages if we run out of DMA remap space
  drm/i915: Fix phys pwrite for struct_mutex-less operation
  drm/i915: Simplify testing for am-I-the-kernel-context?
  drm/i915: Use range_overflows()
  drm/i915: Use fixed-sized types for stolen
  drm/i915: Use phys_addr_t for the address of stolen memory
  drm/i915: Consolidate checks for memcpy-from-wc support
  drm/i915: Only skip requests once a context is banned
  drm/i915: Move a few more utility macros to i915_utils.h
  drm/i915: Clear ret before unbinding in i915_gem_evict_something()
  drm/i915/guc: Exclude the upper end of the Global GTT for the GuC
  drm/i915: Move a few utility macros into a separate header
  drm/i915/execlists: Reorder execlists register enabling
  drm/i915: Assert that we do create the deferred context
  drm/i915: Assert all timeline requests are gone before fini
  drm/i915: Revoke fenced GTT mmapings across GPU reset
  drm/i915: enable FBC on gen9+ too
  drm/i915: actually drive the BDW reserved IDs
  ...
2017-01-10 08:02:09 +10:00
Linus Torvalds
2fd8774c79 Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
 "This has one fix to make i915 work when using Xen SWIOTLB, and a
  feature from Geert to aid in debugging of devices that can't do DMA
  outside the 32-bit address space.

  The feature from Geert is on top of v4.10 merge window commit
  (specifically you pulling my previous branch), as his changes were
  dependent on the Documentation/ movement patches.

  I figured it would just easier than me trying than to cherry-pick the
  Documentation patches to satisfy git.

  The patches have been soaking since 12/20, albeit I updated the last
  patch due to linux-next catching an compiler error and adding an
  Tested-and-Reported-by tag"

* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: Export swiotlb_max_segment to users
  swiotlb: Add swiotlb=noforce debug option
  swiotlb: Convert swiotlb_force from int to enum
  x86, swiotlb: Simplify pci_swiotlb_detect_override()
2017-01-06 10:53:21 -08:00
Konrad Rzeszutek Wilk
7453c549f5 swiotlb: Export swiotlb_max_segment to users
So they can figure out what is the optimal number of pages
that can be contingously stitched together without fear of
bounce buffer.

We also expose an mechanism for sub-users of SWIOTLB API, such
as Xen-SWIOTLB to set the max segment value. And lastly
if swiotlb=force is set (which mandates we bounce buffer everything)
we set max_segment so at least we can bounce buffer one 4K page
instead of a giant 512KB one for which we may not have space.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-Tested-by: Juergen Gross <jgross@suse.com>
2017-01-06 13:00:01 -05:00
Chris Wilson
b42a13d918 drm/i915: Drain freed objects for mmap space exhaustion
As we now use a deferred free queue for objects, simply retiring the
active objects is not enough to immediately free them and recover their
mmap space - we must now also drain the freed object list.

Fixes: fbbd37b36f ("drm/i915: Move object release to a freelist + worker"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-06 16:54:47 +00:00
Chris Wilson
10466d2a59 drm/i915: Fix phys pwrite for struct_mutex-less operation
Since commit fe115628d5 ("drm/i915: Implement pwrite without
struct-mutex") the lowlevel pwrite calls are now called without the
protection of struct_mutex, but pwrite_phys was still asserting that it
held the struct_mutex and later tried to drop and relock it.

Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-06 16:53:44 +00:00
Chris Wilson
984ff29f74 drm/i915: Simplify testing for am-I-the-kernel-context?
The kernel context (dev_priv->kernel_context) is unique in that it is
not associated with any user filp - it is the only one with
ctx->file_priv == NULL. This is a simpler test than comparing it against
dev_priv->kernel_context which involves some pointer dancing.

In checking that this is true, we notice that the gvt context is
allocating itself a i915_hw_ppgtt it doesn't use and not flagging that
its file_priv should be invalid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk
2017-01-06 16:02:12 +00:00
Chris Wilson
7ec73b7e36 drm/i915: Only skip requests once a context is banned
If we skip before banning, we have an inconsistent interface between
execbuf still queueing valid request but those requests already queued
being cancelled. If we only cancel the pending requests once we stop
accepting new requests, the user interface is more consistent.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org> # v4.9+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105170059.344-1-chris@chris-wilson.co.uk
2017-01-05 21:06:20 +00:00
Chris Wilson
40b326eefe drm/i915: Move a few utility macros into a separate header
In order to defeat some circular dependencies between headers to allow use
of e.g. range_overflows() in a header, move the simple independent macros
into their own header.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-4-chris@chris-wilson.co.uk
2017-01-05 15:34:45 +00:00
Chris Wilson
b1ed35d917 drm/i915: Revoke fenced GTT mmapings across GPU reset
The fence registers are clobbered by a GPU reset. If there is concurrent
user access to a fenced region via a GTT mmaping, the access will not be
fenced during the reset (until we restore the fences afterwards). In order
to prevent invalid access during the reset, before we clobber the fences
first we must invalidate the GTT mmapings. Access to the mmap will then
be forced to fault in the page, and in handling the fault, i915_gem_fault()
will take the struct_mutex and wait upon the reset to complete.

v2: Fix up commentary.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99274
Testcase: igt/gem_mmap_gtt/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170104145110.1486-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-04 18:44:30 +00:00
Daniel Vetter
a402eae64d Linux 4.10-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYaYNlAAoJEHm+PkMAQRiGtCUH/18PMUJpHqRKjxL3Yscw+QZC
 RmGlD/hwBRLUSgiTCfURNGKP4QZv2kQW7BGsGC72oL01lmxozsU72ixUIO+wXzDY
 K2b0OOKGZZWzFtaVm7Qs+5JhHAEKZcT046mLD8sjJuqkrFAhmNLKdwHjihKBEkm9
 J3s2tpdXdN0x/Uyga/GY9khEYIrvLPeBoKSz+JXcQKdC0iq3/+PMpWnN47QCNScr
 7azojkJkj/rs2cqVdOi7Wbh6PSqIvPsl8E3qJefpaVJF/IQaU1pFdy5g8kYm4V7T
 fr6HgIbuN4EQWdN/5cgKrUdpQyV7D8iYx02klk4R8WgfS0QMYoUcsg+XsTd02TI=
 =OhGe
 -----END PGP SIGNATURE-----

Merge tag 'v4.10-rc2' into drm-intel-next-queued

Backmerge Linux 4.10-rc2 to resync with our -fixes cherry-picks. I've
done the backmerge directly because Dave is on vacation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-01-04 11:35:18 +01:00
Chris Wilson
2471eb5fb6 drm/i915: Prevent timeline updates whilst performing reset
As the fence may be signaled concurrently from an interrupt on another
device, it is possible for the list of requests on the timeline to be
modified as we walk it. Take both (the context's timeline and the global
timeline) locks to prevent such modifications.

Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk
(cherry picked from commit 00c25e3f40)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-03 11:41:57 +02:00
Chris Wilson
64d1461ce0 drm/i915: Silence allocation failure during sg_trim()
As trimming the sg table is merely an optimisation that gracefully fails
if we cannot allocate a new table, we do not need to report the failure
either.

Fixes: 0c40ce130e ("drm/i915: Trim the object sg table")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk
(cherry picked from commit 8bfc478fa4)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-03 11:41:48 +02:00
Chris Wilson
c3f923b554 drm/i915: Don't clflush before release phys object
When we teardown the backing storage for the phys object, we copy from
the coherent contiguous block back to the shmemfs object, clflushing as
we go. Trying to clflush the invalid sg beforehand just oops and would
be redundant (due to it already being coherent, and clflushed
afterwards).

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk
(cherry picked from commit e5facdf964)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-01-03 11:41:38 +02:00
Chris Wilson
6095868a27 drm/i915: Complete kerneldoc for struct i915_gem_context
The existing kerneldoc was outdated, so time for a refresh.

v2: Use single line kdoc, mention functions for manipulation

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161231112012.29263-3-chris@chris-wilson.co.uk
2016-12-31 11:41:47 +00:00
Chris Wilson
00c25e3f40 drm/i915: Prevent timeline updates whilst performing reset
As the fence may be signaled concurrently from an interrupt on another
device, it is possible for the list of requests on the timeline to be
modified as we walk it. Take both (the context's timeline and the global
timeline) locks to prevent such modifications.

Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk
2016-12-23 16:08:10 +00:00
Chris Wilson
8bfc478fa4 drm/i915: Silence allocation failure during sg_trim()
As trimming the sg table is merely an optimisation that gracefully fails
if we cannot allocate a new table, we do not need to report the failure
either.

Fixes: 0c40ce130e ("drm/i915: Trim the object sg table")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk
2016-12-23 16:07:32 +00:00
Chris Wilson
e5facdf964 drm/i915: Don't clflush before release phys object
When we teardown the backing storage for the phys object, we copy from
the coherent contiguous block back to the shmemfs object, clflushing as
we go. Trying to clflush the invalid sg beforehand just oops and would
be redundant (due to it already being coherent, and clflushed
afterwards).

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk
2016-12-23 16:07:20 +00:00
Chris Wilson
bdeb978506 drm/i915: Repeat flush of idle work during suspend
The idle work handler is self-arming - if it detects that it needs to
run again it will queue itself from its work handler. Take greater care
when trying to drain the idle work, and double check that it is flushed.

The free worker has a similar issue where it is armed by an RCU task
which may be running concurrently with us.

This should hopefully help with the sporadic WARN_ON(dev_priv->gt.awake)
from i915_gem_suspend.

v2: Reuse drain_freed_objects.
v3: Don't try to flush the freed objects from the shrinker, as it may be
underneath the struct_mutex already.
v4: do while and comment upon the excess rcu_barrier in drain_freed_objects

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-2-chris@chris-wilson.co.uk
2016-12-23 16:07:11 +00:00
Chris Wilson
28f412e02b drm/i915: Break after walking all GGTT vma in bump_inactive_ggtt
Since commit db6c2b4151 ("drm/i915: Store the vma in an rbtree under
the object") the vma are once again sorted into GGTT first, then ppGTT
so that the typical case of walking the GGTT vma can stop as soon as we
find a non-ppGTT. Apply that optimisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-1-chris@chris-wilson.co.uk
2016-12-23 16:06:56 +00:00
Dave Airlie
4a401ceeef Merge tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
First set of i915 fixes for code in next.

* tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: skip the first 4k of stolen memory on everything >= gen8
  drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
  drm/i915: Fix use after free in logical_render_ring_init
  drm/i915: disable PSR by default on HSW/BDW
  drm/i915: Fix setting of boost freq tunable
  drm/i915: tune down the fast link training vs boot fail
  drm/i915: Reorder phys backing storage release
  drm/i915/gen9: Fix PCODE polling during SAGV disabling
  drm/i915/gen9: Fix PCODE polling during CDCLK change notification
  drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting
  drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET
  drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
  drm/i915: drop the struct_mutex when wedged or trying to reset
2016-12-23 05:28:02 +10:00
Chris Wilson
abb0deacb5 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:

	 i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)

Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d766ef5300)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-12-20 16:30:47 +02:00
Chris Wilson
057f803ff1 drm/i915: Reorder phys backing storage release
In commit a4f5ea64f0 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.

v2: Add commentary about always aligning to page size.

Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk
(cherry picked from commit dbb4351bab)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-12-20 16:29:26 +02:00
Chris Wilson
c2dc6cc946 drm/i915: Add a test that we terminate the trimmed sgtable as expected
In commit 0c40ce130e ("drm/i915: Trim the object sg table"), we expect
to copy exactly orig_st->nents across and allocate the table thusly.
The copy loop should therefore end with the new_sg being NULL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-2-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-12-20 12:31:52 +00:00
Chris Wilson
d766ef5300 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:

	 i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)

Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-12-20 12:30:56 +00:00
Chris Wilson
e8a9c58fcd drm/i915: Unify active context tracking between legacy/execlists/guc
The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.

We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.

We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.

And finally an ulterior motive for unifying context handling was to
prepare for mock requests.

v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
2016-12-18 16:18:50 +00:00
Matthew Auld
966d5bf5eb drm/i915: convert to using range_overflows
Convert some of the obvious hand-rolled ranged overflow sanity checks to
our shiny new range_overflows macro.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-4-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-16 21:22:12 +00:00
Jan Kara
1a29d85eb0 mm: use vmf->address instead of of vmf->virtual_address
Every single user of vmf->virtual_address typed that entry to unsigned
long before doing anything with it so the type of virtual_address does
not really provide us any additional safety.  Just use masked
vmf->address which already has the appropriate type.

Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:09 -08:00
Chris Wilson
dbb4351bab drm/i915: Reorder phys backing storage release
In commit a4f5ea64f0 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.

v2: Add commentary about always aligning to page size.

Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk
2016-12-12 12:24:36 +00:00
Jani Nikula
73f67aa8cc drm/i915: distinguish G33 and Pineview from each other
Pineview deserves to use its own platform enum (which was already added,
unused, previously). IS_G33() no longer matches Pineview, and gets
replaced by IS_G33() || IS_PINEVIEW() or equivalent. Pineview is no
longer an outlier among platform definitions.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481143689-19672-1-git-send-email-jani.nikula@intel.com
2016-12-07 23:28:33 +02:00
Jani Nikula
c0f86832e3 drm/i915: rename BROADWATER and CRESTLINE to I965G and I965GM, respectively
Add more consistency to our naming. Pineview remains the outlier. Keep
using code names for gen5+.

v2: rebased

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481105584-23033-1-git-send-email-jani.nikula@intel.com
2016-12-07 15:18:33 +02:00
Chris Wilson
85fd4f58d7 drm/i915: Mark all non-vma being inserted into the address spaces
We need to distinguish between full i915_vma structs and simple
drm_mm_nodes when considering eviction (i.e. we must be careful not to
treat a mere drm_mm_node as a much larger i915_vma causing memory
corruption, if we are lucky). To do this, color these not-a-vma with -1
(I915_COLOR_UNEVICTABLE).

v2...v200: New name for -1.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk
2016-12-05 20:49:17 +00:00
Chris Wilson
ce1135c7de drm/i915: Complete requests in nop_submit_request
Since the submit/execute split in commit d55ac5bf97 ("drm/i915: Defer
transfer onto execution timeline to actual hw submission") the
global seqno advance was deferred until the submit_request callback.
After wedging the GPU, we were installing a nop_submit_request handler
(to avoid waking up the dead hw) but I had missed converting this over
to the new scheme. Under the new scheme, we have to explicitly call
i915_gem_submit_request() from the submit_request handler to mark the
request as on the hardware. If we don't the request is always pending,
and any waiter will continue to wait indefinitely and hangcheck will not
be able to resolve the lockup.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
Testcase: igt/gem_eio/in-flight
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk
(cherry picked from commit 3dcf93f7f2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-12-05 10:38:01 +02:00
Tvrtko Ursulin
cb15d9f8c3 drm/i915: More GEM init dev_priv cleanup
Simplifies the code to pass the right parameter in.

v2: Commit message. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-12-01 18:01:16 +00:00
Tvrtko Ursulin
bf9e8429ab drm/i915: Make various init functions take dev_priv
Like GEM init, GUC init, MOCS init and context creation.

Enables them to lose dev_priv locals.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-12-01 18:01:15 +00:00
Tvrtko Ursulin
12d79d7828 drm/i915: Make GEM object create and create from data take dev_priv
Makes all GEM object constructors consistent.

v2: Fix compilation in GVT code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
2016-12-01 18:01:08 +00:00
Tvrtko Ursulin
187685cb90 drm/i915: Make GEM object alloc/free and stolen created take dev_priv
Where it is more appropriate and also to be consistent with
the direction of the driver.

v2: Leave out object alloc/free inlining. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-12-01 18:00:15 +00:00
Chris Wilson
49d73912cb drm/i915: Convert vm->dev backpointer to vm->i915
99% of the time we access i915_address_space->dev we want the i915
device and not the drm device, so let's store the drm_i915_private
backpointer instead. The only real complication here are the inlines
in i915_vma.h where drm_i915_private is not yet defined and so we have
to choose an alternate path for our asserts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-29 11:38:00 +00:00
Chris Wilson
20e4933c47 drm/i915: Stop the machine as we install the wedged submit_request handler
In order to prevent a race between the old callback submitting an
incomplete request and i915_gem_set_wedged() installing its nop handler,
we must ensure that the swap occurs when the machine is idle
(stop_machine).

v2: move context lost from out of BKL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-4-chris@chris-wilson.co.uk
2016-11-22 17:42:19 +00:00
Chris Wilson
3dcf93f7f2 drm/i915: Complete requests in nop_submit_request
Since the submit/execute split in commit d55ac5bf97 ("drm/i915: Defer
transfer onto execution timeline to actual hw submission") the
global seqno advance was deferred until the submit_request callback.
After wedging the GPU, we were installing a nop_submit_request handler
(to avoid waking up the dead hw) but I had missed converting this over
to the new scheme. Under the new scheme, we have to explicitly call
i915_gem_submit_request() from the submit_request handler to mark the
request as on the hardware. If we don't the request is always pending,
and any waiter will continue to wait indefinitely and hangcheck will not
be able to resolve the lockup.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
Testcase: igt/gem_eio/in-flight
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk
2016-11-22 17:42:18 +00:00
Chris Wilson
d9e9da64c4 drm/i915: Don't deref context->file_priv ERR_PTR upon reset
When a user context is closed, it's file_priv backpointer is replaced by
ERR_PTR(-EBADF); be careful not to chase this invalid pointer after a
hang and a GPU reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: b083a0870c ("drm/i915: Add per client max context ban limit")
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-1-chris@chris-wilson.co.uk
2016-11-22 17:42:17 +00:00
Mika Kuoppala
bc1d53c647 drm/i915: Wipe hang stats as an embedded struct
Bannable property, banned status, guilty and active counts are
properties of i915_gem_context. Make them so.

v2: rebase

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479309634-28574-1-git-send-email-mika.kuoppala@intel.com
2016-11-21 14:36:56 +02:00
Mika Kuoppala
b083a0870c drm/i915: Add per client max context ban limit
If we have a bad client submitting unfavourably across different
contexts, creating new ones, the per context scoring of badness
doesn't remove the root cause, the offending client.
To counter, keep track of per client context bans. Deny access if
client is responsible for more than 3 context bans in
it's lifetime.

v2: move ban check to context create ioctl (Chris)
v3: add commentary about hangs needed to reach client ban (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-11-21 14:36:40 +02:00
Mika Kuoppala
841021713a drm/i915: Add bannable context parameter
Now when driver has per context scoring of 'hanging badness'
and also subsequent hangs during short windows are allowed,
if there is progress made in between, it does not make sense
to expose a ban timing window as a context parameter anymore.

Let the scoring be the sole indicator for ban policy and substitute
ban period context parameter as a boolean to get/set context
bannable property.

v2: allow non root to opt into being banned (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-11-21 14:36:40 +02:00
Mika Kuoppala
e5e1fc47ea drm/i915: Use request retirement as context progress
As hangcheck score was removed, the active decay of score
was removed also. This removed feature for hangcheck to detect
if the gpu client was accidentally or maliciously causing intermittent
hangs. Reinstate the scoring as a per context property, so that if
one context starts to act unfavourably, ban it.

v2: ban_period_secs as a gate to score check (Chris)
v3: decay in proper spot. scores as tunables (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-11-21 14:36:40 +02:00
Mika Kuoppala
3fe3b030bd drm/i915: Decouple hang detection from hangcheck period
Hangcheck state accumulation has gained more steps
along the years, like head movement and more recently the
subunit inactivity check. As the subunit sampling is only
done if the previous state check showed inactivity, we
have added more stages (and time) to reach a hang verdict.

Asymmetric engine states led to different actual weight of
'one hangcheck unit' and it was demonstrated in some
hangs that due to difference in stages, simpler engines
were accused falsely of a hang as their scoring was much
more quicker to accumulate above the hang treshold.

To completely decouple the hangcheck guilty score
from the hangcheck period, convert hangcheck score to a
rough period of inactivity measurement. As these are
tracked as jiffies, they are meaningful also across
reset boundaries. This makes finding a guilty engine
more accurate across multi engine activity scenarios,
especially across asymmetric engines.

We lose the ability to detect cross batch malicious attempts
to hinder the progress. Plan is to move this functionality
to be part of context banning which is more natural fit,
later in the series.

v2: use time_before macros (Chris)
    reinstate the pardoning of moving engine after hc (Chris)
v3: avoid global state for per engine stall detection (Chris)
v4: take timeline last retirement into account (Chris)
v5: do debug print on pardoning, split out retirement timestamp (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-11-21 14:36:40 +02:00
Chris Wilson
05c348377d drm/i915: Skip final clflush if LLC is coherent
If the LLC is coherent with the object, we do not need to worry about
whether main memory and cache mismatch when we hand the object back to
the system.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-2-chris@chris-wilson.co.uk
2016-11-18 22:33:49 +00:00
Chris Wilson
a6a7cc4b7d drm/i915: Always flush the dirty CPU cache when pinning the scanout
Currently we only clflush the scanout if it is in the CPU domain. Also
flush if we have a pending CPU clflush. We also want to treat the
dirtyfb path similar, and flush any pending writes there as well.

v2: Only send the fb flush message if flushing the dirt on flip
v3: Make flush-for-flip and dirtyfb look more alike since they serve
similar roles as end-of-frame marker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2
Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-1-chris@chris-wilson.co.uk
2016-11-18 22:33:22 +00:00
Chris Wilson
b17993b7b2 drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
On the DMA mapping error path, sg may be NULL (it has already been
marked as the last scatterlist entry), and we should avoid dereferencing
it again.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e227330223 ("drm/i915: avoid leaking DMA mappings")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20161114112930.2033-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-11-18 20:51:53 +00:00
Chris Wilson
5b8c8aec8e drm/i915: Move frontbuffer CS write tracking from ggtt vma to object
I tried to avoid having to track the write for every VMA by only
tracking writes to the ggtt. However, for the purposes of frontbuffer
tracking this is insufficient as we need to invalidate around writes not
just to the the ggtt but all aliased ppgtt views of the framebuffer. By
moving the critical section to the object and only doing so for
framebuffer writes we can reduce the tracking even further by only
watching framebuffers and not vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161116190704.5293-1-chris@chris-wilson.co.uk
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-18 11:15:59 +00:00
Matthew Auld
ea84aa776f drm/i915: don't leak global_timeline
We need to clean up the global_timeline in i915_gem_load_cleanup.

v2: don't forget about the struct_mutex, and also WARN_ON if we have any
remaining timelines before purging the global_timeline.

v3: it might be a good idea to first remove the global_timeline...duh!

Fixes: 73cb97010d ("drm/i915: Combine seqno + tracking into a global timeline struct")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1479415087-13216-1-git-send-email-matthew.auld@intel.com
Link: http://patchwork.freedesktop.org/patch/msgid/20161117210411.14044-2-chris@chris-wilson.co.uk
2016-11-17 21:05:37 +00:00
Chris Wilson
c4c29d7b59 drm/i915: Demote i915_gem_open() debugging from DRIVER to USER
We use DRM_DEBUG() when reporting on user actions, to try and keep
intentional errors out of the CI dmesg. Demote the debug from
i915_gem_open() similarly so that it is only apparent with drm.debug & 1
like its brethren.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161109104507.21228-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-11-17 14:23:20 +00:00
Tvrtko Ursulin
275a991c03 drm/i915: dev_priv cleanup in i915_gem_gtt.c
Started with removing INTEL_INFO(dev) and cascaded into a quite
big trickle of function prototype changes. Still, I think it is
for the better.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-11-17 13:56:12 +00:00
Tvrtko Ursulin
4362f4f6dd drm/i915: Use dev_priv in INTEL_INFO in i915_gem_fence_reg.c
Plus a small cascade of function prototype changes.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-11-17 13:56:06 +00:00
Tvrtko Ursulin
c6be607abc drm/i915: dev_priv and a small cascade of cleanups in i915_gem.c
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-11-17 13:55:26 +00:00
Chris Wilson
6b5e90f58c drm/i915/scheduler: Boost priorities for flips
Boost the priority of any rendering required to show the next pageflip
as we want to avoid missing the vblank by being delayed by invisible
workload. We prioritise avoiding jank and jitter in the GUI over
starving background tasks.

v2: Descend dma_fence_array when boosting priorities.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-10-chris@chris-wilson.co.uk
2016-11-14 21:01:22 +00:00
Chris Wilson
20311bd350 drm/i915/scheduler: Execute requests in order of priorities
Track the priority of each request and use it to determine the order in
which we submit requests to the hardware via execlists.

The priority of the request is determined by the user (eventually via
the context) but may be overridden at any time by the driver. When we set
the priority of the request, we bump the priority of all of its
dependencies to match - so that a high priority drawing operation is not
stuck behind a background task.

When the request is ready to execute (i.e. we have signaled the submit
fence following completion of all its dependencies, including third
party fences), we put the request into a priority sorted rbtree to be
submitted to the hardware. If the request is higher priority than all
pending requests, it will be submitted on the next context-switch
interrupt as soon as the hardware has completed the current request. We
do not currently preempt any current execution to immediately run a very
high priority request, at least not yet.

One more limitation, is that this is first implementation is for
execlists only so currently limited to gen8/gen9.

v2: Replace recursive priority inheritance bumping with an iterative
depth-first search list.
v3: list_next_entry() for walking lists
v4: Explain how the dfs solves the recursion problem with PI.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk
2016-11-14 21:01:21 +00:00
Chris Wilson
52e5420907 drm/i915/scheduler: Record all dependencies upon request construction
The scheduler needs to know the dependencies of each request for the
lifetime of the request, as it may choose to reschedule the requests at
any time and must ensure the dependency tree is not broken. This is in
additional to using the fence to only allow execution after all
dependencies have been completed.

One option was to extend the fence to support the bidirectional
dependency tracking required by the scheduler. However the mismatch in
lifetimes between the submit fence and the request essentially meant
that we had to build a completely separate struct (and we could not
simply reuse the existing waitqueue in the fence for one half of the
dependency tracking). The extra dependency tracking simply did not mesh
well with the fence, and keeping it separate both keeps the fence
implementation simpler and allows us to extend the dependency tracking
into a priority tree (whilst maintaining support for reordering the
tree).

To avoid the additional allocations and list manipulations, the use of
the priotree is disabled when there are no schedulers to use it.

v2: Create a dedicated slab for i915_dependency.
    Rename the lists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk
2016-11-14 21:00:28 +00:00
Chris Wilson
663f71e73f drm/i915: Remove engine->execlist_lock
The execlist_lock is now completely subsumed by the engine->timeline->lock,
and so we can remove the redundant layer of locking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-5-chris@chris-wilson.co.uk
2016-11-14 21:00:25 +00:00
Chris Wilson
bb89485e99 drm/i915: Create distinct lockclasses for execution vs user timelines
In order to simplify the lockdep annotation, as they become more complex
in the future with deferred execution and multiple paths through the
same functions, create a separate lockclass for the user timeline and
the hardware execution timeline.

We should only ever be locking the user timeline and the execution
timeline in parallel so we only need to create two lock classes, rather
than a separate class for every timeline.

v2: Rename the lock classes to be more consistent with other lockdep.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-2-chris@chris-wilson.co.uk
2016-11-14 21:00:21 +00:00
Chris Wilson
2b3c83176e drm/i915: Stop skipping the final clflush back to system pages
When we release the shmem backing storage, we make sure that the pages
are coherent with the cpu cache. However, our clflush routine was
skipping the flush as the object had no pages at release time. Fix this by
explicitly flushing the sg_table we are decoupling.

Fixes: 03ac84f183 ("drm/i915: Pass around sg_table to get_pages/put_pages backend")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-2-chris@chris-wilson.co.uk
2016-11-11 16:21:52 +00:00
Chris Wilson
9caa34aa93 drm/i915: Only wait upon the execution timeline when unlocked
In order to walk the list of all timelines, we currently require the
struct_mutex. We are sometimes called prior to the struct_mutex being
taken by the caller (i.e !I915_WAIT_LOCKED) in which case we can only
trust the global execution timelines (as these are owned by the device).
This means in the unlocked phase we can only wait upon the currently
executing requests and not all queued.

[  175.743243] general protection fault: 0000 [#1] SMP
[  175.743263] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iwlwifi aesni_intel aes_x86_64 lrw snd_soc_rt5640 gf128mul snd_soc_rl6231 snd_soc_core glue_helper snd_compress snd_pcm_dmaengine snd_hda_codec_hdmi ablk_helper snd_hda_codec_realtek cryptd snd_hda_codec_generic serio_raw cfg80211 snd_hda_intel snd_hda_codec ir_lirc_codec snd_hda_core lirc_dev snd_hwdep snd_pcm lpc_ich mei_me mei snd_seq_midi shpchp snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer rc_rc6_mce acpi_als nuvoton_cir kfifo_buf rc_core snd industrialio snd_soc_sst_acpi soundcore snd_soc_sst_match i2c_designware_platform 8250_dw i2c_designware_core dw_dmac spi_pxa2xx_platform mac_hid acpi_pad parport_pc ppdev lp parport
[  175.743509]  autofs4 i915 e1000e psmouse ptp pps_core xhci_pci ehci_pci ahci xhci_hcd ehci_hcd libahci video sdhci_acpi sdhci i2c_hid hid
[  175.743560] CPU: 2 PID: 2386 Comm: wtdg_monitor.sh Tainted: G     U          4.9.0-rc4-nightly+ #2
[  175.743581] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0358.2016.0606.1423 06/06/2016
[  175.743603] task: ffff88024509ba80 task.stack: ffffc9007bd18000
[  175.743618] RIP: 0010:[<ffffffffa01af29b>]  [<ffffffffa01af29b>] i915_gem_wait_for_idle+0x3b/0x140 [i915]
[  175.743660] RSP: 0000:ffffc9007bd1b9b8  EFLAGS: 00010297
[  175.743674] RAX: ffff88024489d248 RBX: 0000000000000000 RCX: 0000000000000000
[  175.743691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880244898000
[  175.743708] RBP: ffffc9007bd1b9f0 R08: 0000000000000000 R09: 0000000000000001
[  175.743724] R10: 00000028eaf42792 R11: 0000000000000001 R12: dead000000000100
[  175.743741] R13: dead000000000148 R14: ffffc9007bd1ba5f R15: 0000000000000005
[  175.743758] FS:  00007f2638330700(0000) GS:ffff880256d00000(0000) knlGS:0000000000000000
[  175.743777] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  175.743791] CR2: 00007f885c8cea40 CR3: 00000002416b5000 CR4: 00000000003406e0
[  175.743808] Stack:
[  175.743816]  ffff88024489d248 000000004509ba80 ffff880244898000 ffff88024509ba80
[  175.743840]  00000000ffff8b69 ffffc9007bd1ba5f ffffc9007bd1ba5e ffffc9007bd1ba28
[  175.743863]  ffffffffa01b661d 00000000ffffffff 0000000000000000 ffff880244898000
[  175.743886] Call Trace:
[  175.743906]  [<ffffffffa01b661d>] i915_gem_shrinker_lock_uninterruptible.constprop.5+0x5d/0xc0 [i915]
[  175.743937]  [<ffffffffa01b6cd0>] i915_gem_shrinker_oom+0x30/0x1b0 [i915]
[  175.743955]  [<ffffffff8109ca79>] notifier_call_chain+0x49/0x70
[  175.743971]  [<ffffffff8109cd9d>] __blocking_notifier_call_chain+0x4d/0x70
[  175.743988]  [<ffffffff8109cdd6>] blocking_notifier_call_chain+0x16/0x20
[  175.744005]  [<ffffffff811885dc>] out_of_memory+0x22c/0x480
[  175.744020]  [<ffffffff81205542>] __alloc_pages_slowpath+0x851/0x8ec
[  175.744037]  [<ffffffff8118ca51>] __alloc_pages_nodemask+0x2c1/0x310
[  175.744054]  [<ffffffff811d8ea8>] alloc_pages_current+0x88/0x120
[  175.744070]  [<ffffffff811833a4>] __page_cache_alloc+0xb4/0xc0
[  175.744086]  [<ffffffff811865ca>] filemap_fault+0x29a/0x500
[  175.744101]  [<ffffffff81299aa6>] ext4_filemap_fault+0x36/0x50
[  175.744117]  [<ffffffff811b3d4a>] __do_fault+0x6a/0xe0
[  175.744131]  [<ffffffff811b97ee>] handle_mm_fault+0xd0e/0x1330
[  175.744147]  [<ffffffff8106738c>] __do_page_fault+0x23c/0x4d0
[  175.744162]  [<ffffffff81067650>] do_page_fault+0x30/0x80
[  175.744177]  [<ffffffff817ffbe8>] page_fault+0x28/0x30
[  175.744191] Code: 41 57 41 56 41 55 41 54 53 48 83 ec 10 4c 8b a7 48 52 00 00 89 75 d4 48 89 45 c8 49 39 c4 74 78 4d 8d 6c 24 48 41 bf 05 00 00 00 <49> 8b 5d 00 48 85 db 74 50 8b 83 20 01 00 00 85 c0 74 15 48 8b
[  175.744320] RIP  [<ffffffffa01af29b>] i915_gem_wait_for_idle+0x3b/0x140 [i915]
[  175.744351]  RSP <ffffc9007bd1b9b8>

Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-1-chris@chris-wilson.co.uk
2016-11-11 16:21:52 +00:00
Tvrtko Ursulin
0031fb9685 drm/i915: Assorted dev_priv cleanups
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
2016-11-11 14:58:26 +00:00
Joonas Lahtinen
b42fe9ca0a drm/i915: Split out i915_vma.c
As a side product, had to split two other files;
- i915_gem_fence_reg.h
- i915_gem_object.h (only parts that needed immediate untanglement)

I tried to move code in as big chunks as possible, to make review
easier. i915_vma_compare was moved to a header temporarily.

v2:
- Use i915_gem_fence_reg.{c,h}

v3:
- Rebased

v4:
- Fix building when DEBUG_GEM is enabled by reordering a bit.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com
2016-11-11 14:34:54 +02:00
Tvrtko Ursulin
0c40ce130e drm/i915: Trim the object sg table
At the moment we allocate enough sg table entries assuming we
will not be able to do any coalescing. But since in practice
we most often can, and more so very effectively, this ends up
wasting a lot of memory.

A simple and effective way of trimming the over-allocated
entries is to copy the table over to a new one allocated to the
exact size.

Experiments on my freshly logged and idle desktop (KDE) showed
that by doing this we can save approximately 1 MiB of RAM, or
when running a typical benchmark like gl_manhattan I have
even seen a 6 MiB saving.

More complicated techniques such as only copying the last used
page and freeing the rest are left to the reader.

v2:
 * Update commit message.
 * Use temporary sg_table on stack. (Chris Wilson)

v3:
 * Commit message update.
 * Comment added.
 * Replace memcpy with copy assignment.
   (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478704423-7447-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-11-10 09:29:25 +00:00
Chris Wilson
d0da48cf92 drm/i915: Remove chipset flush after cache flush
We always flush the chipset prior to executing with the GPU, so we can
skip the flush during ordinary domain management.

This should help mitigate some of the potential performance regressions,
but likely trivial, from doing the flush unconditionally before execbuf
introduced in commit dcd79934b0 ("drm/i915: Unconditionally flush any
chipset buffers before execbuf")

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161106130001.9509-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-08 11:04:04 +00:00
Imre Deak
31ab49abde drm/i915: Add assert for no pending GPU requests during suspend/resume in LR mode
During resume we will reset the SW/HW tracking for each ring head/tail
pointers and so are not prepared to replay any pending requests (as
opposed to GPU reset time). Add an assert for this both to the suspend
and the resume code.

v2:
- Check for ELSP port idle already during suspend and check !gt.awake
  during resume. (Chris)
v3:
- Move the !gt.awake check to i915_gem_resume().
v4:
- s/intel_lr_engines_idle/intel_execlists_idle/ (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-4-git-send-email-imre.deak@intel.com
2016-11-07 14:48:05 +02:00
Imre Deak
0cb5670baa drm/i915: Make sure engines are idle during GPU idling in LR mode
We assume that the GPU is idle once receiving the seqno via the last
request's user interrupt. In execlist mode the corresponding context
completed interrupt can be delayed though and until this latter
interrupt arrives we consider the request to be pending on the ELSP
submit port. This can cause a problem during system suspend where this
last request will be seen by the resume code as still pending. Such
pending requests are normally replayed after a GPU reset, but during
resume we reset both SW and HW tracking of the ring head/tail pointers,
so replaying the pending request with its stale tail pointer will leave
the ring in an inconsistent state. A subsequent request submission can
lead then to the GPU executing from uninitialized area in the ring
behind the above stale tail pointer.

Fix this by making sure any pending request on the ELSP port is
completed before suspending. I used a polling wait since the completion
time I measured was <1ms and since normally we only need to wait during
system suspend. GPU idling during runtime suspend is scheduled with a
delay (currently 50-100ms) after the retirement of the last request at
which point the context completed interrupt must have arrived already.

The chance of this bug was increased by

commit 1c777c5d1d
Author: Imre Deak <imre.deak@intel.com>
Date:   Wed Oct 12 17:46:37 2016 +0300

    drm/i915/hsw: Fix GPU hang during resume from S3-devices state

but it could happen even without the explicit GPU reset, since we
disable interrupts afterwards during the suspend sequence.

v2:
- Do an unlocked poll-wait first. (Chris)
v3-4:
- s/intel_lr_engines_idle/intel_execlists_idle/ and move
  i915.enable_execlists check to the new helper. (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98470
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-3-git-send-email-imre.deak@intel.com
2016-11-07 14:48:05 +02:00
Imre Deak
93c97dc17f drm/i915: Avoid early GPU idling due to race with new request
There is a small race where a new request can be submitted and retired
after the idle worker started to run which leads to idling the GPU too
early. Fix this by deferring the idling to the pending instance of the
worker.

This scenario was pointed out by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-2-git-send-email-imre.deak@intel.com
2016-11-07 14:48:04 +02:00
Chris Wilson
767a222e47 drm/i915: Limit Valleyview and earlier to only using mappable scanout
Valleyview appears to be limited to only scanning out from the first 512MiB
of the Global GTT. Lets presume that this behaviour was inherited from the
display block copied from g4x (not Ironlake) and all earlier generations
are similarly affected, though testing suggests different symptoms. For
simplicity, impose that these platforms must scanout from the mappable
region. (For extra simplicity, use HAS_GMCH_DISPLAY even though this
catches Cherryview which does not appear to be limited to the low
aperture for its scanout.)

v2: Use HAS_GMCH_DISPLAY() to more clearly convey my intent about
limiting this workaround to the old style of display engine.

v3: Update changelog to reflect testing by Ville Syrjälä
v4: Include the changes to the comments as well

Reported-by: Luis Botello <luis.botello.ortega@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98036
Fixes: 2efb813d53 ("drm/i915: Fallback to using unmappable memory for scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107110128.28762-1-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com
2016-11-07 12:25:50 +00:00
Chris Wilson
0ef723cbce drm/i915: Round tile chunks up for constructing partial VMAs
When we split a large object up into chunks for GTT faulting (because we
can't fit the whole object into the aperture) we have to align our cuts
with the fence registers. Each partial VMA must cover a complete set of
tile rows or the offset into each partial VMA is not aligned with the
whole image. Currently we enforce a minimum size on each partial VMA,
but this minimum size itself was not aligned to the tile row causing
distortion.

Reported-by: Andreas Reis <andreas.reis@gmail.com>
Reported-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Norbert Preining <preining@logic.at>
Tested-by: Norbert Preining <preining@logic.at>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Fixes: 03af84fe7f ("drm/i915: Choose partial chunksize based on tile row size")
Fixes: a61007a83a ("drm/i915: Fix partial GGTT faulting") # enabling patch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98402
Testcase: igt/gem_mmap_gtt/medium-copy-odd
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107105443.27855-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-07 11:39:59 +00:00
Chris Wilson
2c3a3f44dc drm/i915: Fix pages pin counting around swizzle quirk
commit bc0629a767 ("drm/i915: Track pages pinned due to swizzling
quirk") fixed one problem, but revealed a whole lot more. The root cause
of the pin count mismatch for the swizzle quirk (for L-shaped memory on
gen3/4) was that we were incrementing the pages_pin_count upon getting
the backing pages but then overwriting the pages_pin_count to set it to
1 afterwards. With a little bit of adjustment to satisfy the GEM_BUG_ON
sanitychecks, the fix is to replace the explicit atomic_set with an
atomic_inc.

v2: Consistently use atomics (not mix atomics and helpers) within the
lowlevel get_pages routines. This makes the atomic operations much
clearer.

Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161104103001.27643-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-04 11:55:39 +00:00
Tvrtko Ursulin
a933568eb6 drm/i915: Tidy slab cache allocations
We can use the preferred KMEM_CACHE helper for brevity.

Also simplifiy error unwind by only setting the ENOMEM
error code once.

v2: Add forgotten changes. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478099699-28652-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-11-03 13:52:57 +00:00
Joonas Lahtinen
56cea32382 drm/i915: Unify global_list into global_link
$ sed -i -r 's/\bglobal_list\b/global_link/g' *.c *.h

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478081764-8058-1-git-send-email-joonas.lahtinen@linux.intel.com
2016-11-02 15:17:13 +02:00
Tvrtko Ursulin
3599a91cc8 drm/i915: Allow shrinking of userptr objects once again
Commit 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects
are backed by swap") stopped considering the userptr objects
in shrinker callbacks.

Restore that so idle userptr objects can be discarded in order
to free up memory.

One change further to what was introduced in 1bec9b0bda is
to start considering userptr objects in oom but that should
also be a correct thing to do.

v2: Introduce I915_GEM_OBJECT_IS_SHRINKABLE. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects are backed by swap")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1478011450-6634-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-11-01 16:35:26 +00:00
Ville Syrjälä
a26e523921 drm/i915: Pass dev_priv to IS_BROADWATER/IS_CRESTLINE
Unify our approach to things by passing around dev_priv instead of dev.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477946245-14134-21-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-11-01 16:40:38 +02:00
Chris Wilson
548625ee8f drm/i915: Improve lockdep tracking for obj->mm.lock
The shrinker may appear to recurse into obj->mm.lock as the shrinker may
be called from a direct reclaim path whilst handling get_pages. We
filter out recursing on the same obj->mm.lock by inspecting
obj->mm.pages, but we do want to take the lock on a second object in
order to reap their pages. lockdep spots the recursion on the same
lockclass and needs annotation to avoid a false positive. To keep the
two paths distinct, create an enum to indicate which subclass of
obj->mm.lock we are using. This removes the false positive and avoids
masking real bugs.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-01 13:01:44 +00:00
Chris Wilson
db6c2b4151 drm/i915: Store the vma in an rbtree under the object
With full-ppgtt one of the main bottlenecks is the lookup of the VMA
underneath the object. For execbuf there is merit in having a very fast
direct lookup of ctx:handle to the vma using a hashtree, but that still
leaves a large number of other lookups. One way to speed up the lookup
would be to use a rhashtable, but that requires extra allocations and
may exhibit poor worse case behaviour. An alternative is to use an
embedded rbtree, i.e. no extra allocations and deterministic behaviour,
but at the slight cost of O(lgN) lookups (instead of O(1) for
rhashtable). The major of such tree will be very shallow and so not much
slower, and still scales much, much better than the current unsorted
list.

v2: Bump vma_compare() to return a long, as we return the result of
comparing two pointers.

References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101115400.15647-1-chris@chris-wilson.co.uk
2016-11-01 13:00:40 +00:00
Chris Wilson
bc0629a767 drm/i915: Track pages pinned due to swizzling quirk
If we have a tiled object and an unknown CPU swizzle pattern, we pin the
pages to prevent the object from being swapped out (and us corrupting
the contents as we do not know the access pattern and so cannot convert
it to linear and back to tiled on reuse). This requires us to remember
to drop the extra pinning when freeing the object, or else we trigger
warnings about the pin leak. In commit fbbd37b36f ("drm/i915: Move
object release to a freelist + worker"), the object free path was
deferred to a worker, but the unpinning of the quirk, along with marking
the object as reclaimable, was left on the immediate path (so that if
required we could reclaim the pages under memory pressure as early as
possible). However, this split introduced a bug where the pages were no
longer being unpinned if they were marked as unneeded.

[  231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915]
[  231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj))
[  231.800405] Modules linked in:
[  231.800406]  snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915]
[  231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G     U          4.9.0-rc2-CI-CI_DRM_1780+ #1
[  231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009
[  231.800456] Workqueue: events __i915_gem_free_work [i915]
[  231.800459]  ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000
[  231.800465]  ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000
[  231.800469]  ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0
[  231.800474] Call Trace:
[  231.800479]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
[  231.800484]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[  231.800487]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[  231.800491]  [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340
[  231.800520]  [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915]
[  231.800548]  [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915]
[  231.800552]  [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0
[  231.800555]  [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0
[  231.800558]  [<ffffffff8109c789>] worker_thread+0x49/0x490
[  231.800561]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[  231.800563]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[  231.800566]  [<ffffffff810a2aab>] kthread+0xeb/0x110
[  231.800569]  [<ffffffff810a29c0>] ? kthread_park+0x60/0x60
[  231.800573]  [<ffffffff818164a7>] ret_from_fork+0x27/0x40

Moving to a separate flag for tracking the quirked pin is overkill for
the bug (since we only have to interchange the two tests in
i915_gem_free_object) but it does reduce a complicated test on all
objects and provide a sanitycheck for uncommon code paths.

Fixes: fbbd37b36f ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-2-chris@chris-wilson.co.uk
2016-11-01 10:48:41 +00:00
Chris Wilson
cb399eabc4 drm/i915: Avoid accessing request->timeline outside of its lifetime
Whilst waiting on a request, we may do so without holding any locks or
any guards beyond a reference to the request. In order to avoid taking
locks within request deallocation, we drop references to its timeline
(via the context and ppgtt) upon retirement. We should avoid chasing
such pointers outside of their control, in particular we inspect the
request->timeline to see if we may restore the RPS waitboost for a
client. If we instead look at the engine->timeline, we will have similar
behaviour on both full-ppgtt and !full-ppgtt systems and reduce the
amount of reward we give towards stalling clients (i.e. only if the
client stalls and the GPU is uncontended does it reclaim its boost).
This restores behaviour back to pre-timelines, whilst fixing:

[  645.078485] BUG: KASAN: use-after-free in i915_gem_object_wait_fence+0x1ee/0x2e0 at addr ffff8802335643a0
[  645.078577] Read of size 4 by task gem_exec_schedu/28408
[  645.078638] CPU: 1 PID: 28408 Comm: gem_exec_schedu Not tainted 4.9.0-rc2+ #64
[  645.078724] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[  645.078816]  ffff88022daef9a0 ffffffff8143d059 ffff880235402a80 ffff880233564200
[  645.078998]  ffff88022daef9c8 ffffffff81229c5c ffff88022daefa48 ffff880233564200
[  645.079172]  ffff880235402a80 ffff88022daefa38 ffffffff81229ef0 000000008110a796
[  645.079345] Call Trace:
[  645.079404]  [<ffffffff8143d059>] dump_stack+0x68/0x9f
[  645.079467]  [<ffffffff81229c5c>] kasan_object_err+0x1c/0x70
[  645.079534]  [<ffffffff81229ef0>] kasan_report_error+0x1f0/0x4b0
[  645.079601]  [<ffffffff8122a244>] kasan_report+0x34/0x40
[  645.079676]  [<ffffffff81634f5e>] ? i915_gem_object_wait_fence+0x1ee/0x2e0
[  645.079741]  [<ffffffff81229951>] __asan_load4+0x61/0x80
[  645.079807]  [<ffffffff81634f5e>] i915_gem_object_wait_fence+0x1ee/0x2e0
[  645.079876]  [<ffffffff816364bf>] i915_gem_object_wait+0x19f/0x590
[  645.079944]  [<ffffffff81636320>] ? i915_gem_object_wait_priority+0x500/0x500
[  645.080016]  [<ffffffff8110fb30>] ? debug_show_all_locks+0x1e0/0x1e0
[  645.080084]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
[  645.080157]  [<ffffffff8110a796>] ? __lock_is_held+0x46/0xc0
[  645.080226]  [<ffffffff8163bc61>] ? i915_gem_set_domain_ioctl+0x141/0x690
[  645.080296]  [<ffffffff8163bcc2>] i915_gem_set_domain_ioctl+0x1a2/0x690
[  645.080366]  [<ffffffff811f8f85>] ? __might_fault+0x75/0xe0
[  645.080433]  [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
[  645.080508]  [<ffffffff8163bb20>] ? i915_gem_obj_prepare_shmem_write+0x3a0/0x3a0
[  645.080603]  [<ffffffff815a52d0>] ? drm_ioctl_permit+0x120/0x120
[  645.080670]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
[  645.080738]  [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
[  645.080804]  [<ffffffff8120268c>] ? do_mmap+0x47c/0x580
[  645.080871]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
[  645.080938]  [<ffffffff812755f0>] ? ioctl_preallocate+0x150/0x150
[  645.081011]  [<ffffffff81108c53>] ? up_write+0x23/0x50
[  645.081078]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
[  645.081145]  [<ffffffff811da450>] ? vma_is_stack_for_current+0x90/0x90
[  645.081214]  [<ffffffff8110d853>] ? mark_held_locks+0x23/0xc0
[  645.082030]  [<ffffffff81288408>] ? __fget+0x168/0x250
[  645.082106]  [<ffffffff819ad517>] ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  645.082176]  [<ffffffff81288592>] ? __fget_light+0xa2/0xc0
[  645.082242]  [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
[  645.082309]  [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[  645.082374] Object at ffff880233564200, in cache kmalloc-8192 size: 8192
[  645.082431] Allocated:
[  645.082480] PID = 28408
[  645.082535]  [  645.082566] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
[  645.082623]  [  645.082656] [<ffffffff81228b06>] save_stack+0x46/0xd0
[  645.082716]  [  645.082756] [<ffffffff812292fd>] kasan_kmalloc+0xad/0xe0
[  645.082817]  [  645.082848] [<ffffffff81631752>] i915_ppgtt_create+0x52/0x220
[  645.082908]  [  645.082941] [<ffffffff8161db96>] i915_gem_create_context+0x396/0x560
[  645.083027]  [  645.083059] [<ffffffff8161f857>] i915_gem_context_create_ioctl+0x97/0xf0
[  645.083152]  [  645.083183] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
[  645.083243]  [  645.083274] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
[  645.083334]  [  645.083372] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
[  645.083432]  [  645.083464] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[  645.083551] Freed:
[  645.083599] PID = 27629
[  645.083648]  [  645.083676] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
[  645.083738]  [  645.083770] [<ffffffff81228b06>] save_stack+0x46/0xd0
[  645.083830]  [  645.083862] [<ffffffff81229203>] kasan_slab_free+0x73/0xc0
[  645.083922]  [  645.083961] [<ffffffff812279c9>] kfree+0xa9/0x170
[  645.084021]  [  645.084053] [<ffffffff81629f60>] i915_ppgtt_release+0x100/0x180
[  645.084139]  [  645.084171] [<ffffffff8161d414>] i915_gem_context_free+0x1b4/0x230
[  645.084257]  [  645.084288] [<ffffffff816537b2>] intel_lr_context_unpin+0x192/0x230
[  645.084380]  [  645.084413] [<ffffffff81645250>] i915_gem_request_retire+0x620/0x630
[  645.084500]  [  645.085226] [<ffffffff816473d1>] i915_gem_retire_requests+0x181/0x280
[  645.085313]  [  645.085352] [<ffffffff816352ba>] i915_gem_retire_work_handler+0xca/0xe0
[  645.085440]  [  645.085471] [<ffffffff810c725b>] process_one_work+0x4fb/0x920
[  645.085532]  [  645.085562] [<ffffffff810c770d>] worker_thread+0x8d/0x840
[  645.085622]  [  645.085653] [<ffffffff810d21e5>] kthread+0x185/0x1b0
[  645.085718]  [  645.085750] [<ffffffff819ad7a7>] ret_from_fork+0x27/0x40
[  645.085811] Memory state around the buggy address:
[  645.085869]  ffff880233564280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.085956]  ffff880233564300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.086053] >ffff880233564380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.086138]                                ^
[  645.086193]  ffff880233564400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.086283]  ffff880233564480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

v2: Add a comment to document the hint like nature of
 intel_engine_last_submit()

Fixes: 73cb97010d ("drm/i915: Combine seqno + tracking into a global timeline struct")
Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-1-chris@chris-wilson.co.uk
2016-11-01 10:48:40 +00:00
Chris Wilson
7d5d59e527 drm/i915: Use the full hammer when shutting down the rcu tasks
To flush all call_rcu() tasks (here from i915_gem_free_object()) we need
to call rcu_barrier() (not synchronize_rcu()). If we don't then we may
still have objects being freed as we continue to teardown the driver -
in particular, the recently released rings may race with the memory
manager shutdown resulting in sporadic:

[  142.217186] WARNING: CPU: 7 PID: 6185 at drivers/gpu/drm/drm_mm.c:932 drm_mm_takedown+0x2e/0x40
[  142.217187] Memory manager not clean during takedown.
[  142.217187] Modules linked in: i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_realtek snd_hda_codec_generic mei_me mei snd_hda_codec_hdmi snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: snd_hda_intel]
[  142.217199] CPU: 7 PID: 6185 Comm: rmmod Not tainted 4.9.0-rc2-CI-Trybot_242+ #1
[  142.217199] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
[  142.217200]  ffffc90002ecfce0 ffffffff8142dd65 ffffc90002ecfd30 0000000000000000
[  142.217202]  ffffc90002ecfd20 ffffffff8107e4e6 000003a40778c2a8 ffff880401355c48
[  142.217204]  ffff88040778c2a8 ffffffffa040f3c0 ffffffffa040f4a0 00005621fbf8b1f0
[  142.217206] Call Trace:
[  142.217209]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
[  142.217211]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[  142.217213]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[  142.217214]  [<ffffffff81559e3e>] drm_mm_takedown+0x2e/0x40
[  142.217236]  [<ffffffffa035c02a>] i915_gem_cleanup_stolen+0x1a/0x20 [i915]
[  142.217246]  [<ffffffffa034c581>] i915_ggtt_cleanup_hw+0x31/0xb0 [i915]
[  142.217253]  [<ffffffffa0310311>] i915_driver_cleanup_hw+0x31/0x40 [i915]
[  142.217260]  [<ffffffffa0312001>] i915_driver_unload+0x141/0x1a0 [i915]
[  142.217268]  [<ffffffffa031c2c4>] i915_pci_remove+0x14/0x20 [i915]
[  142.217269]  [<ffffffff8147d214>] pci_device_remove+0x34/0xb0
[  142.217271]  [<ffffffff8157b14c>] __device_release_driver+0x9c/0x150
[  142.217272]  [<ffffffff8157bcc6>] driver_detach+0xb6/0xc0
[  142.217273]  [<ffffffff8157abe3>] bus_remove_driver+0x53/0xd0
[  142.217274]  [<ffffffff8157c787>] driver_unregister+0x27/0x50
[  142.217276]  [<ffffffff8147c265>] pci_unregister_driver+0x25/0x70
[  142.217287]  [<ffffffffa03d764c>] i915_exit+0x1a/0x71 [i915]
[  142.217289]  [<ffffffff811136b3>] SyS_delete_module+0x193/0x1e0
[  142.217291]  [<ffffffff818174ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[  142.217292] ---[ end trace 6fd164859c154772 ]---
[  142.217505] [drm:show_leaks] *ERROR* node [6b6b6b6b6b6b6b6b + 6b6b6b6b6b6b6b6b]: inserted at
                [<ffffffff81559ff3>] save_stack.isra.1+0x53/0xa0
                [<ffffffff8155a98d>] drm_mm_insert_node_in_range_generic+0x2ad/0x360
                [<ffffffffa035bf23>] i915_gem_stolen_insert_node_in_range+0x93/0xe0 [i915]
                [<ffffffffa035c855>] i915_gem_object_create_stolen+0x75/0xb0 [i915]
                [<ffffffffa036a51a>] intel_engine_create_ring+0x9a/0x140 [i915]
                [<ffffffffa036a921>] intel_init_ring_buffer+0xf1/0x440 [i915]
                [<ffffffffa036be1b>] intel_init_render_ring_buffer+0xab/0x1b0 [i915]
                [<ffffffffa0363d08>] intel_engines_init+0xc8/0x210 [i915]
                [<ffffffffa0355d7c>] i915_gem_init+0xac/0xf0 [i915]
                [<ffffffffa0311454>] i915_driver_load+0x9c4/0x1430 [i915]
                [<ffffffffa031c2f8>] i915_pci_probe+0x28/0x40 [i915]
                [<ffffffff8147d315>] pci_device_probe+0x85/0xf0
                [<ffffffff8157b7ff>] driver_probe_device+0x21f/0x430
                [<ffffffff8157baee>] __driver_attach+0xde/0xe0

In particular note that the node was being poisoned as we inspected the
list, a  clear indication that the object is being freed as we make the
assertion.

v2: Don't loop, just assert that we do all the work required as that
will be better at detecting further errors.

Fixes: fbbd37b36f ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101084843.3961-1-chris@chris-wilson.co.uk
2016-11-01 09:30:08 +00:00
Chris Wilson
80b204bce8 drm/i915: Enable multiple timelines
With the infrastructure converted over to tracking multiple timelines in
the GEM API whilst preserving the efficiency of using a single execution
timeline internally, we can now assign a separate timeline to every
context with full-ppgtt.

v2: Add a comment to indicate the xfer between timelines upon submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
2016-10-28 20:53:57 +01:00
Chris Wilson
28176ef4cf drm/i915: Reserve space in the global seqno during request allocation
A restriction on our global seqno is that they cannot wrap, and that we
cannot use the value 0. This allows us to detect when a request has not
yet been submitted, its global seqno is still 0, and ensures that
hardware semaphores are monotonic as required by older hardware. To
meet these restrictions when we defer the assignment of the global
seqno, we must check that we have an available slot in the global seqno
space during request construction. If that test fails, we wait for all
requests to be completed and reset the hardware back to 0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk
2016-10-28 20:53:56 +01:00
Chris Wilson
65e4760e39 drm/i915: Introduce a global_seqno for each request
Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
2016-10-28 20:53:53 +01:00
Chris Wilson
3033acab07 drm/i915: Queue the idling context switch after all other timelines
Before suspend, we wait for the switch to the kernel context. In order
for all the other context images to be complete upon suspend, that
switch must be the last operation by the GPU (i.e. this idling request
must not overtake any pending requests). To make this request execute last,
we make it depend on every other inflight request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-24-chris@chris-wilson.co.uk
2016-10-28 20:53:52 +01:00
Chris Wilson
73cb97010d drm/i915: Combine seqno + tracking into a global timeline struct
Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.

Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
2016-10-28 20:53:51 +01:00
Chris Wilson
d07f0e59b2 drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)

v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.

Caveats:

 * busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.

 * non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"

 * dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).

 * loss of object-level retirement callbacks, emulated by VMA retirement
tracking.

 * minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 20:53:50 +01:00
Chris Wilson
f0cd518206 drm/i915: Use lockless object free
Having moved the locked phase of freeing an object to a separate worker,
we can now declare to the core that we only need the unlocked variant of
driver->gem_free_object, and can use the simple unreference internally.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk
2016-10-28 20:53:50 +01:00
Chris Wilson
fbbd37b36f drm/i915: Move object release to a freelist + worker
We want to hide the latency of releasing objects and their backing
storage from the submission, so we move the actual free to a worker.
This allows us to switch to struct_mutex freeing of the object in the
next patch.

Furthermore, if we know that the object we are dereferencing remains valid
for the duration of our access, we can forgo the usual synchronisation
barriers and atomic reference counting. To ensure this we defer freeing
an object til after an RCU grace period, such that any lookup of the
object within an RCU read critical section will remain valid until
after we exit that critical section. We also employ this delay for
rate-limiting the serialisation on reallocation - we have to slow down
object creation in order to prevent resource starvation (in particular,
files).

v2: Return early in i915_gem_tiling() ioctl to skip over superfluous
work on error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-19-chris@chris-wilson.co.uk
2016-10-28 20:53:49 +01:00
Chris Wilson
40e62d5d6b drm/i915: Acquire the backing storage outside of struct_mutex in set-domain
As we can locklessly (well struct_mutex-lessly) acquire the backing
storage, do so in set-domain-ioctl to reduce the contention on the
struct_mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-18-chris@chris-wilson.co.uk
2016-10-28 20:53:49 +01:00
Chris Wilson
fe115628d5 drm/i915: Implement pwrite without struct-mutex
We only need struct_mutex within pwrite for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-17-chris@chris-wilson.co.uk
2016-10-28 20:53:48 +01:00
Chris Wilson
bb6dc8d96b drm/i915: Implement pread without struct-mutex
We only need struct_mutex within pread for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-16-chris@chris-wilson.co.uk
2016-10-28 20:53:48 +01:00
Chris Wilson
1233e2db19 drm/i915: Move object backing storage manipulation to its own locking
Break the allocation of the backing storage away from struct_mutex into
a per-object lock. This allows parallel page allocation, provided we can
do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT
fault), i.e. before execbuf! The increased cost of the atomic counters
are hidden behind i915_vma_pin() for the typical case of execbuf, i.e.
as the object is typically bound between execbufs, the page_pin_count is
static. The cost will be felt around set-domain and pwrite, but offset
by the improvement from reduced struct_mutex contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk
2016-10-28 20:53:47 +01:00
Chris Wilson
03ac84f183 drm/i915: Pass around sg_table to get_pages/put_pages backend
The plan is to move obj->pages out from under the struct_mutex into its
own per-object lock. We need to prune any assumption of the struct_mutex
from the get_pages/put_pages backends, and to make it easier we pass
around the sg_table to operate on rather than indirectly via the obj.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk
2016-10-28 20:53:47 +01:00
Chris Wilson
a4f5ea64f0 drm/i915: Refactor object page API
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
2016-10-28 20:53:46 +01:00
Chris Wilson
96d7763452 drm/i915: Use a radixtree for random access to the object's backing storage
A while ago we switched from a contiguous array of pages into an sglist,
for that was both more convenient for mapping to hardware and avoided
the requirement for a vmalloc array of pages on every object. However,
certain GEM API calls (like pwrite, pread as well as performing
relocations) do desire access to individual struct pages. A quick hack
was to introduce a cache of the last access such that finding the
following page was quick - this works so long as the caller desired
sequential access. Walking backwards, or multiple callers, still hits a
slow linear search for each page. One solution is to store each
successful lookup in a radix tree.

v2: Rewrite building the radixtree for clarity, hopefully.

v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from
within an atomic section and so relax the allocation context to a simple
GFP_KERNEL and mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk
2016-10-28 20:53:45 +01:00
Chris Wilson
4c7d62c6b8 drm/i915: Markup GEM API with lockdep asserts
Add lockdep_assert_held(struct_mutex) to the API preamble of the
internal GEM interfaces.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk
2016-10-28 20:53:45 +01:00
Chris Wilson
f8a7fde456 drm/i915: Defer active reference until required
We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
then pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk
2016-10-28 20:53:43 +01:00
Chris Wilson
e95433c73a drm/i915: Rearrange i915_wait_request() accounting with callers
Our low-level wait routine has evolved from our generic wait interface
that handled unlocked, RPS boosting, waits with time tracking. If we
push our GEM fence tracking to use reservation_objects (required for
handling multiple timelines), we lose the ability to pass the required
information down to i915_wait_request(). However, if we push the extra
functionality from i915_wait_request() to the individual callsites
(i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use
of those extras, we can both simplify our low level wait and prepare for
extending the GEM interface for use of reservation_objects.

v2: Rewrite i915_wait_request() kerneldocs

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
2016-10-28 20:53:43 +01:00
Chris Wilson
c92ac094a9 drm/i915: Remove superfluous wait_for_error() from throttle-ioctl
The throttle-ioctl never touches the struct_mutex. It does, however, as
part of its ABI report whether the hardware is terminally wedged. For
that purposes, it only has to report the current state and not incur the
cost of checking/waiting every invocation, as we do not have to wait for
a reset before waiting on a request to ensure completion (that is baked
into the wait request implementation).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-3-chris@chris-wilson.co.uk
2016-10-28 20:53:42 +01:00
Chris Wilson
b52992c06c drm/i915: Support asynchronous waits on struct fence from i915_gem_request
We will need to wait on DMA completion (as signaled via struct fence)
before executing our i915_gem_request. Therefore we want to expose a
method for adding the await on the fence itself to the request.

v2: Add a comment detailing a failure to handle a signal-on-any
fence-array.
v3: Pretend that magic numbers don't exist.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
2016-10-28 20:53:41 +01:00
Tvrtko Ursulin
3299e7e434 drm/i915: Remove two invalid warns
Objects can have multiple VMAs used for display in which
case assertion that objects must not be pinned for display
more times than the current VMA is incorrect.

v2: Commit message update. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 058d88c433 ("drm/i915: Track pinned VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1477413635-3876-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-10-26 09:04:56 +01:00
Tvrtko Ursulin
07ee2bce6a drm/i915: Rotated view does not need a fence
We do not need to set up a fence for the rotated view.

Display does not need it and no one can access it.

v2: Move code to __i915_vma_set_map_and_fenceable. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 05a20d098d ("drm/i915: Move map-and-fenceable tracking to the VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-10-26 09:04:55 +01:00
Chris Wilson
de867c20b9 drm/i915: Include the kernel uptime in the error state
As well as knowing when the error occurred, it is more interesting to me
to know how long after booting the error occurred, and for good measure
record the time since last hw initialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161025121602.1457-1-chris@chris-wilson.co.uk
2016-10-25 13:22:43 +01:00
Daniel Vetter
f9bf1d97e8 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge because Chris Wilson needs the very latest&greates of
Gustavo Padovan's sync_file work, specifically the refcounting changes
from:

commit 30cd85dd6e
Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Date:   Wed Oct 19 15:48:32 2016 -0200

    dma-buf/sync_file: hold reference to fence when creating sync_file

Also good to sync in general since git tends to get confused with the
cherry-picking going on.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-10-25 08:57:53 +02:00
Dave Airlie
5481e27f6f Merge tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel into drm-next
- first slice of the gvt device model (Zhenyu et al)
- compression support for gpu error states (Chris)
- sunset clause on gpu errors resulting in dmesg noise telling users
  how to report them
- .rodata diet from Tvrtko
- switch over lots of macros to only take dev_priv (Tvrtko)
- underrun suppression for dp link training (Ville)
- lspcon (hmdi 2.0 on skl/bxt) support from Shashank Sharma, polish
  from Jani
- gen9 wm fixes from Paulo&Lyude
- updated ddi programming for kbl (Rodrigo)
- respect alternate aux/ddc pins (from vbt) for all ddi ports (Ville)

* tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel: (227 commits)
  drm/i915: Update DRIVER_DATE to 20161024
  drm/i915: Stop setting SNB min-freq-table 0 on powersave setup
  drm/i915/dp: add lane_count check in intel_dp_check_link_status
  drm/i915: Fix whitespace issues
  drm/i915: Clean up DDI DDC/AUX CH sanitation
  drm/i915: Respect alternate_ddc_pin for all DDI ports
  drm/i915: Respect alternate_aux_channel for all DDI ports
  drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7
  drm/i915: KBL - Recommended buffer translation programming for DisplayPort
  drm/i915: Move down skl/kbl ddi iboost and n_edp_entires fixup
  drm/i915: Add a sunset clause to GPU hang logging
  drm/i915: Stop reporting error details in dmesg as well as the error-state
  drm/i915/gvt: do not ignore return value of create_scratch_page
  drm/i915/gvt: fix spare warnings on odd constant _Bool cast
  drm/i915/gvt: mark symbols static where possible
  drm/i915/gvt: fix sparse warnings on different address spaces
  drm/i915/gvt: properly access enabled intel_engine_cs
  drm/i915/gvt: Remove defunct vmap_batch()
  drm/i915/gvt: Use common mapping routines for shadow_bb object
  drm/i915/gvt: Use common mapping routines for indirect_ctx object
  ...
2016-10-25 16:39:43 +10:00
Chris Wilson
7c108fd8fe drm/i915: Move fence cancellation to runtime suspend
At the moment, we have dependency on the RPM as a barrier itself in both
i915_gem_release_all_mmaps() and i915_gem_restore_fences().
i915_gem_restore_fences() is also called along !runtime pm paths, but we
can move the markup of lost fences alongside releasing the mmaps into a
common i915_gem_runtime_suspend(). This has the advantage of locating
all the tricky barrier dependencies into one location.

v2: Just mark the fence as invalid (fence->dirty) so that upon waking we
will be sure to clear the fence after use, or restore it to the correct
value before use. This makes sure that if the fence is left intact
across the sleep, we do not leave it pointing to a region of GTT for the
next unsuspecting user.

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-5-chris@chris-wilson.co.uk
2016-10-24 13:45:38 +01:00
Chris Wilson
3594a3e21f drm/i915: Remove superfluous locking around userfault_list
Now that we have reduced the access to the list to either (a) under the
struct_mutex whilst holding the RPM wakeref (so that concurrent writers to
the list are serialised by struct_mutex) and (b) under the atomic
runtime suspend (which cannot run concurrently with any other accessor due
to the atomic nature of the runtime suspend) we can remove the extra
locking around the list itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-3-chris@chris-wilson.co.uk
2016-10-24 13:45:36 +01:00
Chris Wilson
9c870d0367 drm/i915: Use RPM as the barrier for controlling user mmap access
We can remove the false coupling between RPM and struct mutex by the
observation that we can use the RPM wakeref as the barrier around user
mmap access. That is as we tear down the user's PTE atomically from
within rpm suspend and then to fault in new PTE requires the rpm
wakeref, means that no user access is possible through those PTE without
RPM being awake. Having made that observation, we can then remove the
presumption of having to take rpm outside of struct_mutex and so allow
fine grained acquisition of a wakeref around hw access rather than
having to remember to acquire the wakeref early on.

v2: Rejig placement of the new intel_runtime_pm_get() to be as tight
as possible around the GTT pread/pwrite.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-2-chris@chris-wilson.co.uk
2016-10-24 13:45:35 +01:00
Chris Wilson
275f039db5 drm/i915: Move user fault tracking to a separate list
We want to decouple RPM and struct_mutex, but currently RPM has to walk
the list of bound objects and remove userspace mmapping before we
suspend (otherwise userspace may continue to access the GTT whilst it is
powered down). This currently requires the struct_mutex to walk the
bound_list, but if we move that to a separate list and lock we can take
the first step towards removing the struct_mutex.

v2: Split runtime suspend unmapping vs regular unmapping, to make the
locking (and barriers) clearer. Add the object to the userfault_list
prior to inserting the first PTE, the race between add/revoke depends
upon struct_mutex for regular unmappings and rpm for runtime-suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v1
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-1-chris@chris-wilson.co.uk
2016-10-24 13:45:35 +01:00
Chris Wilson
4ff340f061 drm/i915: Limit the scattergather coalescing to 32bits
The scattergather list uses a 32bit size counter, we should avoid
exceeding it.

v2: Also we should use unsigned int to match sg->length.

Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-3-chris@chris-wilson.co.uk
2016-10-18 14:22:27 +01:00
Chris Wilson
b4bcbe2a90 drm/i915: Document our internal limit on object size
In many places, we try to count pages using a 32 bit integer. That
implies if we are asked to create an object larger than 43bits, we will
subtly crash much later. Catch this on the boundary, and add a warning
to remind ourselves later on our exabyte systems.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-2-chris@chris-wilson.co.uk
2016-10-18 14:22:26 +01:00
Chris Wilson
3ef7f22893 drm/i915: Bump object bookkeeping to u64 from size_t
Internally we allow for using more objects than a single process can
allocate, i.e. we allow for a 64bit GPU address space even on a 32bit
system. Using size_t may oveerflow.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-1-chris@chris-wilson.co.uk
2016-10-18 14:22:26 +01:00
Michał Winiarski
4fb84d991e drm/i915: Remove unused "valid" parameter from pte_encode
We never used any invalid ptes, those were put in place for
a possibility of doing gpu faults. However our batchbuffers are not
restricted in length, so everything needs to be pointing to something
and thus out-of-bounds is pointing to scratch.

Remove the valid flag as it is always true.

v2: Expand commit msg, patch reorder (Mika)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-1-git-send-email-michal.winiarski@intel.com
2016-10-14 12:40:32 +01:00
Tvrtko Ursulin
5db9401983 drm/i915: Make IS_GEN macros only take dev_priv
Saves 1416 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476352990-2504-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-10-14 12:23:22 +01:00
Tvrtko Ursulin
772c2a519c drm/i915: Make IS_HASWELL only take dev_priv
Saves 2432 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2016-10-14 12:23:19 +01:00
Tvrtko Ursulin
8652744b64 drm/i915: Make IS_BROADWELL only take dev_priv
Saves 1808 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2016-10-14 12:23:19 +01:00
Tvrtko Ursulin
fd6b8f43c9 drm/i915: Make IS_IVYBRIDGE only take dev_priv
Saves 848 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)
v3: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2016-10-14 12:23:19 +01:00
Tvrtko Ursulin
50a0bc9054 drm/i915: Make INTEL_DEVID only take dev_priv
Saves 4472 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2016-10-14 12:23:19 +01:00
Tvrtko Ursulin
6e266956a5 drm/i915: Make INTEL_PCH_TYPE & co only take dev_priv
This saves 1872 bytes of .rodata strings.

v2:
 * Rebase.
 * Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2016-10-14 12:23:19 +01:00
Akash Goel
3b3f1650b1 drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
	struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
	struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.

There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).

v2:
- Remove the engine iterator field added in drm_i915_private structure,
  instead pass a local iterator variable to the for_each_engine**
  macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
  NULL pointer check on engine pointer. (Chris)

v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
  can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
  engine specific init is done later in Driver load sequence.

v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().

v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
  allocation of intel_engine_cs structure. (Chris)

v6:
- Rebase.

v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.

v8: Rebase.

v9: Rebase.

v10:
- For index calculation use engine ID instead of pointer based arithmetic in
  intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
  check for NULL engine pointer in cleanup() routines. (Joonas)

v11: Rebase.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 09:58:43 +01:00
Chris Wilson
ad16d2ed8f drm/i915: Skip unbinding large unmappable global buffers
If the user requests a mappable binding to the global GTT, we will first
unbind an existing mapping if it doesn't match. We will unbind even if
there is no possibility that the object can fit in the mappable
aperture. This may lead to a ping-pong migration of the object, for
example igt/gem_exec_big.

v2: Comment upon the reasoning, or lack thereof!, behind the choice of
magic numbers.

Testcase: igt/gem_exec_big
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161013085504.30705-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com
2016-10-13 15:47:47 +01:00
Imre Deak
1c777c5d1d drm/i915/hsw: Fix GPU hang during resume from S3-devices state
Currently resuming on HSW from S3 pm_test/devices state leads to an
unrecoverable GPU hang. Resetting the GPU during suspend fixes this. For
a full S3 cycle this change only means the reset happens earlier (before
reaching S3). For S4 the reset will happen now both during the freeze
and quiesce phases, which is a benefit since it will guarantee that the
GPU is idle before creating and loading the hibernation image.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476283597-580-1-git-send-email-imre.deak@intel.com
2016-10-13 12:11:31 +03:00
Linus Torvalds
6b25e21fa6 Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Core:
   - Fence destaging work
   - DRIVER_LEGACY to split off legacy drm drivers
   - drm_mm refactoring
   - Splitting drm_crtc.c into chunks and documenting better
   - Display info fixes
   - rbtree support for prime buffer lookup
   - Simple VGA DAC driver

  Panel:
   - Add Nexus 7 panel
   - More simple panels

  i915:
   - Refactoring GEM naming
   - Refactored vma/active tracking
   - Lockless request lookups
   - Better stolen memory support
   - FBC fixes
   - SKL watermark fixes
   - VGPU improvements
   - dma-buf fencing support
   - Better DP dongle support

  amdgpu:
   - Powerplay for Iceland asics
   - Improved GPU reset support
   - UVD/VEC powergating support for CZ/ST
   - Preinitialised VRAM buffer support
   - Virtual display support
   - Initial SI support
   - GTT rework
   - PCI shutdown callback support
   - HPD IRQ storm fixes

  amdkfd:
   - bugfixes

  tilcdc:
   - Atomic modesetting support

  mediatek:
   - AAL + GAMMA engine support
   - Hook up gamma LUT
   - Temporal dithering support

  imx:
   - Pixel clock from devicetree
   - drm bridge support for LVDS bridges
   - active plane reconfiguration
   - VDIC deinterlacer support
   - Frame synchronisation unit support
   - Color space conversion support

  analogix:
   - PSR support
   - Better panel on/off support

  rockchip:
   - rk3399 vop/crtc support
   - PSR support

  vc4:
   - Interlaced vblank timing
   - 3D rendering CPU overhead reduction
   - HDMI output fixes

  tda998x:
   - HDMI audio ASoC support

  sunxi:
   - Allwinner A33 support
   - better TCON support

  msm:
   - DT binding cleanups
   - Explicit fence-fd support

  sti:
   - remove sti415/416 support

  etnaviv:
   - MMUv2 refactoring
   - GC3000 support

  exynos:
   - Refactoring HDMI DCC/PHY
   - G2D pm regression fix
   - Page fault issues with wait for vblank

  There is no nouveau work in this tree, as Ben didn't get a pull
  request in, and he was fighting moving to atomic and adding mst
  support, so maybe best it waits for a cycle"

* tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits)
  drm/crtc: constify drm_crtc_index parameter
  drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next
  drm/i915/guc: Unwind GuC workqueue reservation if request construction fails
  drm/i915: Reset the breadcrumbs IRQ more carefully
  drm/i915: Force relocations via cpu if we run out of idle aperture
  drm/i915: Distinguish last emitted request from last submitted request
  drm/i915: Allow DP to work w/o EDID
  drm/i915: Move long hpd handling into the hotplug work
  drm/i915/execlists: Reinitialise context image after GPU hang
  drm/i915: Use correct index for backtracking HUNG semaphores
  drm/i915: Unalias obj->phys_handle and obj->userptr
  drm/i915: Just clear the mmiodebug before a register access
  drm/i915/gen9: only add the planes actually affected by ddb changes
  drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED
  drm/i915/bxt: Fix HDMI DPLL configuration
  drm/i915/gen9: fix the watermark res_blocks value
  drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations
  drm/i915/gen9: minimum scanlines for Y tile is not always 4
  drm/i915/gen9: fix the WaWmMemoryReadLatency implementation
  drm/i915/kbl: KBL also needs to run the SAGV code
  ...
2016-10-11 18:12:22 -07:00
Chris Wilson
908b123225 drm/i915: Convert open-coded use of vma_pages()
If we want to know how many pages a VMA spans, we can use vma_pages() to
find out. We have one such invocation inside our faulthandler, so
convert it. (We have two other that want the size in bytes rather than
pages, food for future thought.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161011090656.29554-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-10-11 10:54:20 +01:00
Chris Wilson
871dfbd67d drm/i915: Allow compaction upto SWIOTLB max segment size
commit 1625e7e549 ("drm/i915: make compact dma scatter lists creation
work with SWIOTLB backend") took a heavy handed approach to undo the
scatterlist compaction in the face of SWIOTLB. (The compaction hit a bug
whereby we tried to pass a segment larger than SWIOTLB could handle.) We
can be a little more intelligent and try compacting the scatterlist up
to the maximum SWIOTLB segment size (when using SWIOTLB).

v2: Tidy sg_mark_end() and cpp

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
CC: Imre Deak <imre.deak@intel.com>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161011082021.14606-2-chris@chris-wilson.co.uk
2016-10-11 10:15:01 +01:00
Chris Wilson
465350d0db drm/i915: Remove self-harming shrink_all on get_pages_gtt fail
When we notice the system under memory pressure, we try to evict some
driver pages before asking the VM to shrink all caches. As a final step
in that process, we tried to evict everything, including active buffers.
This is harming ourselves, and we can mix shrinking all caches as well
as our residual buffers (after the first pass of trying to shrink just
our own buffers).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161011082021.14606-1-chris@chris-wilson.co.uk
2016-10-11 10:15:01 +01:00
Chris Wilson
105f1a65b0 drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next
The conflict resolution of v4.8-rc8 backmerge to drm-next pulled back in
a few lines of dead code due to the code movement around
i915_gem_reset(), fix that up.

Fixes: ca09fb9f60 ("Merge tag 'v4.8-rc8' into drm-next")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161010125017.23911-1-chris@chris-wilson.co.uk
2016-10-10 16:12:21 +03:00
Chris Wilson
ec7ce653d9 drm/i915: Only shrink the unbound objects during freeze
At the point of creating the hibernation image, the runtime power manage
core is disabled - and using the rpm functions triggers a warn.
i915_gem_shrink_all() tries to unbind objects, which requires device
access and so tries to how an rpm reference triggering a warning:

[   44.235420] ------------[ cut here ]------------
[   44.235424] WARNING: CPU: 2 PID: 2199 at drivers/gpu/drm/i915/intel_runtime_pm.c:2688 intel_runtime_pm_get_if_in_use+0xe6/0xf0
[   44.235426] WARN_ON_ONCE(ret < 0)
[   44.235445] Modules linked in: ctr ccm arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib crc_ccitt mac80211 cmac cfg80211 btusb rfcomm bnep btrtl btbcm btintel bluetooth dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper cryptd snd_hda_codec hid_multitouch joydev snd_hda_core binfmt_misc i2c_hid serio_raw snd_pcm acpi_pad snd_timer snd i2c_designware_platform 8250_dw nls_iso8859_1 i2c_designware_core lpc_ich mfd_core soundcore usbhid hid psmouse ahci libahci
[   44.235447] CPU: 2 PID: 2199 Comm: kworker/u8:8 Not tainted 4.8.0-rc5+ #130
[   44.235447] Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A07 11/11/2015
[   44.235450] Workqueue: events_unbound async_run_entry_fn
[   44.235453]  0000000000000000 ffff8801b2f7fb98 ffffffff81306c2f ffff8801b2f7fbe8
[   44.235454]  0000000000000000 ffff8801b2f7fbd8 ffffffff81056c01 00000a801f50ecc0
[   44.235456]  ffff88020ce50000 ffff88020ce59b60 ffffffff81a60b5c ffffffff81414840
[   44.235456] Call Trace:
[   44.235459]  [<ffffffff81306c2f>] dump_stack+0x4d/0x6e
[   44.235461]  [<ffffffff81056c01>] __warn+0xd1/0xf0
[   44.235464]  [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30
[   44.235465]  [<ffffffff81056c6f>] warn_slowpath_fmt+0x4f/0x60
[   44.235468]  [<ffffffff814e73ce>] ? pm_runtime_get_if_in_use+0x6e/0xa0
[   44.235469]  [<ffffffff81433526>] intel_runtime_pm_get_if_in_use+0xe6/0xf0
[   44.235471]  [<ffffffff81458a26>] i915_gem_shrink+0x306/0x360
[   44.235473]  [<ffffffff81343fd4>] ? pci_platform_power_transition+0x24/0x90
[   44.235475]  [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30
[   44.235476]  [<ffffffff81458dfb>] i915_gem_shrink_all+0x1b/0x30
[   44.235478]  [<ffffffff814560b3>] i915_gem_freeze_late+0x33/0x90
[   44.235479]  [<ffffffff81414877>] i915_pm_freeze_late+0x37/0x40
[   44.235481]  [<ffffffff814e9b8e>] dpm_run_callback+0x4e/0x130
[   44.235483]  [<ffffffff814ea5db>] __device_suspend_late+0xdb/0x1f0
[   44.235484]  [<ffffffff814ea70f>] async_suspend_late+0x1f/0xa0
[   44.235486]  [<ffffffff81077557>] async_run_entry_fn+0x37/0x150
[   44.235488]  [<ffffffff8106f518>] process_one_work+0x148/0x3f0
[   44.235490]  [<ffffffff8106f8eb>] worker_thread+0x12b/0x490
[   44.235491]  [<ffffffff8106f7c0>] ? process_one_work+0x3f0/0x3f0
[   44.235492]  [<ffffffff81074d09>] kthread+0xc9/0xe0
[   44.235495]  [<ffffffff816e257f>] ret_from_fork+0x1f/0x40
[   44.235496]  [<ffffffff81074c40>] ? kthread_park+0x60/0x60
[   44.235497] ---[ end trace e438706b97c7f132 ]---

Alternatively, to actually shrink everything we have to do so slightly
earlier in the hibernation process.

To keep lockdep silent, we need to take struct_mutex for the shrinker
even though we know that we are the only user during the freeze.

Fixes: 7aab2d534e ("drm/i915: Shrink objects prior to hibernation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-2-chris@chris-wilson.co.uk
(cherry picked from commit 6a800eabba)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-10 16:06:36 +03:00
Chris Wilson
ac75694125 drm/i915: Restore current RPS state after reset
Following commit 821ed7df6e ("drm/i915: Update reset path to fix
incomplete requests") we no longer mark the context as lost on reset as
we keep the requests (and contexts) alive. However, RPS remains reset
and we need to restore the current state to match the in-flight
requests.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97824
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-1-chris@chris-wilson.co.uk
(cherry picked from commit f2a91d1a6f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-10 16:06:35 +03:00
Chris Wilson
77c607013e drm/i915: Double check hangcheck.seqno after reset
Check that there was not a late recovery between us declaring the GPU
hung and processing the reset. If the GPU did recover by itself, let the
request remain on the active list and see if it hangs again!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-5-chris@chris-wilson.co.uk
2016-10-05 08:40:06 +01:00
Chris Wilson
9e60ab0387 drm/i915: Disable irqs across GPU reset
Whilst we reset the GPU, we want to prevent execlists from submitting
new work (which it does via an interrupt handler). To achieve this we
disable the irq (and drain the irq tasklet) around the reset. When we
enable it again afters, the interrupt queue should be empty and we can
reinitialise from a known state without fear of the tasklet running
concurrently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-4-chris@chris-wilson.co.uk
2016-10-05 08:40:06 +01:00
Dave Airlie
ca09fb9f60 Linux 4.8-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJX6H4uAAoJEHm+PkMAQRiG5sMH/3yzrMiUCSokdS+cvY+jgKAG
 JS58JmRvBPz2mRaU3MRPBGRDeCz/Nc9LggL2ZcgM+E1ZYirlYyQfIED3lkqk5R07
 kIN1wmb+kQhXyU4IY3fEX7joqyKC6zOy4DUChPkBQU0/0+VUmdVmcJvsuPlnMZtf
 g95m0BdYTui+eDezASRqOEp3Lb5ONL4c3ao4yBP0LHF033ctj3VJQiyi5uERPZJ0
 5e6Mo7Wxn78t9WqJLQAiEH46kTwT2plNlxf3XXqTenfIdbWhqE873HPGeSMa3VQV
 VywXTpCpSPQsA8BYg66qIbebdKOhs9MOviHVfqDtwQlvwhjlBDya0gNHfI5fSy4=
 =Y/L5
 -----END PGP SIGNATURE-----

Merge tag 'v4.8-rc8' into drm-next

Linux 4.8-rc8

There was a lot of fallout in the imx/amdgpu/i915 drivers, so backmerge
it now to avoid troubles.

* tag 'v4.8-rc8': (1442 commits)
  Linux 4.8-rc8
  fault_in_multipages_readable() throws set-but-unused error
  mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  radix tree: fix sibling entry handling in radix_tree_descend()
  radix tree test suite: Test radix_tree_replace_slot() for multiorder entries
  fix memory leaks in tracing_buffers_splice_read()
  tracing: Move mutex to protect against resetting of seq data
  MIPS: Fix delay slot emulation count in debugfs
  MIPS: SMP: Fix possibility of deadlock when bringing CPUs online
  mm: delete unnecessary and unsafe init_tlb_ubc()
  huge tmpfs: fix Committed_AS leak
  shmem: fix tmpfs to handle the huge= option properly
  blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx
  MIPS: Fix pre-r6 emulation FPU initialisation
  arm64: kgdb: handle read-only text / modules
  arm64: Call numa_store_cpu_info() earlier.
  locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
  nvme-rdma: only clear queue flags after successful connect
  i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
  perf/core: Limit matching exclusive events to one PMU
  ...
2016-09-28 12:08:49 +10:00
Al Viro
4bce9f6ee8 get rid of separate multipage fault-in primitives
* the only remaining callers of "short" fault-ins are just as happy with generic
variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
"short" ones
* rename the multipage variants to now available plain ones.
* get rid of compat macro defining iov_iter_fault_in_multipage_readable by
expanding it in its only user.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 18:12:24 -04:00
Chris Wilson
6a800eabba drm/i915: Only shrink the unbound objects during freeze
At the point of creating the hibernation image, the runtime power manage
core is disabled - and using the rpm functions triggers a warn.
i915_gem_shrink_all() tries to unbind objects, which requires device
access and so tries to how an rpm reference triggering a warning:

[   44.235420] ------------[ cut here ]------------
[   44.235424] WARNING: CPU: 2 PID: 2199 at drivers/gpu/drm/i915/intel_runtime_pm.c:2688 intel_runtime_pm_get_if_in_use+0xe6/0xf0
[   44.235426] WARN_ON_ONCE(ret < 0)
[   44.235445] Modules linked in: ctr ccm arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib crc_ccitt mac80211 cmac cfg80211 btusb rfcomm bnep btrtl btbcm btintel bluetooth dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper cryptd snd_hda_codec hid_multitouch joydev snd_hda_core binfmt_misc i2c_hid serio_raw snd_pcm acpi_pad snd_timer snd i2c_designware_platform 8250_dw nls_iso8859_1 i2c_designware_core lpc_ich mfd_core soundcore usbhid hid psmouse ahci libahci
[   44.235447] CPU: 2 PID: 2199 Comm: kworker/u8:8 Not tainted 4.8.0-rc5+ #130
[   44.235447] Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A07 11/11/2015
[   44.235450] Workqueue: events_unbound async_run_entry_fn
[   44.235453]  0000000000000000 ffff8801b2f7fb98 ffffffff81306c2f ffff8801b2f7fbe8
[   44.235454]  0000000000000000 ffff8801b2f7fbd8 ffffffff81056c01 00000a801f50ecc0
[   44.235456]  ffff88020ce50000 ffff88020ce59b60 ffffffff81a60b5c ffffffff81414840
[   44.235456] Call Trace:
[   44.235459]  [<ffffffff81306c2f>] dump_stack+0x4d/0x6e
[   44.235461]  [<ffffffff81056c01>] __warn+0xd1/0xf0
[   44.235464]  [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30
[   44.235465]  [<ffffffff81056c6f>] warn_slowpath_fmt+0x4f/0x60
[   44.235468]  [<ffffffff814e73ce>] ? pm_runtime_get_if_in_use+0x6e/0xa0
[   44.235469]  [<ffffffff81433526>] intel_runtime_pm_get_if_in_use+0xe6/0xf0
[   44.235471]  [<ffffffff81458a26>] i915_gem_shrink+0x306/0x360
[   44.235473]  [<ffffffff81343fd4>] ? pci_platform_power_transition+0x24/0x90
[   44.235475]  [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30
[   44.235476]  [<ffffffff81458dfb>] i915_gem_shrink_all+0x1b/0x30
[   44.235478]  [<ffffffff814560b3>] i915_gem_freeze_late+0x33/0x90
[   44.235479]  [<ffffffff81414877>] i915_pm_freeze_late+0x37/0x40
[   44.235481]  [<ffffffff814e9b8e>] dpm_run_callback+0x4e/0x130
[   44.235483]  [<ffffffff814ea5db>] __device_suspend_late+0xdb/0x1f0
[   44.235484]  [<ffffffff814ea70f>] async_suspend_late+0x1f/0xa0
[   44.235486]  [<ffffffff81077557>] async_run_entry_fn+0x37/0x150
[   44.235488]  [<ffffffff8106f518>] process_one_work+0x148/0x3f0
[   44.235490]  [<ffffffff8106f8eb>] worker_thread+0x12b/0x490
[   44.235491]  [<ffffffff8106f7c0>] ? process_one_work+0x3f0/0x3f0
[   44.235492]  [<ffffffff81074d09>] kthread+0xc9/0xe0
[   44.235495]  [<ffffffff816e257f>] ret_from_fork+0x1f/0x40
[   44.235496]  [<ffffffff81074c40>] ? kthread_park+0x60/0x60
[   44.235497] ---[ end trace e438706b97c7f132 ]---

Alternatively, to actually shrink everything we have to do so slightly
earlier in the hibernation process.

To keep lockdep silent, we need to take struct_mutex for the shrinker
even though we know that we are the only user during the freeze.

Fixes: 7aab2d534e ("drm/i915: Shrink objects prior to hibernation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-2-chris@chris-wilson.co.uk
2016-09-21 16:57:06 +01:00
Chris Wilson
f2a91d1a6f drm/i915: Restore current RPS state after reset
Following commit 821ed7df6e ("drm/i915: Update reset path to fix
incomplete requests") we no longer mark the context as lost on reset as
we keep the requests (and contexts) alive. However, RPS remains reset
and we need to restore the current state to match the in-flight
requests.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97824
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-1-chris@chris-wilson.co.uk
2016-09-21 16:52:39 +01:00
Chris Wilson
7aab2d534e drm/i915: Shrink objects prior to hibernation
In an attempt to keep the hibernation image as same as possible, let's
try and discard any unwanted pages and our own page arrays.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909190218.16831-1-chris@chris-wilson.co.uk
2016-09-09 20:07:46 +01:00
Chris Wilson
a2bc4695bb drm/i915: Prepare object synchronisation for asynchronicity
We are about to specialize object synchronisation to enable nonblocking
execbuf submission. First we make a copy of the current object
synchronisation for execbuffer. The general i915_gem_object_sync() will
be removed following the removal of CS flips in the near future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk
2016-09-09 14:23:06 +01:00
Chris Wilson
5590af3e11 drm/i915: Drive request submission through fence callbacks
Drive final request submission from a callback from the fence. This way
the request is queued until all dependencies are resolved, at which
point it is handed to the backend for queueing to hardware. At this
point, no dependencies are set on the request, so the callback is
immediate.

A side-effect of imposing a heavier-irqsafe spinlock for execlist
submission is that we lose the softirq enabling after scheduling the
execlists tasklet. To compensate, we manually kickstart the softirq by
disabling and enabling the bh around the fence signaling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-14-chris@chris-wilson.co.uk
2016-09-09 14:23:05 +01:00
Chris Wilson
821ed7df6e drm/i915: Update reset path to fix incomplete requests
Update reset path in preparation for engine reset which requires
identification of incomplete requests and associated context and fixing
their state so that engine can resume correctly after reset.

The request that caused the hang will be skipped and head is reset to the
start of breadcrumb. This allows us to resume from where we left-off.
Since this request didn't complete normally we also need to cleanup elsp
queue manually. This is vital if we employ nonblocking request
submission where we may have a web of dependencies upon the hung request
and so advancing the seqno manually is no longer trivial.

ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS

We change the way we count pending batches. Only the active context
involved in the reset is marked as either innocent or guilty, and not
mark the entire world as pending. By inspection this only affects
igt/gem_reset_stats (which assumes implementation details) and not
piglit.

ARB_robustness gives this guide on how we expect the user of this
interface to behave:

 * Provide a mechanism for an OpenGL application to learn about
   graphics resets that affect the context.  When a graphics reset
   occurs, the OpenGL context becomes unusable and the application
   must create a new context to continue operation. Detecting a
   graphics reset happens through an inexpensive query.

And with regards to the actual meaning of the reset values:

   Certain events can result in a reset of the GL context. Such a reset
   causes all context state to be lost. Recovery from such events
   requires recreation of all objects in the affected context. The
   current status of the graphics reset state is returned by

	enum GetGraphicsResetStatusARB();

   The symbolic constant returned indicates if the GL context has been
   in a reset state at any point since the last call to
   GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context
   has not been in a reset state since the last call.
   GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected
   that is attributable to the current GL context.
   INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that
   is not attributable to the current GL context.
   UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose
   cause is unknown.

The language here is explicit in that we must mark up the guilty batch,
but is loose enough for us to relax the innocent (i.e. pending)
accounting as only the active batches are involved with the reset.

In the future, we are looking towards single engine resetting (with
minimal locking), where it seems inappropriate to mark the entire world
as innocent since the reset occurred on a different engine. Reducing the
information available means we only have to encounter the pain once, and
also reduces the information leaking from one context to another.

v2: Legacy ringbuffer submission required a reset following hibernation,
or else we restore stale values to the RING_HEAD and walked over
stolen garbage.

v3: GuC requires replaying the requests after a reset.

v4: Restore engine IRQ after reset (so waiters will be woken!)
    Rearm hangcheck if resetting with a waiter.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk
2016-09-09 14:23:05 +01:00
Chris Wilson
22dd3bb919 drm/i915: Mark up all locked waiters
In the next patch we want to handle reset directly by a locked waiter in
order to avoid issues with returning before the reset is handled. To
handle the reset, we must first know whether we hold the struct_mutex.
If we do not hold the struct_mtuex we can not perform the reset, but we do
not block the reset worker either (and so we can just continue to wait for
request completion) - otherwise we must relinquish the mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk
2016-09-09 14:23:03 +01:00
Chris Wilson
ea746f3659 drm/i915: Expand bool interruptible to pass flags to i915_wait_request()
We need finer control over wakeup behaviour during i915_wait_request(),
so expand the current bool interruptible to a bitmask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk
2016-09-09 14:23:03 +01:00
Chris Wilson
8af29b0c78 drm/i915: Separate out reset flags from the reset counter
In preparation for introducing a per-engine reset, we can first separate
the mixing of the reset state from the global reset counter.

The loss of atomicity in updating the reset state poses a small problem
for handling the waiters. For requests, this is solved by advancing the
seqno so that a waiter waking up after the reset knows the request is
complete. For pending flips, we still rely on the increment of the
global reset epoch (as well as the reset-in-progress flag) to signify
when the hardware was reset.

The advantage, now that we do not inspect the reset state during reset
itself i.e. we no longer emit requests during reset, is that we can use
the atomic updates of the state flags to ensure that only one reset
worker is active.

v2: Mika spotted that I transformed the i915_gem_wait_for_error() wakeup
into a waiter wakeup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-6-git-send-email-arun.siluvery@linux.intel.com
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-7-chris@chris-wilson.co.uk
2016-09-09 14:23:02 +01:00
Chris Wilson
70c2a24dbf drm/i915: Simplify ELSP queue request tracking
Emulate HW to track and manage ELSP queue. A set of SW ports are defined
and requests are assigned to these ports before submitting them to HW. This
helps in cleaning up incomplete requests during reset recovery easier
especially after engine reset by decoupling elsp queue management. This
will become more clear in the next patch.

In the engine reset case we want to resume where we left-off after skipping
the incomplete batch which requires checking the elsp queue, removing
element and fixing elsp_submitted counts in some cases. Instead of directly
manipulating the elsp queue from reset path we can examine these ports, fix
up ringbuffer pointers using the incomplete request and restart submissions
again after reset.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-3-git-send-email-arun.siluvery@linux.intel.com
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-6-chris@chris-wilson.co.uk
2016-09-09 14:23:01 +01:00
Joonas Lahtinen
6f63340284 drm/i915: Use atomic for dev_priv->mm.bsd_engine_dispatch_index
Use atomic type and operands for dev_priv->mm.bsd_engine_dispatch_index
to avoid one struct_mutex locking scenario.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1472731101-21982-1-git-send-email-joonas.lahtinen@linux.intel.com
2016-09-01 15:39:25 +03:00
Chris Wilson
4cc6907501 drm/i915: Add I915_PARAM_MMAP_GTT_VERSION to advertise unlimited mmaps
Now that we have working partial VMA and faulting support for all
objects, including fence support, advertise to userspace that it can
take advantage of unlimited GGTT mmaps.

v2: Make room in the kerneldoc for a more detailed explanation of the
limitations of the GTT mmap interface.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20160825180519.11341-1-chris@chris-wilson.co.uk
2016-08-26 08:42:26 +01:00
Chris Wilson
c58305af18 drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass
Very old numbers indicate this is a 66% improvement when remapping the
entire object for fence contention - due to the elimination of
track_pfn_insert and its strcmp.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-6-chris@chris-wilson.co.uk
2016-08-19 17:13:36 +01:00
Chris Wilson
f7bbe7883c drm/i915: Embed the io-mapping struct inside drm_i915_private
As io_mapping.h now always allocates the struct, we can avoid that
allocation and extra pointer dance by embedding the struct inside
drm_i915_private

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk
2016-08-19 17:13:35 +01:00
Chris Wilson
cd3127d684 drm/i915: Stop discarding GTT cache-domain on unbind vma
Since commit 43566dedde ("drm/i915: Broaden application of
set-domain(GTT)") we allowed objects to be in the GTT domain, but unbound.
Therefore removing the GTT cache domain when removing the GGTT vma is no
longer semantically correct.

An unfortunate side-effect is we lose the wondrously named
i915_gem_object_finish_gtt(), not to be confused with
i915_gem_gtt_finish_object()!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-30-chris@chris-wilson.co.uk
2016-08-18 22:36:57 +01:00
Chris Wilson
383d5823e9 drm/i915: Bump the inactive tracking for all VMA accessed
We track the LRU access for eviction and bump the last access for the
user GGTT on set-to-gtt. When we do so we need to not only bump the
primary GGTT VMA but all partials as well. Similarly we want to
bump the last access tracking for when unpinning an object from the
scanout so that they do not get promptly evicted and hopefully remain
available for reuse on the next frame.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-29-chris@chris-wilson.co.uk
2016-08-18 22:36:57 +01:00
Chris Wilson
d8923dcfa5 drm/i915: Track display alignment on VMA
When using the aliasing ppgtt and pageflipping with the shrinker/eviction
active, we note that we often have to rebind the backbuffer before
flipping onto the scanout because it has an invalid alignment. If we
store the worst-case alignment required for a VMA, we can avoid having
to rebind at critical junctures.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-28-chris@chris-wilson.co.uk
2016-08-18 22:36:56 +01:00
Chris Wilson
2efb813d53 drm/i915: Fallback to using unmappable memory for scanout
The existing ABI says that scanouts are pinned into the mappable region
so that legacy clients (e.g. old Xorg or plymouthd) can write directly
into the scanout through a GTT mapping. However if the surface does not
fit into the mappable region, we are better off just trying to fit it
anywhere and hoping for the best. (Any userspace that is capable of
using ginormous scanouts is also likely not to rely on pure GTT
updates.) With the partial vma fault support, we are no longer
restricted to only using scanouts that we can pin (though it is still
preferred for performance reasons and for powersaving features like
FBC).

v2: Skip fence pinning when not mappable.
v3: Add a comment to explain the possible ramifications of not being
    able to use fences for unmappable scanouts.
v4: Rebase to skip over some local patches
v5: Rebase to defer until after we have unmappable GTT fault support

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-27-chris@chris-wilson.co.uk
2016-08-18 22:36:56 +01:00
Chris Wilson
821188778b drm/i915: Choose not to evict faultable objects from the GGTT
Often times we do not want to evict mapped objects from the GGTT as
these are quite expensive to teardown and frequently reused (causing an
equally, if not more so, expensive setup). In particular, when faulting
in a new object we want to avoid evicting an active object, or else we
may trigger a page-fault-of-doom as we ping-pong between evicting two
objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-26-chris@chris-wilson.co.uk
2016-08-18 22:36:55 +01:00
Chris Wilson
50349247ea drm/i915: Drop ORIGIN_GTT for untracked GTT writes
If FBC is set on a framebuffer that is unmapped, all GTT faults will be
from a partial mapping. Writes by the user through the partial VMA are
then untracked by the FBC and so we must use the ORIGIN_CPU when flushing
the I915_GEM_DOMAIN_GTT.

v2: Keep ORIGIN_CPU for set-to-domain(.write=CPU)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-25-chris@chris-wilson.co.uk
2016-08-18 22:36:55 +01:00
Chris Wilson
aa136d9d72 drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object
If we want to create a partial vma from a chunk that is the same size as
the object, create a normal ggtt vma instead. The benefit is that it
will match future requests for the normal ggtt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-24-chris@chris-wilson.co.uk
2016-08-18 22:36:51 +01:00
Chris Wilson
a61007a83a drm/i915: Fix partial GGTT faulting
We want to always use the partial VMA as a fallback for a failure to
bind the object into the GGTT. This extends the support partial objects
in the GGTT to cover everything, not just objects too large.

v2: Call the partial view, view not partial.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-23-chris@chris-wilson.co.uk
2016-08-18 22:36:51 +01:00
Chris Wilson
03af84fe7f drm/i915: Choose partial chunksize based on tile row size
In order to support setting up fences for partial mappings of an object,
we have to align those mappings with the fence. The minimum chunksize we
choose is at least the size of a single tile row.

v2: Make minimum chunk size a define for later use

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-22-chris@chris-wilson.co.uk
2016-08-18 22:36:50 +01:00
Chris Wilson
49ef5294cd drm/i915: Move fence tracking from object to vma
In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.

v2: A couple of silly drops replaced spotted by Joonas

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
2016-08-18 22:36:50 +01:00
Chris Wilson
a1e5afbe4d drm/i915: Rename fence.lru_list to link
Our current practice is to only name the actual list (here
dev_priv->fence_list) using "list", and elements upon that list are
referred to as "link". Further, the lru nature is of the list and not of
the node and including in the name does not disambiguate the link from
anything else.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk
2016-08-18 22:36:49 +01:00
Chris Wilson
05a20d098d drm/i915: Move map-and-fenceable tracking to the VMA
By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk
2016-08-18 22:36:48 +01:00
Chris Wilson
b0dc465f95 drm/i915: Tidy up flush cpu/gtt write domains
Since we know the write domain, we can drop the local variable and make
the code look a tiny bit simpler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-12-chris@chris-wilson.co.uk
2016-08-18 22:36:46 +01:00
Chris Wilson
9764951e7f drm/i915: Pin the pages first in shmem prepare read/write
There is an improbable, but not impossible, case that if we leave the
pages unpin as we operate on the object, then somebody via the shrinker
may steal the lock (which lock? right now, it is struct_mutex, THE lock)
and change the cache domains after we have already inspected them.

(Whilst here, avail ourselves of the opportunity to take a couple of
steps to make the two functions look more similar.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-11-chris@chris-wilson.co.uk
2016-08-18 22:36:45 +01:00
Chris Wilson
3b5724d702 drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.

The issue can be illustrated in userspace with:

	gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
	cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
	gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);

	for (i = 0; i < OBJECT_SIZE / 64; i++) {
		int x = 16*i + (i%16);
		gtt[x] = i;
		clflush(&cpu[x], sizeof(cpu[x]));
		assert(cpu[x] == i);
	}

Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.

Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-18 22:36:45 +01:00
Chris Wilson
a314d5cb4a drm/i915: Before accessing an object via the cpu, flush GTT writes
If we want to read the pages directly via the CPU, we have to be sure
that we have to flush the writes via the GTT (as the CPU can not see
the address aliasing).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-9-chris@chris-wilson.co.uk
2016-08-18 22:36:44 +01:00
Chris Wilson
43394c7d0d drm/i915: Extract i915_gem_obj_prepare_shmem_write()
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.

Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.

v2: Add i915_gem_obj_finish_shmem_access() for symmetry

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk
2016-08-18 22:36:44 +01:00
Chris Wilson
1803458478 drm/i915: Fallback to single page pwrite/pread if unable to release fence
If we cannot release the fence (for example if someone is inexplicably
trying to write into a tiled framebuffer that is currently pinned to the
display! *cough* kms_frontbuffer_tracking *cough*) fallback to using the
page-by-page pwrite/pread interface, rather than fail the syscall
entirely.

Since this is triggerable by the user (along pwrite) we have to remove
the WARN_ON(fence->pin_count).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-6-chris@chris-wilson.co.uk
2016-08-18 22:36:43 +01:00
Chris Wilson
d243ad8202 drm/i915: Mark up the GTT flush following WC writes as ORIGIN_CPU
Similarly to invalidating beforehand, if the object is mmapped via
I915_MMAP_WC we cannot track writes through the I915_GEM_DOMAIN_GTT. At
the conclusion of the write, i915_gem_object_flush_gtt_writes() we also
need to treat the origin carefully in case it may have been untracked.

See also commit aeecc9696a ("drm/i915: use ORIGIN_CPU for frontbuffer
invalidation on WC mmaps").

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-5-chris@chris-wilson.co.uk
2016-08-18 22:36:43 +01:00
Chris Wilson
b19482d7ce drm/i915: Use ORIGIN_CPU for fb invalidation from pwrite
As pwrite does not use the fence for its GTT access, and may even go
through a secondary interface avoiding the main VMA, we cannot treat the
write as automatically invalidated by the hardware and so we require
ORIGIN_CPU frontbufer invalidate/flushes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-4-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-18 22:36:24 +01:00
Chris Wilson
4b30cb2334 drm/i915: vfree() no longer ignores the low bits of the address
Since vfree() now likes to WARN when passed a non-page-aligned pointer,
we need to discard the low bits to comply with it.

Fixes: d31d7cb146 ("drm/i915: Support for creating write combined type vmaps")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-3-chris@chris-wilson.co.uk
2016-08-18 22:36:23 +01:00
Chris Wilson
1255501d86 drm/i915: Embrace the race in busy-ioctl
Daniel Vetter proposed a new challenge to the serialisation inside the
busy-ioctl that exposed a flaw that could result in us reporting the
wrong engine as being busy. If the request is reallocated as we test
its busyness and then reassigned to this object by another thread, we
would not notice that the test itself was incorrect.

We are faced with a choice of using __i915_gem_active_get_request_rcu()
to first acquire a reference to the request preventing the race, or to
acknowledge the race and accept the limitations upon the accuracy of the
busy flags. Note that we guarantee that we never falsely report the
object as idle (providing userspace itself doesn't race), and so the
most important use of the busy-ioctl and its guarantees are fulfilled.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471337440-16777-1-git-send-email-chris@chris-wilson.co.uk
2016-08-16 10:35:02 +01:00
Chris Wilson
bde13ebdab drm/i915: Introduce i915_ggtt_offset()
This little helper only exists to safely discard the upper unused 32bits
of the general 64-bit VMA address - as we know that all Global GTT
currently are less than 4GiB in size and so that the upper bits must be
zero. In many places, we use a u32 for the global GTT offset and we want
to document where we are discarding the full VMA offset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-28-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:14 +01:00
Chris Wilson
058d88c433 drm/i915: Track pinned VMA
Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.

v2: Joonas' stylistic nitpicks, a fun rebase.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:13 +01:00
Chris Wilson
247177ddd5 drm/i915: Always set the vma->pages
Previously, we would only set the vma->pages pointer for GGTT entries.
However, if we always set it, we can use it to prettify some code that
may want to access the backing store associated with the VMA (as
assigned to the VMA).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-8-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:00:54 +01:00
Daniel Vetter
cc9263874b Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge because too many conflicts, and also we need to get at the
latest struct fence patches from Gustavo. Requested by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-08-15 10:41:47 +02:00
Dave Airlie
fc93ff608b Merge tag 'drm-intel-next-2016-08-08' of git://anongit.freedesktop.org/drm-intel into drm-next
- refactor ddi buffer programming a bit (Ville)
- large-scale renaming to untangle naming in the gem code (Chris)
- rework vma/active tracking for accurately reaping idle mappings of shared
  objects (Chris)
- misc dp sst/mst probing corner case fixes (Ville)
- tons of cleanup&tunings all around in gem
- lockless (rcu-protected) request lookup, plus use it everywhere for
  non(b)locking waits (Chris)
- pipe crc debugfs fixes (Rodrigo)
- random fixes all over

* tag 'drm-intel-next-2016-08-08' of git://anongit.freedesktop.org/drm-intel: (222 commits)
  drm/i915: Update DRIVER_DATE to 20160808
  drm/i915: fix aliasing_ppgtt leak
  drm/i915: Update comment before i915_spin_request
  drm/i915: Use drm official vblank_no_hw_counter callback.
  drm/i915: Fix copy_to_user usage for pipe_crc
  Revert "drm/i915: Track active streams also for DP SST"
  drm/i915: fix WaInsertDummyPushConstPs
  drm/i915: Assert that the request hasn't been retired
  drm/i915: Repack fence tiling mode and stride into a single integer
  drm/i915: Document and reject invalid tiling modes
  drm/i915: Remove locking for get_tiling
  drm/i915: Remove pinned check from madvise ioctl
  drm/i915: Reduce locking inside swfinish ioctl
  drm/i915: Remove (struct_mutex) locking for busy-ioctl
  drm/i915: Remove (struct_mutex) locking for wait-ioctl
  drm/i915: Do a nonblocking wait first in pread/pwrite
  drm/i915: Remove unused no-shrinker-steal
  drm/i915: Tidy generation of the GTT mmap offset
  drm/i915/shrinker: Wait before acquiring struct_mutex under oom
  drm/i915: Simplify do_idling() (Ironlake vt-d w/a)
  ...
2016-08-15 16:53:57 +10:00
Chris Wilson
02bef8f98d drm/i915: Unbind closed vma for i915_gem_object_unbind()
Closed vma are removed from the obj->vma_list so that they cannot be
found by userspace. However, this means that when forcibly unbinding an
object, we have to wait upon all rendering to that object first in order
for the closed, but active, vma to be reaped and their bindings removed.

Reported-by: Matthew Auld <matthew.auld@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97343
Fixes: aa653a685d ("drm/i915: Be more careful when unbinding vma")
Fixes: 8a3b3d576c (" drm/i915: Convert non-blocking userptr waits...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Tested-by: Matthew Auld <matthew.auld@intel.com>
2016-08-14 19:40:09 +01:00
Chris Wilson
35a9611ca0 drm/i915: Initialize return value for empty i915_gem_object_unbind()
If the obj->vma_list is empty, we immediately return ret. However, we
are doing so having never set it to any value, it should be zero!

Reported-by: Matthew Auld <matthew.auld@intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=97343
Fixes: aa653a685d ("drm/i915: Be more careful when unbinding vma")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-08-14 19:38:26 +01:00
Chris Wilson
d31d7cb146 drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.

We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT

v2: Remove the redundant braces around pin count check and fix the marker
     in documentation (Chris)

v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
   i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
   superfluous BUG_ON.(Tvrtko)

v4:
- Rename the enums and clean up the pin_map function. (Chris)

v5: Drop the VM_NO_GUARD, minor cosmetics.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 13:06:36 +01:00
Chris Wilson
2ca17b87e8 drm/i915: Add missing rpm wakelock to GGTT pread
Joonas spotted a discrepancy between the pwrite and pread ioctls, in
that pwrite takes the rpm wakelock around its GGTT access, The wakelock
is required in order for the GTT to function. In disregard for the
current convention, we take the rpm wakelock around the access itself
rather than around the struct_mutex as the nesting is not strictly
required and such ordering will one day be fixed by explicitly noting
the barrier dependencies between the GGTT and rpm.

Fixes: b50a53715f ("drm/i915: Support for pread/pwrite ...")
Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1470298193-21765-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 1dd5b6f202)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-08-11 00:59:48 +03:00
Chris Wilson
fae82e59d2 drm/i915: Handle ENOSPC after failing to insert a mappable node
Even after adding individual page support for GTT mmaping, we can still
fail to find any space within the mappable region, and
drm_mm_insert_node() will then report ENOSPC. We have to then handle
this error by using the shmem access to the pages.

Fixes: b50a53715f ("drm/i915: Support for pread/pwrite ... objects")
Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Link: http://patchwork.freedesktop.org/patch/msgid/1468690956-23480-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d1054ee492)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-08-11 00:49:02 +03:00
Chris Wilson
b06bc7ec4d drm/i915: Flush GT idle status upon reset
Upon resetting the GPU, we force the engines to be idle by clearing
their request lists. However, I neglected to clear the GT active status
and so the next request following the reset was not marking the device
as busy again. (We had to wait until any outstanding retire worker
finally ran and cleared the active status.)

Fixes: 67d97da349 ("drm/i915: Only start retire worker when idle")
Testcase: igt/pm_rps/reset
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit b913b33c43)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-08-10 17:43:49 +03:00
Chris Wilson
83348ba84e drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
In commit 2529d57050 ("drm/i915: Drop racy markup of missed-irqs from
idle-worker") the racy detection of missed interrupts was removed when
we went idle. This however opened up the issue that the stuck waiters
were not being reported, causing a test case failure. If we move the
stuck waiter detection out of hangcheck and into the breadcrumb
mechanims (i.e. the waiter) itself, we can avoid this issue entirely.
This leaves hangcheck looking for a stuck GPU (inspecting for request
advancement and HEAD motion), and breadcrumbs looking for a stuck
waiter - hopefully make both easier to understand by their segregation.

v2: Reduce the error message as we now run independently of hangcheck,
and the hanging batch used by igt also counts as a stuck waiter causing
extra warnings in dmesg.
v3: Move the breadcrumb's hangcheck kickstart to the first missed wait.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104
Fixes: 2529d57050 (waiter"drm/i915: Drop racy markup of missed-irqs...")
Testcase: igt/drv_missed_irq
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-2-git-send-email-chris@chris-wilson.co.uk
2016-08-10 10:37:35 +01:00
Chris Wilson
70cb472c6d drm/i915: Always mark the writer as also a read for busy ioctl
One of the few guarantees we want the busy ioctl to provide is that the
reported busy writer is included in the set of busy read engines. This
should be provided by the ordering of setting and retiring the active
trackers, but we can do better by explicitly setting the busy read
engine flag for the last writer.

v2: More comments inside __busy_write_id() to explain why both fields
are set.

Fixes: 3fdc13c7a3 ("drm/i915: Remove (struct_mutex) locking for busy-ioctl")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470762505-12799-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-10 10:37:21 +01:00
Chris Wilson
edf6b76f64 drm/i915: Add smp_rmb() to busy ioctl's RCU dance
In the debate as to whether the second read of active->request is
ordered after the dependent reads of the first read of active->request,
just give in and throw a smp_rmb() in there so that ordering of loads is
assured.

v2: Explain the manual smp_rmb()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-1-git-send-email-chris@chris-wilson.co.uk
2016-08-09 10:17:26 +01:00
Chris Wilson
87b723a16d drm/i915: Don't check for idleness before retiring after a GPU hang
When we force the cleanup after a GPU hang, we want to retire all
requests, or else we may leak them if truly wedged (and the GPU never
advances again). Converting to the active request helpers had the issue
of doing the check against busyness before reporting the request, so if
we claim the GPU had hung but this engine hadn't we could potential skip
the request cleanup - triggering the self-check BUG.

Fixes: dcff85c844 ("drm/i915: Enable i915_gem_wait_for_idle() ...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470728222-10243-3-git-send-email-chris@chris-wilson.co.uk
2016-08-09 09:11:59 +01:00
Chris Wilson
3e510a8e65 drm/i915: Repack fence tiling mode and stride into a single integer
In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.

v2: New magic

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:43 +01:00
Chris Wilson
e883d73503 drm/i915: Remove pinned check from madvise ioctl
We don't need to incur the overhead of checking whether the object is
pinned prior to changing its madvise. If the object is pinned, the
madvise will not take effect until it is unpinned and so we cannot free
the pages being pointed at by hardware. Marking a pinned object with
allocated pages as DONTNEED will not trigger any undue warnings. The check
is therefore superfluous, and by removing it we can remove a linear walk
over all the vma the object has.

Still despite it being an overzealous check, that error code is part of
the current ABI and so we must proceed with caution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-15-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:41 +01:00
Chris Wilson
c21724cc4d drm/i915: Reduce locking inside swfinish ioctl
We only need to take the struct_mutex if the object is pinned to the
display engine and so requires checking for clflush. (The race with
userspace pinning the object to a framebuffer is irrelevant.)

v2: Use access once for compiler hints (or not as it is a bitfield)
v3: READ_ONCE, obj->pin_display is not a bitfield anymore
v4: Don't be creative with goto.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-14-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:41 +01:00
Chris Wilson
3fdc13c7a3 drm/i915: Remove (struct_mutex) locking for busy-ioctl
By applying the same logic as for wait-ioctl, we can query whether a
request has completed without holding struct_mutex. The biggest impact
system-wide is removing the flush_active and the contention that causes.

Testcase: igt/gem_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-13-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:40 +01:00
Chris Wilson
033d549b81 drm/i915: Remove (struct_mutex) locking for wait-ioctl
With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME), locklessly - this is handled by
i915_gem_active_wait_unlocked().

The impact of this is actually quite small - the return to userspace
following the wait was already lockless and so we don't see much gain in
latency improvement upon completing the wait. What we do achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause - but it is still one less contention
point!

v2: Break up a long line.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-12-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:40 +01:00
Chris Wilson
258a5edee0 drm/i915: Do a nonblocking wait first in pread/pwrite
If we try and read or write to an active request, we first must wait
upon the GPU completing that request. Let's do that without holding the
mutex (and so allow someone else to access the GPU whilst we wait). Upon
completion, we will acquire the mutex and only then start the operation
(i.e. we do not rely on state from before the initial wait).

v2: Repaint the goto labels
v3: Move the tracepoints back to the start of the ioctls

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-11-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:39 +01:00
Chris Wilson
f3f6184c5f drm/i915: Tidy generation of the GTT mmap offset
If we make the observation that mmap-offsets are only released when we
free an object, we can then deduce that the shrinker only creates free
space in the mmap arena indirectly by flushing the request list and
freeing expired objects. If we combine this with the lockless
vma-manager and lockless idling, we can avoid taking our big struct_mutex
until we need to actually free the requests.

One side-effect is that we defer the madvise checking until we need the
pages (i.e. the fault handler). This brings us into line with the other
delayed checks (and madvise in general).

v2: s/ret/err/ and use if (!err) rather than if (ret == 0)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-9-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:38 +01:00
Chris Wilson
dcff85c844 drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex
The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.

As a side-effect of having a robust means for tracking engine busyness,
we can replace our other busyness heuristic, that of comparing against
the last submitted seqno. For paranoid reasons, we have a semi-ordered
check of that seqno inside the hangchecker, which we can now improve to
an ordered check of the engine's busyness (removing a locked xchg in the
process).

v2: Pass along "bool interruptible" as being unlocked we cannot rely on
i915->mm.interruptible being stable or even under our control.
v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:37 +01:00
Chris Wilson
90f4fcd56b drm/i915: Remove forced stop ring on suspend/unload
Before suspending (or unloading), we would first wait upon all rendering
to be completed and then disable the rings. This later step is a remanent
from DRI1 days when we did not use request tracking for all operations
upon the ring. Now that we are sure we are waiting upon the very last
operation by the engine, we can forgo clobbering the ring registers,
though we do keep the assert that the engine is indeed idle before
sleeping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-5-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:36 +01:00
Chris Wilson
b8f9096d6a drm/i915: Convert non-blocking waits for requests over to using RCU
We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.

v2: Move i915_gem_fault tracepoint back to the start of the function,
before the unlocked wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-2-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:34 +01:00
Chris Wilson
0eafec6d32 drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.

However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.

v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)

Paul E. McKenney wrote:

Another approach is synchronize_rcu() after some largish number of
requests.  The advantage of this approach is that it throttles the
production of callbacks at the source.  The corresponding disadvantage
is that it slows things up.

Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it.  Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
idea is to do something like this:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();

You would of course do an initial get_state_synchronize_rcu() to
get things going.  This would not block unless there was less than
one grace period's worth of time between invocations.  But this
assumes a busy system, where there is almost always a grace period
in flight.  But you can make that happen as follows:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();
        call_rcu(&my_rcu_head, noop_function);

Note that you need additional code to make sure that the old callback
has completed before doing a new one.  Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).

v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.

v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:05 +01:00
Chris Wilson
00e60f2659 drm/i915: Move i915_gem_object_wait_rendering()
Just move it earlier so that we can use the companion nonblocking
version in a couple of more callsites without having to add a forward
declaration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-24-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:05 +01:00
Chris Wilson
573adb3962 drm/i915: Move obj->active:5 to obj->flags
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).

v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:04 +01:00
Chris Wilson
faf5bf0ad6 drm/i915: Use atomics to manipulate obj->frontbuffer_bits
The individual bits inside obj->frontbuffer_bits are protected by each
plane->mutex, but the whole bitfield may be accessed by multiple KMS
operations simultaneously and so the RMW need to be under atomics.
However, for updating the single field we do not need to mandate that it
be under the struct_mutex, one more step towards its removal as the de
facto BKL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-21-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:03 +01:00
Chris Wilson
b5add9591c drm/i915: Make fb_tracking.lock a spinlock
We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.

v2: Move the cheap unlikely tests into the caller
v3: Move the kerneldoc into the header (now separated out into
intel_fronbuffer.h for better kerneldoc and readability)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtien <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-20-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:02 +01:00
Chris Wilson
5d723d7afd drm/i915: Separate intel_frontbuffer into its own header
In view of adding inline functions into the intel_frontbuffer section,
we first split the header into its own file so that we can integrate it
more easily with kerneldoc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-19-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:01 +01:00
Chris Wilson
de895082f7 drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin()
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.

v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_create_ggtt_vma()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:00 +01:00
Chris Wilson
305bc234a8 drm/i915: Make i915_vma_pin() small and inline
Not only is i915_vma_pin() called for every single object on every single
execbuf, it is usually a simple increment as the VMA is already bound for
execution by the GPU. Rearrange the tests for unbound and pin_count
overflow so that we can do the increment and test very cheaply and
compact enough to inline the operation into execbuf. The trick used is
to note that we can check for an overflow bit (keeping space available
for it inside the flags) at the same time as checking the binding bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-17-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:00 +01:00
Chris Wilson
3272db5313 drm/i915: Combine all i915_vma bitfields into a single set of flags
In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:59 +01:00
Chris Wilson
59bfa1248e drm/i915: Start passing around i915_vma from execbuffer
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!

v2: Tidy parameter lists to remove one level of redirection in the hot
path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:58 +01:00
Chris Wilson
20dfbde463 drm/i915: Wrap vma->pin_count accessors with small inline helpers
In the next few patches, the VMA pinning API is overhauled and to reduce
the churn we pull out the update to the accessors into a prep patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-14-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:58 +01:00
Chris Wilson
de18003328 drm/i915: Record allocated vma size
Tracking the size of the VMA as allocated allows us to dramatically
reduce the complexity of later functions (like inserting the VMA in to
the drm_mm range manager).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-13-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:57 +01:00
Chris Wilson
a9f1481f41 drm/i915: Update i915_gem_get_ggtt_size/_alignment to use drm_i915_private
For consistency, internal functions should take drm_i915_private rather
than drm_device. Now that we are subclassing drm_device, there are no
more size wins, but being consistent is its own blessing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-12-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:56 +01:00
Chris Wilson
ad1a7d20a1 drm/i915: Update the GGTT size/alignment query functions
In order to be consistent with other address space functions, we want to
pass around 64-bit sizes, even though all known global GTT are limited
to 4GiB. Similarly, we are trying to be consistent in using the _ggtt_
nomenclature when referring to the special global GTT.

v2: Update docs to consistently state "global GTT".

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-11-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:55 +01:00
Chris Wilson
954c469121 drm/i915: Convert 4096 alignment request to 0 for drm_mm allocations
As we always allocate in chunks of 4096 (that being both the PAGE_SIZE
and our own GTT_PAGE_SIZE), we know that all results from the drm_mm are
aligned to at least 4096. The drm_mm allocator itself is optimised for
alignment == 0, and so by converting alignments of 4096 to 0 we can
satisfy our own requirements and still hit the faster path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-10-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:55 +01:00
Chris Wilson
3b16525cc4 drm/i915: Split insertion/binding of an object into the VM
Split the insertion into the address space's range manager and binding
of that object into the GTT to simplify the code flow when pinning a
VMA.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-9-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:54 +01:00
Chris Wilson
3750858990 drm/i915: Reduce WARN(i915_gem_valid_gtt_space) to a debug-only check
i915_gem_valid_gtt_space() is used after inserting the VMA to double
check the list - the location should have been chosen to pass all the
restrictions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-8-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:53 +01:00
Chris Wilson
91b2db6f65 drm/i915: Pad GTT views of exec objects up to user specified size
Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept pointers from strangers and later impose extra conditions
on them, the original client allocator has no idea about the
monstrosities in the GPU and we require the userspace driver to inform
the kernel how many padding pages are required beyond the client
allocation.

v2: Long time, no see
v3: Try an anonymous union for uapi struct compatibility

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:53 +01:00
Chris Wilson
2ffffd0f85 drm/i915: Fix up vma alignment to be u64
This is not the full fix, as we are required to percolate the u64 nature
down through the drm_mm stack, but this is required now to prevent
explosions due to mismatch between execbuf (eb_vma_misplaced) and vma
binding (i915_vma_misplaced) - and reduces the risk of spurious changes
as we adjust the vma interface in the next patches.

v2: long long casts not required for u64 printk (%llx)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-6-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:52 +01:00
Chris Wilson
e522ac2324 drm/i915: Remove surplus drm_device parameter to i915_gem_evict_something()
Eviction is VM local, so we can ignore the significance of the
drm_device in the caller, and leave it to i915_gem_evict_something() to
manage itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-2-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:50 +01:00
Chris Wilson
1dd5b6f202 drm/i915: Add missing rpm wakelock to GGTT pread
Joonas spotted a discrepancy between the pwrite and pread ioctls, in
that pwrite takes the rpm wakelock around its GGTT access, The wakelock
is required in order for the GTT to function. In disregard for the
current convention, we take the rpm wakelock around the access itself
rather than around the struct_mutex as the nesting is not strictly
required and such ordering will one day be fixed by explicitly noting
the barrier dependencies between the GGTT and rpm.

Fixes: b50a53715f ("drm/i915: Support for pread/pwrite ...")
Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1470298193-21765-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-08-04 11:18:22 +01:00
Chris Wilson
df0e9a287d Revert "drm/i915: Clean up associated VMAs on context destruction"
This reverts commit e9f24d5fb7.

The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The previous patches add the VMA tracking necessary
to do close them along with the object, context or file, and so the time
has come to remove the partial fix.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-28-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:34 +01:00
Chris Wilson
50e046b6a0 drm/i915: Mark the context and address space as closed
When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two
purposes.  First it allows us to flag the closed context and detect
internal errors if we to create any new objects for it (as it is removed
from the user's namespace, these should be internal bugs only). And
secondly, it allows us to immediately reap stale vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-27-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:33 +01:00
Chris Wilson
b1f788c6ac drm/i915: Release vma when the handle is closed
In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.

v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.

Testcase: igt/gem_ppggtt/flink-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:33 +01:00
Chris Wilson
b0decaf75b drm/i915: Track active vma requests
Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.

A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...

v2: Update inline names to be consistent with
i915_gem_object_get_active()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:32 +01:00
Chris Wilson
5cf3d28098 drm/i915: i915_vma_move_to_active prep patch
This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are all located in the same spot. The importance of this
is so that we can more simply control the order in which the requests
are place in the retirement list (i.e. control the order at which we
retire and so control the lifetimes to avoid having to hold onto
references).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-24-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:31 +01:00
Chris Wilson
4b8de8e68a drm/i915: Move request list retirement to i915_gem_request.c
As the list retirement is now clean of implementation details, we can
move it closer to the request management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-23-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:30 +01:00
Chris Wilson
776f32364d drm/i915: s/__i915_wait_request/i915_wait_request/
There is only one wait on request function now, so drop the "expert"
indication of leading __.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-21-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:29 +01:00
Chris Wilson
fa545cbf97 drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.

Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.

Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.

All told, less code, simpler and faster, and more extensible.

v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:25 +01:00
Chris Wilson
21c310f2f9 drm/i915: Remove obsolete i915_gem_object_flush_active()
Since we track requests, and requests are always added to the GPU fully
formed, we never have to flush the incomplete request and know that the
given request will eventually complete without any further action on our
part.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-15-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:24 +01:00
Chris Wilson
efdf7c0605 drm/i915: Rename request->list to link for consistency
We use "list" to denote the list and "link" to denote an element on that
list. Rename request->list to match this idiom.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-14-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:23 +01:00
Chris Wilson
8cac6f6c41 drm/i915: Refactor blocking waits
Tidy up the for loops that handle waiting for read/write vs read-only
access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-13-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:23 +01:00
Chris Wilson
d72d908b56 drm/i915: Mark up i915_gem_active for locking annotation
The future annotations will track the locking used for access to ensure
that it is always sufficient. We make the preparations now to present
the API ahead and to make sure that GCC can eliminate the unused
parameter.

Before:	6298417 3619610  696320 10614347         a1f64b vmlinux
After:	6298417 3619610  696320 10614347         a1f64b vmlinux
(with i915 builtin)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-12-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:22 +01:00
Chris Wilson
27c01aaef0 drm/i915: Prepare i915_gem_active for annotations
In the future, we will want to add annotations to the i915_gem_active
struct. The API is thus expanded to hide direct access to the contents
of i915_gem_active and mediated instead through a number of helpers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-11-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:21 +01:00
Chris Wilson
381f371b25 drm/i915: Introduce i915_gem_active for request tracking
In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.

v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:20 +01:00
Chris Wilson
4717ca9eec drm/i915: Kill drop_pages()
The drop_pages() function is a dangerous trap in that it can release the
passed in object pointer and so unless the caller is aware, it can
easily trick us into using the stale object afterwards. Move it into its
solitary callsite where we know it is safe.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-9-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:19 +01:00
Chris Wilson
aa653a685d drm/i915: Be more careful when unbinding vma
When we call i915_vma_unbind(), we will wait upon outstanding rendering.
This will also trigger a retirement phase, which may update the object
lists. If, we extend request tracking to the VMA itself (rather than
keep it at the encompassing object), then there is a potential that the
obj->vma_list be modified for other elements upon i915_vma_unbind(). As
a result, if we walk over the object list and call i915_vma_unbind(), we
need to be prepared for that list to change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-8-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:18 +01:00
Chris Wilson
15717de219 drm/i915: Count how many VMA are bound for an object
Since we may have VMA allocated for an object, but we interrupted their
binding, there is a disparity between have elements on the obj->vma_list
and being bound. i915_gem_obj_bound_any() does this check, but this is
not rigorously observed - add an explicit count to make it easier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-7-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:17 +01:00
Chris Wilson
f6b9d5cabd drm/i915: Split early global GTT initialisation
Initialising the global GTT is tricky as we wish to use the drm_mm range
manager during the modesetting initialisation (to capture stolen
allocations from the BIOS) before we actually enable GEM. To overcome
this, we currently setup the drm_mm first and then carefully rebind
them.

v2: Fixup after rebasing
v3: GGTT initialisation needs to be split around kicking out conflicts
v4: Restore an old UMS BUG_ON(mappable > total) as a DRM_ERROR plus
fixup of probe results.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-4-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:15 +01:00
Chris Wilson
97d6d7ab68 drm/i915: Update GGTT initialisation functions to take drm_i915_private
Since these are internal functions they operate on drm_i915_private and
not the drm_device being passed in. So pass in the drm_i915_private
instead, and remove one layer of dancing. No space wins here, just
conforming to the norm in function parameters.

v2: Include all the probe functions

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-3-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:14 +01:00
Chris Wilson
ddf07be7a2 drm/i915: Simplify calling engine->sync_to
Since requests can no longer be generated as a side-effect of
intel_ring_begin(), we know that the seqno will be unchanged during
ring-emission. This predicatablity then means we do not have to check
for the seqno wrapping around whilst emitting the semaphore for
engine->sync_to().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-31-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-22-git-send-email-chris@chris-wilson.co.uk
2016-08-02 22:58:32 +01:00
Chris Wilson
5b043f4e60 drm/i915: Unify legacy/execlists submit_execbuf callbacks
Now that emitting requests is identical between legacy and execlists, we
can use the same function to build up the ring for submitting to either
engine. (With the exception of i915_switch_contexts(), but in time that
will also be handled gracefully.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-30-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-21-git-send-email-chris@chris-wilson.co.uk
2016-08-02 22:58:31 +01:00
Chris Wilson
8e63717837 drm/i915: Simplify request_alloc by returning the allocated request
If is simpler and leads to more readable code through the callstack if
the allocation returns the allocated struct through the return value.

The importance of this is that it no longer looks like we accidentally
allocate requests as side-effect of calling certain functions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-19-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-9-git-send-email-chris@chris-wilson.co.uk
2016-08-02 22:58:20 +01:00
Chris Wilson
7e37f889b5 drm/i915: Rename struct intel_ringbuffer to struct intel_ring
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.

s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
2016-08-02 22:58:16 +01:00
Linus Torvalds
731c7d3a20 Merge tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux
Merge drm updates from Dave Airlie:
 "This is the main drm pull request for 4.8.

  I'm down with a cold at the moment so hopefully this isn't in too bad
  a state, I finished pulling stuff last week mostly (nouveau fixes just
  went in today), so only this message should be influenced by illness.
  Apologies to anyone who's major feature I missed :-)

  Core:
        Lockless GEM BO freeing
        Non-blocking atomic work
        Documentation changes (rst/sphinx)
        Prep for new fencing changes
        Simple display helpers
        Master/auth changes
        Register/unregister rework
        Loads of trivial patches/fixes.

  New stuff:
        ARM Mali display driver (not the 3D chip)
        sii902x RGB->HDMI bridge

  Panel:
        Support for new panels
        Improved backlight support

  Bridge:
        Convert ADV7511 to bridge driver
        ADV7533 support
        TC358767 (DSI/DPI to eDP) encoder chip support

  i915:
        BXT support enabled by default
        GVT-g infrastructure
        GuC command submission and fixes
        BXT workarounds
        SKL/BKL workarounds
        Demidlayering device registration
        Thundering herd fixes
        Missing pci ids
        Atomic updates

  amdgpu/radeon:
        ATPX improvements for better dGPU power control on PX systems
        New power features for CZ/BR/ST
        Pipelined BO moves and evictions in TTM
        GPU scheduler improvements
        GPU reset improvements
        Overclocking on dGPUs with amdgpu
        Polaris powermanagement enabled

  nouveau:
        GK20A/GM20B volt and clock improvements.
        Initial support for GP100/GP104 GPUs, GP104 will not yet support
        acceleration due to NVIDIA having not released firmware for them as of yet.

  exynos:
        Exynos5433 SoC with IOMMU support.

  vc4:
        Shader validation for branching

  imx-drm:
        Atomic mode setting conversion
        Reworked DMFC FIFO allocation
        External bridge support

  analogix-dp:
        RK3399 eDP support
        Lots of fixes.

  rockchip:
        Lots of small fixes.

  msm:
        DT bindings cleanups
        Shrinker and madvise support
        ASoC HDMI codec support

  tegra:
        Host1x driver cleanups
        SOR reworking for DP support
        Runtime PM support

  omapdrm:
        PLL enhancements
        Header refactoring
        Gamma table support

  arcgpu:
        Simulator support

  virtio-gpu:
        Atomic modesetting fixes.

  rcar-du:
        Misc fixes.

  mediatek:
        MT8173 HDMI support

  sti:
        ASOC HDMI codec support
        Minor fixes

  fsl-dcu:
        Suspend/resume support
        Bridge support

  amdkfd:
        Minor fixes.

  etnaviv:
        Enable GPU clock gating

  hisilicon:
        Vblank and other fixes"

* tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux: (1575 commits)
  drm/nouveau/gr/nv3x: fix instobj write offsets in gr setup
  drm/nouveau/acpi: fix lockup with PCIe runtime PM
  drm/nouveau/acpi: check for function 0x1B before using it
  drm/nouveau/acpi: return supported DSM functions
  drm/nouveau/acpi: ensure matching ACPI handle and supported functions
  drm/nouveau/fbcon: fix font width not divisible by 8
  drm/amd/powerplay: remove enable_clock_power_gatings_tasks from initialize and resume events
  drm/amd/powerplay: move clockgating to after ungating power in pp for uvd/vce
  drm/amdgpu: add query device id and revision id into system info entry at CGS
  drm/amdgpu: add new definition in bif header
  drm/amd/powerplay: rename smum header guards
  drm/amdgpu: enable UVD context buffer for older HW
  drm/amdgpu: fix default UVD context size
  drm/amdgpu: fix incorrect type of info_id
  drm/amdgpu: make amdgpu_cgs_call_acpi_method as static
  drm/amdgpu: comment out unused defaults_staturn_pro static const structure to fix the build
  drm/amdgpu: enable UVD VM only on polaris
  drm/amdgpu: increase timeout of IB test
  drm/amdgpu: add destroy session when generate VCE destroy msg.
  drm/amd: fix deadlock of job_list_lock V2
  ...
2016-08-01 21:44:08 -04:00
Chris Wilson
7e21d6484d drm/i915: Remove stray intel_engine_cs ring identifiers from i915_gem.c
A few places we use ring when referring to the struct intel_engine_cs. An
anachronism we are pruning out.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-9-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-4-git-send-email-chris@chris-wilson.co.uk
2016-07-27 16:23:05 +01:00
Chris Wilson
c80ff16e11 drm/i915: Use engine to refer to the user's BSD intel_engine_cs
This patch transitions the execbuf engine selection away from using the
ring nomenclature - though we still refer to the user's incoming
selector as their user_ring_id.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-2-git-send-email-chris@chris-wilson.co.uk
2016-07-27 16:23:05 +01:00
Chris Wilson
15f7bbc735 drm/i915: Only clear the client pointer when tearing down the file
Upon release of the file (i.e. the user calls close(fd)), we decouple
all objects from the client list so that we don't chase the dangling
file_priv. As we always inspect file_priv first, we only need to nullify
that pointer and can safely ignore the list_head.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469530913-17180-3-git-send-email-chris@chris-wilson.co.uk
2016-07-26 13:00:59 +01:00
Chris Wilson
2529d57050 drm/i915: Drop racy markup of missed-irqs from idle-worker
During the idle-worker we disable the hangcheck and so kick any waiters
that should have been completed (since the GPU is now idle). Unlike the
hangcheck, we do not take any care to avoid the race between the irq
handler and ourselves, and so it is possible for us to declare a missed
interrupt even as the bottom-half is being scheduled to run. Let's
ignore this race to stop a potential false-positive error.

References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469351421-13493-1-git-send-email-chris@chris-wilson.co.uk
2016-07-24 10:57:19 +01:00
Chris Wilson
54b4f68f18 Revert "drm/i915: Enable RC6 immediately"
This reverts commit b12e0ee208 ("drm/i915: Enable RC6 immediately"),
as it was never meant to be sent anywhere other than the bug report for
experimentation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469132179-4052-1-git-send-email-chris@chris-wilson.co.uk
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-21 21:43:19 +01:00
Chris Wilson
b12e0ee208 drm/i915: Enable RC6 immediately
Now that PCU communication is reasonably fast, we do not need to defer
RC6 initialisation to a workqueue.

References: https://bugs.freedesktop.org/show_bug.cgi?id=97017
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-07-21 18:30:22 +01:00
Chris Wilson
39df91905d drm/i915: Convert i915_semaphores_is_enabled over to early sanitize
Rather than recomputing whether semaphores are enabled, we can do that
computation once during early initialisation as the i915.semaphores
module parameter is now read-only.

s/i915_semaphores_is_enabled/i915.semaphores/

v2: Add the state to the debug dmesg as well

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-10-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-9-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:43 +01:00
Chris Wilson
34911fd30c drm/i915: Rename drm_gem_object_unreference_unlocked in preparation for lockless free
Whilst this ultimately wraps kref_put_mutex(), our goal here is the
lockless variant, so keep the _unlocked() suffix until we need it no
more.

s/drm_gem_object_unreference_unlocked/i915_gem_object_put_unlocked/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-6-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:13 +01:00
Chris Wilson
f8c417cdb1 drm/i915: Rename drm_gem_object_unreference in preparation for lockless free
Ultimately wraps kref_put(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_unreference/i915_gem_object_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-6-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-5-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:12 +01:00
Chris Wilson
25dc556a2a drm/i915: Wrap drm_gem_object_reference in i915_gem_object_get
Ultimately wraps kref_get(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_reference/i915_gem_object_get/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-5-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-4-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:11 +01:00
Chris Wilson
03ac0642f6 drm/i915: Wrap drm_gem_object_lookup in i915_gem_object_lookup
For symmetry with a forthcoming i915_gem_object_get() and
i915_gem_object_put().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-3-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:10 +01:00
Chris Wilson
e8a261ea63 drm/i915: Rename request reference/unreference to get/put
Now that we derive requests from struct fence, swap over to its
nomenclature for references. It's shorter and more idiomatic across the
kernel.

s/i915_gem_request_reference/i915_gem_request_get/
s/i915_gem_request_unreference/i915_gem_request_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-2-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-1-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:09 +01:00
Chris Wilson
c13d87ea53 drm/i915: Wait on external rendering for GEM objects
When transitioning to the GTT or CPU domain we wait on all rendering
from i915 to complete (with the optimisation of allowing concurrent read
access by both the GPU and client). We don't yet ensure all rendering
from third parties (tracked by implicit fences on the dma-buf) is
complete. Since implicitly tracked rendering by third parties will
ignore our cache-domain tracking, we have to always wait upon rendering
from third-parties when transitioning to direct access to the backing
store. We still rely on clients notifying us of cache domain changes
(i.e. they need to move to the GTT read or write domain after doing a CPU
access before letting the third party render again).

v2:
This introduces a potential WARN_ON into i915_gem_object_free() as the
current i915_vma_unbind() calls i915_gem_object_wait_rendering(). To
hit this path we first need to render with the GPU, have a dma-buf
attached with an unsignaled fence and then interrupt the wait. It does
get fixed later in the series (when i915_vma_unbind() only waits on the
active VMA and not all, including third-party, rendering.

To offset that risk, use the __i915_vma_unbind_no_wait hack.

Testcase: igt/prime_vgem/basic-fence-read
Testcase: igt/prime_vgem/basic-fence-mmap
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-8-git-send-email-chris@chris-wilson.co.uk
2016-07-20 09:29:53 +01:00
Chris Wilson
197be2ae8b drm/i915: Disable waitboosting for mmioflips/semaphores
Since commit a6f766f397 ("drm/i915: Limit ring synchronisation (sw
sempahores) RPS boosts") and commit bcafc4e38b ("drm/i915: Limit mmio
flip RPS boosts") we have limited the waitboosting for semaphores and
flips. Ideally we do not want to boost in either of these instances as no
userspace consumer is waiting upon the results (though a userspace producer
may be stalled trying to submit an execbuf - but in this case the
producer is being throttled due to the engine being saturated with
work). With the introduction of NO_WAITBOOST in the previous patch, we
can finally disable these needless boosts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-6-git-send-email-chris@chris-wilson.co.uk
2016-07-20 09:29:53 +01:00
Chris Wilson
c4b0930bf4 drm/i915: Mark all current requests as complete before resetting them
Following a GPU reset upon hang, we retire all the requests and then
mark them all as complete. If we mark them as complete first, we both
keep the normal retirement order (completed first then retired) and
provide a small optimisation for concurrent lookups.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-3-git-send-email-chris@chris-wilson.co.uk
2016-07-20 09:29:53 +01:00
Chris Wilson
05235c5354 drm/i915: Move GEM request routines to i915_gem_request.c
Migrate the request operations out of the main body of i915_gem.c and
into their own C file for easier expansion.

v2: Move __i915_add_request() across as well

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-1-git-send-email-chris@chris-wilson.co.uk
2016-07-20 09:29:53 +01:00
Daniel Vetter
62f90b38f3 drm/i915: Update missing kerneldoc
Not sure why so much slips through when 0day is catching these. Hopefully
the much faster sphinx toolchain helps in unlazying people.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1468612088-9721-10-git-send-email-daniel.vetter@ffwll.ch
2016-07-19 10:34:24 +02:00
Chris Wilson
d1054ee492 drm/i915: Handle ENOSPC after failing to insert a mappable node
Even after adding individual page support for GTT mmaping, we can still
fail to find any space within the mappable region, and
drm_mm_insert_node() will then report ENOSPC. We have to then handle
this error by using the shmem access to the pages.

Fixes: b50a53715f ("drm/i915: Support for pread/pwrite ... objects")
Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Link: http://patchwork.freedesktop.org/patch/msgid/1468690956-23480-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-19 08:59:40 +01:00
Chris Wilson
5ab57c7020 drm/i915: Flush logical context image out to memory upon suspend
Before suspend, and especially before building the hibernation image, we
need to context image to be coherent in memory. To do this we require
that we perform a context switch to a disposable context (i.e. the
dev_priv->kernel_context) - when that switch is complete, all other
context images will be complete. This leaves the kernel_context image as
incomplete, but fortunately that is disposable and we can do a quick
fixup of the logical state after resuming.

v2: Share the nearly identical code to switch to the kernel context with
eviction.
v3: Explain why we need the switch and reset.

Testcase: igt/gem_exec_suspend # bsw
References: https://bugs.freedesktop.org/show_bug.cgi?id=96526
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468590980-6186-2-git-send-email-chris@chris-wilson.co.uk
2016-07-15 20:41:05 +01:00
Chris Wilson
b7137e0cf1 drm/i915: Defer enabling rc6 til after we submit the first batch/context
Some hardware requires a valid render context before it can initiate
rc6 power gating of the GPU; the default state of the GPU is not
sufficient and may lead to undefined behaviour. The first execution of
any batch will load the "golden render state", at which point it is safe
to enable rc6. As we do not forcibly load the kernel context at resume,
we have to hook into the batch submission to be sure that the render
state is setup before enabling rc6.

However, since we don't enable powersaving until that first batch, we
queued a delayed task in order to guarantee that the batch is indeed
submitted.

v2: Rearrange intel_disable_gt_powersave() to match.
v3: Apply user specified cur_freq (or idle_freq if not set).
v4: Give in, and supply a delayed work to autoenable rc6
v5: Mika suggested a couple of better names for delayed_resume_work
v6: Rebalance rpm_put around the autoenable task

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-07-14 15:24:21 +01:00
Chris Wilson
b913b33c43 drm/i915: Flush GT idle status upon reset
Upon resetting the GPU, we force the engines to be idle by clearing
their request lists. However, I neglected to clear the GT active status
and so the next request following the reset was not marking the device
as busy again. (We had to wait until any outstanding retire worker
finally ran and cleared the active status.)

Fixes: 67d97da349 ("drm/i915: Only start retire worker when idle")
Testcase: igt/pm_rps/reset
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-07-14 15:21:17 +01:00
Chris Wilson
5b58592530 drm/i915/breadcrumbs: Queue hangcheck before sleeping
Never go to sleep waiting on the GPU without first ensuring that we will
get woken up.

We have a choice of queuing the hangcheck before every schedule() or the
first time we wakeup. In order to simply accommodate both the signaler
and the ordinary waiter, move the queuing to the common point of
enabling the irq. We lose the paranoid safety of ensuring that the
hangcheck is active before the sleep, but avoid code duplication (and
redundant hangcheck queuing).

Testcase: igt/prime_busy
Fixes: c81d46138d ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
(cherry picked from commit 232af392fd)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-14 15:48:32 +02:00
Matthew Auld
3fef3a5be3 drm/i915: remove superfluous i915_gem_object_free_mmap_offset call
This should already be handled by drm_gem_object_release, which is
called later on.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467720019-31876-1-git-send-email-matthew.auld@intel.com
2016-07-14 16:11:55 +03:00
Tvrtko Ursulin
8b3e2d3639 drm/i915: Unify engine init loop
With the unified common engine setup done, and the execlist engine
initialization loop clearly split into two phases, we can eliminate
the separate legacy engine initialization code.

v2: Fix cleanup path for legacy.
v3: Rename constructors. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
2016-07-14 11:17:06 +01:00
Chris Wilson
c961561382 drm/i915: Kick hangcheck from retire worker
Let's ensure that we cannot run indefinitely without the hangcheck
worker being queued. We removed it from being kicked on every request
because we were kicking it a few millions times in every hangcheck
interval and only once is necessary! However, that leaves us with the
issue of what if userspace never waits for a request, or runs out of
resources, what if userspace just issues a request then spins on
BUSY_IOCTL?

Testcase: igt/gem_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-3-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-07-11 13:48:42 +01:00
Chris Wilson
232af392fd drm/i915/breadcrumbs: Queue hangcheck before sleeping
Never go to sleep waiting on the GPU without first ensuring that we will
get woken up.

We have a choice of queuing the hangcheck before every schedule() or the
first time we wakeup. In order to simply accommodate both the signaler
and the ordinary waiter, move the queuing to the common point of
enabling the irq. We lose the paranoid safety of ensuring that the
hangcheck is active before the sleep, but avoid code duplication (and
redundant hangcheck queuing).

Testcase: igt/prime_busy
Fixes: c81d46138d ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-07-11 13:48:22 +01:00
Chris Wilson
91c8a326a1 drm/i915: Convert dev_priv->dev backpointers to dev_priv->drm
Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.

   text	   data	    bss	    dec	    hex	filename
1068757	   4565	    416	1073738	 10624a	drivers/gpu/drm/i915/i915.ko
1066949	   4565	    416	1071930	 105b3a	drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)

and for good measure the dev_priv->dev backpointer was removed entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
2016-07-05 11:58:45 +01:00
Chris Wilson
b1379d4964 drm/i915: Replace lockless_dereference(bool) with READ_ONCE()
After Joonas complained about using READ_ONCE() on the only use of the
variable in the function, where the intent was to simply document that
the read was intentionally racy and unlocked, I switched the READ_ONCE()
over to lockless_dereference(). However, in linux-next that has a
stronger type-check to only allow pointers and is no longer
interchangeable with READ_ONCE(), see commit 331b6d8c7a
("locking/barriers: Validate lockless_dereference() is used on a pointer
type")

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 67d97da349 ("drm/i915: Only start retire worker when idle")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467705276-707-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-07-05 10:58:07 +01:00
Dave Gordon
f19ec8cb5a drm/i915: convert a few more E->dev_private to to_i915(E)
Also remove some redundant dev and dev_priv locals

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467626365-29871-1-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-2-git-send-email-chris@chris-wilson.co.uk
2016-07-04 12:54:44 +01:00
Chris Wilson
fac5e23e3c drm/i915: Mass convert dev->dev_private to to_i915(dev)
Since we now subclass struct drm_device, we can save pointer dances by
noting the equivalence of struct drm_device and struct drm_i915_private,
i.e. by using to_i915().

   text    data     bss     dec     hex filename
1073824    4562     416 1078802  107612 drivers/gpu/drm/i915/i915.ko
1068976    4562     416 1073954  106322 drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:

@@
expression E;
identifier p;
@@
- struct drm_i915_private *p = E->dev_private;
+ struct drm_i915_private *p = to_i915(E);

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk
2016-07-04 12:54:07 +01:00
Chris Wilson
7b4d3a16dd drm/i915: Remove stop-rings debugfs interface
Now that we have (near) universal GPU recovery code, we can inject a
real hang from userspace and not need any fakery. Not only does this
mean that the testing is far more realistic, but we can simplify the
kernel in the process.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:23 +01:00
Chris Wilson
df4ba5099f drm/i915: Add background commentary to "waitboosting"
Describe the intent of boosting the GPU frequency to maximum before
waiting on the GPU.

RPS waitboosting was introduced with commit b29c19b645 ("drm/i915:
Boost RPS frequency for CPU stalls") but lacked a concise comment in the
code to explain itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-5-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:22 +01:00
Chris Wilson
0e6883b043 drm/i915: Restore waitboost credit to the synchronous waiter
Ideally, we want to automagically have the GPU respond to the
instantaneous load by reclocking itself. However, reclocking occurs
relatively slowly, and to the client waiting for a result from the GPU,
too late. To compensate and reduce the client latency, we allow the
first wait from a client to boost the GPU clocks to maximum. This
overcomes the lag in autoreclocking, at the expense of forcing the GPU
clocks too high. So to offset the excessive power usage, we currently
allow a client to only boost the clocks once before we detect the GPU
is idle again. This works reasonably for say the first frame in a
benchmark, but for many more synchronous workloads (like OpenCL) we find
the GPU clocks remain too low. By noting a wait which would idle the GPU
(i.e. we just waited upon the last known request), we can give that
client the idle boost credit (for their next wait) without the 100ms
delay required for us to detect the GPU idle state. The intention is to
boost clients that are stalling in the process of feeding the GPU more
work (and who in doing so let the GPU idle), without granting boost
credits to clients that are throttling themselves (such as compositors).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Zou, Nanhai" <nanhai.zou@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-4-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:21 +01:00
Chris Wilson
e307d62d5f drm/i915: Remove redundant queue_delayed_work() from throttle ioctl
We know, by design, that whilst the GPU is active (and thus we are
throttling) the retire_worker is queued. Therefore attempting to requeue
it with queue_delayed_work() is a no-op and we can safely remove it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-3-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:21 +01:00
Chris Wilson
1b51bce27b drm/i915: Do not keep postponing the idle-work
Rather than persistently postponing the idle-work everytime somebody
calls i915_gem_retire_requests() (potentially ensuring that we never
reach the idle state), queue the work the first time we detect all
requests are complete. Then if in 100ms, more requests have been queued,
we will abort the idle-worker and wait again until all the new requests
have been completed.

Of course, this does depend upon the idle worker cancelling itself
gracefully from the previous patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-2-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:20 +01:00
Chris Wilson
67d97da349 drm/i915: Only start retire worker when idle
The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.

v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick
the waiter.
v3: Remove the wakeref assertion squelching (now we hold a wakeref for
the hangcheck, any rpm error there is genuine).
v4: To prevent excess work when retiring requests, we split the busy
flag into two, a boolean to denote whether we hold the wakeref and a
bitmask of active engines.
v5: Reorder cancelling hangcheck upon idling to avoid a race where we
might cancel a hangcheck after being preempted by a new task

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:19 +01:00