2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

16 Commits

Author SHA1 Message Date
Matthew Brost
8332109430 drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis
Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.

Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.

Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).

Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.

Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.

Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.

Design awareness: Selftest impact.
As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.

Design awareness: Race between guc_request_alloc and guc_context_close.
If a context close is issued while there is a request submission in
flight and a delayed schedule disable is pending, guc_context_close
and guc_request_alloc will race to cancel the delayed disable.
To close the race, make sure that guc_request_alloc waits for
guc_context_close to finish running before checking any state.

Design awareness: GT Reset event.
If a gt reset is triggered, as preparation steps, add an additional step
to ensure all contexts that have a pending delay-disable-schedule task
be flushed of it. Move them directly into the closed state after cancelling
the worker. This is okay because the existing flow flushes all
yet-to-arrive G2H's dropping them anyway.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com
2022-10-26 17:29:43 -07:00
Matthew Auld
54c204c522 Revert "drm/i915/guc: Add delay to disable scheduling after pin count goes to zero"
This reverts commit 6a07990384.

Everything in CI using GuC is now timing out[1], and killing the machine
with this change (perhaps a deadlock?). CI was recently on fire due to
some changes coming in from -rc1, so likely the pre-merge CI results for
this series were invalid? For now just revert, unless GuC experts
already have a fix in mind.

[1] https://intel-gfx-ci.01.org/tree/drm-tip/index.html?

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220819123904.913750-1-matthew.auld@intel.com
2022-08-20 09:41:56 -07:00
Matthew Brost
6a07990384 drm/i915/guc: Add delay to disable scheduling after pin count goes to zero
Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.

As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.

Alan Previn: Matt Brost first introduced this series back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.

Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).

Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.

Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.

Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com
2022-08-18 15:23:47 -07:00
Maarten Lankhorst
2c8ab3339e drm/i915: Pin timeline map after first timeline pin, v4.
We're starting to require the reservation lock for pinning,
so wait until we have that.

Update the selftests to handle this correctly, and ensure pin is
called in live_hwsp_rollover_user() and mock_hwsp_freelist().

Changes since v1:
- Fix NULL + XX arithmatic, use casts. (kbuild)
Changes since v2:
- Clear entire cacheline when pinning.
Changes since v3:
- CACHELINE_BYTES -> TIMELINE_SEQNO_BYTES. (jekstrand)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-3-maarten.lankhorst@linux.intel.com
2021-03-24 11:39:46 +01:00
Chris Wilson
e3d291301f drm/i915/gem: Implement legacy MI_STORE_DATA_IMM
The older arches did not convert MI_STORE_DATA_IMM to using the GTT, but
left them writing to a physical address. The notes suggest that the
primary reason would be so that the writes were cache coherent, as the
CPU cache uses physical tagging. As such we did not implement the
legacy variant of MI_STORE_DATA_IMM and so left all the relocations
synchronous -- but with a small function to convert from the vma address
into the physical address, we can implement asynchronous relocs on these
older arches, fixing up a few tests that require them.

In order to be able to test the legacy paths, refactor the gpu
relocations so that we can hook them up to a selftest.

v2: Use an array of offsets not enum labels for the selftest
v3: Refactor the common igt_hexdump()

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/757
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504140629.28240-1-chris@chris-wilson.co.uk
2020-05-04 15:15:04 +01:00
Chris Wilson
3c7a44bbbf drm/i915/selftests: Perform some basic cycle counting of MI ops
Some basic information that is useful to know, such as how many cycles
is a MI_NOOP.

v2: Keep volatile pages pinned at all times! (Matthew)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Anna Karas <anna.karas@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191111172716.23733-1-chris@chris-wilson.co.uk
2019-11-11 18:30:13 +00:00
Lucas De Marchi
2e8de0879c drm/i915: make i915_selftest.h self-contained
Fix build breakage:

In file included from <command-line>:
./drivers/gpu/drm/i915/i915_selftest.h:125:1: error: unknown type name ‘bool’
  125 | bool __igt_timeout(unsigned long timeout, const char *fmt, ...);
      | ^~~~

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730181759.26162-1-lucas.demarchi@intel.com
2019-07-30 13:41:35 -07:00
Chris Wilson
cb823ed991 drm/i915/gt: Use intel_gt as the primary object for handling resets
Having taken the first step in encapsulating the functionality by moving
the related files under gt/, the next step is to start encapsulating by
passing around the relevant structs rather than the global
drm_i915_private. In this step, we pass intel_gt to intel_reset.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk
2019-07-12 21:06:56 +01:00
Chris Wilson
63251685c1 drm/i915/selftests: Common live setup/teardown
We frequently, but not frequently enough!, remember to flush residual
operations and objects at the end of a live subtest. The purpose is to
cleanup after every subtest, leaving a clean slate for the next subtest,
and perform early detection of leaky state. As this should ideally be
common for all live subtests, pull the task into a common teardown
routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-1-chris@chris-wilson.co.uk
2019-07-03 11:07:57 +01:00
Chris Wilson
06039d9820 drm/i915/selftests: Apply a subtest filter
In bringup on simulated HW even rudimentary tests are slow, and so many
may fail that we want to be able to filter out the noise to focus on the
specific problem. Even just the tests groups provided for igt is not
specific enough, and we would like to isolate one particular subtest
(and probably subsubtests!). For simplicity, allow the user to provide a
command line parameter such as

	i915.st_filter=i915_timeline_mock_selftests/igt_sync

to restrict ourselves to only running on subtest. The exact name to use
is given during a normal run, highlighted as an error if it failed,
debug otherwise. The test group is optional, and then all subtests are
compared for an exact match with the filter (most subtests have unique
names). The filter can be negated, e.g. i915.st_filter=!igt_sync and
then all tests but those that match will be run. More than one match can
be supplied separated by a comma, e.g.

	i915.st_filter=igt_vma_create,igt_vma_pin1

to only run those specified, or

	i915.st_filter=!igt_vma_create,!igt_vma_pin1

to run all but those named. Mixing a blacklist and whitelist will only
execute those subtests matching the whitelist so long as they are
previously excluded in the blacklist.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-1-chris@chris-wilson.co.uk
2019-01-29 19:59:57 +00:00
Chris Wilson
55e4b859a2 drm/i915/selftests: Downgrade igt_timeout message
Give in, since CI continues to incorrectly insist that KERN_NOTICE is a
warning and flags the timeout message as unwanted spam. At first, the
intention was to use the message to indicate which tests might warrant
an extended run, but virtually all tests require a timeout so it is
simply not as interesting as first thought.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103667
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180716080332.32283-6-chris@chris-wilson.co.uk
2018-07-16 11:23:45 +01:00
Chris Wilson
b1a1e5d2ce drm/i915/selftests: Reduce the volume of the timeout message
Originally it was anticipated that timeouts would be a rare event, and
so merit a warning that the test was incomplete. However, for igt we
keep the timeout low, and hitting the timeout is intentional. It no
longer necessitates a warning, but to be expected.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110101110.12042-1-chris@chris-wilson.co.uk
Reviwed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-11-10 11:31:58 +00:00
Chris Wilson
f2f5c0610f drm/i915: Don't use MI_STORE_DWORD_IMM on Sandybridge/vcs
MI_STORE_DWORD_IMM just doesn't work on the video decode engine under
Sandybridge, so refrain from using it. Then switch the selftests over to
using the now common test prior to using MI_STORE_DWORD_IMM.

Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-08-18 11:55:02 +01:00
Chris Wilson
aae4a3d811 drm/i915: Use fault-injection to force the shrinker to run in live GTT tests
It is possible whilst allocating the page-directory tree for a ppgtt
bind that the shrinker may run and reap unused parts of the tree. If the
shrinker happens to remove a chunk of the tree that the
allocate_va_range has already processed, we may then try to insert into
the dangling tree. This test uses the fault-injection framework to force
the shrinker to be invoked before we allocate new pages, i.e. new chunks
of the PD tree.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99295
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-02-13 20:46:32 +00:00
Chris Wilson
170594502c drm/i915: Test coherency of and barriers between cache domains
Write into an object using WB, WC, GTT, and GPU paths and make sure that
our internal API is sufficient to ensure coherent reads and writes.

v2: Avoid invalid free upon allocation error

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-21-chris@chris-wilson.co.uk
2017-02-13 20:45:45 +00:00
Chris Wilson
953c7f82eb drm/i915: Provide a hook for selftests
Some pieces of code are independent of hardware but are very tricky to
exercise through the normal userspace ABI or via debugfs hooks. Being
able to create mock unit tests and execute them through CI is vital.
Start by adding a central point where we can execute unit tests and
a parameter to enable them. This is disabled by default as the
expectation is that these tests will occasionally explode.

To facilitate integration with igt, any parameter beginning with
i915.igt__ is interpreted as a subtest executable independently via
igt/drv_selftest.

Two classes of selftests are recognised: mock unit tests and integration
tests. Mock unit tests are run as soon as the module is loaded, before
the device is probed. At that point there is no driver instantiated and
all hw interactions must be "mocked". This is very useful for writing
universal tests to exercise code not typically run on a broad range of
architectures. Alternatively, you can hook into the live selftests and
run when the device has been instantiated - hw interactions are real.

v2: Add a macro for compiling conditional code for mock objects inside
real objects.
v3: Differentiate between mock unit tests and late integration test.
v4: List the tests in natural order, use igt to sort after modparam.
v5: s/late/live/
v6: s/unsigned long/unsigned int/
v7: Use igt_ prefixes for long helpers.
v8: Deobfuscate macros overriding functions, stop using -I$(src)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-1-chris@chris-wilson.co.uk
2017-02-13 20:45:21 +00:00