On 2020-12-22 at 20:19:49 +0530, Jani Nikula wrote:
> Follow the usual naming pattern for functions.
>
> Signed-off-by: Jani Nikula
> ---
> drivers/gpu/drm/i915/display/intel_dp.c | 2 +-
> drivers/gpu/drm/i915/display/intel_pps.c | 2 +-
> drivers/gpu/drm/i915/display/intel_pps.h | 2 +-
> 3 f
On 2020-12-22 at 20:19:48 +0530, Jani Nikula wrote:
> Add a "reinit" call to hide some more pps functions, and clean up the
> callers. A minor functional change is not holding the pps lock across
> the whole operation in intel_dp_encoder_reset, but instead doing it in
> two steps.
>
> Signed-off-b
On 2020-12-22 at 20:19:47 +0530, Jani Nikula wrote:
> Add a new init call to be called only once, unlike some of the other
> various init calls. This lets us hide more functions within intel_pps.c.
>
> Signed-off-by: Jani Nikula
Looks good to me.
Reviewed-by: Anshuman Gupta
> ---
> drivers/gpu/
On 2020-12-22 at 20:19:46 +0530, Jani Nikula wrote:
> Add a locked version of intel_pps_vdd_off_sync_unlocked() that does
> everything the callers expect it to.
>
> Signed-off-by: Jani Nikula
> ---
> drivers/gpu/drm/i915/display/intel_dp.c | 31 +++-
> drivers/gpu/drm/i915/d
On 2020-12-22 at 20:19:45 +0530, Jani Nikula wrote:
> Follow the usual naming pattern for functions, both for the prefix and
> the _unlocked suffix for functions that expect the lock to be held when
IMHO referring * pps lock * would be good in commit log.
Thanks,
Anshuman.
> calling.
>
> Signed-o
On 2020-12-22 at 20:19:44 +0530, Jani Nikula wrote:
> Follow the usual naming pattern for functions. We don't need to repeat
> "panel" here.
>
> Follow the usual naming pattern for functions.
>
> Signed-off-by: Jani Nikula
> ---
> drivers/gpu/drm/i915/display/intel_ddi.c | 8
> driver
On 2020-12-22 at 20:19:43 +0530, Jani Nikula wrote:
> Follow the usual naming pattern for functions.
>
> Signed-off-by: Jani Nikula
Looks good to me.
Reviewed-by: Anshuman Gupta
> ---
> drivers/gpu/drm/i915/display/intel_dp.c | 6 +++---
> drivers/gpu/drm/i915/display/intel_pps.c | 10 +--
Since we use a flag within i915_request.flags to indicate when we have
boosted the request (so that we only apply the boost) once, this can be
used as the serialisation with i915_request_retire() to avoid having to
explicitly take the i915_request.lock which is more heavily contended.
Signed-off-b
If any engine asks for the tasklet to be kicked from the CS interrupt,
do so. Currently, this is used by the execlists scheduler backends to
feed in the next request to the HW, and similarly could be used by a
ring scheduler, as will be seen in the next patch.
Signed-off-by: Chris Wilson
Reviewed
Replace the priolist rbtree with a skiplist. The crucial difference is
that walking and removing the first element of a skiplist is O(1), but
O(lgN) for an rbtree, as we need to rebalance on remove. This is a
hindrance for submission latency as it occurs between picking a request
for the priolist a
Since we process schedule-in of a context after submitting the request,
if we decide to reset the context at that time, we also have to cancel
the requets we have marked for submission.
Signed-off-by: Chris Wilson
---
.../drm/i915/gt/intel_execlists_submission.c | 22 ++-
driver
In preparation for removing the has_initial_breadcrumb field, add a
helper function for the existing callers.
Signed-off-by: Chris Wilson
Reviewed-by: Mika Kuoppala
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c| 2 +-
drivers/gpu/drm/i915/gt/intel_ring_submission.c | 4 ++--
drivers/gpu/
Since schedule-in/out is now entirely serialised by the tasklet bitlock,
we do not need to worry about concurrent in/out operations and so reduce
the atomic operations to plain instructions.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_engine_cs.c| 2 +-
drivers/gpu/
In the process of preparing to reuse the request submission logic for
other backends, lift it out of the execlists backend.
While this operates on the common structs, we do have a bit of backend
knowledge, which is harmless for !lrc but still unsightly.
Signed-off-by: Chris Wilson
---
drivers/g
As a topological sort, we expect it to run in linear graph time,
O(V+E). In removing the recursion, it is no longer a DFS but rather a
BFS, and performs as O(VE). Let's demonstrate how bad this is with a few
examples, and build a few test cases to verify a potential fix.
Signed-off-by: Chris Wilso
Treat the dependency between bonded requests as weak and leave the
remainder of the pair on the GPU if one hangs.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/Kconfig.profile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/Kconfig.profile
b/drivers/gpu/drm/i915/Kconfig.profile
index 35bbe2b80596..3eacea42b19f 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
As context-in/out is now always serialised, we do not have to worry
about concurrent enabling/disable of the busy-stats and can reduce the
atomic_t active to a plain unsigned int, and the seqlock to a seqcount.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_engine_cs.c| 8 ++-
We only need to evaluate the current status of the context when it is
scheduled in, we will force a reschedule when the context is closed
propagating the change to inflight contexts.
Signed-off-by: Chris Wilson
Cc: Matthew Brost
---
drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4
As we do not have any internal priority levels, the priority can be set
directed from the user values.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/display/intel_display.c | 4 +-
drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +--
.../i915/gem/selftests/i915_gem_object_blt.c | 4
As we know when we expect the heartbeat to be checked for completion,
pass this information along as its deadline. We still do not complain if
the deadline is missed, at least until we have tried a few times, but it
will allow for quicker hang detection on systems where deadlines are
adhered to.
S
Extract the scheduling queue from "execlists" into the per-engine
scheduling structs, for reuse by other backends.
Signed-off-by: Chris Wilson
---
.../gpu/drm/i915/gem/i915_gem_context_types.h | 2 +-
drivers/gpu/drm/i915/gem/i915_gem_wait.c | 1 +
drivers/gpu/drm/i915/gt/intel_engine_cs.
When we are not using semaphores with a context/engine, we can simply
reuse the same seqno location across wraps, but we still require each
timeline to have its own address. For LRC submission, each context is
prefixed by a per-process HWSP, which provides us with a unique location
for each context
Move the scheduling tasklists out of the execlists backend into the
per-engine scheduling bookkeeping.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_engine.h| 14
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 ++--
drivers/gpu/drm/i915/gt/intel_engine_types.h |
Switch over from FIFO global submission to the priority-sorted
topographical scheduler. At the cost of more busy work on the CPU to
keep the GPU supplied with the next packet of requests, this allows us
to reorder requests around submission stalls.
This also enables the timer based RPS, with the e
Currently, we construct and teardown the i915_dependency chains using a
global spinlock. As the lists are entirely local, it should be possible
to use an double-lock with an explicit nesting [signaler -> waiter,
always] and so avoid the costly convenience of a global spinlock.
Signed-off-by: Chris
Looking to the future, we want to set the scheduling attributes
explicitly and so replace the generic engine->schedule() with the more
direct i915_request_set_priority()
What it loses in removing the 'schedule' name from the function, it
gains in having an explicit entry point with a stated goal.
When we know that we are inside the timeline mutex, or inside the
submission flow (under active.lock or the holder's rcu lock), we know
that the rq->hwsp is stable and we can use the simpler direct version.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +-
Relative timelines are relative to either the global or per-process
HWSP, and so we can replace the absolute addressing with store-index
variants for position invariance.
Signed-off-by: Chris Wilson
Reviewed-by: Matthew Brost
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 98 +--
Take a snapshot of the ctx->engines, so we can avoid taking the
ctx->engines_mutex for a mere read in get_engines().
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 39 +
1 file changed, 8 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/
Since we are not using any internal priority levels, and in the next few
patches will introduce a new index for which the optimisation is not so
lear cut, discard the small table within the priolist.
Signed-off-by: Chris Wilson
---
.../gpu/drm/i915/gt/intel_engine_heartbeat.c | 2 +-
.../drm/i
In anticipation of wanting to be able to call pi from underneath an
engine's active.lock, rework the priority inheritance to primarily work
along an engine's priority queue, delegating any other engine that the
chain may traverse to a worker. This reduces the global spinlock from
governing the mult
In the process of preparing to reuse the request submission logic for
other backends, lift it out of the execlists backend. It already
operates on the common structs, so just a matter of moving and renaming.
Signed-off-by: Chris Wilson
---
.../drm/i915/gt/intel_execlists_submission.c | 55 +
If we allow for per-client timelines, even with legacy ring submission,
we open the door to a world full of possiblities [scheduling and
semaphores].
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/gen6_engine_cs.c | 89 +---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c |
Build a bare bones scheduler to sit on top the global legacy ringbuffer
submission. This virtual execlists scheme should be applicable to all
older platforms.
A key problem we have with the legacy ring buffer submission is that it
only allows for FIFO queuing. All clients share the global request
For a modeset/pageflip, there is a very precise deadline by which the
frame must be completed in order to hit the vblank and be shown. While
we don't pass along that exact information, we can at least inform the
scheduler that this request-chain needs to be completed asap.
Signed-off-by: Chris Wil
This was removed in commit 478ffad6d690 ("drm/i915: drop
engine_pin/unpin_breadcrumbs_irq") as the last user had been removed,
but now there is a promise of a new user in the next patch.
Signed-off-by: Chris Wilson
Reviewed-by: Mika Kuoppala
---
drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 24
Originally, we used the signal->lock as a means of following the
previous link in its timeline and peeking at the previous fence.
However, we have replaced the explicit serialisation with a series of
very careful probes that anticipate the links being deleted and the
fences recycled before we are a
The current implementation of walking the children of a deferred
requests lacks the backtracking required to reduce the dfs to linear.
Having pulled it from execlists into the common layer, we can reuse the
dfs code for priority inheritance.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i
If supported by the backend, we can quickly look at the context's
inflight engine rather than search along the active list to confirm.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 +++
drivers/gpu/drm/i915/gt/intel_context.h | 10
When cloning the engines from the source context, we need to ensure that
the engines are not freed as we copy them, and that the flags we clone
from the source correspond with the engines we copy across. To do this
we need only take a reference to the src->engines, rather than hold the
src->engine_
Extract the scheduler lists into a related structure, stop sprawling
over struct intel_engine_cs
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 26 +-
drivers/gpu/drm/i915/gt/intel_engine_types.h | 8 +
.../drm/i915/gt/intel_execlists_submission
Explicitly differentiate between the absolute and relative timelines,
and the global HWSP and ppHWSP relative offsets. When using a timeline
that is relative to a known status page, we can replace the absolute
addressing in the commands with indexed variants.
Signed-off-by: Chris Wilson
Reviewed-
In the next patch, we remove the strict priority system and continuously
re-evaluate the relative priority of tasks. As such we need to enable
the timeslice whenever there is more than one context in the pipeline.
This simplifies the decision and removes some of the tweaks to suppress
timeslicing,
A quick test to verify that the backend accepts each type of timeline
and can use them to track and control request emission.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/selftest_timeline.c | 105
1 file changed, 105 insertions(+)
diff --git a/drivers/gpu/drm/i9
Now that we are careful to always force-restore contexts upon rewinding
(where necessary), we can restore our optimisation to skip over
completed active execlists when dequeuing.
Referenecs: 35f3fd8182ba ("drm/i915/execlists: Workaround switching back to a
completed context")
References: 8ab3a381
The first "scheduler" was a topographical sorting of requests into
priority order. The execution order was deterministic, the earliest
submitted, highest priority request would be executed first. Priority
inheritance ensured that inversions were kept at bay, and allowed us to
dynamically boost prio
Make the ability to suspend and resume a request and its dependents
generic.
Signed-off-by: Chris Wilson
---
.../drm/i915/gt/intel_execlists_submission.c | 148 +-
drivers/gpu/drm/i915/i915_scheduler.c | 120 ++
drivers/gpu/drm/i915/i915_scheduler.h |
Wrap cmpxchg64 with a try_cmpxchg()-esque helper. Hiding the old-value
dance in the helper allows for cleaner code.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_utils.h | 32 +++
1 file changed, 32 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_uti
Exercise rescheduling priority inheritance around a sequence of requests
that wrap around all the engines.
Signed-off-by: Chris Wilson
---
.../gpu/drm/i915/selftests/i915_scheduler.c | 219 ++
1 file changed, 219 insertions(+)
diff --git a/drivers/gpu/drm/i915/selftests/i915_s
Couple up the context in/out accounting to record how long each engine
is busy handling requests. This is exposed to userspace for more accurate
measurements, and also enables our soft-rps timer.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_ring_scheduler.c | 6 ++
1 file ch
When we introduced the saturated workload detection to tell us to back
off from semaphore usage [semaphores have a noticeable impact on
contended bus cycles with the CPU for some heavy workloads], we first
introduced it as a per-context tracker. This allows individual contexts
to try and optimise t
Allow multiple requests to be queued unto a virtual engine, whereas
before we only allowed a single request to be queued at a time. The
advantage of keeping just one request in the queue was to ensure that we
always decided late which engine to use. However, with the introduction
of the virtual dea
Lift the busy-stats context-in/out implementation out of intel_lrc, so
that we can reuse it for other scheduler implementations.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/gt/intel_engine_stats.h | 49 +++
.../drm/i915/gt/intel_execlists_submission.c | 34 +---
To support legacy ring buffer scheduling, we want a virtual ringbuffer
for each client. These rings are purely for holding the requests as they
are being constructed on the CPU and never accessed by the GPU, so they
should not be bound into the GGTT, and we can use plain old WB mapped
pages.
As th
Lift the ability to defer a request until later from execlists into the
common layer.
Signed-off-by: Chris Wilson
---
.../drm/i915/gt/intel_execlists_submission.c | 55 ++--
drivers/gpu/drm/i915/i915_scheduler.c | 66 ---
drivers/gpu/drm/i915/i915_scheduler.h
Avoid the full blown memory barrier of test_and_set_bit() by noting the
completed request and removing it from the lists.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_request.c | 16 +---
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/
The core of the scheduling algorithm is that we compute the topological
order of the fence DAG. Knowing that we have a DAG, we should be able to
use a DFS to compute the topological sort in linear time. However,
during the conversion of the recursive algorithm into an iterative one,
the memorizatio
A key prolem with legacy ring buffer submission is that it is an inheret
FIFO queue across all clients; if one blocks, they all block. A
scheduler allows us to avoid that limitation, and ensures that all
clients can submit in parallel, removing the resource contention of the
global ringbuffer.
Hav
Allow the sysadmin to specify whether we should prevent the CPU from
entering higher C-states while waiting for the CPU, in order to reduce
the latency of request completions and so speed up client continuations.
The target dma latency can be adjusted per-engine using,
/sys/class/drm/card
Currently we know that the timeline status page is at most a page in
size, and so we can preserve the lower 12bits of the offset when
relocating the status page in the GGTT. If we want to use a larger
object, such as the context state, we may not necessarily use a position
within the first page and
Hi Sir,
I am using the new Intel Elkhart Lake SKU1 platform in framebuffer console mode.
Is there any chance to using frame buffer console in two display with extension
mode? and in different resolution?
I have tested in Ubuntu 20.04 kernel 5.10.2 with i915.force_probe=4555 boot
config, the grap
> -Original Message-
> From: Intel-gfx On Behalf Of Jani
> Nikula
> Sent: Tuesday, December 22, 2020 8:20 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Nikula, Jani
> Subject: [Intel-gfx] [PATCH 01/13] drm/i915/pps: abstract panel power
> sequencer from intel_dp.c
>
> In a long overdu
> -Original Message-
> From: Intel-gfx On Behalf Of Jani
> Nikula
> Sent: Tuesday, December 22, 2020 8:20 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Nikula, Jani
> Subject: [Intel-gfx] [PATCH 02/13] drm/i915/pps: rename pps_{, un}lock ->
> intel_pps_{, un}lock
>
> Start following
== Series Details ==
Series: drm/i915/cml : Add TGP PCH support (rev2)
URL : https://patchwork.freedesktop.org/series/85013/
State : failure
== Summary ==
CI Bug Log - changes from CI_DRM_9528_full -> Patchwork_19218_full
Summary
---
65 matches
Mail list logo