Tracking the size of the VMA as allocated allows us to dramatically
reduce the complexity of later functions (like inserting the VMA in to
the drm_mm range manager).
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem.c | 103 ++--
drivers/gpu/drm/i9
There is only one wait on request function now, so drop the "expert"
indication of leading __.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_gem.c | 18 +-
drivers/gpu/drm/i915/i915_gem_request.c | 16
drivers/gpu
We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.
v2: Move the cheap unlikely tests into the caller
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtien
Cc: Daniel Vetter
[After some kerneldoc fixes, which I hope Daniel explains better
As the list retirement is now clean of implementation details, we can
move it closer to the request management.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_gem.c | 44 -
drivers/gpu/drm/i915/i915_gem_request.c |
This is not the full fix, as we are required to percolate the u64 nature
down through the drm_mm stack, but this is required now to prevent
explosions due to mismatch between execbuf (eb_vma_misplaced) and vma
binding (i915_vma_misplaced) - and reduces the risk of spurious changes
as we adjust the
Rather than a mismash of struct drm_device *dev and struct
drm_i915_private *dev_priv being used freely within a function, be
consistent and only pass along dev_priv.
Signed-off-by: Chris Wilson
Reviewed-by: Daniel Vetter
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/intel_display.c
intel_overlay already tracks its last flip request, along with action to
take after its completion. Refactor intel_overlay to reuse the common
i915_gem_active tracker.
v2: Now using i915_gem_retire_fn typedef
References: https://bugs.freedesktop.org/show_bug.cgi?id=93730
References: https://bugs.
The individual bits inside obj->frontbuffer_bits are protected by each
plane->mutex, but the whole bitfield may be accessed by multiple KMS
operations simultaneously and so the RMW need to be under atomics.
However, for updating the single field we do not need to mandate that it
be under the struct
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when deal
Move the single line to the callsite as the name is now misleading, and
the purpose is solely to add the request to the execution queue. Here,
we can see that if we failed to dispatch the batch from the request, we
can forgo flushing the GPU when closing the request.
Signed-off-by: Chris Wilson
R
In order to be consistent with other address space functions, we want to
pass around 64-bit sizes, even though all known global GTT are limited
to 4GiB. Similar, we are trying to be consistent in using the _ggtt_
nomenclature when referring to the special global GTT.
Signed-off-by: Chris Wilson
-
In the next few patches, the VMA pinning API is overhauled and to reduce
the churn we pull out the update to the accessors into a prep patch.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_debugfs.c| 2 +-
drivers/gpu/drm/i915/i915_gem.c
For consistency, internal functions should take drm_i915_private rather
than drm_device. Now that we are subclassing drm_device, there are no
more size wins, but being consistent is its own blessing.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h| 5 +++--
drivers/gpu/d
As we always allocate in chunks of 4096 (that being both the PAGE_SIZE
and our own GTT_PAGE_SIZE), we know that all results from the drm_mm are
aligned to at least 4096. The drm_mm allocator itself is optimised for
alignment == 0, and so by converting alignments of 4096 to 0 we can
satisfy our own
Eviction is VM local, so we can ignore the significance of the
drm_device in the caller, and leave it to i915_gem_evict_something() to
manage itself.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_drv.h | 3 +--
drivers/gpu/drm/i915/i915_gem.c
Not only is i915_vma_pin() called for every single object on every single
execbuf, it is usually a simple increment as the VMA is already bound for
execution by the GPU. Rearrange the tests for unbound and pin_count
overflow so that we can do the increment and test very cheaply and
compact enough t
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_gem_request.c | 25 -
drivers/gpu/drm/i915/i915_gem_request.h | 4
drivers/gpu/drm/i915/intel_ringbuffer.c | 18 +-
3 files changed, 13 insertions(+), 34 deletions
This reverts commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae.
The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The pre
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!
v2: Tidy parameter lists to remove one level of
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.
v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_cre
Just move it earlier so that we can use the companion nonblocking
version in a couple of more callsites without having to add a forward
declaration.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_gem.c | 202
1 fil
In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind
Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept po
This reimplements the denial-of-service protection against igt from
commit 227f782e4667 ("drm/i915: Retire requests before creating a new
one") and transfers the stall from before each batch into get_pages().
The issue is that the stall is increasing latency between batches which
is detrimental in
The future annotations will track the locking used for access to ensure
that it is always sufficient. We make the preparations now to present
the API ahead and to make sure that GCC can eliminate the unused
parameter.
Before: 6298417 3619610 696320 10614347 a1f64b vmlinux
After: 6298417
i915_gem_valid_gtt_space() is used after inserting the VMA to double
check the list - the location should have been chosen to pass all the
restrictions.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_gem.c | 5 +
1 file changed, 1 insertion(+), 4 dele
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
I
The drop_pages() function is a dangerous trap in that it can release the
passed in object pointer and so unless the caller is aware, it can
easily trick us into using the stale object afterwards. Move it into its
solitary callsite where we know it is safe.
Signed-off-by: Chris Wilson
Reviewed-by:
In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_d
This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are a
Tidy up the for loops that handle waiting for read/write vs read-only
access.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_gem.c | 158 +++-
1 file changed, 75 insertions(+), 83 deletions(-)
diff --git a/drivers/gpu/
In the future, we will want to add annotations to the i915_gem_active
struct. The API is thus expanded to hide direct access to the contents
of i915_gem_active and mediated instead through a number of helpers.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i91
By tracking each request occupying space inside an individual
intel_ring, we can greatly simplify the logic of tracking available
space and not worry about other timelines. (Each ring is an ordered
timeline of committed requests.)
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
dri
For the global GTT (and aliasing GTT), the address space is owned by the
device (it is a global resource) and so the per-file owner field is
NULL. For per-process GTT (where we create an address space per
context), each is owned by the opening file. We can use this ownership
information to both dis
If the user floods the GPU with so many requests that the engine stalls
waiting for free space, don't automatically promote the GPU to maximum
frequencies. If the GPU really is saturated with work, it will migrate
to high clocks by itself, otherwise it is merely a user flooding us with
busy-work.
Now that we use the same vfuncs for emitting the batch buffer in both
execlists and legacy, the golden render state initialisation is
identical between both.
v2: gcc wants so.ggtt_offset initialised (even though it is not used)
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
Link:
htt
Space reservation is already safe with respect to the ring->size
modulus, but hardware only expects to see values in the range
0...ring->size-1 (inclusive) and so requires the modulus to prevent us
writing the value ring->size instead of 0. As this is only required for
the register itself, we can d
In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.
v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem
In order to be more consistent with the rest of the request construction
and ring emission, use the common names for the ring and request.
Rather than using signaler_req, waiter_req, and intel_ring *wait, we use
plain req and ring.
Signed-off-by: Chris Wilson
Cc: Joonas Lahtinen
Reviewed-by: Jo
As we can now have multiple VMA inside the global GTT (with partial
mappings, rotations, etc), it is no longer true that there may just be a
single GGTT entry and so we should walk the full vma_list to count up
the actual usage. In addition to unifying the two walkers, switch from
multiplying the o
If is simpler and leads to more readable code through the callstack if
the allocation returns the allocated struct through the return value.
The importance of this is that it no longer looks like we accidentally
allocate requests as side-effect of calling certain functions.
Signed-off-by: Chris W
When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two
purposes. First it allows us to flag the closed context and detect
internal errors if we to create any new objects for it (as it is removed
from the user's na
With adding engine->submit_request, we now have a bunch of functions
with similar names used at different stages of the execlist submission.
Try a different coat of paint, to hopefully reduce confusion between the
requests, intel_engine_cs and the actual execlists submision process.
Signed-off-by:
Initialising the global GTT is tricky as we wish to use the drm_mm range
manager during the modesetting initialisation (to capture stolen
allocations from the BIOS) before we actually enable GEM. To overcome
this, we currently setup the drm_mm first and then carefully rebind
them.
v2: Fixup after
We use "list" to denote the list and "link" to denote an element on that
list. Rename request->list to match this idiom.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_debugfs.c | 4 ++--
drivers/gpu/drm/i915/i915_gem.c | 10 +-
drive
Since we track requests, and requests are always added to the GPU fully
formed, we never have to flush the incomplete request and know that the
given request will eventually complete without any further action on our
part.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu
GCC was inlining the init and setup functions, but was getting itself
confused into thinking that variables could be used uninitialised. If we
do the inline for gcc, it is happy! As a bonus we shrink the code.
v2: A couple of minor tweaks from Joonas
Signed-off-by: Chris Wilson
Cc: Joonas Lahtin
If we rewrite the I915_WRITE_TAIL specialisation for the legacy
ringbuffer as submitting the request onto the ringbuffer, we can unify
the vfunc with both execlists and GuC in the next patch.
v2: Drop the modulus from the I915_WRITE_TAIL as it is currently being
applied in intel_ring_advance() aft
Rather than pass in the num_dwords that the caller wishes to use after
the signal command packet, split the breadcrumb emission into two phases
and have both the signal and breadcrumb individiually acquire space on
the ring. This makes the interface simpler for the reader, and will
simplify for pat
Move request submission from emit_request into its own common vfunc
from i915_add_request().
v2: Convert I915_DISPATCH_flags to BIT(x) whilst passing
v3: Rename a few functions to match.
v4: Reenable execlists submission after disabling guc.
Signed-off-by: Chris Wilson
Link:
http://patchwork.fr
When we call i915_vma_unbind(), we will wait upon outstanding rendering.
This will also trigger a retirement phase, which may update the object
lists. If, we extend request tracking to the VMA itself (rather than
keep it at the encompassing object), then there is a potential that the
obj->vma_list
Now that emitting requests is identical between legacy and execlists, we
can use the same function to build up the ring for submitting to either
engine. (With the exception of i915_switch_contexts(), but in time that
will also be handled gracefully.)
Signed-off-by: Chris Wilson
Reviewed-by: Joona
If the object is active and we need to perform a relocation upon it, we
need to take the slow relocation path. Before we do, double check the
active requests to see if they have completed.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c |
Both the ->dispatch_execbuffer and ->emit_bb_start callbacks do exactly
the same thing, add MI_BATCHBUFFER_START to the request's ringbuffer -
we need only one vfunc.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
Link:
http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-20-
Joonas doesn't like the tiny function, especially if I go around making
it more complicated and using it elsewhere. To remove that temptation,
remove the function!
Signed-off-by: Chris Wilson
Link:
http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-21-git-send-email-ch...@chris-wilson
Since we may have VMA allocated for an object, but we interrupted their
binding, there is a disparity between have elements on the obj->vma_list
and being bound. i915_gem_obj_bound_any() does this check, but this is
not rigorously observed - add an explicit count to make it easier.
Signed-off-by:
As GEN6+ is now a simple variant on the basic breadcrumbs + tail write,
reuse the common code.
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
Link:
http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-26-git-send-email-ch...@chris-wilson.co.uk
---
drivers/gpu/drm/i915/intel_
As gen6_emit_request() only differs from i9xx_emit_request() when
semaphores are enabled, only use the specialised vfunc in that scenario.
v2: Reorder semaphore init so as to keep engine->emit_request default
vfunc selection compact.
Signed-off-by: Chris Wilson
Link:
http://patchwork.freedeskto
Since requests can no longer be generated as a side-effect of
intel_ring_begin(), we know that the seqno will be unchanged during
ring-emission. This predicatablity then means we do not have to check
for the seqno wrapping around whilst emitting the semaphore for
engine->sync_to().
Signed-off-by:
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.
s/struct intel_ringbuffer/struct intel_ring/
s
Space for flushing the GPU cache prior to completing the request is
preallocated and so cannot fail - the GPU caches will always be flushed
along with the completed request. This means we no longer have to track
whether the GPU cache is dirty between batches like we had to with the
outstanding_lazy
For more consistent oop-naming, we would use intel_ring_verb, so pick
intel_ring_pin() and intel_ring_unpin().
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
Link:
http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-17-git-send-email-ch...@chris-wilson.co.uk
---
drivers/gpu
Rather than passing a complete set of GPU cache domains for either
invalidation or for flushing, or even both, just pass a single parameter
to the engine->emit_flush to determine the required operations.
engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE)
engine->emit_flush(0, GPU) -
Now that we have a clear ring/engine split and a struct intel_ring, we
no longer need the stopgap ringbuf names.
Signed-off-by: Chris Wilson
Link:
http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-16-git-send-email-ch...@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen
---
drivers/g
The current beginning, almost all have been reviewed with a small amount
of review fallout. Perhaps the biggest unresolved review comment is from
Daniel concerning intel_frontbuffer kerneldoc which is still a mystery to
me.
-Chris
___
Intel-gfx mailing l
Perform s/ringbuf/ring/ on the context struct for consistency with the
ring/engine split.
v2: Kill an outdated error_ringbuf label
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
Link:
http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-14-git-send-email-ch...@chris-wilson.c
Now that we have disambuigated ring and engine, we can use the clearer
and more consistent name for the intel_ringbuffer pointer in the
request.
@@
struct drm_i915_gem_request *r;
@@
- r->ringbuf
+ r->ring
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
Link:
http://patchwork.freedesk
Both perform the same actions with more or less indirection, so just
unify the code.
v2: Add back a few intel_engine_cs locals
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
Link:
http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-11-git-send-email-ch...@chris-wilson.co.uk
On Fri, Jul 29, 2016 at 10:34:35AM +0100, Peter Antoine wrote:
> This change adds a RC6 test for the MOCS. The MOCS registers are loaded
> and saved as part of the RC6 cycle but not all the registers are
> saved/restored. This tests that those registers are correctly restored.
>
> Signed-off-by: P
On Fri, Jul 29, 2016 at 10:34:36AM +0100, Peter Antoine wrote:
> If one of the previous tests fails then the following tests fail.
> This patch means that the following tests do not fail when the previous
> test fails (for some cases).
The problem is just gem_mocs_settings hasn't split its tests u
Op 29-07-16 om 22:33 schreef Matt Roper:
> On Fri, Jul 29, 2016 at 12:39:05PM +0300, Ville Syrjälä wrote:
>> On Thu, Jul 28, 2016 at 05:03:52PM -0700, Matt Roper wrote:
>>> This is completely untested (and probably horribly broken/buggy), but
>>> here's a quick mockup of the general approach I was
On Fri, Jul 29, 2016 at 11:23:43AM +0300, Joonas Lahtinen wrote:
> On ke, 2016-07-27 at 12:14 +0100, Chris Wilson wrote:
>
> > + if (i915_vma_misplaced(vma, size, alignment, flags)) {
> > + if (flags & PIN_NONBLOCK &&
> > + (i915_vma_is_pinned(vma) || i915_vma_is_active(v
101 - 172 of 172 matches
Mail list logo