[Intel-gfx] [PATCH 102/190] drm/i915: Move the "per-ring" default_context to the device

2016-01-11 Thread Chris Wilson
We have a false notion of a default_context allocated per engine, whereas actually it is a singular context reserved for kernel use. Remove it from the engines, and rename it thus. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 19 ++- drivers/gpu/dr

[Intel-gfx] [PATCH 135/190] drm/i915: Move map-and-fenceable tracking to the VMA

2016-01-11 Thread Chris Wilson
By moving map-and-fenceable tracking from the object to the VMA, we gain fine-grained tracking and the ability to track individual fences on the VMA (subsequent patch). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 46 +- drivers/gpu/drm

[Intel-gfx] [PATCH 131/190] drm/i915: Pin the pages first in shmem prepare read/write

2016-01-11 Thread Chris Wilson
There is an improbable, but not impossible, case that if we leave the pages unpin as we operate on the object, then somebody may steal the lock and change the cache domains after we have already inspected them. (Whilst here, avail ourselves of the opportunity to take a couple of steps to make the

[Intel-gfx] [PATCH 110/190] drm/i915: Move vma->pin_count:4 to vma->flags

2016-01-11 Thread Chris Wilson
Let's aide gcc in our pin_count tracking as i915_vma_pin()/i915_vma_unpin() are some of the hotest of the hot functions and gcc doesn't like bitfields that much! Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h| 20 +++ drivers/gpu/drm/i915/i915_gem.c

[Intel-gfx] [PATCH 118/190] drm/i915: Remove locking for get_tiling

2016-01-11 Thread Chris Wilson
Since we are not concerned with userspace racing itself with set-tiling (the order is indeterminant even if we take a lock), then we can safely read back the single obj->tiling_mode and do the static lookup of swizzle mode without having to take a lock. get-tiling is reasonably frequent due to the

[Intel-gfx] [PATCH 134/190] drm/i915: Refactor execbuffer relocation writing

2016-01-11 Thread Chris Wilson
With in the introduction of the reloc page cache, we are just one step away from refactoring the relocation write functions into one. Not only does it tidy the code (slightly), but it greatly simplifies the control logic much to gcc's satisfaction. Signed-off-by: Chris Wilson --- drivers/gpu/drm

[Intel-gfx] [PATCH 091/190] drm/i915: Move context initialisation to first-use

2016-01-11 Thread Chris Wilson
Instead of allocating a new request when allocating a context, use the request that initiated the allocation to emit the context initialisation. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/intel_lrc.c | 42

[Intel-gfx] [PATCH 103/190] drm/i915: Move pinning of dev_priv->kernel_context into its creator

2016-01-11 Thread Chris Wilson
Rather than have every context ask "am I owned by the kernel? pin!", move that logic into the creator of the kernel context, in order to improve code comprehension. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_context.c | 53 +++-- 1 file changed, 24

[Intel-gfx] [PATCH 090/190] drm/i915: Refactor execlists default context pinning

2016-01-11 Thread Chris Wilson
Refactor pinning and unpinning of contexts, such that the default context for an engine is pinned during initialisation and unpinned during teardown (pinning of the context handles the reference counting). Thus we can eliminate the special case handling of the default context that was required to m

[Intel-gfx] [PATCH 112/190] drm/i915: Move obj->active:5 to obj->flags

2016-01-11 Thread Chris Wilson
We are motivated to avoid using a bitfield for obj->active for a couple of reasons. Firstly, we wish to document our lockless read of obj->active using READ_ONCE inside i915_gem_busy_ioctl() and that requires an integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code when presented

[Intel-gfx] [PATCH 122/190] drm/i915: Move setting of request->batch into its single callsite

2016-01-11 Thread Chris Wilson
request->batch_obj is only set by execbuffer for the convenience of debugging hangs. By moving that operation to the callsite, we can simplify all other callers and future patches. We also move the complications of reference handling of the request->batch_obj next to where the active tracking is se

[Intel-gfx] [PATCH 115/190] drm/i915: Remove (struct_mutex) locking for busy-ioctl

2016-01-11 Thread Chris Wilson
By applying the same logic as for wait-ioctl, we can query whether a request has completed without holding struct_mutex. The biggest impact system-wide is removing the flush_active and the contention that causes. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 51 ++

[Intel-gfx] [PATCH 104/190] drm/i915: Remove i915_gem_execbuffer_retire_commands()

2016-01-11 Thread Chris Wilson
Move the single line to the callsite as the name is now misleading, and the purpose is solely to add the request to the execution queue. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/

[Intel-gfx] [PATCH 097/190] drm/i915/shrinker: Flush active on objects before counting

2016-01-11 Thread Chris Wilson
As we inspect obj->active to decide how many objects we can shrink (we only shrink idle objects), it helps to flush the active lists first in order to have a more accurate count of available objects. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 ++ 1 file changed,

[Intel-gfx] [PATCH 124/190] drm/i915: Track pinned vma inside guc

2016-01-11 Thread Chris Wilson
Since the guc allocates and pins and object into the GGTT for its usage, it is more natural to use that pinned VMA as our resource cookie. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 10 +- drivers/gpu/drm/i915/i915_guc_submission.c | 142 ++-

[Intel-gfx] [PATCH 119/190] drm/i915: Reduce amount of duplicate buffer information captured on error

2016-01-11 Thread Chris Wilson
When capturing the error state, we do not need to know about every address space - just those that are related to the error. We know which context is active at the time, therefore we know which VM are implicated in the error. We can then restrict the VM which we report to the relevant subset. Sign

[Intel-gfx] [PATCH 125/190] drm/i915: Track pinned VMA

2016-01-11 Thread Chris Wilson
Treat the VMA as the primary struct responsible for tracking bindings into the GPU's VM. That is we want to treat the VMA returned after we pin an object into the VM as the cookie we hold and eventually release when unpinning. Doing so eliminates the ambiguity in pinning the object and then searchi

[Intel-gfx] [PATCH 137/190] drm/i915: Shrink pages around failure to dma map

2016-01-11 Thread Chris Wilson
Similar to how we handle resource allocation failure of both physical memory and GGTT mmap space, if we fail to allocate our DMAR remapping, shrink some of our other objects and try again. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_gtt.c | 35 ++

[Intel-gfx] [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA

2016-01-11 Thread Chris Wilson
By tracking the iomapping on the VMA itself, we can share that area between multiple users. Also by only revoking the iomapping upon unbinding from the mappable portion of the GGTT, we can keep that iomap across multiple invocations (e.g. execlists context pinning). Signed-off-by: Chris Wilson --

[Intel-gfx] [PATCH 109/190] drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin()

2016-01-11 Thread Chris Wilson
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for i915_gem_object_ggtt_pin(), spare us the confustion and remove it. Removing it now simplifies later patches to change the i915_vma_pin() (and friends) interface. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h

[Intel-gfx] [PATCH 132/190] drm/i915: Tidy up flush cpu/gtt write domains

2016-01-11 Thread Chris Wilson
Since we know the write domain, we can drop the local variable and make the code look a tiny bit simpler. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers

[Intel-gfx] [PATCH 101/190] drm/i915: Only retire if necessary when creating a userptr

2016-01-11 Thread Chris Wilson
We only want to retire requests if we have an existing object that conflicts with the fresh userptr range in order to avoid unnecessary work during creation of every userptr. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_userptr.c | 20 +--- 1 file changed, 13 ins

[Intel-gfx] [PATCH 114/190] drm/i915: Remove (struct_mutex) locking for wait-ioctl

2016-01-11 Thread Chris Wilson
With a bit of care (and leniency) we can iterate over the object and wait for previous rendering to complete with judicial use of atomic reference counting. The ABI requires us to ensure that an active object is eventually flushed (like the busy-ioctl) which is guaranteed by our management of reque

[Intel-gfx] [PATCH 093/190] drm/i915: Move the forced switch back to the kernel context into eviction

2016-01-11 Thread Chris Wilson
Currently, we always switch back to the kernel context (if available, i.e. legacy HW contexts not execlists) whenever we try and idle the GPU. We actually only require the switch when trying to evict everything (in order to prevent fragmentation from placement of the currently active context) from

[Intel-gfx] [PATCH 099/190] drm/i915: Check for request completion before choosing CS flips

2016-01-11 Thread Chris Wilson
Only queue a CS flip if the outstanding request is not complete, and in particular do not rely on the request tracking being fresh (since it is only updated when requests are retired). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/intel_display.c | 5 - 1 file changed, 4 insertions(+)

[Intel-gfx] [PATCH 096/190] drm/i915: Eliminate early submission of context enabling request

2016-01-11 Thread Chris Wilson
Now that the first request is simplified to a pure context enabling request (i.e. any request will do the required initialisation as appropriate), we can forgo explicitly sending that required during early hw initialisation. The only reason we might want to do such is in enabling power contexts, i.

[Intel-gfx] [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half

2016-01-11 Thread Chris Wilson
[ 196.988204] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large: [ 196.988512] clocksource: 'refined-jiffies' wd_now: 9b48 wd_last: 9acb mask: [ 196.988559] clocksource: 'tsc' cs_n

[Intel-gfx] [PATCH 095/190] drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use

2016-01-11 Thread Chris Wilson
The code to switch_mm() is already handled by i915_switch_context(), the only difference required to setup the aliasing ppgtt is that we need to emit te switch_mm() on the first context, i.e. when transitioning from engine->last_context == NULL. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i9

[Intel-gfx] [PATCH 126/190] drm/i915: Print the batchbuffer offset next to BBADDR in error state

2016-01-11 Thread Chris Wilson
It is useful when looking at captured error states to check the recorded BBADDR register (the address of the last batchbuffer instruction loaded) against the expected offset of the batch buffer, and so do a quick check that (a) the capture is true or (b) HEAD hasn't wandered off into the badlands.

[Intel-gfx] [PATCH 108/190] drm/i915: Start passing around i915_vma from execbuffer

2016-01-11 Thread Chris Wilson
During execbuffer we look up the i915_vma in order to reserver them in the VM. However, we then do a double lookup of the vma in order to then pin them, all because we lack the necessary interfaces to operate on i915_vma. v2: Tidy parameter lists to remove one level of redirection in the hot path.

[Intel-gfx] [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction"

2016-01-11 Thread Chris Wilson
This reverts commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae. The patch was only a stop-gap measure that fixed half the problem - the leak of the fbcon when restarting X. A complete solution required releasing the VMA when the object itself was closed rather than rely on file/process exit. The pre

[Intel-gfx] [PATCH 106/190] drm/i915: Split insertion/binding of an object into the VM

2016-01-11 Thread Chris Wilson
Split the insertion into the address space's range manager and binding of that object into the GTT to simplify the code flow when pinning a VMA. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 33 +++-- 1 file changed, 15 insertions(+), 18 deletions(

[Intel-gfx] [PATCH 089/190] drm/i915: Tidy execlists submission and tracking

2016-01-11 Thread Chris Wilson
Other than dramatically simplifying the submission code (requests ftw), we can reduce the execlist spinlock duration and importantly avoid having to hold it across the context switch register reads. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 20 +- drivers/gpu/

[Intel-gfx] [PATCH 105/190] drm/i915: Pad GTT views of exec objects up to user specified size

2016-01-11 Thread Chris Wilson
Our GPUs impose certain requirements upon buffers that depend upon how exactly they are used. Typically this is expressed as that they require a larger surface than would be naively computed by pitch * height. Normally such requirements are hidden away in the userspace driver, but when we accept po

[Intel-gfx] ✗ warning: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: dmesg-warn -> PASS (bdw-ultra) Test kms_flip: Subgroup basic-flip-vs-dpms:

[Intel-gfx] ✗ failure: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: dmesg-warn -> PASS (bdw-ultra) Test kms_flip: Subgroup basic-flip-vs-dpms:

Re: [Intel-gfx] [PATCH 5/5] drm/vmwgfx: Nuke preclose hook

2016-01-11 Thread Thomas Hellstrom
LGTM. Reviewed-by: Thomas Hellstrom On 01/10/2016 11:26 PM, Daniel Vetter wrote: > Again since the drm core takes care of event unlinking/disarming this > is now just needless code. > > v2: I've completely missed eaction->fpriv_head and all the related > code. We need to nuke that too to avoid

[Intel-gfx] ✗ warning: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: pass -> DMESG-WARN (skl-i5k-2) UNSTABLE dmesg-warn -> PASS (bdw-

[Intel-gfx] ✗ failure: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_storedw_loop: Subgroup basic-render: pass -> DMESG-WARN (skl-i5k-2) UNSTABLE dmesg-warn -> PASS (bdw-

Re: [Intel-gfx] [PATCH 07/13] drm/i915: Introduce dedicated object VMA iterator

2016-01-11 Thread Tvrtko Ursulin
On 11/01/16 08:43, Daniel Vetter wrote: > On Fri, Jan 08, 2016 at 01:29:14PM +, Tvrtko Ursulin wrote: >> >> On 08/01/16 11:29, Tvrtko Ursulin wrote: >>> From: Tvrtko Ursulin >>> >>> Purpose is to catch places which iterate the object VMA list >>> without holding the big lock. >>> >>> Implemen

Re: [Intel-gfx] [PATCH 06/13] drm/i915: Only grab timestamps when needed

2016-01-11 Thread Tvrtko Ursulin
On 11/01/16 08:42, Daniel Vetter wrote: On Fri, Jan 08, 2016 at 11:29:45AM +, Tvrtko Ursulin wrote: From: Tvrtko Ursulin No need to call ktime_get_raw_ns twice per unlimited wait and can also elimate a local variable. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem.c |

[Intel-gfx] ✗ failure: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test gem_basic: Subgroup create-close: pass -> DMESG-WARN (skl-i7k-2) Test gem_cpu_reloc: Subgroup basic: pa

Re: [Intel-gfx] [PATCH 03/13] drm/i915: Avoid invariant conditionals in lrc interrupt handler

2016-01-11 Thread Tvrtko Ursulin
On 11/01/16 08:29, Daniel Vetter wrote: On Fri, Jan 08, 2016 at 11:29:42AM +, Tvrtko Ursulin wrote: From: Tvrtko Ursulin There is no need to check on what Gen we are running on every interrupt and every command submission. We can instead set up some of that when engines are initialized, s

[Intel-gfx] ✓ success: Fi.CI.BAT

2016-01-11 Thread Patchwork
== Summary == Built on ff88655b3a5467bbc3be8c67d3e05ebf182557d3 drm-intel-nightly: 2016y-01m-11d-07h-30m-16s UTC integration manifest Test kms_pipe_crc_basic: Subgroup read-crc-pipe-b: dmesg-warn -> PASS (byt-nuc) bdw-ultratotal:138 pass:130 dwarn:1 dfa

[Intel-gfx] [PATCH 064/190] drm/i915: Rename intel_pin_and_map_ring()

2016-01-11 Thread Chris Wilson
For more consistent oop-naming, we would use intel_ring_verb, so pick intel_ring_map(). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/intel_lrc.c| 6 ++--- drivers/gpu/drm/i915/intel_ringbuffer.c | 44 - drivers/gpu/drm/i915/intel_ringbuffer.h | 4

[Intel-gfx] [PATCH 082/190] drm/i915: Count how many VMA are bound for an object

2016-01-11 Thread Chris Wilson
Since we may have VMA allocated for an object, but we interrupted their binding, there is a disparity between have elements on the obj->vma_list and being bound. i915_gem_obj_bound_any() does this check, but this is not rigorously observed - add an explicit count to make it easier. Signed-off-by:

[Intel-gfx] [PATCH 028/190] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno

2016-01-11 Thread Chris Wilson
After the GPU reset and we discard all of the incomplete requests, mark the GPU as having advanced to the last_submitted_seqno (as having completed the requests and ready for fresh work). The impact of this is negligble, as all the requests will be considered completed by this point, it just brings

[Intel-gfx] [PATCH 084/190] drm/i915: Track active vma requests

2016-01-11 Thread Chris Wilson
Hook the vma itself into the i915_gem_request_retire() so that we can accurately track when a solitary vma is inactive (as opposed to having to wait for the entire object to be idle). This improves the interaction when using multiple contexts (with full-ppgtt) and eliminates some frequent list walk

[Intel-gfx] [PATCH 076/190] drm/i915: Rename vma->*_list to *_link for consistency

2016-01-11 Thread Chris Wilson
Elsewhere we have adopted the convention of using '_link' to denote elements in the list (and '_list' for the actual list_head itself), and that the name should indicate which list the link belongs to (and preferrably not just where the link is being stored). s/vma_link/obj_link/ (we iterate over

[Intel-gfx] [PATCH 063/190] drm/i915: Rename struct intel_ringbuffer to intel_ring

2016-01-11 Thread Chris Wilson
Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 21 +++--- drivers/gpu/drm/i915/i915_drv.h| 2 +- drivers/gpu/drm/i915/i915_gem.c| 43 ++-- drivers/gpu/drm/i915/i915_gem_context.c| 2 +- drivers/gpu/drm/i915/i915_gem_execbuffe

[Intel-gfx] [PATCH 052/190] drm/i915: Treat ringbuffer writes as write to normal memory

2016-01-11 Thread Chris Wilson
Ringbuffers are now being written to either through LLC or WC paths, so treating them as simply iomem is no longer adequate. However, for the older !llc hardware, the hardware is documentated as treating the TAIL register update as serialising, so we can relax the barriers when filling the rings (b

[Intel-gfx] [PATCH 085/190] drm/i915: Release vma when the handle is closed

2016-01-11 Thread Chris Wilson
In order to prevent a leak of the vma on shared objects, we need to hook into the object_close callback to destroy the vma on the object for this file. However, if we destroyed that vma immediately we may cause unexpected application stalls as we try to unbind a busy vma - hence we defer the unbind

[Intel-gfx] [PATCH 086/190] drm/i915: Mark the context and address space as closed

2016-01-11 Thread Chris Wilson
When the user closes the context mark it and the dependent address space as closed. As we use an asynchronous destruct method, this has two purposes. First it allows us to flag the closed context and detect internal errors if we to create any new objects for it (as it is removed from the user's nam

[Intel-gfx] [PATCH 003/190] drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER

2016-01-11 Thread Chris Wilson
userptr requires mmu-notifier for full unprivileged support. Most systems have mmu-notifier support already enabled as a requirement for virtualisation support, but we should make the option for i915 to take advantage of mmu-notifiers explicit (and enable by default so that regular userspace can ta

[Intel-gfx] [PATCH 049/190] drm/i915: Disable waitboosting for mmioflips/semaphores

2016-01-11 Thread Chris Wilson
Since commit a6f766f3975185af66a31a2cea2cd38721645999 Author: Chris Wilson Date: Mon Apr 27 13:41:20 2015 +0100 drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts and commit bcafc4e38b6ad03f48989b7ecaff03845b5b7acf Author: Chris Wilson Date: Mon Apr 27 13:41:21 2015 +0100

[Intel-gfx] [PATCH 069/190] drm/i915: Remove duplicate golden render state init from execlists

2016-01-11 Thread Chris Wilson
Now that we use the same vfuncs for emitting the batch buffer in both execlists and legacy, the golden render state initialisation is identical between both. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_render_state.c | 22 -- drivers/gpu/drm/i915/i915_gem_render

[Intel-gfx] [PATCH 072/190] drm/i915: Execlists cannot pin a context without the object

2016-01-11 Thread Chris Wilson
Given that the intel_lr_context_pin cannot succeed without the object, we cannot reach intel_lr_context_unpin() without first allocating that object - so we can remove the redundant test. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/intel_lrc.c | 19 --- 1 file changed, 8

[Intel-gfx] [PATCH 081/190] drm/i915: i915_vma_move_to_active prep patch

2016-01-11 Thread Chris Wilson
This patch is broken out of the next just to remove the code motion from that patch and make it more readable. What we do here is move the i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three stages (read, write, fenced) together so that future modifications to active handling are a

[Intel-gfx] [PATCH 079/190] drm/i915: Reduce the pointer dance of i915_is_ggtt()

2016-01-11 Thread Chris Wilson
The multiple levels of indirect do nothing but hinder the compiler and the pointer chasing turns to be quite painful but painless to fix. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 13 ++--- drivers/gpu/drm/i915/i915_drv.h| 7 --- driver

[Intel-gfx] [PATCH 074/190] drm/i915: Rename request->list to link for consistency

2016-01-11 Thread Chris Wilson
We use "list" to denote the list and "link" to denote an element on that list. Rename request->list to match this idiom. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c | 4 ++-- drivers/gpu/drm/i915/i915_gem.c | 12 ++-- drivers/gpu/drm/i915/i915_gem_req

[Intel-gfx] [PATCH 062/190] drm/i915: Rename extern functions operating on intel_engine_cs

2016-01-11 Thread Chris Wilson
Using intel_ring_* to refer to the intel_engine_cs functions is most confusing! Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 10 +++ drivers/gpu/drm/i915/i915_dma.c| 8 +++--- drivers/gpu/drm/i915/i915_drv.h| 4 +-- drivers/gpu/drm/i9

[Intel-gfx] [PATCH 025/190] drm/i915: Broadwell execlists needs exactly the same seqno w/a as legacy

2016-01-11 Thread Chris Wilson
In legacy mode, we use the gen6 seqno barrier to insert a delay after the interrupt before reading the seqno (as the seqno write is not flushed before the interrupt is sent, the interrupt arrives before the seqno is visible). Execlists ignored the evidence of igt. Note that is harder, but not impo

[Intel-gfx] [PATCH 042/190] drm/i915: Clean up GPU hang message

2016-01-11 Thread Chris Wilson
Remove some redundant kernel messages as we deduce a hung GPU and capture the error state. v2: Fix "hang" vs "no progress" message whilst I was there Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_irq.c | 21 +++-- 1 file changed, 7 insertions(+), 14 deletions(-) dif

[Intel-gfx] [PATCH 053/190] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize

2016-01-11 Thread Chris Wilson
Rather than recomputing whether semaphores are enabled, we can do that computation once during early initialisation as the i915.semaphores module parameter is now read-only. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c | 2 +- drivers/gpu/drm/i915/i915_dma.c |

[Intel-gfx] [PATCH 066/190] drm/i915: Simplify request_alloc by returning the allocated request

2016-01-11 Thread Chris Wilson
If is simpler and leads to more readable code through the callstack if the allocation returns the allocated struct through the return value. The importance of this is that it no longer looks like we accidentally allocate requests as side-effect of calling certain functions. Signed-off-by: Chris W

[Intel-gfx] [PATCH 035/190] drm/i915: Remove redundant queue_delayed_work() from throttle ioctl

2016-01-11 Thread Chris Wilson
We know, by design, that whilst the GPU is active (and thus we are throttling) the retire_worker is queued. Therefore attempting to requeue it with queue_delayed_work() is a no-op and we can safely remove it. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 3 --- 1 file changed

[Intel-gfx] [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c

2016-01-11 Thread Chris Wilson
Migrate the request operations out of the main body of i915_gem.c and into their own C file for easier expansion. v2: Move __i915_add_request() across as well Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_drv.h | 205 +

[Intel-gfx] [PATCH 080/190] drm/i915: Store owning file on the i915_address_space

2016-01-11 Thread Chris Wilson
For the global GTT (and aliasing GTT), the address space is owned by the device (it is a global resource) and so the per-file owner field is NULL. For per-process GTT (where we create an address space per context), each is owned by the opening file. We can use this ownership information to both dis

[Intel-gfx] [PATCH 077/190] drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers

2016-01-11 Thread Chris Wilson
As we can now have multiple VMA inside the global GTT (with partial mappings, rotations, etc), it is no longer true that there may just be a single GGTT entry and so we should walk the full vma_list to count up the actual usage. In addition to unifying the two walkers, switch from multiplying the o

[Intel-gfx] [PATCH 057/190] drm/i915: Remove the identical implementations of request space reservation

2016-01-11 Thread Chris Wilson
Now that we share intel_ring_begin(), reserving space for the tail of the request is identical between legacy/execlists and so the tautology can be removed. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_request.c | 7 +++ drivers/gpu/drm/i915/intel_lrc.c| 15

[Intel-gfx] [PATCH 071/190] drm/i915: Simplify calling engine->sync_to

2016-01-11 Thread Chris Wilson
Since requests can no longer be generated as a side-effect of intel_ring_begin(), we know that the seqno will be unchanged during ring-emission. This predicatablity then means we do not have to check for the seqno wrapping around whilst emitting the semaphore for engine->sync_to(). Signed-off-by:

[Intel-gfx] [PATCH 018/190] drm/i915: Slaughter the thundering i915_wait_request herd

2016-01-11 Thread Chris Wilson
One particularly stressful scenario consists of many independent tasks all competing for GPU time and waiting upon the results (e.g. realtime transcoding of many, many streams). One bottleneck in particular is that each client waits on its own results, but every client is woken up after every batch

[Intel-gfx] [PATCH 083/190] drm/i915: Be more careful when unbinding vma

2016-01-11 Thread Chris Wilson
When we call i915_vma_unbind(), we will wait upon outstanding rendering. This will also trigger a retirement phase, which may update the object lists. If, we extend request tracking to the VMA itself (rather than keep it at the encompassing object), then there is a potential that the obj->vma_list

[Intel-gfx] [PATCH 070/190] drm/i915: Unify legacy/execlists submit_execbuf callbacks

2016-01-11 Thread Chris Wilson
Now that emitting requests is identical between legacy and execlists, we can use the same function to build up the ring for submitting to either engine. (With the exception of i915_switch_contexts(), but in time that will also be handled gracefully.) Signed-off-by: Chris Wilson --- drivers/gpu/d

[Intel-gfx] [PATCH 014/190] drm/i915: Delay queuing hangcheck to wait-request

2016-01-11 Thread Chris Wilson
We can forgo queuing the hangcheck from the start of every request to until we wait upon a request. This reduces the overhead of every request, but may increase the latency of detecting a hang. Howeever, if nothing every waits upon a hang, did it ever hang? It also improves the robustness of the wa

[Intel-gfx] [PATCH 061/190] drm/i915: Rename intel_context[engine].ringbuf

2016-01-11 Thread Chris Wilson
Perform s/ringbuf/ring/ on the context struct for consistency with the ring/engine split. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 2 +- drivers/gpu/drm/i915/i915_drv.h| 2 +- drivers/gpu/drm/i915/i915_guc_submission.c | 6 +-- drivers/gpu/drm/i

[Intel-gfx] [PATCH 034/190] drm/i915: Do not keep postponing the idle-work

2016-01-11 Thread Chris Wilson
Rather than persistently postponing the idle-work everytime somebody calls i915_gem_retire_requests() (potentially ensuring that we never reach the idle state), queue the work the first time we detect all requests are complete. Then if in 100ms, more requests have been queued, we will abort the idl

[Intel-gfx] [PATCH 033/190] drm/i915: Only start retire worker when idle

2016-01-11 Thread Chris Wilson
The retire worker is a low frequency task that makes sure we retire outstanding requests if userspace is being lax. We only need to start it once as it remains active until the GPU is idle, so do a cheap test before the more expensive queue_work(). A consequence of this is that we need correct lock

[Intel-gfx] [PATCH 016/190] drm/i915: Make queueing the hangcheck work inline

2016-01-11 Thread Chris Wilson
Since the function is a small wrapper around schedule_delayed_work(), move it inline to remove the function call overhead for the principle caller. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 17 - drivers/gpu/drm/i915/i915_irq.c | 16 2 fil

[Intel-gfx] [PATCH 009/190] drm/i915: Tighten reset_counter for reset status

2016-01-11 Thread Chris Wilson
In the reset_counter, we use two bits to track a GPU hang and reset. The low bit is a "reset-in-progress" flag that we set to signal when we need to break waiters in order for the recovery task to grab the mutex. As soon as the recovery task has the mutex, we can clear that flag (which we do by inc

[Intel-gfx] [PATCH 039/190] drm/i915: Remove stop-rings debugfs interface

2016-01-11 Thread Chris Wilson
Now that we have (near) universal GPU recovery code, we can inject a real hang from userspace and not need any fakery. Not only does this mean that the testing is far more realistic, but we can simplify the kernel in the process. v2: Replace the i915_stop_rings with a dummy implementation as igt e

[Intel-gfx] [PATCH 024/190] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor

2016-01-11 Thread Chris Wilson
When reading from the HWS page, we use barrier() to prevent the compiler optimising away the read from the volatile (may be updated by the GPU) memory address. This is more suited to READ_ONCE(); make it so. Signed-off-by: Chris Wilson Cc: Daniel Vetter --- drivers/gpu/drm/i915/intel_ringbuffer

[Intel-gfx] [PATCH 075/190] drm/i915: Refactor activity tracking for requests

2016-01-11 Thread Chris Wilson
With the introduction of requests, we amplified the number of atomic refcounted objects we use and update every execbuffer; from none to several references, and a set of references that need to be changed. We also introduced interesting side-effects in the order of retiring requests and objects. I

[Intel-gfx] [PATCH 078/190] drm/i915: Split early global GTT initialisation

2016-01-11 Thread Chris Wilson
Initialising the global GTT is tricky as we wish to use the drm_mm range manager during the modesetting initialisation (to capture stolen allocations from the BIOS) before we actually enable GEM. To overcome this, we currently setup the drm_mm first and then carefully rebind them. Signed-off-by: C

[Intel-gfx] [PATCH 046/190] drm/i915: Derive GEM requests from dma-fence

2016-01-11 Thread Chris Wilson
dma-buf provides a generic fence class for interoperation between drivers. Internally we use the request structure as a fence, and so with only a little bit of interfacing we can rebase those requests on top of dma-buf fences. This will allow us, in the future, to pass those fences back to userspac

[Intel-gfx] [PATCH 065/190] drm/i915: Remove obsolete engine->gpu_caches_dirty

2016-01-11 Thread Chris Wilson
Space for flushing the GPU cache prior to completing the request is preallocated and so cannot fail. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_context.c| 2 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 +--- drivers/gpu/drm/i915/i915_gem_gtt.c| 18

[Intel-gfx] [PATCH 060/190] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs

2016-01-11 Thread Chris Wilson
Having ringbuf->ring point to an engine is confusing, so rename it once again to ring->engine. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_guc_submission.c | 10 +++--- drivers/gpu/drm/i915/intel_lrc.c | 35 +-- drivers/gpu/drm/i915/intel_ringbuffer.c|

[Intel-gfx] [PATCH 022/190] drm/i915: Check the CPU cached value of seqno after waking the waiter

2016-01-11 Thread Chris Wilson
If we have multiple waiters, we may find that many complete on the same wake up. If we first inspect the seqno from the CPU cache, we may reduce the number of heavyweight coherent seqno reads we require. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 14 ++ 1 file

[Intel-gfx] [PATCH 029/190] drm/i915: Convert trace-irq to the breadcrumb waiter

2016-01-11 Thread Chris Wilson
If we convert the tracing over from direct use of ring->irq_get() and over to the breadcrumb infrastructure, we only have a single user of the ring->irq_get and so we will be able to simplify the driver routines (eliminating the redundant validation and irq refcounting). v2: Move to a signaling fr

[Intel-gfx] [PATCH 068/190] drm/i915: Unify adding requests between ringbuffer and execlists

2016-01-11 Thread Chris Wilson
Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_request.c | 8 +- drivers/gpu/drm/i915/intel_lrc.c| 14 ++-- drivers/gpu/drm/i915/intel_ringbuffer.c | 129 +--- drivers/gpu/drm/i915/intel_ringbuffer.h | 21 +++--- 4 files changed, 87 insertion

[Intel-gfx] [PATCH 056/190] drm/i915: Unify intel_ring_begin()

2016-01-11 Thread Chris Wilson
Combine the near identical implementations of intel_logical_ring_begin() and intel_ring_begin() - the only difference is that the logical wait has to check for a matching ring (which is assumed by legacy). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/intel_lrc.c| 141 ++--

[Intel-gfx] [PATCH 067/190] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START

2016-01-11 Thread Chris Wilson
Both the ->dispatch_execbuffer and ->emit_bb_start callbacks do exactly the same thing, add MI_BATCHBUFFER_START to the request's ringbuffer - we need only one vfunc. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 6 +-- drivers/gpu/drm/i915/i915_gem_render_state

[Intel-gfx] [PATCH 038/190] drm/i915: Flush the RPS bottom-half when the GPU idles

2016-01-11 Thread Chris Wilson
Make sure that the RPS bottom-half is flushed before we set the idle frequency when we decide the GPU is idle. This should prevent any races with the bottom-half and setting the idle frequency, and ensures that the bottom-half is bounded by the GPU's rpm reference taken for when it is active (i.e.

[Intel-gfx] [PATCH 027/190] drm/i915: Only query timestamp when measuring elapsed time

2016-01-11 Thread Chris Wilson
Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as we only need to compute the elapsed time for a timed wait. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i91

[Intel-gfx] [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking

2016-01-11 Thread Chris Wilson
In the next patch, request tracking is made more generic and for that we need a new expanded struct and to separate out the logic changes from the mechanical churn, we split out the structure renaming into this patch. v2: Writer's block. Add some spiel about why we track requests. v3: Now i915_gem

[Intel-gfx] [PATCH 047/190] drm/i915: Rename request reference/unreference to get/put

2016-01-11 Thread Chris Wilson
Now that we derive requests from struct fence, swap over to its nomenclature for references. It's shorter and more idiomatic across the kernel. s/i915_gem_request_reference/i915_gem_request_get/ s/i915_gem_request_unreference/i915_gem_request_put/ Signed-off-by: Chris Wilson --- drivers/gpu/drm

[Intel-gfx] [PATCH 008/190] drm/i915: Simplify checking of GPU reset_counter in display pageflips

2016-01-11 Thread Chris Wilson
If we, when we store the reset_counter for the operation, we ensure that it is not in a wedged or in the middle of a reset, we can then assert that if any reset occurs the reset_counter must change. Later we can just compare the operation's reset epoch against the current counter to see if we need

[Intel-gfx] [PATCH 051/190] drm,i915: Introduce drm_malloc_gfp()

2016-01-11 Thread Chris Wilson
I have instances where I want to use drm_malloc_ab() but with a custom gfp mask. And with those, where I want a temporary allocation, I want to try a high-order kmalloc() before using a vmalloc(). So refactor my usage into drm_malloc_gfp(). Signed-off-by: Chris Wilson Cc: dri-de...@lists.freedes

[Intel-gfx] [PATCH 043/190] drm/i915: Skip capturing an error state if we already have one

2016-01-11 Thread Chris Wilson
As we only ever keep the first error state around, we can avoid some work that can be quite intrusive if we don't record the error the second time around. This does move the race whereby the user could discard one error state as the second is being captured, but that race exists in the current code

[Intel-gfx] [PATCH 058/190] drm/i915: Rename request->ring to request->engine

2016-01-11 Thread Chris Wilson
In order to disambiguate between the pointer to the intel_engine_cs (called ring) and the intel_ringbuffer (called ringbuf), rename s/ring/engine/. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c | 11 +-- drivers/gpu/drm/i915/i915_drv.h | 2 +- drive

<    1   2   3   4   5   >