On Oct 25, 2010, at 6:59 PM, Jesse Barnes wrote:
On Mon, 25 Oct 2010 17:13:55 +0300
Pauli Nieminen <ext-pauli.niemi...@nokia.com> wrote:

This allows ddx to set swap_limit if there is more than one back
buffer for drawable. Setting swap_limit has to also check if change
affects a client that is blocked.

This can be used to implement N-buffering in driver with minimal
logic in allocation and selecting next back.

Signed-off-by: Pauli Nieminen <ext-pauli.niemi...@nokia.com>
Tested-by: Francisco Jerez <curroje...@riseup.net>
CC: Kristian Høgsberg <k...@bitplanet.net>
CC: Jesse Barnes <jbar...@virtuousgeek.org>
---
 hw/xfree86/dri2/dri2.c |   23 +++++++++++++++++++++++
 hw/xfree86/dri2/dri2.h |    1 +
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c
index 34f735f..c96eb35 100644
--- a/hw/xfree86/dri2/dri2.c
+++ b/hw/xfree86/dri2/dri2.c
@@ -190,6 +190,29 @@ DRI2AllocateDrawable(DrawablePtr pDraw)
     return pPriv;
 }

+Bool
+DRI2SwapLimit(DrawablePtr pDraw, int swap_limit)
+{
+    DRI2DrawablePtr pPriv = DRI2GetDrawable(pDraw);
+    if (!pPriv)
+       return FALSE;
+
+    pPriv->swap_limit = swap_limit;
+
+    /* Check throttling */
+    if (pPriv->swapsPending >= pPriv->swap_limit)
+       return TRUE;
+
+    if (pPriv->target_sbc == -1 && !pPriv->blockedOnMsc) {
+       if (pPriv->blockedClient) {
+           AttendClient(pPriv->blockedClient);
+           pPriv->blockedClient = NULL;
+       }
+    }
+
+    return TRUE;
+}
+
 typedef struct DRI2DrawableRefRec {
     XID                  id;
     XID                  dri2_id;
diff --git a/hw/xfree86/dri2/dri2.h b/hw/xfree86/dri2/dri2.h
index fe0bf6c..0e0bea4 100644
--- a/hw/xfree86/dri2/dri2.h
+++ b/hw/xfree86/dri2/dri2.h
@@ -251,6 +251,7 @@ extern _X_EXPORT DRI2BufferPtr *DRI2GetBuffersWithFormat(DrawablePtr pDraw,
        int *out_count);

extern _X_EXPORT void DRI2SwapInterval(DrawablePtr pDrawable, int interval); +extern _X_EXPORT Bool DRI2SwapLimit(DrawablePtr pDraw, int swap_limit); extern _X_EXPORT int DRI2SwapBuffers(ClientPtr client, DrawablePtr pDrawable,
                                     CARD64 target_msc, CARD64 divisor,
                                     CARD64 remainder, CARD64 *swap_target,

I think this is ok, but I'd like Mario to ack it as well; I think we
can still get into trouble if multiple clients are sharing a drawable
and doing waits on it.
--
Jesse Barnes, Intel Open Source Technology Center


Ja, i think the patch is ok for doing what it does. But i think it isn't sufficient to implement n-buffering reliably. We'd need more clever scheduling of swaps and vblank events if multiple swaps are pending.

The problem is wrong presentation timing for copyswaps if the gpu is loaded and really corrupt/wrong presentation in the pageflip case.

Assume a client A submits 10 render + swapbuffers requests in a row, for presentation at ten consecutive refresh intervals (and the swaplimit is >=10). The driver would happily queue 10 vblank events in the kernel for consecutive vblank counts.

Meanwhile client B submits a huge rendering request into the command stream, worth multiple video refresh durations of rendering time.

1. BBBBBBBBB -> gpu

The 1st vblank event fires, gets processed by the server and a copyswap commandbuffer A gets submitted into the cs and a intel_swap_event or OML_sync_control timestamp delivered to the client about the supposed time of swap completion.

2. ABBBBBBBB -> gpu

After 1st swap request is stuck behind client B's command buffers for at least 1 video refresh cycle, the next vblank event fires as scheduled and the next copyswap buffer gets queued, more wrong timestamps get delivered:

3. AABBBBBBB -> gpu

In the end they all get executed at some point, but their true swap timing has nothing to do with what was specified in the glXSwapBuffersMscOML() call and the (divisor, remainder) mechanism for dealing with such delayed swaps fails to anything useful completely. Also all delivered timestamps and msc counts are wrong - they indicate perfect timing which has nothing to do with reality.

Pageflipped swaps are worse. Step 1 and 2 would be the same for pageflip, but in step 2 a pageflip would be queued inside the kernel's pageflip ioctl(). One refresh cycle later, when the next vblank fires, the ioctl() gets called again to schedule the next pageflip while the 1st one hasn't completed yet. Result is that the ioctl() returns EBUSY and the swap is discarded. Same for some of the following swaps if the gpu is a bit busy. So with pageflipping, not only the timting is wrong, but many of the rendered frames aren't presented at all and stuff gets really out of sequence.

These funny things can't happen as long as we only allow 1 pending swaps (swaplimit == 1).

My idea was to solve it as follows for n-buffering. If somebody feels like implementing that or something better now, instead of me in a couple of weeks or months, go for it:

1. We maintain a little fifo queue / list of pending swap requests for each drawable. 2. The DRI2ScheduleSwap() function just converts the incoming swapbuffers request into a little struct with all the parameters (target_msc, divisor, remainder, ...) and inserts it at the tail of that queue, after checking that swaps_pending < current swaplimit. If (++swaps_pending) == 1, it kicks off a little DRI2ReallyScheduleOneSwap() function which dequeues this first request from the queue and calls into the ddx->ScheduleSwap() routine. At the end of the DRI2SwapComplete() function, when a swap really got completed, that DRI2ReallyScheduleOneSwap() function gets called and checks if more swap requests are pending in the queue and dequeues and processes the oldest request. If the queue is empty, it is a no op and the whole thing goes idle again. Maybe DRI2ReallyScheduleOneSwap() would also take care of msc jumps due to drawables being moved between crtc's etc.

This can't perfectly solve delays in one swapbuffer request due to traffic jams in the command stream, but it could prevent that one misscheduled swap will trigger the followup ones to go ugly in random ways.

The problems with multiple clients blocking on glXWaitForMscOML or glXWaitForSbcOML() is a separate issue which would need implementation of additional wait queues.

And then, as we discussed on xds, there's the need for proper timestamping of copyswaps in the kernel, otherwise this proposal would only fix the pageflipped case, not the windowed case.

thanks,
-mario

*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.klei...@tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)

_______________________________________________
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Reply via email to