Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-28 Thread DRC
On 2/28/17 8:12 AM, Hardening wrote:
> This has been done a quite long time ago, here =>
> https://gitorious.org/weston/jonseverinsson-weston/?p=weston:jonseverinsson-weston.git;a=commit;h=9e26d9356255f4af1723700272805f6d356c7d7a
> 
> It's clearly outdated, and IIRC people here didn't like the way it was
> implemented, but you have the idea. It's using DRI render nodes to do
> the rendering.

Thanks for the hint.  I was able to integrate your patch with the latest
Weston code by borrowing some of the code from the DRM backend, but it
doesn't seem to display anything (my patch is attached.  Maybe someone
can point out what I'm doing wrong, so I could at least get this working
for experimental purposes.)

The following GitHub comment summarizes my general feelings at the moment:

https://github.com/TurboVNC/turbovnc/issues/18#issuecomment-282827192

To the best of my knowledge (someone please correct me if I got any part
of this wrong), I think it would be much easier and more flexible to
pursue the interposer approach for now, mostly because of the nVidia
proprietary driver issue (that's a show-stopper for VirtualGL and
TurboVNC) but also because implementing a headless hardware-accelerated
remote display backend "the right way" would likely require some changes
to gl_renderer and the compositor code, including implementing PBO
readback in order to prevent the readback in the compositor from
blocking other OpenGL applications.  I anticipate that further tuning
will likely be required as well, and it may still not perform as well as
an interposer, because VirtualGL reads back the pixels in the
application's rendering thread but uses a separate thread for displaying
the pixels.  Thus, if the compositor was blocking on encoding a pixel
region for remote display, the Wayland client could still render the
next frame in the background.

I am not in a good position to develop or maintain changes to Weston,
and since interest has been expressed in "officially" supporting a
headless hardware-accelerated remote display backend in the long term,
it makes sense for me to develop an interposer as a stopgap measure.
The interposer could also provide a springboard for other developers who
are interested in making their own Weston remote display backends, since
they would not have to deal with the problem of OpenGL hardware
acceleration.  In the long term, I anticipate that this interposer will
be rendered obsolete, but that's fine, because a lot of the effort
necessary to build an EGL interposer for Wayland would benefit the
existing GLX interposer in VirtualGL as well, so the only thing that
might be "thrown away" in the long term would be the Wayland-specific
parts.  Barring any further information, that seems to be the best path
forward at the moment, but if there is any movement on the nVidia front
or on the headless hardware-accelerated remote display backend front,
please keep me in the loop.

DRC

From f331db279a138295701806de0c8bd71f385d2796 Mon Sep 17 00:00:00 2001
From: DRC 
Date: Tue, 28 Feb 2017 22:49:22 -0600
Subject: [PATCH] rdp-backend.so: OpenGL hardware acceleration

---
 compositor/main.c  |  23 +++-
 configure.ac   |   4 +-
 libweston/compositor-rdp.c | 314 +++--
 libweston/compositor-rdp.h |  24 
 4 files changed, 352 insertions(+), 13 deletions(-)

diff --git a/compositor/main.c b/compositor/main.c
index 72c3cd1..7f4b8db 100644
--- a/compositor/main.c
+++ b/compositor/main.c
@@ -601,6 +601,7 @@ usage(int error_code)
"  --rdp4-key=FILE\tThe file containing the key for RDP4 
encryption\n"
"  --rdp-tls-cert=FILE\tThe file containing the certificate for 
TLS encryption\n"
"  --rdp-tls-key=FILE\tThe file containing the private key for 
TLS encryption\n"
+   "  --use-pixman\t\tUse the pixman (CPU) renderer\n"
"\n");
 #endif
 
@@ -1329,11 +1330,14 @@ static void
 rdp_backend_output_configure(struct wl_listener *listener, void *data)
 {
struct weston_output *output = data;
+   struct weston_config *wc = wet_get_config(output->compositor);
struct wet_compositor *compositor = 
to_wet_compositor(output->compositor);
struct wet_output_config *parsed_options = compositor->parsed_options;
+   struct weston_config_section *section;
const struct weston_rdp_output_api *api = 
weston_rdp_output_get_api(output->compositor);
int width = 640;
int height = 480;
+   char *gbm_format = NULL;
 
assert(parsed_options);
 
@@ -1342,6 +1346,8 @@ rdp_backend_output_configure(struct wl_listener 
*listener, void *data)
return;
}
 
+   section = weston_config_get_section(wc, "output", "name", output->name);
+
if (parsed_options->width)
width = parsed_options->width;
 
@@ -1351,6 +1357,12 @@ rdp_backend_output_configure(struct wl_listener 
*listener, 

Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-28 Thread Hardening
Le 24/02/2017 à 00:51, DRC a écrit :
> On 12/15/16 3:01 AM, Pekka Paalanen wrote:
>> The current RDP-backed is written to set up and use only the Pixman
>> renderer. Pixman renderer is a software renderer, and will not
>> initialize EGL in the compositor. Therefore no support for hardware
>> accelerated OpenGL gets advertised to clients, and clients fall back to
>> software GL.
>>
>> You can fix this purely by modifying libweston/compositor-rdp.c file,
>> writing the support for initializing the GL-renderer. Then you get
>> hardware accelerated GL support for all Wayland clients without any
>> other modifications anywhere.
>>
>> Why that has not been done already is because it was thought that
>> having clients using hardware OpenGL while the compositor is not cannot
>> be performant enough to justify the effort. Also, it pulls in the
>> dependency to EGL and GL libs, which are huge. Obviously your use case
>> is different and this rationale does not apply.
>>
>> The hardest part in adding the support to the RDP-backend is
>> implementing the buffer content access efficiently. RDP requires pixel
>> data in system memory so the CPU can read it, but GL-renderer has all
>> pixel data in graphics memory which often cannot be directly read by
>> the CPU. Accessing that pixel data requires a copy (glReadPixels), and
>> there is nowadays a helper: weston_surface_copy_content(), however the
>> function is not efficient and is so far meant only for debugging and
>> testing.
> 
> I am attempting to modify the RDP backend to prove the concept that
> hardware-accelerated OpenGL is possible with a remote display backend,
> but my lack of familiarity with the code is making this very
> challenging.  It seems that the RDP backend uses Pixman both for GL
> rendering and also to maintain its framebuffer in main memory
> (shadow_surface.)  Is that correct?  If so, then it seems that I would
> need to continue using the shadow surface but use gl_renderer instead of
> the Pixman renderer, then implement my own method of transferring pixels
> from the GL renderer to the shadow surface at the end of every frame (?)
>  I've been trying to work from compositor-wayland.c as a template, but
> it's unclear how everything connects, which parts of that code I need in
> order to implement hardware acceleration, and which parts are
> unnecessary.  I would appreciate it if someone who has familiarity with
> the RDP backend could give me some targeted advice.

This has been done a quite long time ago, here =>
https://gitorious.org/weston/jonseverinsson-weston/?p=weston:jonseverinsson-weston.git;a=commit;h=9e26d9356255f4af1723700272805f6d356c7d7a

It's clearly outdated, and IIRC people here didn't like the way it was
implemented, but you have the idea. It's using DRI render nodes to do
the rendering.

Best regards.
-- 
David FORT
website: http://www.hardening-consulting.com/

___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-24 Thread Emil Velikov
On 24 February 2017 at 09:36, Pekka Paalanen  wrote:
> On Thu, 23 Feb 2017 17:51:24 -0600
> DRC  wrote:
>
>> On 12/15/16 3:01 AM, Pekka Paalanen wrote:
>> > The current RDP-backed is written to set up and use only the Pixman
>> > renderer. Pixman renderer is a software renderer, and will not
>> > initialize EGL in the compositor. Therefore no support for hardware
>> > accelerated OpenGL gets advertised to clients, and clients fall back to
>> > software GL.
>> >
>> > You can fix this purely by modifying libweston/compositor-rdp.c file,
>> > writing the support for initializing the GL-renderer. Then you get
>> > hardware accelerated GL support for all Wayland clients without any
>> > other modifications anywhere.
>> >
>> > Why that has not been done already is because it was thought that
>> > having clients using hardware OpenGL while the compositor is not cannot
>> > be performant enough to justify the effort. Also, it pulls in the
>> > dependency to EGL and GL libs, which are huge. Obviously your use case
>> > is different and this rationale does not apply.
>> >
>> > The hardest part in adding the support to the RDP-backend is
>> > implementing the buffer content access efficiently. RDP requires pixel
>> > data in system memory so the CPU can read it, but GL-renderer has all
>> > pixel data in graphics memory which often cannot be directly read by
>> > the CPU. Accessing that pixel data requires a copy (glReadPixels), and
>> > there is nowadays a helper: weston_surface_copy_content(), however the
>> > function is not efficient and is so far meant only for debugging and
>> > testing.
>>
>> I am attempting to modify the RDP backend to prove the concept that
>> hardware-accelerated OpenGL is possible with a remote display backend,
>> but my lack of familiarity with the code is making this very
>> challenging.  It seems that the RDP backend uses Pixman both for GL
>> rendering and also to maintain its framebuffer in main memory
>> (shadow_surface.)  Is that correct?  If so, then it seems that I would
>> need to continue using the shadow surface but use gl_renderer instead of
>> the Pixman renderer, then implement my own method of transferring pixels
>> from the GL renderer to the shadow surface at the end of every frame (?)
>
> That is pretty much the case, yes. I suppose you could also just let
> GL-renderer maintain the framebuffer to only read it out for
> transmission rather than maintaining a shadow copy, but the difference
> is mostly just conceptual.
>
>>  I've been trying to work from compositor-wayland.c as a template, but
>> it's unclear how everything connects, which parts of that code I need in
>> order to implement hardware acceleration, and which parts are
>> unnecessary.  I would appreciate it if someone who has familiarity with
>> the RDP backend could give me some targeted advice.
>
> I cannot help with the RDP-specifics.
>
> Since this compositor is essentially headless in the local machine, you
> would want to use DRM render nodes instead of KMS nodes for accessing
> the GPU. The KMS node would be reserved by any display server running
> for the local monitors.
>
> You would initialize EGL somehow to use a render node. I can't really
> provide a good suggestion for an architecture off-hand, but maybe these
> could help:
> https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_platform_device.txt
> https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_platform_gbm.txt
>
FYI:

One can use EGL_EXT_device_drm to get the master fd, but we need
another extension for the render.
I've got some work on the topic - both EGL Device in mesa and the new
extension - need to see if I can finish it these days.

-Emil
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-24 Thread Pekka Paalanen
On Thu, 23 Feb 2017 17:51:24 -0600
DRC  wrote:

> On 12/15/16 3:01 AM, Pekka Paalanen wrote:
> > The current RDP-backed is written to set up and use only the Pixman
> > renderer. Pixman renderer is a software renderer, and will not
> > initialize EGL in the compositor. Therefore no support for hardware
> > accelerated OpenGL gets advertised to clients, and clients fall back to
> > software GL.
> > 
> > You can fix this purely by modifying libweston/compositor-rdp.c file,
> > writing the support for initializing the GL-renderer. Then you get
> > hardware accelerated GL support for all Wayland clients without any
> > other modifications anywhere.
> > 
> > Why that has not been done already is because it was thought that
> > having clients using hardware OpenGL while the compositor is not cannot
> > be performant enough to justify the effort. Also, it pulls in the
> > dependency to EGL and GL libs, which are huge. Obviously your use case
> > is different and this rationale does not apply.
> > 
> > The hardest part in adding the support to the RDP-backend is
> > implementing the buffer content access efficiently. RDP requires pixel
> > data in system memory so the CPU can read it, but GL-renderer has all
> > pixel data in graphics memory which often cannot be directly read by
> > the CPU. Accessing that pixel data requires a copy (glReadPixels), and
> > there is nowadays a helper: weston_surface_copy_content(), however the
> > function is not efficient and is so far meant only for debugging and
> > testing.  
> 
> I am attempting to modify the RDP backend to prove the concept that
> hardware-accelerated OpenGL is possible with a remote display backend,
> but my lack of familiarity with the code is making this very
> challenging.  It seems that the RDP backend uses Pixman both for GL
> rendering and also to maintain its framebuffer in main memory
> (shadow_surface.)  Is that correct?  If so, then it seems that I would
> need to continue using the shadow surface but use gl_renderer instead of
> the Pixman renderer, then implement my own method of transferring pixels
> from the GL renderer to the shadow surface at the end of every frame (?)

That is pretty much the case, yes. I suppose you could also just let
GL-renderer maintain the framebuffer to only read it out for
transmission rather than maintaining a shadow copy, but the difference
is mostly just conceptual.

>  I've been trying to work from compositor-wayland.c as a template, but
> it's unclear how everything connects, which parts of that code I need in
> order to implement hardware acceleration, and which parts are
> unnecessary.  I would appreciate it if someone who has familiarity with
> the RDP backend could give me some targeted advice.

I cannot help with the RDP-specifics.

Since this compositor is essentially headless in the local machine, you
would want to use DRM render nodes instead of KMS nodes for accessing
the GPU. The KMS node would be reserved by any display server running
for the local monitors.

You would initialize EGL somehow to use a render node. I can't really
provide a good suggestion for an architecture off-hand, but maybe these
could help:
https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_platform_device.txt
https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_platform_gbm.txt

Or any other way to get EGL initialized with a real GPU, but without an actual
display or a window system. Depending on how that works, you might be
rendering into an FBO and use glReadPixels(), or create an EGLSurface
and use glReadPixels() or something else perhaps.

Weston will need to be able to run EGL with a render node for testing
the GL-renderer on real hardware, but so far we don't have that code,
so I don't have an example.

A completely different path would be to duplicate the parts of
GL-renderer you need (access to client provided buffers) and throw away
the rest (all the rendering, damage tracking, and whatnot), initialize
EGL with a render node, and just scrape the client buffer contents
directly without composition. In this case you would be transmitting
client window contents as is, not the final composition. That might
have (bandwidth) drawbacks of its own.


Thanks,
pq


pgpYqGx096BYl.pgp
Description: OpenPGP digital signature
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-24 Thread Pekka Paalanen
On Fri, 24 Feb 2017 02:20:08 +0100
Christian Stroetmann  wrote:

> Maybe a look on another compositor or the IVI shell might help you.

Please do not look at the IVI shell. It has nothing to do with this
while being an example of a... "inferior" window management
architecture dictated by GENIVI.


Thanks,
pq


pgpJnGrr_YE_x.pgp
Description: OpenPGP digital signature
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Fwd: Remote display with 3D acceleration using Wayland/Weston

2017-02-23 Thread Erik De Rijcke
-- Forwarded message --
From: Erik De Rijcke <derijcke.e...@gmail.com>
Date: 2017-02-24 8:46 GMT+01:00
Subject: Re: Remote display with 3D acceleration using Wayland/Weston
To: Christian Stroetmann <stroetm...@ontolab.com>


I made a poc for a remote (gl) display some time ago for my own compositor,
but instead of using rdp, I used html5 websocket & canvas.

On a 800x600 display I would get a steady 30fps. There were basically 2
performance challenges to be solved. The glreadpixels and the compression
of raw pixels to a compressed image format (png in my case). In the end I
had to resolve to multi threading to get acceptable performance. This was
without other optimizations like only reading out damaged regions, or
resorting to something more efficient than glreadpixels.

Nowadays I'm working on a html5 solution that is more (very) closely
related to what waltham aims to be. That is where only the content of
individual surfaces are send over rtp (openwebrtc in my case). While
communication with the back-end happens over a wayland alike protocol [
https://github.com/udevbe/westfield ]. Each client then basically runs it's
own compositor.

One major issue that is still unresolved is to efficiently encode a gl
texture to a video frame (vp8/9, h264) without leaving the gpu space early.
Ideally the encoding to a video frame (or a png or jpeg) happens on the gpu
so the amount of pixels to be read out is minimized.

What you should ask yourself is if you really want to send over the entire
screen or just the individual surfaces. Using individual surfaces should
give you less work to get some decent performance, another side effect is
that moving a surface does not require re-sending the entire screen.
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-23 Thread Christian Stroetmann

On the 24th of February 2017 00:51, DRC wrote:

On 12/15/16 3:01 AM, Pekka Paalanen wrote:

The current RDP-backed is written to set up and use only the Pixman
renderer. Pixman renderer is a software renderer, and will not
initialize EGL in the compositor. Therefore no support for hardware
accelerated OpenGL gets advertised to clients, and clients fall back to
software GL.

You can fix this purely by modifying libweston/compositor-rdp.c file,
writing the support for initializing the GL-renderer. Then you get
hardware accelerated GL support for all Wayland clients without any
other modifications anywhere.

Why that has not been done already is because it was thought that
having clients using hardware OpenGL while the compositor is not cannot
be performant enough to justify the effort. Also, it pulls in the
dependency to EGL and GL libs, which are huge. Obviously your use case
is different and this rationale does not apply.

The hardest part in adding the support to the RDP-backend is
implementing the buffer content access efficiently. RDP requires pixel
data in system memory so the CPU can read it, but GL-renderer has all
pixel data in graphics memory which often cannot be directly read by
the CPU. Accessing that pixel data requires a copy (glReadPixels), and
there is nowadays a helper: weston_surface_copy_content(), however the
function is not efficient and is so far meant only for debugging and
testing.

I am attempting to modify the RDP backend to prove the concept that
hardware-accelerated OpenGL is possible with a remote display backend,
but my lack of familiarity with the code is making this very
challenging.  It seems that the RDP backend uses Pixman both for GL
rendering and also to maintain its framebuffer in main memory
(shadow_surface.)  Is that correct?  If so, then it seems that I would
need to continue using the shadow surface but use gl_renderer instead of
the Pixman renderer, then implement my own method of transferring pixels
from the GL renderer to the shadow surface at the end of every frame (?)
  I've been trying to work from compositor-wayland.c as a template, but
it's unclear how everything connects, which parts of that code I need in
order to implement hardware acceleration, and which parts are
unnecessary.  I would appreciate it if someone who has familiarity with
the RDP backend could give me some targeted advice.



Aloha

I have not come so far some years ago but your approach sounds good. 
Basically, it is a translation of what you do with your GLX interposer 
(VirtualGL) and high-speed X proxy (TurboVNC) to Weston, which reduces 
to the handling of the buffers.


That said, I would try to get it running and then compare the speed with 
the old version running with XWindow, and also upload the code for 
review by the gurus.


Maybe a look on another compositor or the IVI shell might help you.

Regards
Christian Stroetmann
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-23 Thread Christian Stroetmann

On the 24th of February 2017 00:51, DRC wrote:

On 12/15/16 3:01 AM, Pekka Paalanen wrote:

The current RDP-backed is written to set up and use only the Pixman
renderer. Pixman renderer is a software renderer, and will not
initialize EGL in the compositor. Therefore no support for hardware
accelerated OpenGL gets advertised to clients, and clients fall back to
software GL.

You can fix this purely by modifying libweston/compositor-rdp.c file,
writing the support for initializing the GL-renderer. Then you get
hardware accelerated GL support for all Wayland clients without any
other modifications anywhere.

Why that has not been done already is because it was thought that
having clients using hardware OpenGL while the compositor is not cannot
be performant enough to justify the effort. Also, it pulls in the
dependency to EGL and GL libs, which are huge. Obviously your use case
is different and this rationale does not apply.

The hardest part in adding the support to the RDP-backend is
implementing the buffer content access efficiently. RDP requires pixel
data in system memory so the CPU can read it, but GL-renderer has all
pixel data in graphics memory which often cannot be directly read by
the CPU. Accessing that pixel data requires a copy (glReadPixels), and
there is nowadays a helper: weston_surface_copy_content(), however the
function is not efficient and is so far meant only for debugging and
testing.

I am attempting to modify the RDP backend to prove the concept that
hardware-accelerated OpenGL is possible with a remote display backend,
but my lack of familiarity with the code is making this very
challenging.  It seems that the RDP backend uses Pixman both for GL
rendering and also to maintain its framebuffer in main memory
(shadow_surface.)  Is that correct?  If so, then it seems that I would
need to continue using the shadow surface but use gl_renderer instead of
the Pixman renderer, then implement my own method of transferring pixels
from the GL renderer to the shadow surface at the end of every frame (?)
  I've been trying to work from compositor-wayland.c as a template, but
it's unclear how everything connects, which parts of that code I need in
order to implement hardware acceleration, and which parts are
unnecessary.  I would appreciate it if someone who has familiarity with
the RDP backend could give me some targeted advice.



Aloha

I have not come so far some years ago but your approach sounds good. 
Basically, it is a translation of what you do with your GLX interposer 
(VirtualGL) and high-speed X proxy (TurboVNC) to Weston, which reduces 
to the handling of the buffers.


That said, I would try to get it running and then compare the speed with 
the old version running with XWindow, and also upload the code for 
review by the gurus.


Maybe a look on another compositor or the IVI shell might help you.

Regards
Christian Stroetmann
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-23 Thread DRC
On 12/15/16 3:01 AM, Pekka Paalanen wrote:
> The current RDP-backed is written to set up and use only the Pixman
> renderer. Pixman renderer is a software renderer, and will not
> initialize EGL in the compositor. Therefore no support for hardware
> accelerated OpenGL gets advertised to clients, and clients fall back to
> software GL.
> 
> You can fix this purely by modifying libweston/compositor-rdp.c file,
> writing the support for initializing the GL-renderer. Then you get
> hardware accelerated GL support for all Wayland clients without any
> other modifications anywhere.
> 
> Why that has not been done already is because it was thought that
> having clients using hardware OpenGL while the compositor is not cannot
> be performant enough to justify the effort. Also, it pulls in the
> dependency to EGL and GL libs, which are huge. Obviously your use case
> is different and this rationale does not apply.
> 
> The hardest part in adding the support to the RDP-backend is
> implementing the buffer content access efficiently. RDP requires pixel
> data in system memory so the CPU can read it, but GL-renderer has all
> pixel data in graphics memory which often cannot be directly read by
> the CPU. Accessing that pixel data requires a copy (glReadPixels), and
> there is nowadays a helper: weston_surface_copy_content(), however the
> function is not efficient and is so far meant only for debugging and
> testing.

I am attempting to modify the RDP backend to prove the concept that
hardware-accelerated OpenGL is possible with a remote display backend,
but my lack of familiarity with the code is making this very
challenging.  It seems that the RDP backend uses Pixman both for GL
rendering and also to maintain its framebuffer in main memory
(shadow_surface.)  Is that correct?  If so, then it seems that I would
need to continue using the shadow surface but use gl_renderer instead of
the Pixman renderer, then implement my own method of transferring pixels
from the GL renderer to the shadow surface at the end of every frame (?)
 I've been trying to work from compositor-wayland.c as a template, but
it's unclear how everything connects, which parts of that code I need in
order to implement hardware acceleration, and which parts are
unnecessary.  I would appreciate it if someone who has familiarity with
the RDP backend could give me some targeted advice.
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-20 Thread Pekka Paalanen
On Mon, 19 Dec 2016 13:23:26 -0600
DRC  wrote:

> On 12/19/16 2:48 AM, Pekka Paalanen wrote:
> > Hmm, indeed, maybe it would be possible if you are imposing your own
> > EGL middle-man library between the application and the real EGL library.
> > 
> > That's definitely a idea to look into. I cannot say off-hand why it
> > would not work, so maybe it can work. :-)
> > 
> > To summarize, with that approach, you would have the client send only
> > wl_shm buffers to the compositor, and the compositor never needs to
> > touch EGL at all. It also has the benefit that the read-back cost
> > (glReadPixels) is completely in the client process, so the compositor
> > will not stall on it, and you don't need the stuff I explained about in
> > the compositor. And you get support for the proprietary drivers!
> > 
> > Sorry for not realizing the "wrap libEGL.so" approach earlier.  
> 
> Yes, exactly.  That is essentially how VirtualGL already works with
> GLX/OpenGL, so it is a solution space I know well.  As I see it, the
> advantages of implementing this at the compositor level are:
> 
> -- Automatic hardware acceleration for window managers that might need
> to use OpenGL (which includes most of them these days)
> -- No need to launch OpenGL applications using a wrapper script
> -- Potentially the compositor could tap into GPU-based encoding methods
> (NVENC, for instance) quite easily to compress the pixel updates sent to
> the client.  This becomes more difficult when the pixel readback is
> occurring in the OpenGL application process but the compression is
> occurring in another process.
> 
> The potential advantages of an interposer are:
> 
> -- Much easier for me to develop, since this would represent basically a
> subset of VirtualGL's existing functionality (the GLX interposer could
> also benefit from a back end that accesses the GPU directly through EGL
> rather than forwarding the GLX requests through a local X server.)
> -- The readback occurs in-process, so only applications that actually
> need it (OpenGL applications) are subject to that overhead, and the
> design of VirtualGL makes it such that the readback of the current frame
> occurs in parallel with the display of the last frame.
> -- Theoretically should work with any Wayland implementation or back
> end.  It goes without saying that I'm not the only one in this game.  In
> the current market, there are lots of different vendors producing their
> X11 proxy of choice, but all of them can use VirtualGL to add GPU
> acceleration.  I don't know how the market will look with Wayland, but I
> would anticipate that those same vendors will produce their own Wayland
> proxies of choice as well, so there might be an advantage to retaining
> VirtualGL as an independent bolt-on product.
> -- That is a good point about the compositor not stalling on
> glReadPixels()-- although I think I could probably mitigate that by
> using PBOs rather than synchronous glReadPixels().
> 
> I know for sure that I can make the interposer approach work, and
> perhaps that would be a good short-term approach to get something up and
> running while the other approach is explored in more depth.

Hi,

I fully agree on everything you said here. :-)


Thanks,
pq


pgp1ZHuIPTVLU.pgp
Description: OpenPGP digital signature
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-19 Thread DRC
On 12/19/16 2:48 AM, Pekka Paalanen wrote:
> Hmm, indeed, maybe it would be possible if you are imposing your own
> EGL middle-man library between the application and the real EGL library.
> 
> That's definitely a idea to look into. I cannot say off-hand why it
> would not work, so maybe it can work. :-)
> 
> To summarize, with that approach, you would have the client send only
> wl_shm buffers to the compositor, and the compositor never needs to
> touch EGL at all. It also has the benefit that the read-back cost
> (glReadPixels) is completely in the client process, so the compositor
> will not stall on it, and you don't need the stuff I explained about in
> the compositor. And you get support for the proprietary drivers!
> 
> Sorry for not realizing the "wrap libEGL.so" approach earlier.

Yes, exactly.  That is essentially how VirtualGL already works with
GLX/OpenGL, so it is a solution space I know well.  As I see it, the
advantages of implementing this at the compositor level are:

-- Automatic hardware acceleration for window managers that might need
to use OpenGL (which includes most of them these days)
-- No need to launch OpenGL applications using a wrapper script
-- Potentially the compositor could tap into GPU-based encoding methods
(NVENC, for instance) quite easily to compress the pixel updates sent to
the client.  This becomes more difficult when the pixel readback is
occurring in the OpenGL application process but the compression is
occurring in another process.

The potential advantages of an interposer are:

-- Much easier for me to develop, since this would represent basically a
subset of VirtualGL's existing functionality (the GLX interposer could
also benefit from a back end that accesses the GPU directly through EGL
rather than forwarding the GLX requests through a local X server.)
-- The readback occurs in-process, so only applications that actually
need it (OpenGL applications) are subject to that overhead, and the
design of VirtualGL makes it such that the readback of the current frame
occurs in parallel with the display of the last frame.
-- Theoretically should work with any Wayland implementation or back
end.  It goes without saying that I'm not the only one in this game.  In
the current market, there are lots of different vendors producing their
X11 proxy of choice, but all of them can use VirtualGL to add GPU
acceleration.  I don't know how the market will look with Wayland, but I
would anticipate that those same vendors will produce their own Wayland
proxies of choice as well, so there might be an advantage to retaining
VirtualGL as an independent bolt-on product.
-- That is a good point about the compositor not stalling on
glReadPixels()-- although I think I could probably mitigate that by
using PBOs rather than synchronous glReadPixels().

I know for sure that I can make the interposer approach work, and
perhaps that would be a good short-term approach to get something up and
running while the other approach is explored in more depth.

___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-19 Thread Pekka Paalanen
On Mon, 19 Dec 2016 10:50:22 +0100
Christian Stroetmann  wrote:

> On19.Dec.2016 09:48, Pekka Paalanen wrote:
> > Sorry for not realizing the "wrap libEGL.so" approach earlier.
> 
> Yeah, and how does this look like when put in context with Waltham?

It has nothing to do with Waltham at all. It is completely orthogonal.

You still have the Wayland compositor beside the application, and the
compositor is still serving Wayland clients and using something else on
the network. It could be waltham or RDP or VNC or HTTP... neither the
application nor the wrapper-libEGL would care.

The libEGL.so wrapper could help by converting the application output
to wl_shm in the application process, which is the alternative to
implementing support in the compositor for handling all the different
hardware-related buffer types. Both ways presumably work, and with
slightly different characteristics.


Thanks,
pq


pgp5gIN3RrUqi.pgp
Description: OpenPGP digital signature
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-19 Thread Christian Stroetmann

On19.Dec.2016 09:48, Pekka Paalanen wrote:

On Fri, 16 Dec 2016 11:35:48 -0600
DRC  wrote:


On 12/16/16 3:06 AM, Pekka Paalanen wrote:

I should probably tell a little more, because what I explained above is
a simplification due to using a single path for all buffer types.
...

Thanks again.  This is all very new to me, and I guess I don't fully
understand where these buffer types would come into play.  Bearing in
mind that I really don't care whether non-OpenGL applications are
hardware-accelerated, does that simplify anything?  It would be
sufficient if only OpenGL applications could render using the GPU.  The
compositor itself doesn't necessarily need to.

I do not know of any OpenGL (EGL, actually) implementation that would
allow using the GPU while the compositor is not accepting hardware
(EGL-based) buffers. The reason is that it simply cannot be performant
in fully local usage scenario. When I say this, I mean in a way that
would be transparent to the application. Applications themselves could
initialize GPU support on an EGL platform other than Wayland, render
with the GPU, call glReadPixels, and then use wl_shm buffers for
sending the content to the compositor. However, this obviously needs
explicit coding in the application, and I would not expect anyone to do
it, because in a usual case of a fully local graphics stack without any
remoting, it would make a huge performance hit.

Well, I suppose some proprietary implementations have used wl_shm for
hardware-rendered content, but we have always considered that a bug,
especially when no other transport has been implemented.

A bit of background:
http://ppaalanen.blogspot.fi/2012/11/on-supporting-wayland-gl-clients-and.html

Oh, this might be a nice reading before that one:
http://ppaalanen.blogspot.fi/2012/03/what-does-egl-do-in-wayland-stack.html


Lastly, and I believe this is the most sad part for you, is that NVIDIA
proprietary drivers do not work (the way we would like).

NVIDIA has been proposing for years a solution that is completely
different to anything explained above: EGLStreams, and for the same
amount of years, the community has been unimpressed with the design.
Anyway, NVIDIA did implement their design and even wrote patches for
Weston which we have not merged. Other compositors (e.g. Mutter) may
choose to support EGLStreams as a temporary solution.

I guess I was hoping to take advantage of the EGL_PLATFORM_DEVICE_EXT
extension that allows for off-screen OpenGL rendering.  It currently
works with nVidia's drivers:
https://gist.github.com/dcommander/ee1247362201552b2532

Right. You can do that, but then you need to write the application to
use it...

Hmm, indeed, maybe it would be possible if you are imposing your own
EGL middle-man library between the application and the real EGL library.

That's definitely a idea to look into. I cannot say off-hand why it
would not work, so maybe it can work. :-)

To summarize, with that approach, you would have the client send only
wl_shm buffers to the compositor, and the compositor never needs to
touch EGL at all. It also has the benefit that the read-back cost
(glReadPixels) is completely in the client process, so the compositor
will not stall on it, and you don't need the stuff I explained about in
the compositor. And you get support for the proprietary drivers!

Sorry for not realizing the "wrap libEGL.so" approach earlier.


Thanks,
pq


Yeah, and how does this look like when put in context with Waltham?



Regards
Christian Stroetmann
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-19 Thread Christian Stroetmann

On19.Dec.2016 09:48, Pekka Paalanen wrote:


On Fri, 16 Dec 2016 11:35:48 -0600
DRC  wrote:


On 12/16/16 3:06 AM, Pekka Paalanen wrote:

I should probably tell a little more, because what I explained above is
a simplification due to using a single path for all buffer types.
...

Thanks again.  This is all very new to me, and I guess I don't fully
understand where these buffer types would come into play.  Bearing in
mind that I really don't care whether non-OpenGL applications are
hardware-accelerated, does that simplify anything?  It would be
sufficient if only OpenGL applications could render using the GPU.  The
compositor itself doesn't necessarily need to.

I do not know of any OpenGL (EGL, actually) implementation that would
allow using the GPU while the compositor is not accepting hardware
(EGL-based) buffers. The reason is that it simply cannot be performant
in fully local usage scenario. When I say this, I mean in a way that
would be transparent to the application. Applications themselves could
initialize GPU support on an EGL platform other than Wayland, render
with the GPU, call glReadPixels, and then use wl_shm buffers for
sending the content to the compositor. However, this obviously needs
explicit coding in the application, and I would not expect anyone to do
it, because in a usual case of a fully local graphics stack without any
remoting, it would make a huge performance hit.

Well, I suppose some proprietary implementations have used wl_shm for
hardware-rendered content, but we have always considered that a bug,
especially when no other transport has been implemented.

A bit of background:
http://ppaalanen.blogspot.fi/2012/11/on-supporting-wayland-gl-clients-and.html

Oh, this might be a nice reading before that one:
http://ppaalanen.blogspot.fi/2012/03/what-does-egl-do-in-wayland-stack.html


Lastly, and I believe this is the most sad part for you, is that NVIDIA
proprietary drivers do not work (the way we would like).

NVIDIA has been proposing for years a solution that is completely
different to anything explained above: EGLStreams, and for the same
amount of years, the community has been unimpressed with the design.
Anyway, NVIDIA did implement their design and even wrote patches for
Weston which we have not merged. Other compositors (e.g. Mutter) may
choose to support EGLStreams as a temporary solution.

I guess I was hoping to take advantage of the EGL_PLATFORM_DEVICE_EXT
extension that allows for off-screen OpenGL rendering.  It currently
works with nVidia's drivers:
https://gist.github.com/dcommander/ee1247362201552b2532

Right. You can do that, but then you need to write the application to
use it...

Hmm, indeed, maybe it would be possible if you are imposing your own
EGL middle-man library between the application and the real EGL library.

That's definitely a idea to look into. I cannot say off-hand why it
would not work, so maybe it can work. :-)

To summarize, with that approach, you would have the client send only
wl_shm buffers to the compositor, and the compositor never needs to
touch EGL at all. It also has the benefit that the read-back cost
(glReadPixels) is completely in the client process, so the compositor
will not stall on it, and you don't need the stuff I explained about in
the compositor. And you get support for the proprietary drivers!

Sorry for not realizing the "wrap libEGL.so" approach earlier.


Thanks,
pq



Yeah, and how does this look like when put in context with Waltham?



Regards
Christian Stroetmann
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-19 Thread Pekka Paalanen
On Fri, 16 Dec 2016 11:35:48 -0600
DRC  wrote:

> On 12/16/16 3:06 AM, Pekka Paalanen wrote:
> > I should probably tell a little more, because what I explained above is
> > a simplification due to using a single path for all buffer types.
> > ...  
> 
> Thanks again.  This is all very new to me, and I guess I don't fully
> understand where these buffer types would come into play.  Bearing in
> mind that I really don't care whether non-OpenGL applications are
> hardware-accelerated, does that simplify anything?  It would be
> sufficient if only OpenGL applications could render using the GPU.  The
> compositor itself doesn't necessarily need to.

I do not know of any OpenGL (EGL, actually) implementation that would
allow using the GPU while the compositor is not accepting hardware
(EGL-based) buffers. The reason is that it simply cannot be performant
in fully local usage scenario. When I say this, I mean in a way that
would be transparent to the application. Applications themselves could
initialize GPU support on an EGL platform other than Wayland, render
with the GPU, call glReadPixels, and then use wl_shm buffers for
sending the content to the compositor. However, this obviously needs
explicit coding in the application, and I would not expect anyone to do
it, because in a usual case of a fully local graphics stack without any
remoting, it would make a huge performance hit.

Well, I suppose some proprietary implementations have used wl_shm for
hardware-rendered content, but we have always considered that a bug,
especially when no other transport has been implemented.

A bit of background:
http://ppaalanen.blogspot.fi/2012/11/on-supporting-wayland-gl-clients-and.html

Oh, this might be a nice reading before that one:
http://ppaalanen.blogspot.fi/2012/03/what-does-egl-do-in-wayland-stack.html

> > Lastly, and I believe this is the most sad part for you, is that NVIDIA
> > proprietary drivers do not work (the way we would like).
> > 
> > NVIDIA has been proposing for years a solution that is completely
> > different to anything explained above: EGLStreams, and for the same
> > amount of years, the community has been unimpressed with the design.
> > Anyway, NVIDIA did implement their design and even wrote patches for
> > Weston which we have not merged. Other compositors (e.g. Mutter) may
> > choose to support EGLStreams as a temporary solution.  
> 
> I guess I was hoping to take advantage of the EGL_PLATFORM_DEVICE_EXT
> extension that allows for off-screen OpenGL rendering.  It currently
> works with nVidia's drivers:
> https://gist.github.com/dcommander/ee1247362201552b2532

Right. You can do that, but then you need to write the application to
use it...

Hmm, indeed, maybe it would be possible if you are imposing your own
EGL middle-man library between the application and the real EGL library.

That's definitely a idea to look into. I cannot say off-hand why it
would not work, so maybe it can work. :-)

To summarize, with that approach, you would have the client send only
wl_shm buffers to the compositor, and the compositor never needs to
touch EGL at all. It also has the benefit that the read-back cost
(glReadPixels) is completely in the client process, so the compositor
will not stall on it, and you don't need the stuff I explained about in
the compositor. And you get support for the proprietary drivers!

Sorry for not realizing the "wrap libEGL.so" approach earlier.


Thanks,
pq


pgpeuXea9yyn3.pgp
Description: OpenPGP digital signature
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-16 Thread DRC
On 12/16/16 3:06 AM, Pekka Paalanen wrote:
> I should probably tell a little more, because what I explained above is
> a simplification due to using a single path for all buffer types.
> ...

Thanks again.  This is all very new to me, and I guess I don't fully
understand where these buffer types would come into play.  Bearing in
mind that I really don't care whether non-OpenGL applications are
hardware-accelerated, does that simplify anything?  It would be
sufficient if only OpenGL applications could render using the GPU.  The
compositor itself doesn't necessarily need to.


> Lastly, and I believe this is the most sad part for you, is that NVIDIA
> proprietary drivers do not work (the way we would like).
> 
> NVIDIA has been proposing for years a solution that is completely
> different to anything explained above: EGLStreams, and for the same
> amount of years, the community has been unimpressed with the design.
> Anyway, NVIDIA did implement their design and even wrote patches for
> Weston which we have not merged. Other compositors (e.g. Mutter) may
> choose to support EGLStreams as a temporary solution.

I guess I was hoping to take advantage of the EGL_PLATFORM_DEVICE_EXT
extension that allows for off-screen OpenGL rendering.  It currently
works with nVidia's drivers:
https://gist.github.com/dcommander/ee1247362201552b2532
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-16 Thread Pekka Paalanen
On Thu, 15 Dec 2016 09:55:44 -0600
DRC  wrote:

> On 12/15/16 3:01 AM, Pekka Paalanen wrote:
> > I assure you, this is a limitation of the RDP-backend itself. Nothing
> > outside of Weston creates this restriction.
> > 
> > The current RDP-backed is written to set up and use only the Pixman
> > renderer. Pixman renderer is a software renderer, and will not
> > initialize EGL in the compositor. Therefore no support for hardware
> > accelerated OpenGL gets advertised to clients, and clients fall back to
> > software GL.
> > 
> > You can fix this purely by modifying libweston/compositor-rdp.c file,
> > writing the support for initializing the GL-renderer. Then you get
> > hardware accelerated GL support for all Wayland clients without any
> > other modifications anywhere.
> > 
> > Why that has not been done already is because it was thought that
> > having clients using hardware OpenGL while the compositor is not cannot
> > be performant enough to justify the effort. Also, it pulls in the
> > dependency to EGL and GL libs, which are huge. Obviously your use case
> > is different and this rationale does not apply.  
> 
> Like many things, it depends on the application.  GLXgears may not
> perform better in a hardware-accelerated remote 3D environment vs. using
> software OpenGL, but real-world applications with larger geometries
> certainly will.  In a VirtualGL environment, the overhead is per-frame
> rather than per-primitive, so geometric throughput is essentially as
> fast as it would be in the local case (the OpenGL applications are still
> using direct rendering.)  The main performance limiters are pixel
> readback and transmission.  Modern GPUs have pretty fast readback--
> 800-1000 Mpixels/sec in the case of a mid-range Quadro, for instance, if
> you use synchronous readback.  VirtualGL uses PBO readback, which is a
> bit slower than synchronous readback but which uses practically zero CPU
> cycles and does not block at the driver level (this is what enables many
> users to share the same GPU without conflict.)  VGL also uses a frame
> queueing/spoiling system to send the 3D frames from the rendering thread
> into another thread for transmission and/or display, so it can be
> displaying or transmitting the last frame while the application renders
> the next frame.  TurboVNC (and most other X proxies that people use with
> VGL) is based on libjpeg-turbo, which can compress JPEG images at
> hundreds of Mpixels/sec on modern CPUs.  In total, you can pretty easily
> push 60+ Megapixels/sec with perceptually lossless image quality to
> clients on even a 100 Megabit network, and 20 Megapixels/sec across a 10
> Megabit network (with reduced quality.)  Our biggest success stories are
> large companies who have replaced their 3D workstation infrastructure
> with 8 or 10 beefy servers running VirtualGL+TurboVNC with laptop
> clients running the TurboVNC Viewer.  In most cases, they claim that the
> perceived performance is as good as or better than their old workstations.
> 
> To put some numbers on this, our GLXspheres benchmark uses a geometry
> size that is relatively small (~60,000 polygons) but still a lot more
> realistic than GLXgears (which has a polygon count only in the hundreds,
> if I recall correctly.)  When running on a 1920x1200 remote display
> session (TurboVNC), this benchmark will perform at about 14 Hz with
> llvmpipe but 43 Hz with VirtualGL.  So software OpenGL definitely does
> slow things down, even with a relatively modest geometry size and in an
> environment where there is a lot of per-frame overhead.

Hi,

indeed, those are use cases I (we?) have not thought about. Our
thinking has largely revolved around the idea that reading back a
buffer from gfx memory into system memory is prohibitively slow. And in
many cases it is, but not if you want to remote.

Another thought was that if the clients can use hardware GL, then why
would the compositor not use hardware paths all the way to the scanout?
So the case has been largely ignored.

It is very interesting to hear about the numbers!

> > The hardest part in adding the support to the RDP-backend is
> > implementing the buffer content access efficiently. RDP requires pixel
> > data in system memory so the CPU can read it, but GL-renderer has all
> > pixel data in graphics memory which often cannot be directly read by
> > the CPU. Accessing that pixel data requires a copy (glReadPixels), and
> > there is nowadays a helper: weston_surface_copy_content(), however the
> > function is not efficient and is so far meant only for debugging and
> > testing.  
> 
> I could probably reuse some of the VirtualGL code for this, since it
> already does a good job of buffer management.
> 
> Thanks so much for all of the helpful info.  I guess I have my work cut
> out for me.  :|

I should probably tell a little more, because what I explained above is
a simplification due to using a single path for all buffer types.


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-15 Thread DRC
On 12/15/16 3:01 AM, Pekka Paalanen wrote:
> I assure you, this is a limitation of the RDP-backend itself. Nothing
> outside of Weston creates this restriction.
> 
> The current RDP-backed is written to set up and use only the Pixman
> renderer. Pixman renderer is a software renderer, and will not
> initialize EGL in the compositor. Therefore no support for hardware
> accelerated OpenGL gets advertised to clients, and clients fall back to
> software GL.
> 
> You can fix this purely by modifying libweston/compositor-rdp.c file,
> writing the support for initializing the GL-renderer. Then you get
> hardware accelerated GL support for all Wayland clients without any
> other modifications anywhere.
> 
> Why that has not been done already is because it was thought that
> having clients using hardware OpenGL while the compositor is not cannot
> be performant enough to justify the effort. Also, it pulls in the
> dependency to EGL and GL libs, which are huge. Obviously your use case
> is different and this rationale does not apply.

Like many things, it depends on the application.  GLXgears may not
perform better in a hardware-accelerated remote 3D environment vs. using
software OpenGL, but real-world applications with larger geometries
certainly will.  In a VirtualGL environment, the overhead is per-frame
rather than per-primitive, so geometric throughput is essentially as
fast as it would be in the local case (the OpenGL applications are still
using direct rendering.)  The main performance limiters are pixel
readback and transmission.  Modern GPUs have pretty fast readback--
800-1000 Mpixels/sec in the case of a mid-range Quadro, for instance, if
you use synchronous readback.  VirtualGL uses PBO readback, which is a
bit slower than synchronous readback but which uses practically zero CPU
cycles and does not block at the driver level (this is what enables many
users to share the same GPU without conflict.)  VGL also uses a frame
queueing/spoiling system to send the 3D frames from the rendering thread
into another thread for transmission and/or display, so it can be
displaying or transmitting the last frame while the application renders
the next frame.  TurboVNC (and most other X proxies that people use with
VGL) is based on libjpeg-turbo, which can compress JPEG images at
hundreds of Mpixels/sec on modern CPUs.  In total, you can pretty easily
push 60+ Megapixels/sec with perceptually lossless image quality to
clients on even a 100 Megabit network, and 20 Megapixels/sec across a 10
Megabit network (with reduced quality.)  Our biggest success stories are
large companies who have replaced their 3D workstation infrastructure
with 8 or 10 beefy servers running VirtualGL+TurboVNC with laptop
clients running the TurboVNC Viewer.  In most cases, they claim that the
perceived performance is as good as or better than their old workstations.

To put some numbers on this, our GLXspheres benchmark uses a geometry
size that is relatively small (~60,000 polygons) but still a lot more
realistic than GLXgears (which has a polygon count only in the hundreds,
if I recall correctly.)  When running on a 1920x1200 remote display
session (TurboVNC), this benchmark will perform at about 14 Hz with
llvmpipe but 43 Hz with VirtualGL.  So software OpenGL definitely does
slow things down, even with a relatively modest geometry size and in an
environment where there is a lot of per-frame overhead.


> The hardest part in adding the support to the RDP-backend is
> implementing the buffer content access efficiently. RDP requires pixel
> data in system memory so the CPU can read it, but GL-renderer has all
> pixel data in graphics memory which often cannot be directly read by
> the CPU. Accessing that pixel data requires a copy (glReadPixels), and
> there is nowadays a helper: weston_surface_copy_content(), however the
> function is not efficient and is so far meant only for debugging and
> testing.

I could probably reuse some of the VirtualGL code for this, since it
already does a good job of buffer management.

Thanks so much for all of the helpful info.  I guess I have my work cut
out for me.  :|
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-15 Thread Pekka Paalanen
On Wed, 14 Dec 2016 11:42:54 -0600
DRC  wrote:

> But if you run OpenGL applications in Weston, as it is currently
> implemented, then the OpenGL applications are either GPU-accelerated or
> not, depending on the back end used.  If you run Weston nested in a
> Wayland compositor that is already GPU-accelerated, then OpenGL
> applications run in the Weston session will be GPU-accelerated as well.
> If you run Weston with the RDP back end, then OpenGL applications run in
> the Weston session will use Mesa llvmpipe instead.  I'm trying to
> understand, quite simply, whether it's possible for unmodified Wayland
> OpenGL applications-- such as the example OpenGL applications in the
> Weston source-- to take advantage of OpenGL GPU acceleration when they
> are running with the RDP back end.  (I'm assuming that whatever
> restrictions there are on the RDP back end would exist for the TurboVNC
> back end I intend to develop.)  My testing thus far indicates that this
> is not currently possible, but I need to understand the source of the
> limitation so I can understand how to work around it.  Instead, you seem
> to be telling me that the limitation doesn't exist, but I can assure you
> that it does.  Please test Weston with the RDP back end and confirm that
> OpenGL applications run in that environment are not GPU-accelerated.

Hi,

I assure you, this is a limitation of the RDP-backend itself. Nothing
outside of Weston creates this restriction.

The current RDP-backed is written to set up and use only the Pixman
renderer. Pixman renderer is a software renderer, and will not
initialize EGL in the compositor. Therefore no support for hardware
accelerated OpenGL gets advertised to clients, and clients fall back to
software GL.

You can fix this purely by modifying libweston/compositor-rdp.c file,
writing the support for initializing the GL-renderer. Then you get
hardware accelerated GL support for all Wayland clients without any
other modifications anywhere.

Why that has not been done already is because it was thought that
having clients using hardware OpenGL while the compositor is not cannot
be performant enough to justify the effort. Also, it pulls in the
dependency to EGL and GL libs, which are huge. Obviously your use case
is different and this rationale does not apply.

The hardest part in adding the support to the RDP-backend is
implementing the buffer content access efficiently. RDP requires pixel
data in system memory so the CPU can read it, but GL-renderer has all
pixel data in graphics memory which often cannot be directly read by
the CPU. Accessing that pixel data requires a copy (glReadPixels), and
there is nowadays a helper: weston_surface_copy_content(), however the
function is not efficient and is so far meant only for debugging and
testing.

In fact, we have been thinking about adding (hardware or software)
OpenGL support to the headless backend so that we could actually run
tests on those code paths in the Weston test suite:
https://bugs.freedesktop.org/show_bug.cgi?id=83984
https://bugs.freedesktop.org/show_bug.cgi?id=83985

Since filing those bugs, I have been thinking that testing Weston's
GL-renderer should happen with the Wayland-backend. The host compositor
would be Weston with the headless backend modified to initialize
compositor-side EGL ad hoc. That way we might be able to limit all
test-only code in the headless backend.


Thanks,
pq


pgpNpXBFeSO0b.pgp
Description: OpenPGP digital signature
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-14 Thread DRC
On 12/14/16 8:52 PM, Carsten Haitzler (The Rasterman) wrote:
> weston is not the only wayland compositor. is the the sample/test compositor.
> wayalnd does not mean sticking to just what weston does.
> 
> i suspect weston's rdp back-end forces a sw gl stack because it's easier to be
> driver agnostic and run everywhere and as you have to read-back pixel data for
> transmitting over rdp... why bother with the complexity of actual driver setup
> and hw device permissions etc...
> 
> what pekka is saying that it's kind of YOUR job then to make a headless
> compositor (base it on weston code or write your own entirely from scratch
> etc.), and this headless compositor does return a hw egl context to clients. 
> it
> can transport data to the other server via vnc. rdp or any other method
> you like. your headless compositor will get new drm buffers from client when
> they display (having rendered using the local gpu) and then transfer tot he
> other end. the other end can be a vnc or rdp viewer or a custom app your wrote
> for your protocol etc. ... but what you want is perfectly doable with
> wayland... but it's kind of your job to do it. that is what virtual-gl would
> be. a local headless wayland compositor (for wayland mode) with some kind of
> display front end on the other end.

Exactly what I needed to know.  Thanks.
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-14 Thread The Rasterman
On Wed, 14 Dec 2016 11:42:54 -0600 DRC  said:

...snip...

> Again, not how it currently works when using Weston with the RDP back end.

weston is not the only wayland compositor. is the the sample/test compositor.
wayalnd does not mean sticking to just what weston does.

i suspect weston's rdp back-end forces a sw gl stack because it's easier to be
driver agnostic and run everywhere and as you have to read-back pixel data for
transmitting over rdp... why bother with the complexity of actual driver setup
and hw device permissions etc...

what pekka is saying that it's kind of YOUR job then to make a headless
compositor (base it on weston code or write your own entirely from scratch
etc.), and this headless compositor does return a hw egl context to clients. it
can transport data to the other server via vnc. rdp or any other method
you like. your headless compositor will get new drm buffers from client when
they display (having rendered using the local gpu) and then transfer tot he
other end. the other end can be a vnc or rdp viewer or a custom app your wrote
for your protocol etc. ... but what you want is perfectly doable with
wayland... but it's kind of your job to do it. that is what virtual-gl would
be. a local headless wayland compositor (for wayland mode) with some kind of
display front end on the other end.

-- 
- Codito, ergo sum - "I code, therefore I am" --
The Rasterman (Carsten Haitzler)ras...@rasterman.com

___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-14 Thread DRC
On 12/14/16 3:27 AM, Pekka Paalanen wrote:
> could you be more specific on what you mean by "server-side", please?
> Are you referring to the machine where the X server runs, or the
> machine that is remote from a user perspective where the app runs?

Few people use remote X anymore in my industry, so the reality of most
VirtualGL deployments (and all of the commercial VGL deployments of
which I'm aware) is that the X servers and the GPU are all on the
application host, the machine where the applications are actually
executed.  Typically people allocate beefy server hardware with multiple
GPUs, hundreds of gigabytes of memory, and as many as 32-64 CPU cores to
act as VirtualGL servers for 50 or 100 users.  We use the terms "3D X
server" and "2D X server" to indicate where the 3D and 2D rendering is
actually occurring.  The 3D X server is located on the application host
and is usually headless, since it only needs to be used by VirtualGL for
obtaining Pbuffer contexts from the GPU-accelerated OpenGL
implementation (usually nVidia or AMD/ATI.)  There is typically one 3D X
server shared by all users of the machine (VirtualGL allows this
sharing, since it rewrites all of the GLX calls from applications and
automatically converts all of them for off-screen rendering), and the 3D
X server has a separate screen for each GPU.  The 2D X server is usually
an X proxy such as TurboVNC, and there are multiple instances of it (one
or more per user.)  These 2D X server instances are usually located on
the application host but don't necessarily have to be.  The client
machine simply runs a VNC viewer.

X proxies such as Xvnc do not support hardware-accelerated OpenGL,
because they are implemented on top of a virtual framebuffer stored in
main memory.  The only way to implement hardware-accelerated OpenGL in
that environment is to use "split rendering", which is what VirtualGL
does.  It splits off the 3D rendering to another X server that has a GPU
attached.


> Wayland apps handle all rendering themselves, there is nothing for
> sending rendering commands to another process like the Wayland
> compositor.
> 
> What a Wayland compositor needs to do is to advertise support for EGL
> Wayland platform for clients. That it does by using the
> EGL_WL_bind_wayland_display extension.
> 
> If you want all GL rendering to happen in the machine where the app
> runs, then you don't have to do much anything, it already works like
> that. You only need to make sure the compositor initializes EGL, which
> in Weston's case means using the gl-renderer. The renderer does not
> have to actually composite anything if you want to remote windows
> separately, but it is needed to gain access to the window contents. In
> Weston, only the renderer knows how to access the contents of all
> windows (wl_surfaces).
> 
> If OTOH you want to send GL rendering commands to the other machine
> than where the app is running, that will require a great deal of work,
> since you have to implement serialization and de-serialization of
> OpenGL (and EGL) yourself. (It has been done before, do ask me if you
> want details.)

But if you run OpenGL applications in Weston, as it is currently
implemented, then the OpenGL applications are either GPU-accelerated or
not, depending on the back end used.  If you run Weston nested in a
Wayland compositor that is already GPU-accelerated, then OpenGL
applications run in the Weston session will be GPU-accelerated as well.
If you run Weston with the RDP back end, then OpenGL applications run in
the Weston session will use Mesa llvmpipe instead.  I'm trying to
understand, quite simply, whether it's possible for unmodified Wayland
OpenGL applications-- such as the example OpenGL applications in the
Weston source-- to take advantage of OpenGL GPU acceleration when they
are running with the RDP back end.  (I'm assuming that whatever
restrictions there are on the RDP back end would exist for the TurboVNC
back end I intend to develop.)  My testing thus far indicates that this
is not currently possible, but I need to understand the source of the
limitation so I can understand how to work around it.  Instead, you seem
to be telling me that the limitation doesn't exist, but I can assure you
that it does.  Please test Weston with the RDP back end and confirm that
OpenGL applications run in that environment are not GPU-accelerated.


> I think you have an underlying assumption that EGL and GL would somehow
> automatically be carried over the network, and you need to undo it.
> That does not happen, as the display server always runs in the same
> machine as the application. The Wayland display is always local, it can
> never be remote simply because Wayland can never go over a network.

No I don't have that assumption at all, because that does not currently
occur with VirtualGL.  VirtualGL is designed precisely to avoid that
situation.  The problem is quite simply:  In Weston, as it is currently
implemented, OpenGL applications are not 

Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-14 Thread Pekka Paalanen
On Tue, 13 Dec 2016 14:39:31 -0600
DRC  wrote:

> Greetings.  I am the founder and principal developer for The VirtualGL
> Project, which has (since 2004) produced a GLX interposer (VirtualGL)
> and a high-speed X proxy (TurboVNC) that are widely used for running
> Linux/Unix OpenGL applications remotely with hardware-accelerated
> server-side 3D rendering.  For those who aren't familiar with VirtualGL,
> it basically works by:

Hi,

could you be more specific on what you mean by "server-side", please?
Are you referring to the machine where the X server runs, or the
machine that is remote from a user perspective where the app runs?

My confusion is caused by the difference in the X11 vs. Wayland models.
The display server the app connects to is not on the same side in one
model as in the other model.


With X11 (traditional indirect rendering with X11 over network):

Machine A| Machine B
 |
App -> libs (X11, GLX)> X server -> display
 |   -> GPU B



With Wayland apps remoted:

Machine A |   Machine B
  |
App   |
  -> EGL and GL libs -> GPU A |
  --(wayland)--> Weston --(VNC/RDP)---> VNC/RDP viewer -> window system 
-> display


Wayland apps handle all rendering themselves, there is nothing for
sending rendering commands to another process like the Wayland
compositor.

What a Wayland compositor needs to do is to advertise support for EGL
Wayland platform for clients. That it does by using the
EGL_WL_bind_wayland_display extension.

If you want all GL rendering to happen in the machine where the app
runs, then you don't have to do much anything, it already works like
that. You only need to make sure the compositor initializes EGL, which
in Weston's case means using the gl-renderer. The renderer does not
have to actually composite anything if you want to remote windows
separately, but it is needed to gain access to the window contents. In
Weston, only the renderer knows how to access the contents of all
windows (wl_surfaces).

If OTOH you want to send GL rendering commands to the other machine
than where the app is running, that will require a great deal of work,
since you have to implement serialization and de-serialization of
OpenGL (and EGL) yourself. (It has been done before, do ask me if you
want details.)

> -- Interposing (via LD_PRELOAD) GLX calls from the OpenGL application
> -- Rewriting the GLX calls such that OpenGL contexts are created in
> Pbuffers instead of windows
> -- Redirecting the GLX calls to the server's local display (usually :0,
> which presumably has a GPU attached) rather than the remote display or
> the X proxy
> -- Reading back the rendered 3D images from the server's local display
> and transferring them to the remote display or X proxy when the
> application swaps buffers or performs other "triggers" (such as calling
> glFinish() when rendering to the front buffer)
> 
> There is more complexity to it than that, but that's at least the
> general idea.

Ok, so that sounds like you want the GL execution to happen in the
app-side machine. That's the easy case. :-)

> At the moment, I'm investigating how best to accomplish a similar feat
> in a Wayland/Weston environment.  I'm given to understand that building
> a VNC server on top of Weston is straightforward and has already been
> done as a proof of concept, so really my main question is how to do the
> OpenGL stuff.  At the moment, my (very limited) understanding of the
> architecture seems to suggest that I have two options:

Weston has the RDP backend already, indeed.

> (1) Implement an interposer similar in concept to VirtualGL, except that
> this interposer would rewrite EGL calls to redirect them from the
> Wayland display to a low-level EGL device that supports off-screen
> rendering (such as the devices provided through the
> EGL_PLATFORM_DEVICE_EXT extension, which is currently supported by
> nVidia's drivers.)  How to get the images from that low-level device
> into the Weston compositor when it is using a remote display back-end is
> an open question, but I assume I'd have to ask the compositor for a
> surface (which presumably would be allocated from main memory) and
> handle the transfer of the pixels from the GPU to that surface.  That is
> similar in concept to how VirtualGL currently works, vis-a-vis using
> glReadPixels to transfer the rendered OpenGL pixels into an MIT-SHM image.

I think you have an underlying assumption that EGL and GL would somehow
automatically be carried over the network, and you need to undo it.
That does not happen, as the display server always runs in the same
machine as the application. The Wayland display is always local, it can
never be remote simply because Wayland can never go over a network.

Furthermore, all GL rendering is always 

Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-13 Thread Christian Stroetmann

On 13.Dec.2016 21:39, DRC wrote:

I thought about this on the 14th of March 2014 (see also [1]).
Have you looked at https://github.com/waltham/waltham ?



Regards
Christian Stroetmann

[1] OntoGraphics 
(www.ontolinux.com/technology/ontographics/ontographics.htm)



Greetings.  I am the founder and principal developer for The VirtualGL
Project, which has (since 2004) produced a GLX interposer (VirtualGL)
and a high-speed X proxy (TurboVNC) that are widely used for running
Linux/Unix OpenGL applications remotely with hardware-accelerated
server-side 3D rendering.  For those who aren't familiar with VirtualGL,
it basically works by:

-- Interposing (via LD_PRELOAD) GLX calls from the OpenGL application
-- Rewriting the GLX calls such that OpenGL contexts are created in
Pbuffers instead of windows
-- Redirecting the GLX calls to the server's local display (usually :0,
which presumably has a GPU attached) rather than the remote display or
the X proxy
-- Reading back the rendered 3D images from the server's local display
and transferring them to the remote display or X proxy when the
application swaps buffers or performs other "triggers" (such as calling
glFinish() when rendering to the front buffer)

There is more complexity to it than that, but that's at least the
general idea.

At the moment, I'm investigating how best to accomplish a similar feat
in a Wayland/Weston environment.  I'm given to understand that building
a VNC server on top of Weston is straightforward and has already been
done as a proof of concept, so really my main question is how to do the
OpenGL stuff.  At the moment, my (very limited) understanding of the
architecture seems to suggest that I have two options:

(1) Implement an interposer similar in concept to VirtualGL, except that
this interposer would rewrite EGL calls to redirect them from the
Wayland display to a low-level EGL device that supports off-screen
rendering (such as the devices provided through the
EGL_PLATFORM_DEVICE_EXT extension, which is currently supported by
nVidia's drivers.)  How to get the images from that low-level device
into the Weston compositor when it is using a remote display back-end is
an open question, but I assume I'd have to ask the compositor for a
surface (which presumably would be allocated from main memory) and
handle the transfer of the pixels from the GPU to that surface.  That is
similar in concept to how VirtualGL currently works, vis-a-vis using
glReadPixels to transfer the rendered OpenGL pixels into an MIT-SHM image.

(2) Figure out some way of redirecting the OpenGL rendering within
Weston itself, rather than using an interposer.  This is where I'm fuzzy
on the details.  Is this even possible with a remote display back-end?
Maybe it's as straightforward as writing a back-end that allows Weston
to use the aforementioned low-level EGL device to obtain all of the
rendering surfaces that it passes to applications, but I don't have a
good enough understanding of the architecture to know whether or not
that idea is nonsense.  I know that X proxies, such as Xvnc, allocate a
"virtual framebuffer" that is used by the X.org code for performing X11
rendering.  Because this virtual framebuffer is located in main memory,
you can't do hardware-accelerated OpenGL with it unless you use a
solution like VirtualGL.  It would be impractical to allocate the X
proxy's virtual framebuffer in GPU memory because of the fine-grained
nature of X11, but since Wayland is all image-based, perhaps that is no
longer a limitation.

Any advice is greatly appreciated.  Thanks for your time.

DRC


___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel