Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-28 Thread DRC
On 2/28/17 8:12 AM, Hardening wrote:
> This has been done a quite long time ago, here =>
> https://gitorious.org/weston/jonseverinsson-weston/?p=weston:jonseverinsson-weston.git;a=commit;h=9e26d9356255f4af1723700272805f6d356c7d7a
> 
> It's clearly outdated, and IIRC people here didn't like the way it was
> implemented, but you have the idea. It's using DRI render nodes to do
> the rendering.

Thanks for the hint.  I was able to integrate your patch with the latest
Weston code by borrowing some of the code from the DRM backend, but it
doesn't seem to display anything (my patch is attached.  Maybe someone
can point out what I'm doing wrong, so I could at least get this working
for experimental purposes.)

The following GitHub comment summarizes my general feelings at the moment:

https://github.com/TurboVNC/turbovnc/issues/18#issuecomment-282827192

To the best of my knowledge (someone please correct me if I got any part
of this wrong), I think it would be much easier and more flexible to
pursue the interposer approach for now, mostly because of the nVidia
proprietary driver issue (that's a show-stopper for VirtualGL and
TurboVNC) but also because implementing a headless hardware-accelerated
remote display backend "the right way" would likely require some changes
to gl_renderer and the compositor code, including implementing PBO
readback in order to prevent the readback in the compositor from
blocking other OpenGL applications.  I anticipate that further tuning
will likely be required as well, and it may still not perform as well as
an interposer, because VirtualGL reads back the pixels in the
application's rendering thread but uses a separate thread for displaying
the pixels.  Thus, if the compositor was blocking on encoding a pixel
region for remote display, the Wayland client could still render the
next frame in the background.

I am not in a good position to develop or maintain changes to Weston,
and since interest has been expressed in "officially" supporting a
headless hardware-accelerated remote display backend in the long term,
it makes sense for me to develop an interposer as a stopgap measure.
The interposer could also provide a springboard for other developers who
are interested in making their own Weston remote display backends, since
they would not have to deal with the problem of OpenGL hardware
acceleration.  In the long term, I anticipate that this interposer will
be rendered obsolete, but that's fine, because a lot of the effort
necessary to build an EGL interposer for Wayland would benefit the
existing GLX interposer in VirtualGL as well, so the only thing that
might be "thrown away" in the long term would be the Wayland-specific
parts.  Barring any further information, that seems to be the best path
forward at the moment, but if there is any movement on the nVidia front
or on the headless hardware-accelerated remote display backend front,
please keep me in the loop.

DRC

From f331db279a138295701806de0c8bd71f385d2796 Mon Sep 17 00:00:00 2001
From: DRC 
Date: Tue, 28 Feb 2017 22:49:22 -0600
Subject: [PATCH] rdp-backend.so: OpenGL hardware acceleration

---
 compositor/main.c  |  23 +++-
 configure.ac   |   4 +-
 libweston/compositor-rdp.c | 314 +++--
 libweston/compositor-rdp.h |  24 
 4 files changed, 352 insertions(+), 13 deletions(-)

diff --git a/compositor/main.c b/compositor/main.c
index 72c3cd1..7f4b8db 100644
--- a/compositor/main.c
+++ b/compositor/main.c
@@ -601,6 +601,7 @@ usage(int error_code)
"  --rdp4-key=FILE\tThe file containing the key for RDP4 
encryption\n"
"  --rdp-tls-cert=FILE\tThe file containing the certificate for 
TLS encryption\n"
"  --rdp-tls-key=FILE\tThe file containing the private key for 
TLS encryption\n"
+   "  --use-pixman\t\tUse the pixman (CPU) renderer\n"
"\n");
 #endif
 
@@ -1329,11 +1330,14 @@ static void
 rdp_backend_output_configure(struct wl_listener *listener, void *data)
 {
struct weston_output *output = data;
+   struct weston_config *wc = wet_get_config(output->compositor);
struct wet_compositor *compositor = 
to_wet_compositor(output->compositor);
struct wet_output_config *parsed_options = compositor->parsed_options;
+   struct weston_config_section *section;
const struct weston_rdp_output_api *api = 
weston_rdp_output_get_api(output->compositor);
int width = 640;
int height = 480;
+   char *gbm_format = NULL;
 
assert(parsed_options);
 
@@ -1342,6 +1346,8 @@ rdp_backend_output_configure(struct wl_listener 
*listener, void *data)
return;
}
 
+   section = weston_config_get_section(wc, "output", "name", output->name);
+
 

Re: Remote display with 3D acceleration using Wayland/Weston

2017-02-23 Thread DRC
On 12/15/16 3:01 AM, Pekka Paalanen wrote:
> The current RDP-backed is written to set up and use only the Pixman
> renderer. Pixman renderer is a software renderer, and will not
> initialize EGL in the compositor. Therefore no support for hardware
> accelerated OpenGL gets advertised to clients, and clients fall back to
> software GL.
> 
> You can fix this purely by modifying libweston/compositor-rdp.c file,
> writing the support for initializing the GL-renderer. Then you get
> hardware accelerated GL support for all Wayland clients without any
> other modifications anywhere.
> 
> Why that has not been done already is because it was thought that
> having clients using hardware OpenGL while the compositor is not cannot
> be performant enough to justify the effort. Also, it pulls in the
> dependency to EGL and GL libs, which are huge. Obviously your use case
> is different and this rationale does not apply.
> 
> The hardest part in adding the support to the RDP-backend is
> implementing the buffer content access efficiently. RDP requires pixel
> data in system memory so the CPU can read it, but GL-renderer has all
> pixel data in graphics memory which often cannot be directly read by
> the CPU. Accessing that pixel data requires a copy (glReadPixels), and
> there is nowadays a helper: weston_surface_copy_content(), however the
> function is not efficient and is so far meant only for debugging and
> testing.

I am attempting to modify the RDP backend to prove the concept that
hardware-accelerated OpenGL is possible with a remote display backend,
but my lack of familiarity with the code is making this very
challenging.  It seems that the RDP backend uses Pixman both for GL
rendering and also to maintain its framebuffer in main memory
(shadow_surface.)  Is that correct?  If so, then it seems that I would
need to continue using the shadow surface but use gl_renderer instead of
the Pixman renderer, then implement my own method of transferring pixels
from the GL renderer to the shadow surface at the end of every frame (?)
 I've been trying to work from compositor-wayland.c as a template, but
it's unclear how everything connects, which parts of that code I need in
order to implement hardware acceleration, and which parts are
unnecessary.  I would appreciate it if someone who has familiarity with
the RDP backend could give me some targeted advice.
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-19 Thread DRC
On 12/19/16 2:48 AM, Pekka Paalanen wrote:
> Hmm, indeed, maybe it would be possible if you are imposing your own
> EGL middle-man library between the application and the real EGL library.
> 
> That's definitely a idea to look into. I cannot say off-hand why it
> would not work, so maybe it can work. :-)
> 
> To summarize, with that approach, you would have the client send only
> wl_shm buffers to the compositor, and the compositor never needs to
> touch EGL at all. It also has the benefit that the read-back cost
> (glReadPixels) is completely in the client process, so the compositor
> will not stall on it, and you don't need the stuff I explained about in
> the compositor. And you get support for the proprietary drivers!
> 
> Sorry for not realizing the "wrap libEGL.so" approach earlier.

Yes, exactly.  That is essentially how VirtualGL already works with
GLX/OpenGL, so it is a solution space I know well.  As I see it, the
advantages of implementing this at the compositor level are:

-- Automatic hardware acceleration for window managers that might need
to use OpenGL (which includes most of them these days)
-- No need to launch OpenGL applications using a wrapper script
-- Potentially the compositor could tap into GPU-based encoding methods
(NVENC, for instance) quite easily to compress the pixel updates sent to
the client.  This becomes more difficult when the pixel readback is
occurring in the OpenGL application process but the compression is
occurring in another process.

The potential advantages of an interposer are:

-- Much easier for me to develop, since this would represent basically a
subset of VirtualGL's existing functionality (the GLX interposer could
also benefit from a back end that accesses the GPU directly through EGL
rather than forwarding the GLX requests through a local X server.)
-- The readback occurs in-process, so only applications that actually
need it (OpenGL applications) are subject to that overhead, and the
design of VirtualGL makes it such that the readback of the current frame
occurs in parallel with the display of the last frame.
-- Theoretically should work with any Wayland implementation or back
end.  It goes without saying that I'm not the only one in this game.  In
the current market, there are lots of different vendors producing their
X11 proxy of choice, but all of them can use VirtualGL to add GPU
acceleration.  I don't know how the market will look with Wayland, but I
would anticipate that those same vendors will produce their own Wayland
proxies of choice as well, so there might be an advantage to retaining
VirtualGL as an independent bolt-on product.
-- That is a good point about the compositor not stalling on
glReadPixels()-- although I think I could probably mitigate that by
using PBOs rather than synchronous glReadPixels().

I know for sure that I can make the interposer approach work, and
perhaps that would be a good short-term approach to get something up and
running while the other approach is explored in more depth.

___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-16 Thread DRC
On 12/16/16 3:06 AM, Pekka Paalanen wrote:
> I should probably tell a little more, because what I explained above is
> a simplification due to using a single path for all buffer types.
> ...

Thanks again.  This is all very new to me, and I guess I don't fully
understand where these buffer types would come into play.  Bearing in
mind that I really don't care whether non-OpenGL applications are
hardware-accelerated, does that simplify anything?  It would be
sufficient if only OpenGL applications could render using the GPU.  The
compositor itself doesn't necessarily need to.


> Lastly, and I believe this is the most sad part for you, is that NVIDIA
> proprietary drivers do not work (the way we would like).
> 
> NVIDIA has been proposing for years a solution that is completely
> different to anything explained above: EGLStreams, and for the same
> amount of years, the community has been unimpressed with the design.
> Anyway, NVIDIA did implement their design and even wrote patches for
> Weston which we have not merged. Other compositors (e.g. Mutter) may
> choose to support EGLStreams as a temporary solution.

I guess I was hoping to take advantage of the EGL_PLATFORM_DEVICE_EXT
extension that allows for off-screen OpenGL rendering.  It currently
works with nVidia's drivers:
https://gist.github.com/dcommander/ee1247362201552b2532
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-15 Thread DRC
On 12/15/16 3:01 AM, Pekka Paalanen wrote:
> I assure you, this is a limitation of the RDP-backend itself. Nothing
> outside of Weston creates this restriction.
> 
> The current RDP-backed is written to set up and use only the Pixman
> renderer. Pixman renderer is a software renderer, and will not
> initialize EGL in the compositor. Therefore no support for hardware
> accelerated OpenGL gets advertised to clients, and clients fall back to
> software GL.
> 
> You can fix this purely by modifying libweston/compositor-rdp.c file,
> writing the support for initializing the GL-renderer. Then you get
> hardware accelerated GL support for all Wayland clients without any
> other modifications anywhere.
> 
> Why that has not been done already is because it was thought that
> having clients using hardware OpenGL while the compositor is not cannot
> be performant enough to justify the effort. Also, it pulls in the
> dependency to EGL and GL libs, which are huge. Obviously your use case
> is different and this rationale does not apply.

Like many things, it depends on the application.  GLXgears may not
perform better in a hardware-accelerated remote 3D environment vs. using
software OpenGL, but real-world applications with larger geometries
certainly will.  In a VirtualGL environment, the overhead is per-frame
rather than per-primitive, so geometric throughput is essentially as
fast as it would be in the local case (the OpenGL applications are still
using direct rendering.)  The main performance limiters are pixel
readback and transmission.  Modern GPUs have pretty fast readback--
800-1000 Mpixels/sec in the case of a mid-range Quadro, for instance, if
you use synchronous readback.  VirtualGL uses PBO readback, which is a
bit slower than synchronous readback but which uses practically zero CPU
cycles and does not block at the driver level (this is what enables many
users to share the same GPU without conflict.)  VGL also uses a frame
queueing/spoiling system to send the 3D frames from the rendering thread
into another thread for transmission and/or display, so it can be
displaying or transmitting the last frame while the application renders
the next frame.  TurboVNC (and most other X proxies that people use with
VGL) is based on libjpeg-turbo, which can compress JPEG images at
hundreds of Mpixels/sec on modern CPUs.  In total, you can pretty easily
push 60+ Megapixels/sec with perceptually lossless image quality to
clients on even a 100 Megabit network, and 20 Megapixels/sec across a 10
Megabit network (with reduced quality.)  Our biggest success stories are
large companies who have replaced their 3D workstation infrastructure
with 8 or 10 beefy servers running VirtualGL+TurboVNC with laptop
clients running the TurboVNC Viewer.  In most cases, they claim that the
perceived performance is as good as or better than their old workstations.

To put some numbers on this, our GLXspheres benchmark uses a geometry
size that is relatively small (~60,000 polygons) but still a lot more
realistic than GLXgears (which has a polygon count only in the hundreds,
if I recall correctly.)  When running on a 1920x1200 remote display
session (TurboVNC), this benchmark will perform at about 14 Hz with
llvmpipe but 43 Hz with VirtualGL.  So software OpenGL definitely does
slow things down, even with a relatively modest geometry size and in an
environment where there is a lot of per-frame overhead.


> The hardest part in adding the support to the RDP-backend is
> implementing the buffer content access efficiently. RDP requires pixel
> data in system memory so the CPU can read it, but GL-renderer has all
> pixel data in graphics memory which often cannot be directly read by
> the CPU. Accessing that pixel data requires a copy (glReadPixels), and
> there is nowadays a helper: weston_surface_copy_content(), however the
> function is not efficient and is so far meant only for debugging and
> testing.

I could probably reuse some of the VirtualGL code for this, since it
already does a good job of buffer management.

Thanks so much for all of the helpful info.  I guess I have my work cut
out for me.  :|
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-14 Thread DRC
On 12/14/16 8:52 PM, Carsten Haitzler (The Rasterman) wrote:
> weston is not the only wayland compositor. is the the sample/test compositor.
> wayalnd does not mean sticking to just what weston does.
> 
> i suspect weston's rdp back-end forces a sw gl stack because it's easier to be
> driver agnostic and run everywhere and as you have to read-back pixel data for
> transmitting over rdp... why bother with the complexity of actual driver setup
> and hw device permissions etc...
> 
> what pekka is saying that it's kind of YOUR job then to make a headless
> compositor (base it on weston code or write your own entirely from scratch
> etc.), and this headless compositor does return a hw egl context to clients. 
> it
> can transport data to the other server via vnc. rdp or any other method
> you like. your headless compositor will get new drm buffers from client when
> they display (having rendered using the local gpu) and then transfer tot he
> other end. the other end can be a vnc or rdp viewer or a custom app your wrote
> for your protocol etc. ... but what you want is perfectly doable with
> wayland... but it's kind of your job to do it. that is what virtual-gl would
> be. a local headless wayland compositor (for wayland mode) with some kind of
> display front end on the other end.

Exactly what I needed to know.  Thanks.
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Remote display with 3D acceleration using Wayland/Weston

2016-12-14 Thread DRC
On 12/14/16 3:27 AM, Pekka Paalanen wrote:
> could you be more specific on what you mean by "server-side", please?
> Are you referring to the machine where the X server runs, or the
> machine that is remote from a user perspective where the app runs?

Few people use remote X anymore in my industry, so the reality of most
VirtualGL deployments (and all of the commercial VGL deployments of
which I'm aware) is that the X servers and the GPU are all on the
application host, the machine where the applications are actually
executed.  Typically people allocate beefy server hardware with multiple
GPUs, hundreds of gigabytes of memory, and as many as 32-64 CPU cores to
act as VirtualGL servers for 50 or 100 users.  We use the terms "3D X
server" and "2D X server" to indicate where the 3D and 2D rendering is
actually occurring.  The 3D X server is located on the application host
and is usually headless, since it only needs to be used by VirtualGL for
obtaining Pbuffer contexts from the GPU-accelerated OpenGL
implementation (usually nVidia or AMD/ATI.)  There is typically one 3D X
server shared by all users of the machine (VirtualGL allows this
sharing, since it rewrites all of the GLX calls from applications and
automatically converts all of them for off-screen rendering), and the 3D
X server has a separate screen for each GPU.  The 2D X server is usually
an X proxy such as TurboVNC, and there are multiple instances of it (one
or more per user.)  These 2D X server instances are usually located on
the application host but don't necessarily have to be.  The client
machine simply runs a VNC viewer.

X proxies such as Xvnc do not support hardware-accelerated OpenGL,
because they are implemented on top of a virtual framebuffer stored in
main memory.  The only way to implement hardware-accelerated OpenGL in
that environment is to use "split rendering", which is what VirtualGL
does.  It splits off the 3D rendering to another X server that has a GPU
attached.


> Wayland apps handle all rendering themselves, there is nothing for
> sending rendering commands to another process like the Wayland
> compositor.
> 
> What a Wayland compositor needs to do is to advertise support for EGL
> Wayland platform for clients. That it does by using the
> EGL_WL_bind_wayland_display extension.
> 
> If you want all GL rendering to happen in the machine where the app
> runs, then you don't have to do much anything, it already works like
> that. You only need to make sure the compositor initializes EGL, which
> in Weston's case means using the gl-renderer. The renderer does not
> have to actually composite anything if you want to remote windows
> separately, but it is needed to gain access to the window contents. In
> Weston, only the renderer knows how to access the contents of all
> windows (wl_surfaces).
> 
> If OTOH you want to send GL rendering commands to the other machine
> than where the app is running, that will require a great deal of work,
> since you have to implement serialization and de-serialization of
> OpenGL (and EGL) yourself. (It has been done before, do ask me if you
> want details.)

But if you run OpenGL applications in Weston, as it is currently
implemented, then the OpenGL applications are either GPU-accelerated or
not, depending on the back end used.  If you run Weston nested in a
Wayland compositor that is already GPU-accelerated, then OpenGL
applications run in the Weston session will be GPU-accelerated as well.
If you run Weston with the RDP back end, then OpenGL applications run in
the Weston session will use Mesa llvmpipe instead.  I'm trying to
understand, quite simply, whether it's possible for unmodified Wayland
OpenGL applications-- such as the example OpenGL applications in the
Weston source-- to take advantage of OpenGL GPU acceleration when they
are running with the RDP back end.  (I'm assuming that whatever
restrictions there are on the RDP back end would exist for the TurboVNC
back end I intend to develop.)  My testing thus far indicates that this
is not currently possible, but I need to understand the source of the
limitation so I can understand how to work around it.  Instead, you seem
to be telling me that the limitation doesn't exist, but I can assure you
that it does.  Please test Weston with the RDP back end and confirm that
OpenGL applications run in that environment are not GPU-accelerated.


> I think you have an underlying assumption that EGL and GL would somehow
> automatically be carried over the network, and you need to undo it.
> That does not happen, as the display server always runs in the same
> machine as the application. The Wayland display is always local, it can
> never be remote simply because Wayland can never go over a network.

No I don't have that assumption at all, because that does not currently
occur with VirtualGL.  VirtualGL is designed precisely to avoid that
situation.  The problem is quite simply:  In Weston, as it is currently
implemented, OpenGL applications are not GP

Remote display with 3D acceleration using Wayland/Weston

2016-12-13 Thread DRC
Greetings.  I am the founder and principal developer for The VirtualGL
Project, which has (since 2004) produced a GLX interposer (VirtualGL)
and a high-speed X proxy (TurboVNC) that are widely used for running
Linux/Unix OpenGL applications remotely with hardware-accelerated
server-side 3D rendering.  For those who aren't familiar with VirtualGL,
it basically works by:

-- Interposing (via LD_PRELOAD) GLX calls from the OpenGL application
-- Rewriting the GLX calls such that OpenGL contexts are created in
Pbuffers instead of windows
-- Redirecting the GLX calls to the server's local display (usually :0,
which presumably has a GPU attached) rather than the remote display or
the X proxy
-- Reading back the rendered 3D images from the server's local display
and transferring them to the remote display or X proxy when the
application swaps buffers or performs other "triggers" (such as calling
glFinish() when rendering to the front buffer)

There is more complexity to it than that, but that's at least the
general idea.

At the moment, I'm investigating how best to accomplish a similar feat
in a Wayland/Weston environment.  I'm given to understand that building
a VNC server on top of Weston is straightforward and has already been
done as a proof of concept, so really my main question is how to do the
OpenGL stuff.  At the moment, my (very limited) understanding of the
architecture seems to suggest that I have two options:

(1) Implement an interposer similar in concept to VirtualGL, except that
this interposer would rewrite EGL calls to redirect them from the
Wayland display to a low-level EGL device that supports off-screen
rendering (such as the devices provided through the
EGL_PLATFORM_DEVICE_EXT extension, which is currently supported by
nVidia's drivers.)  How to get the images from that low-level device
into the Weston compositor when it is using a remote display back-end is
an open question, but I assume I'd have to ask the compositor for a
surface (which presumably would be allocated from main memory) and
handle the transfer of the pixels from the GPU to that surface.  That is
similar in concept to how VirtualGL currently works, vis-a-vis using
glReadPixels to transfer the rendered OpenGL pixels into an MIT-SHM image.

(2) Figure out some way of redirecting the OpenGL rendering within
Weston itself, rather than using an interposer.  This is where I'm fuzzy
on the details.  Is this even possible with a remote display back-end?
Maybe it's as straightforward as writing a back-end that allows Weston
to use the aforementioned low-level EGL device to obtain all of the
rendering surfaces that it passes to applications, but I don't have a
good enough understanding of the architecture to know whether or not
that idea is nonsense.  I know that X proxies, such as Xvnc, allocate a
"virtual framebuffer" that is used by the X.org code for performing X11
rendering.  Because this virtual framebuffer is located in main memory,
you can't do hardware-accelerated OpenGL with it unless you use a
solution like VirtualGL.  It would be impractical to allocate the X
proxy's virtual framebuffer in GPU memory because of the fine-grained
nature of X11, but since Wayland is all image-based, perhaps that is no
longer a limitation.

Any advice is greatly appreciated.  Thanks for your time.

DRC
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel