Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-18 Thread Simon Ser
On Thursday, November 15, 2018 6:11 PM, Philipp Zabel  
wrote:
> Hi Pekka,
>
> thank you for the explanation.

Hi,

Thanks Pekka for clarifying.

> On Wed, 2018-11-14 at 11:03 +0200, Pekka Paalanen wrote:
> [...]
> > The hints protocol we are discussing here is a subset of what
> > https://github.com/cubanismo/allocator aims to achieve. Originally we
> > only concentrated on getting the format and modifier more optimal, but
> > the question of where and how to allocate the buffers is valid too. Is
> > it in scope for this extension is the big question below.
>
> My guess is: probably not. Either way, I'd prefer the protocol docs to
> be explicit about this.
>
> > Ideally, the protocol would do something like this:
> >
> > - Tell the client which device and for which use case the device must
> >   be able to access the buffer at minimum and always.
> >
> > - Tell the client that if it could make the buffer suitable also for a
> >   secondary device and a secondary use case, the compositor could do a
> >   more optimal job (e.g. putting the buffer in direct scanout,
> >   bypassing composition, or a hardware video encoder in case the output
> >   is going to be streamed).
> >
> > We don't have the vocabulary for use cases and there are tons of
> > different details to be taken into account, which is the whole point of
> > the allocator project. So we cannot do the complete solution here and
> > now, but we can do an approximate solution by negotiating pixel
> > formats and modifiers.
> >
> > The primary device is what the compositor uses for the fallback path,
> > which is compositing with a GPU.
> >
> >  Therefore at very minimum, clients
> > need to allocate buffers that can be used with the primary device. We
> > guarantee this in the zwp_linux_dmabuf protocol by having the
> > compositor test the buffer import into EGL (or equivalent) before it
> > accepts that the buffer even exists. The client does not absolutely
> > necessarily need the primary device for this, but it will have much
> > better chances of making usable buffers if it uses it for allocation at
> > least.
>
> So the client must provide buffers that the primary device can import
> and sample a texture from, ideally directly.
> Can something like this be added to the interface description, to make
> it clear what the primary device actually is supposed to be in this
> context?

This seems sensible, I'll do that.

> > The primary device also has another very different meaning: the
> > compositor will likely be using the primary device anyway so it is kept
> > active and if clients use the same device instead of some other device,
> > it probably results in considerable power savings. IOW, the primary
> > device is the preferred rendering device as well. Or so I assume, these
> > two concepts could be decoupled as well.
>
> And the client should default to using the same primary device for
> rendering for power savings.

Will be in the next version, but with "can" instead of "should", because some
clients (games with DRI_PRIME) might want to use another device to get better
performance.

> > A secondary device is optional. In system where the GPU and display
> > devices are separate DRM devices, the GPU will be the primary device,
> > and the display device would be the secondary device. So there seems to
> > be a use case for sending the secondary device (or devices?) in
> > addition to the primary device.
> >
> > AFAIK, the unix device memory allocator project does not yet have
> > anything we should be encoding as a Wayland extension, so all we seem
> > to be able to do is to deliver the device file descriptors and the
> > format+modifier sets.
>
> Ok.
>
> > Now the design question: do we want to communicate the secondary
> > devices in this extension? Quite likely we need a different extension
> > to be used with the allocator project.
>
> As long as the use case is not clear, I'd say leave it out.
> A "secondary_device" event may be added later with a version update if
> needed.

Yes, I agree, I'd prefer not having this in the protocol for now.

> > My current opinion is that if there is no generic way for an
> > application to benefit from the secondary device fd, then we should not
> > add secondary devices in this extension yet.
>
> I agree.

+1
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-15 Thread Philipp Zabel
Hi Pekka,

thank you for the explanation.

On Wed, 2018-11-14 at 11:03 +0200, Pekka Paalanen wrote:
[...]
> The hints protocol we are discussing here is a subset of what
> https://github.com/cubanismo/allocator aims to achieve. Originally we
> only concentrated on getting the format and modifier more optimal, but
> the question of where and how to allocate the buffers is valid too. Is
> it in scope for this extension is the big question below.

My guess is: probably not. Either way, I'd prefer the protocol docs to
be explicit about this.

> Ideally, the protocol would do something like this:
> 
> - Tell the client which device and for which use case the device must
>   be able to access the buffer at minimum and always.
> 
> - Tell the client that if it could make the buffer suitable also for a
>   secondary device and a secondary use case, the compositor could do a
>   more optimal job (e.g. putting the buffer in direct scanout,
>   bypassing composition, or a hardware video encoder in case the output
>   is going to be streamed).
> 
> We don't have the vocabulary for use cases and there are tons of
> different details to be taken into account, which is the whole point of
> the allocator project. So we cannot do the complete solution here and
> now, but we can do an approximate solution by negotiating pixel
> formats and modifiers.
> 
> The primary device is what the compositor uses for the fallback path,
> which is compositing with a GPU.
>
>  Therefore at very minimum, clients
> need to allocate buffers that can be used with the primary device. We
> guarantee this in the zwp_linux_dmabuf protocol by having the
> compositor test the buffer import into EGL (or equivalent) before it
> accepts that the buffer even exists. The client does not absolutely
> necessarily need the primary device for this, but it will have much
> better chances of making usable buffers if it uses it for allocation at
> least.

So the client must provide buffers that the primary device can import
and sample a texture from, ideally directly.
Can something like this be added to the interface description, to make
it clear what the primary device actually is supposed to be in this
context?

> The primary device also has another very different meaning: the
> compositor will likely be using the primary device anyway so it is kept
> active and if clients use the same device instead of some other device,
> it probably results in considerable power savings. IOW, the primary
> device is the preferred rendering device as well. Or so I assume, these
> two concepts could be decoupled as well.

And the client should default to using the same primary device for
rendering for power savings.

> A secondary device is optional. In system where the GPU and display
> devices are separate DRM devices, the GPU will be the primary device,
> and the display device would be the secondary device. So there seems to
> be a use case for sending the secondary device (or devices?) in
> addition to the primary device.
> 
> AFAIK, the unix device memory allocator project does not yet have
> anything we should be encoding as a Wayland extension, so all we seem
> to be able to do is to deliver the device file descriptors and the
> format+modifier sets.

Ok.

> Now the design question: do we want to communicate the secondary
> devices in this extension? Quite likely we need a different extension
> to be used with the allocator project.

As long as the use case is not clear, I'd say leave it out.
A "secondary_device" event may be added later with a version update if
needed.

> Is communicating the display device fd useful already when it differs
> from the rendering device? Is there a way for generic client userspace
> to use it effectively, or would it rely on hardware-specific code in
> clients rather than in e.g. Mesa drivers? Are there EGL or Vulkan APIs
> to tell the driver it should make the buffer work on one device while
> rendering on another?

I have not found anything specific about this in the Vulkan spec.

The VK_KHR_external_memory extension even states:

"However, only the same concrete physical device can be used when
 sharing memory, [...]"

and:

"Note this does not attempt to address cross-device transitions, nor
 transitions to engines on the same device which are not visible
 within the Vulkan API.
 Both of these are beyond the scope of this extension."

in the issues. So even though with VK_EXT_external_memory_dma_buf
and VK_EXT_image_drm_format_modifier bolted on top, sharing between
different devices should be possible, it is not the main focus of these
extensions.

> My current opinion is that if there is no generic way for an
> application to benefit from the secondary device fd, then we should not
> add secondary devices in this extension yet.

I agree.

regards
Philipp
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org

Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-14 Thread Pekka Paalanen
On Tue, 13 Nov 2018 18:19:29 +
Simon Ser  wrote:

> > Hi Simon,
> >
> > On Fri, 2018-11-02 at 18:49 +, Simon Ser wrote:  
> > > On Friday, November 2, 2018 12:30 PM, Philipp Zabel 
> > >  wrote:  
> > > > > > +
> > > > > > +  
> > > > > > +This event advertizes the primary device that the server 
> > > > > > prefers. There
> > > > > > +is exactly one primary device.  
> > > >
> > > > Which device should this be if the scanout engine is separate from the
> > > > render engine (e.g. IPU/imx-drm and GPU/etnaviv on i.MX6)  
> > >
> > > When the surface hints are created, I expect the compositor to send the 
> > > device
> > > it uses for compositing as the primary device (assuming it's using only 
> > > one
> > > device).  
> >
> > i.MX6 has a separate scanout device without any acceleration capabilities
> > except some hardware overlay planes, and a pure GPU render device without
> > any connection to the outside world. The compositor uses both devices for
> > compositing and output.  
> 
> But most of the time, client buffers will go through compositing. So the
> primary device is still the render device.
> 
> The situation doesn't change a lot compared to wl_drm to be honest. The device
> that is advertised via wl_drm will be the primary device advertised by this
> protocol.
> 
> Maybe when the compositor decides to scan-out a client, it can switch the
> primary device to the scan-out device. Sorry, I don't know enough about these
> particular devices to say for sure.

Hi,

I do see Philipp's point after thinking for a while. I'll explain below.

> > > > When the surface becomes fullscreen on a different GPU (meaning it 
> > > > becomes  
> > > fullscreen on an output which is managed by another GPU), I'd expect the
> > > compositor to change the primary device for this surface to this other 
> > > GPU.
> > >
> > > If the compositor uses multiple devices for compositing, it'll probably 
> > > switch
> > > the primary device when the surface is moved from one GPU to the other.
> > >
> > > I'm not sure how i.MX6 works, but: even if the same GPU is used for 
> > > compositing
> > > and scanout, but the compositing preferred formats are different from the
> > > scanout preferred formats, the compositor can update the preferred format
> > > without changing the preferred device.
> > >
> > > Is there an issue with this? Maybe something should be added to the 
> > > protocol to
> > > explain it better?  
> >
> > It is not clear to me from the protocol description whether the primary
> > device means the scanout engine or the GPU, in case they are different.
> >
> > What is the client process supposed to do with this fd? Is it expected
> > to be able to render on this device? Or use it to allocate the optimal
> > buffers?  
> 
> The client is expected to allocate its buffers there. I'm not sure about
> rendering.

Well, actually...

> > > > What about contiguous vs non-contiguous memory?
> > > >
> > > > On i.MX6QP (Vivante GC3000) we would probably want the client to always
> > > > render DRM_FORMAT_MOD_VIVANTE_SUPER_TILED, because this can be directly
> > > > read by both texture samplers (non-contiguous) and scanout (must be
> > > > contiguous).
> > > >
> > > > On i.MX6Q (Vivante GC2000) we always want to use the most efficient
> > > > DRM_FORMAT_MOD_VIVANTE_SPLIT_SUPER_TILED, because neither of the
> > > > supported render formats can be sampled or scanned out directly.
> > > > Since the compositor has to resolve into DRM_FORMAT_MOD_VIVANTE_TILED
> > > > (non-contiguous) for texture sampling or DRM_FORMAT_MOD_LINEAR
> > > > (contiguous) for scanout, the client buffers can always be non-
> > > > contiguous.
> > > >
> > > > On i.MX6S (Vivante GC880) the optimal render format for texture sampling
> > > > would be DRM_FORMAT_MOD_VIVANTE_TILED (non-contiguous) and for scanout
> > > > DRM_FORMAT_MOD_VIVANTE_SUPER_TILED (non-contiguous) which would be
> > > > resolved into DRM_FORMAT_MOD_LINEAR (contiguous) by the compositor.  
> > >
> > > I think all of this works with Daniel's design.
> > >  
> > > > All three could always handle DRM_FORMAT_MOD_LINEAR (contiguous) client
> > > > buffers for scanout directly, but those would be suboptimal if the
> > > > compositor decides to render on short notice, because the client would
> > > > have already resolved into linear and then the compositor would have to
> > > > resolve back into a texture sampler tiling format.  
> > >
> > > Is the concern here that switching between scanout and compositing is
> > > non-optimal until the client chooses the preferred format?  
> >
> > My point is just that whether or not the buffer must be contiguous in
> > physical memory is the essential piece of information on i.MX6QP,
> > whereas the optimal tiling modifier is the same for both GPU composition
> > and direct scanout cases.
> >
> > If the client provides non-contiguous buffers, the "optimal" tiling
> > doesn't help one bit in the scanout case, as the 

Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-13 Thread Simon Ser
> Hi Simon,
>
> On Fri, 2018-11-02 at 18:49 +, Simon Ser wrote:
> > On Friday, November 2, 2018 12:30 PM, Philipp Zabel 
> >  wrote:
> > > > > +
> > > > > +  
> > > > > +This event advertizes the primary device that the server 
> > > > > prefers. There
> > > > > +is exactly one primary device.
> > >
> > > Which device should this be if the scanout engine is separate from the
> > > render engine (e.g. IPU/imx-drm and GPU/etnaviv on i.MX6)
> >
> > When the surface hints are created, I expect the compositor to send the 
> > device
> > it uses for compositing as the primary device (assuming it's using only one
> > device).
>
> i.MX6 has a separate scanout device without any acceleration capabilities
> except some hardware overlay planes, and a pure GPU render device without
> any connection to the outside world. The compositor uses both devices for
> compositing and output.

But most of the time, client buffers will go through compositing. So the
primary device is still the render device.

The situation doesn't change a lot compared to wl_drm to be honest. The device
that is advertised via wl_drm will be the primary device advertised by this
protocol.

Maybe when the compositor decides to scan-out a client, it can switch the
primary device to the scan-out device. Sorry, I don't know enough about these
particular devices to say for sure.

> > > When the surface becomes fullscreen on a different GPU (meaning it becomes
> > fullscreen on an output which is managed by another GPU), I'd expect the
> > compositor to change the primary device for this surface to this other GPU.
> >
> > If the compositor uses multiple devices for compositing, it'll probably 
> > switch
> > the primary device when the surface is moved from one GPU to the other.
> >
> > I'm not sure how i.MX6 works, but: even if the same GPU is used for 
> > compositing
> > and scanout, but the compositing preferred formats are different from the
> > scanout preferred formats, the compositor can update the preferred format
> > without changing the preferred device.
> >
> > Is there an issue with this? Maybe something should be added to the 
> > protocol to
> > explain it better?
>
> It is not clear to me from the protocol description whether the primary
> device means the scanout engine or the GPU, in case they are different.
>
> What is the client process supposed to do with this fd? Is it expected
> to be able to render on this device? Or use it to allocate the optimal
> buffers?

The client is expected to allocate its buffers there. I'm not sure about
rendering.

> > > What about contiguous vs non-contiguous memory?
> > >
> > > On i.MX6QP (Vivante GC3000) we would probably want the client to always
> > > render DRM_FORMAT_MOD_VIVANTE_SUPER_TILED, because this can be directly
> > > read by both texture samplers (non-contiguous) and scanout (must be
> > > contiguous).
> > >
> > > On i.MX6Q (Vivante GC2000) we always want to use the most efficient
> > > DRM_FORMAT_MOD_VIVANTE_SPLIT_SUPER_TILED, because neither of the
> > > supported render formats can be sampled or scanned out directly.
> > > Since the compositor has to resolve into DRM_FORMAT_MOD_VIVANTE_TILED
> > > (non-contiguous) for texture sampling or DRM_FORMAT_MOD_LINEAR
> > > (contiguous) for scanout, the client buffers can always be non-
> > > contiguous.
> > >
> > > On i.MX6S (Vivante GC880) the optimal render format for texture sampling
> > > would be DRM_FORMAT_MOD_VIVANTE_TILED (non-contiguous) and for scanout
> > > DRM_FORMAT_MOD_VIVANTE_SUPER_TILED (non-contiguous) which would be
> > > resolved into DRM_FORMAT_MOD_LINEAR (contiguous) by the compositor.
> >
> > I think all of this works with Daniel's design.
> >
> > > All three could always handle DRM_FORMAT_MOD_LINEAR (contiguous) client
> > > buffers for scanout directly, but those would be suboptimal if the
> > > compositor decides to render on short notice, because the client would
> > > have already resolved into linear and then the compositor would have to
> > > resolve back into a texture sampler tiling format.
> >
> > Is the concern here that switching between scanout and compositing is
> > non-optimal until the client chooses the preferred format?
>
> My point is just that whether or not the buffer must be contiguous in
> physical memory is the essential piece of information on i.MX6QP,
> whereas the optimal tiling modifier is the same for both GPU composition
> and direct scanout cases.
>
> If the client provides non-contiguous buffers, the "optimal" tiling
> doesn't help one bit in the scanout case, as the scanout hardware can't
> read from those.

Sorry, I don't get what you mean. Can you please try to explain again?
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-12 Thread Philipp Zabel
Hi Simon,

On Fri, 2018-11-02 at 18:49 +, Simon Ser wrote:
> On Friday, November 2, 2018 12:30 PM, Philipp Zabel  
> wrote:
> > > > +
> > > > +  
> > > > +This event advertizes the primary device that the server 
> > > > prefers. There
> > > > +is exactly one primary device.
> > 
> > Which device should this be if the scanout engine is separate from the
> > render engine (e.g. IPU/imx-drm and GPU/etnaviv on i.MX6)
> 
> When the surface hints are created, I expect the compositor to send the device
> it uses for compositing as the primary device (assuming it's using only one
> device).

i.MX6 has a separate scanout device without any acceleration capabilities
except some hardware overlay planes, and a pure GPU render device without
any connection to the outside world. The compositor uses both devices for
compositing and output.

> > When the surface becomes fullscreen on a different GPU (meaning it becomes
> fullscreen on an output which is managed by another GPU), I'd expect the
> compositor to change the primary device for this surface to this other GPU.
>
> If the compositor uses multiple devices for compositing, it'll probably switch
> the primary device when the surface is moved from one GPU to the other.
> 
> I'm not sure how i.MX6 works, but: even if the same GPU is used for 
> compositing
> and scanout, but the compositing preferred formats are different from the
> scanout preferred formats, the compositor can update the preferred format
> without changing the preferred device.
> 
> Is there an issue with this? Maybe something should be added to the protocol 
> to
> explain it better?

It is not clear to me from the protocol description whether the primary
device means the scanout engine or the GPU, in case they are different.

What is the client process supposed to do with this fd? Is it expected
to be able to render on this device? Or use it to allocate the optimal
buffers?

> > What about contiguous vs non-contiguous memory?
> > 
> > On i.MX6QP (Vivante GC3000) we would probably want the client to always
> > render DRM_FORMAT_MOD_VIVANTE_SUPER_TILED, because this can be directly
> > read by both texture samplers (non-contiguous) and scanout (must be
> > contiguous).
> > 
> > On i.MX6Q (Vivante GC2000) we always want to use the most efficient
> > DRM_FORMAT_MOD_VIVANTE_SPLIT_SUPER_TILED, because neither of the
> > supported render formats can be sampled or scanned out directly.
> > Since the compositor has to resolve into DRM_FORMAT_MOD_VIVANTE_TILED
> > (non-contiguous) for texture sampling or DRM_FORMAT_MOD_LINEAR
> > (contiguous) for scanout, the client buffers can always be non-
> > contiguous.
> > 
> > On i.MX6S (Vivante GC880) the optimal render format for texture sampling
> > would be DRM_FORMAT_MOD_VIVANTE_TILED (non-contiguous) and for scanout
> > DRM_FORMAT_MOD_VIVANTE_SUPER_TILED (non-contiguous) which would be
> > resolved into DRM_FORMAT_MOD_LINEAR (contiguous) by the compositor.
> 
> I think all of this works with Daniel's design.
>
> > All three could always handle DRM_FORMAT_MOD_LINEAR (contiguous) client
> > buffers for scanout directly, but those would be suboptimal if the
> > compositor decides to render on short notice, because the client would
> > have already resolved into linear and then the compositor would have to
> > resolve back into a texture sampler tiling format.
> 
> Is the concern here that switching between scanout and compositing is
> non-optimal until the client chooses the preferred format?

My point is just that whether or not the buffer must be contiguous in
physical memory is the essential piece of information on i.MX6QP,
whereas the optimal tiling modifier is the same for both GPU composition
and direct scanout cases.

If the client provides non-contiguous buffers, the "optimal" tiling
doesn't help one bit in the scanout case, as the scanout hardware can't
read from those.

regards
Philipp
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-12 Thread Pekka Paalanen
On Mon, 12 Nov 2018 12:16:04 +
Simon Ser  wrote:

> On Monday, November 12, 2018 10:18 AM, Pekka Paalanen  
> wrote:
> > On Sat, 10 Nov 2018 13:34:31 +
> > Simon Ser  wrote:
> >  
> > > On Monday, November 5, 2018 9:57 AM, Pekka Paalanen  
> > > wrote:  
> > > > > > Yeah. Another option is to send a wl_array of modifiers per format 
> > > > > > and
> > > > > > tranch.  
> > > > >
> > > > > True. Any reason why this hasn't been done in the global?  
> > > >
> > > > For formats? Well, it is simpler without a wl_array, and there might be
> > > > a lot of formats.
> > > >
> > > > Could there be a lot of modifiers per format? Would a wl_array make
> > > > anything easier? Just a thought.  
> > >
> > > It's true that for this list of formats sorted by preference, we'll 
> > > probably
> > > need to split modifiers anyway so I don't think we'd benefit a lot from 
> > > this
> > > approach.  
> >
> > Hi Simon,
> >
> > just to be clear, I was thinking of something like:
> >
> > event(uint format, wl_array(modifiers))
> >
> > But I definitely do not insist on it if you don't see any obvious
> > benefits with it.  
> 
> Yeah. I think the benefits would not be substantial as we need to "split" 
> these
> to order them by preference. So it would look like so:
> 
>   event(format1, wl_array(modifiers))
>   barrier()
>   event(format1, wl_array(modifiers))
>   event(format2, wl_array(modifiers))
>   barrier()
>   event(format1, wl_array(modifiers))
>   barrier()
> 
> Also this is not consistent with the rest of the protocol. Maybe we can 
> discuss
> this again for linux-dmabuf-unstable-v2.
> 
> > It seems you and I made very different assumptions on how the hints
> > would be sent, I only realized it just now. More about that below.
> >  
> > > > > > I suppose it will be enough to send tranches for just the currently
> > > > > > used format? Otherwise it could be "a lot" of data.  
> > > > >
> > > > > What do you mean by "the currently used format"?  
> > > >
> > > > This interface is used to send clients hints after they are already
> > > > presenting, which means they already have a format chosen and probably
> > > > want to stick with it, just changing the modifiers to be more optimal.  
> > >
> > > If we only send the modifiers for the current format, how do clients tell 
> > > the
> > > difference between the initial hints (which don't have a "currently used
> > > format") and the subsequent hints?  
> >
> > I'm not sure I understand why they would need to see the difference.
> > But yes, I was short-sighted here and didn't consider the
> > initialization when a surface is not mapped yet. I didn't expect that
> > hints can be calculated if the surface is not mapped, but of course a
> > compositor can provide some defaults. I suppose the initial default
> > hints would boil down to what is most efficient to composite.
> >  
> > > > > I expect clients to bind to this interface and create a surface hints 
> > > > > object
> > > > > before the surface is mapped. In this case there's no "currently used 
> > > > > format".  
> > > >
> > > > Right, that's another use case.
> > > >  
> > > > > It will be a fair amount of data, yes. However it's just a list of 
> > > > > integers.
> > > > > When we send strings over the protocol (e.g. toplevel title in 
> > > > > xdg-shell) it's
> > > > > about the same amount of data I guess.  
> > > >
> > > > If the EGLConfig or GLXFBConfig or GLX visual lists are of any
> > > > indication... yes, they account for depth, stencil, aux, etc. but then
> > > > we will have modifiers.
> > > >
> > > > We already advertise the list of everything supported of format+modifer
> > > > in the linux_dmabuf extension. Could we somehow minimize the number of
> > > > recommended format+modifiers in hints? Or maybe that's not a concern
> > > > for the protocol spec?  
> > >
> > > I'm not sure.
> > >
> > > After this patch, I'm not even sure how the formats+modifiers advertised 
> > > by the
> > > global work. Are these formats+modifiers supported on the GPU the 
> > > compositor
> > > uses for rendering? Intersection or union of formats+modifiers supported 
> > > on all
> > > GPUs?  
> >
> > The format+modifier advertised by the global before this patch are the
> > ones that can work at all, or the compositor is willing to make them
> > work at least in the worst fallback case. This patch must not change
> > that meaning. These formats also must always work regardless of which
> > GPU a client decides to use, but that is already implied by the
> > compositor being able to import a dmabuf. The compositor does not need
> > to try to factor in what other GPUs on the system might be able to
> > render or not, that is for the client to figure out when it knows the
> > formats the compositor can accept and is choosing a GPU to render with.
> > It is theoretically possible that a client tries to use a GPU that
> > cannot render any formats the compositor can use, but that is the
> > client's 

Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-12 Thread Simon Ser
On Monday, November 12, 2018 10:18 AM, Pekka Paalanen  
wrote:
> On Sat, 10 Nov 2018 13:34:31 +
> Simon Ser  wrote:
>
> > On Monday, November 5, 2018 9:57 AM, Pekka Paalanen  
> > wrote:
> > > > > Yeah. Another option is to send a wl_array of modifiers per format and
> > > > > tranch.
> > > >
> > > > True. Any reason why this hasn't been done in the global?
> > >
> > > For formats? Well, it is simpler without a wl_array, and there might be
> > > a lot of formats.
> > >
> > > Could there be a lot of modifiers per format? Would a wl_array make
> > > anything easier? Just a thought.
> >
> > It's true that for this list of formats sorted by preference, we'll probably
> > need to split modifiers anyway so I don't think we'd benefit a lot from this
> > approach.
>
> Hi Simon,
>
> just to be clear, I was thinking of something like:
>
> event(uint format, wl_array(modifiers))
>
> But I definitely do not insist on it if you don't see any obvious
> benefits with it.

Yeah. I think the benefits would not be substantial as we need to "split" these
to order them by preference. So it would look like so:

  event(format1, wl_array(modifiers))
  barrier()
  event(format1, wl_array(modifiers))
  event(format2, wl_array(modifiers))
  barrier()
  event(format1, wl_array(modifiers))
  barrier()

Also this is not consistent with the rest of the protocol. Maybe we can discuss
this again for linux-dmabuf-unstable-v2.

> It seems you and I made very different assumptions on how the hints
> would be sent, I only realized it just now. More about that below.
>
> > > > > I suppose it will be enough to send tranches for just the currently
> > > > > used format? Otherwise it could be "a lot" of data.
> > > >
> > > > What do you mean by "the currently used format"?
> > >
> > > This interface is used to send clients hints after they are already
> > > presenting, which means they already have a format chosen and probably
> > > want to stick with it, just changing the modifiers to be more optimal.
> >
> > If we only send the modifiers for the current format, how do clients tell 
> > the
> > difference between the initial hints (which don't have a "currently used
> > format") and the subsequent hints?
>
> I'm not sure I understand why they would need to see the difference.
> But yes, I was short-sighted here and didn't consider the
> initialization when a surface is not mapped yet. I didn't expect that
> hints can be calculated if the surface is not mapped, but of course a
> compositor can provide some defaults. I suppose the initial default
> hints would boil down to what is most efficient to composite.
>
> > > > I expect clients to bind to this interface and create a surface hints 
> > > > object
> > > > before the surface is mapped. In this case there's no "currently used 
> > > > format".
> > >
> > > Right, that's another use case.
> > >
> > > > It will be a fair amount of data, yes. However it's just a list of 
> > > > integers.
> > > > When we send strings over the protocol (e.g. toplevel title in 
> > > > xdg-shell) it's
> > > > about the same amount of data I guess.
> > >
> > > If the EGLConfig or GLXFBConfig or GLX visual lists are of any
> > > indication... yes, they account for depth, stencil, aux, etc. but then
> > > we will have modifiers.
> > >
> > > We already advertise the list of everything supported of format+modifer
> > > in the linux_dmabuf extension. Could we somehow minimize the number of
> > > recommended format+modifiers in hints? Or maybe that's not a concern
> > > for the protocol spec?
> >
> > I'm not sure.
> >
> > After this patch, I'm not even sure how the formats+modifiers advertised by 
> > the
> > global work. Are these formats+modifiers supported on the GPU the compositor
> > uses for rendering? Intersection or union of formats+modifiers supported on 
> > all
> > GPUs?
>
> The format+modifier advertised by the global before this patch are the
> ones that can work at all, or the compositor is willing to make them
> work at least in the worst fallback case. This patch must not change
> that meaning. These formats also must always work regardless of which
> GPU a client decides to use, but that is already implied by the
> compositor being able to import a dmabuf. The compositor does not need
> to try to factor in what other GPUs on the system might be able to
> render or not, that is for the client to figure out when it knows the
> formats the compositor can accept and is choosing a GPU to render with.
> It is theoretically possible that a client tries to use a GPU that
> cannot render any formats the compositor can use, but that is the
> client's responsibility to figure out.

Okay, that makes sense. And if a GPU doesn't support direct scan-out for some
format+modifier, then it can always fallback to good ol' compositing.

> So clearly the formats from the global can be used by a client at any
> time. The hint formats OTOH has no reason to list absolutely
> everything the compositor supports, but a 

Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-12 Thread Pekka Paalanen
On Mon, 12 Nov 2018 10:13:39 +
Simon Ser  wrote:

> On Monday, November 12, 2018 10:14 AM, Pekka Paalanen  
> wrote:
> > > * Create a wl_surface, get the hints, and destroy everything (without 
> > > mapping
> > >   the surface)
> > > * Allow the get_surface_hints to take a NULL surface
> > > * Add a get_hints request without a wl_surface argument
> > > * Forget about per-surface hints, make hints global
> > > * (Someone else volunteers to patch Mesa to use per-surface FDs)
> > >
> > > What do you think?  
> >
> > I think maybe it would be best to make the device hint "global" in a
> > way, not tied to any surface, while leaving the format+modifier hints
> > per-surface. IOW, just move the primary_device event from
> > zwp_linux_dmabuf_device_hints_v1 into zwp_linux_dmabuf_v1 (or
> > equivalent).
> >
> > Can anyone think of practical uses where the default device would need
> > to depend on the surface somehow?
> >
> > I seem to recall we agreed that the primary device is the one the
> > compositor is compositing with. Using the compositing device as the
> > recommended default device makes sense from power consuption point of
> > view: the compositor will be keeping that GPU awake anyway, so apps
> > that don't care much about performance but do want to use a GPU should
> > use it.  
> 
> In the case of compositing the surface, yes the primary device will be the one
> used for compositing. However there are two cases in which a per-surface 
> device
> hint would be useful.
> 
> First, what happens if the surface isn't composited and is directly scanned 
> out?
> Let's say I have two GPUs, with one output each. The compositor is using one 
> GPU
> for compositing, and the surface is fullscreened on the other's output. If we
> only have a global device hint, then the primary device will be the one used 
> for
> compositing. However this causes an unnecessary copy between the two GPUs: the
> client will render on one, and then the compositor will copy the DMA-BUF to 
> the
> other one for scan-out. It would be better if the client can render directly 
> on
> the GPU it will be scanned out with.

Theoretically yes. However, apps are not usually prepared to switch the
GPU they render with.

Rendering with and being scanned out on are somewhat orthogonal. In the
above case, the compositor could keep the default device as the
compositing GPU, but change the modifiers so that it would be possible
to import the dmabuf to the scanout GPU either for direct scanout or
having the scanout GPU make the copy. It's not always possible for
other reasons like an incompatible memory domain, I give you that.

If you envision that apps (toolkits) might be willing to implement GPU
switching sometimes, then I have no objections. It is again the
difference between initial default hints vs. optimization hints after
the surface is mapped.


> Second, some compositors could support rendering with multiple GPUs. For
> instance, if I have two GPUs with one output each, the compositor could use 
> GPU
> 1 for compositing output 1 and GPU 2 for compositing output 2. In this case, 
> it
> would be better if the client could render using the GPU it will be composited
> with, and this depends on the output the surface is displayed on.

From protocol point of view this does not differ from the first case.


> > Your possible solutions are a valid list for another problem as well:
> > the initial/default format+modifier hints before a surface is mapped. I
> > think it should be either allowing get_surface_hints with NULL surface
> > or adding get_default_hints request that doesn't take a surface.
> > Technically the two equivalent.  
> 
> I think the cleanest solution would be to add get_default_hints, which would
> create a wp_linux_dmabuf_hints object.

Right. And if we want the preferred device to also have the initial
hints vs. optimized hints after mapping, you'd keep the device event in
zwp_linux_dmabuf_device_hints_v1.

Sounds fine to me.


Thanks,
pq


pgpAGWtByNcOx.pgp
Description: OpenPGP digital signature
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-12 Thread Simon Ser
On Monday, November 12, 2018 10:14 AM, Pekka Paalanen  
wrote:
> > * Create a wl_surface, get the hints, and destroy everything (without 
> > mapping
> >   the surface)
> > * Allow the get_surface_hints to take a NULL surface
> > * Add a get_hints request without a wl_surface argument
> > * Forget about per-surface hints, make hints global
> > * (Someone else volunteers to patch Mesa to use per-surface FDs)
> >
> > What do you think?
>
> I think maybe it would be best to make the device hint "global" in a
> way, not tied to any surface, while leaving the format+modifier hints
> per-surface. IOW, just move the primary_device event from
> zwp_linux_dmabuf_device_hints_v1 into zwp_linux_dmabuf_v1 (or
> equivalent).
>
> Can anyone think of practical uses where the default device would need
> to depend on the surface somehow?
>
> I seem to recall we agreed that the primary device is the one the
> compositor is compositing with. Using the compositing device as the
> recommended default device makes sense from power consuption point of
> view: the compositor will be keeping that GPU awake anyway, so apps
> that don't care much about performance but do want to use a GPU should
> use it.

In the case of compositing the surface, yes the primary device will be the one
used for compositing. However there are two cases in which a per-surface device
hint would be useful.

First, what happens if the surface isn't composited and is directly scanned out?
Let's say I have two GPUs, with one output each. The compositor is using one GPU
for compositing, and the surface is fullscreened on the other's output. If we
only have a global device hint, then the primary device will be the one used for
compositing. However this causes an unnecessary copy between the two GPUs: the
client will render on one, and then the compositor will copy the DMA-BUF to the
other one for scan-out. It would be better if the client can render directly on
the GPU it will be scanned out with.

Second, some compositors could support rendering with multiple GPUs. For
instance, if I have two GPUs with one output each, the compositor could use GPU
1 for compositing output 1 and GPU 2 for compositing output 2. In this case, it
would be better if the client could render using the GPU it will be composited
with, and this depends on the output the surface is displayed on.

> Your possible solutions are a valid list for another problem as well:
> the initial/default format+modifier hints before a surface is mapped. I
> think it should be either allowing get_surface_hints with NULL surface
> or adding get_default_hints request that doesn't take a surface.
> Technically the two equivalent.

I think the cleanest solution would be to add get_default_hints, which would
create a wp_linux_dmabuf_hints object.

> I do not like the temp wl_surface approach, and we really do want hints
> to be per-surface because that's the whole point with the
> format+modifier hints.

Aye.
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-12 Thread Pekka Paalanen
On Sat, 10 Nov 2018 13:34:31 +
Simon Ser  wrote:

> On Monday, November 5, 2018 9:57 AM, Pekka Paalanen  
> wrote:
> > > > Yeah. Another option is to send a wl_array of modifiers per format and
> > > > tranch.  
> > >
> > > True. Any reason why this hasn't been done in the global?  
> >
> > For formats? Well, it is simpler without a wl_array, and there might be
> > a lot of formats.
> >
> > Could there be a lot of modifiers per format? Would a wl_array make
> > anything easier? Just a thought.  
> 
> It's true that for this list of formats sorted by preference, we'll probably
> need to split modifiers anyway so I don't think we'd benefit a lot from this
> approach.

Hi Simon,

just to be clear, I was thinking of something like:

event(uint format, wl_array(modifiers))

But I definitely do not insist on it if you don't see any obvious
benefits with it.

It seems you and I made very different assumptions on how the hints
would be sent, I only realized it just now. More about that below.

> > > > I suppose it will be enough to send tranches for just the currently
> > > > used format? Otherwise it could be "a lot" of data.  
> > >
> > > What do you mean by "the currently used format"?  
> >
> > This interface is used to send clients hints after they are already
> > presenting, which means they already have a format chosen and probably
> > want to stick with it, just changing the modifiers to be more optimal.  
> 
> If we only send the modifiers for the current format, how do clients tell the
> difference between the initial hints (which don't have a "currently used
> format") and the subsequent hints?

I'm not sure I understand why they would need to see the difference.
But yes, I was short-sighted here and didn't consider the
initialization when a surface is not mapped yet. I didn't expect that
hints can be calculated if the surface is not mapped, but of course a
compositor can provide some defaults. I suppose the initial default
hints would boil down to what is most efficient to composite.

> > > I expect clients to bind to this interface and create a surface hints 
> > > object
> > > before the surface is mapped. In this case there's no "currently used 
> > > format".  
> >
> > Right, that's another use case.
> >  
> > > It will be a fair amount of data, yes. However it's just a list of 
> > > integers.
> > > When we send strings over the protocol (e.g. toplevel title in xdg-shell) 
> > > it's
> > > about the same amount of data I guess.  
> >
> > If the EGLConfig or GLXFBConfig or GLX visual lists are of any
> > indication... yes, they account for depth, stencil, aux, etc. but then
> > we will have modifiers.
> >
> > We already advertise the list of everything supported of format+modifer
> > in the linux_dmabuf extension. Could we somehow minimize the number of
> > recommended format+modifiers in hints? Or maybe that's not a concern
> > for the protocol spec?  
> 
> I'm not sure.
> 
> After this patch, I'm not even sure how the formats+modifiers advertised by 
> the
> global work. Are these formats+modifiers supported on the GPU the compositor
> uses for rendering? Intersection or union of formats+modifiers supported on 
> all
> GPUs?

The format+modifier advertised by the global before this patch are the
ones that can work at all, or the compositor is willing to make them
work at least in the worst fallback case. This patch must not change
that meaning. These formats also must always work regardless of which
GPU a client decides to use, but that is already implied by the
compositor being able to import a dmabuf. The compositor does not need
to try to factor in what other GPUs on the system might be able to
render or not, that is for the client to figure out when it knows the
formats the compositor can accept and is choosing a GPU to render with.
It is theoretically possible that a client tries to use a GPU that
cannot render any formats the compositor can use, but that is the
client's responsibility to figure out.

So clearly the formats from the global can be used by a client at any
time. The hint formats OTOH has no reason to list absolutely
everything the compositor supports, but a compositor can choose on its
own judgement to send only a sub-set it would prefer.

However, after a client has picked a format and used it, then there
should be hints with that format, at least if they can make any
difference.

I'm not sure. Not listing everything always was my intuitive
assumption, and I believe you perhaps assumed the opposite so that a
client has absolutely all the information to e.g. optimize the modifier
of a format that the compositor would not prefer at all even though it
does work.

It would be simpler to always send everything, but that will be much
more protocol traffic. Would it be too much? I don't know, could you
calculate some examples of how many bytes a typical hints update would
be if sending everything always?


Thanks,
pq


pgp5TMa0hmmup.pgp
Description: OpenPGP digital signature

Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-12 Thread Pekka Paalanen
On Sat, 10 Nov 2018 13:54:19 +
Simon Ser  wrote:

> Just a general update about this: I tried to see how we could make Mesa use 
> this
> new protocol.
> 
> A bad news is that the DRM FD is per-EGLDisplay and I think it would require
> quite some changes to make it per-EGLSurface. I'm still new to the Mesa
> codebase, so it'd probably make sense to only use the new protocol to get the
> device FD, without relying on wl_drm anymore. We could talk about using the
> protocol more efficiently in the future. I also think a lot of clients weren't
> designed to support multiple device FDs, so it would be nice to have a 
> smoother
> upgrade path.

Hi,

yeah, that sounds fine to me: use the new protocol, if available, to
only find the default device at EGLDisplay creation.

What can be done per surface later is only the changing of
format+modifier, within the limits of what EGLConfig the app is using,
so maybe it's the modifier alone. If EGL should do that automatically
and internally to begin with... it could change the modifier at least.

> That leaves an issue: the whole protocol provides hints for a surface. When 
> the
> EGLDisplay is created we don't have a surface yet. I can think of a few 
> possible
> solutions:

Indeed.

> 
> * Create a wl_surface, get the hints, and destroy everything (without mapping
>   the surface)
> * Allow the get_surface_hints to take a NULL surface
> * Add a get_hints request without a wl_surface argument
> * Forget about per-surface hints, make hints global
> * (Someone else volunteers to patch Mesa to use per-surface FDs)
> 
> What do you think?

I think maybe it would be best to make the device hint "global" in a
way, not tied to any surface, while leaving the format+modifier hints
per-surface. IOW, just move the primary_device event from
zwp_linux_dmabuf_device_hints_v1 into zwp_linux_dmabuf_v1 (or
equivalent).

Can anyone think of practical uses where the default device would need
to depend on the surface somehow?

I seem to recall we agreed that the primary device is the one the
compositor is compositing with. Using the compositing device as the
recommended default device makes sense from power consuption point of
view: the compositor will be keeping that GPU awake anyway, so apps
that don't care much about performance but do want to use a GPU should
use it.

Your possible solutions are a valid list for another problem as well:
the initial/default format+modifier hints before a surface is mapped. I
think it should be either allowing get_surface_hints with NULL surface
or adding get_default_hints request that doesn't take a surface.
Technically the two equivalent.

I do not like the temp wl_surface approach, and we really do want hints
to be per-surface because that's the whole point with the
format+modifier hints.


Thanks,
pq


pgprQJZfcbKNR.pgp
Description: OpenPGP digital signature
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-10 Thread Simon Ser
Just a general update about this: I tried to see how we could make Mesa use this
new protocol.

A bad news is that the DRM FD is per-EGLDisplay and I think it would require
quite some changes to make it per-EGLSurface. I'm still new to the Mesa
codebase, so it'd probably make sense to only use the new protocol to get the
device FD, without relying on wl_drm anymore. We could talk about using the
protocol more efficiently in the future. I also think a lot of clients weren't
designed to support multiple device FDs, so it would be nice to have a smoother
upgrade path.

That leaves an issue: the whole protocol provides hints for a surface. When the
EGLDisplay is created we don't have a surface yet. I can think of a few possible
solutions:

* Create a wl_surface, get the hints, and destroy everything (without mapping
  the surface)
* Allow the get_surface_hints to take a NULL surface
* Add a get_hints request without a wl_surface argument
* Forget about per-surface hints, make hints global
* (Someone else volunteers to patch Mesa to use per-surface FDs)

What do you think?
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-10 Thread Simon Ser
On Monday, November 5, 2018 9:57 AM, Pekka Paalanen  wrote:
> > > Yeah. Another option is to send a wl_array of modifiers per format and
> > > tranch.
> >
> > True. Any reason why this hasn't been done in the global?
>
> For formats? Well, it is simpler without a wl_array, and there might be
> a lot of formats.
>
> Could there be a lot of modifiers per format? Would a wl_array make
> anything easier? Just a thought.

It's true that for this list of formats sorted by preference, we'll probably
need to split modifiers anyway so I don't think we'd benefit a lot from this
approach.

> > > I suppose it will be enough to send tranches for just the currently
> > > used format? Otherwise it could be "a lot" of data.
> >
> > What do you mean by "the currently used format"?
>
> This interface is used to send clients hints after they are already
> presenting, which means they already have a format chosen and probably
> want to stick with it, just changing the modifiers to be more optimal.

If we only send the modifiers for the current format, how do clients tell the
difference between the initial hints (which don't have a "currently used
format") and the subsequent hints?

> > I expect clients to bind to this interface and create a surface hints object
> > before the surface is mapped. In this case there's no "currently used 
> > format".
>
> Right, that's another use case.
>
> > It will be a fair amount of data, yes. However it's just a list of integers.
> > When we send strings over the protocol (e.g. toplevel title in xdg-shell) 
> > it's
> > about the same amount of data I guess.
>
> If the EGLConfig or GLXFBConfig or GLX visual lists are of any
> indication... yes, they account for depth, stencil, aux, etc. but then
> we will have modifiers.
>
> We already advertise the list of everything supported of format+modifer
> in the linux_dmabuf extension. Could we somehow minimize the number of
> recommended format+modifiers in hints? Or maybe that's not a concern
> for the protocol spec?

I'm not sure.

After this patch, I'm not even sure how the formats+modifiers advertised by the
global work. Are these formats+modifiers supported on the GPU the compositor
uses for rendering? Intersection or union of formats+modifiers supported on all
GPUs?

> > > > For a simple 'GPU composition or scanout' case, this would only be two
> > > > tranches, which are 'most optimal' and 'fallback'. For multiple GPUs
> > > > though, we could end up with three tranches: scanout-capable,
> > > > same-GPU-composition, or cross-GPU-composition. Similarly, if we take
> > > > media recording into account, we could end up with more than two
> > > > tranches.
> > > >
> > > > What do you think?
> > >
> > > At first I didn't understand this at all. I wonder if Simon is as
> > > puzzled as I was. :-)
> > >
> > > Is the idea of tranches such that within a tranch, a client will be able
> > > to pick a modifier that is optimal for its rendering? This would convey
> > > the knowledge that all modifiers withing a tranch are equally good
> > > for the compositor, so the client can pick what it can use the best.
> > >
> > > This is contrary to a flat preference list, where a client would pick
> > > the first modifier it can use, even if it is less optimal than a later
> > > modifer for its rendering while for compositor it would not make a
> > > difference.
> >
> > Yeah, that's what I've understood too.
> >
> > > I'm also not sure I understand your tranch categories. Are you thinking
> > > that, for instance, if a client uses same-GPU-composition modifers
> > > which exclude cross-GPU-composition that a compositor would start
> > > copy-converting buffers if the composition no longer happens on the
> > > same GPU, until the client adjusts to the new preference? That makes
> > > sense, if I guessed right what you meant.
> >
> > Right. I don't think we can do any better.
> >
> > > I'm wondering how the requirement "a compositor must always be able to
> > > consume the buffer regardless of where it will be shown" is accounted
> > > for here. Do we need a reminder about that in the spec?
> >
> > A reminder might be a good idea. The whole surface hints are just hints. The
> > client can choose to use another device or another format, and in the worst 
> > case
> > it'll just be more work and more copies on the compositor side.
>
> Yeah. What I precisely mean is that even if a client chooses a
> recommended format+modifier, the compositor will not be exempt from the
> requirement that it must work always. I.e. a compositor cannot
> advertise a format+modifier that would work only for scanout but not
> for fallback composition, even if the surface is on scanout right now.

Yeah, this makes sense.
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-05 Thread Pekka Paalanen
On Fri, 02 Nov 2018 18:38:10 +
Simon Ser  wrote:

> On Friday, November 2, 2018 9:53 AM, Pekka Paalanen  
> wrote:
> > > I think we want another event here, to group sets of modifiers
> > > together by preference.
> > >
> > > For example, say the surface could be directly scanned out, but only
> > > if it uses the linear or X-tiled modifiers. Our surface-preferred
> > > modifiers would be LINEAR + X_TILED. However, the client may not be
> > > able to produce that combination. If the GPU still supports Y_TILED,  
> >
> > Combination? I thought modifiers are never combined with other
> > modifiers?  
> 
> I think Daniel refers to the format + modifier combination. Yes, modifiers
> cannot be mixed with each other.
> 
> > > then we want to indicate that the client _can_ use Y_TILED if it needs
> > > to, but _should_ use LINEAR or X_TILED.
> > >
> > > DRI3 implements this by sending sets of modifiers in 'tranches', which
> > > are arrays of arrays, which in this case would be:
> > > tranches = {
> > >   [0 /* optimal */] = {
> > > { .format = XRGB, .modifier = LINEAR }
> > > { .format = XRGB, .modifier = X_TILED }
> > >   },
> > >   [1 /* less optimal */] = {
> > > { .format = XRGB, .modifier = Y_TILED }
> > >   }
> > > }
> > >
> > > I imagine the best way to do it with Wayland events would be to add a
> > > 'marker' event to indicate the border between these tranches. So we
> > > would send:
> > >   modifier(XRGB, LINEAR)
> > >   modifier(XRGB, X_TILED)
> > >   barrier()
> > >   modifier(XRGB, Y_TILED)
> > >   barrier()
> > >   done()  
> >
> > Yeah. Another option is to send a wl_array of modifiers per format and
> > tranch.  
> 
> True. Any reason why this hasn't been done in the global?

For formats? Well, it is simpler without a wl_array, and there might be
a lot of formats.

Could there be a lot of modifiers per format? Would a wl_array make
anything easier? Just a thought.

> 
> > I suppose it will be enough to send tranches for just the currently
> > used format? Otherwise it could be "a lot" of data.  
> 
> What do you mean by "the currently used format"?

This interface is used to send clients hints after they are already
presenting, which means they already have a format chosen and probably
want to stick with it, just changing the modifiers to be more optimal.

> I expect clients to bind to this interface and create a surface hints object
> before the surface is mapped. In this case there's no "currently used format".

Right, that's another use case.

> It will be a fair amount of data, yes. However it's just a list of integers.
> When we send strings over the protocol (e.g. toplevel title in xdg-shell) it's
> about the same amount of data I guess.

If the EGLConfig or GLXFBConfig or GLX visual lists are of any
indication... yes, they account for depth, stencil, aux, etc. but then
we will have modifiers.

We already advertise the list of everything supported of format+modifer
in the linux_dmabuf extension. Could we somehow minimize the number of
recommended format+modifiers in hints? Or maybe that's not a concern
for the protocol spec?

> > >
> > > For a simple 'GPU composition or scanout' case, this would only be two
> > > tranches, which are 'most optimal' and 'fallback'. For multiple GPUs
> > > though, we could end up with three tranches: scanout-capable,
> > > same-GPU-composition, or cross-GPU-composition. Similarly, if we take
> > > media recording into account, we could end up with more than two
> > > tranches.
> > >
> > > What do you think?  
> >
> > At first I didn't understand this at all. I wonder if Simon is as
> > puzzled as I was. :-)
> >
> > Is the idea of tranches such that within a tranch, a client will be able
> > to pick a modifier that is optimal for its rendering? This would convey
> > the knowledge that all modifiers withing a tranch are equally good
> > for the compositor, so the client can pick what it can use the best.
> >
> > This is contrary to a flat preference list, where a client would pick
> > the first modifier it can use, even if it is less optimal than a later
> > modifer for its rendering while for compositor it would not make a
> > difference.  
> 
> Yeah, that's what I've understood too.
> 
> > I'm also not sure I understand your tranch categories. Are you thinking
> > that, for instance, if a client uses same-GPU-composition modifers
> > which exclude cross-GPU-composition that a compositor would start
> > copy-converting buffers if the composition no longer happens on the
> > same GPU, until the client adjusts to the new preference? That makes
> > sense, if I guessed right what you meant.  
> 
> Right. I don't think we can do any better.
> 
> > I'm wondering how the requirement "a compositor must always be able to
> > consume the buffer regardless of where it will be shown" is accounted
> > for here. Do we need a reminder about that in the spec?  
> 
> A reminder might be a good idea. The whole surface hints are just 

Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-02 Thread Simon Ser
On Friday, November 2, 2018 12:30 PM, Philipp Zabel  
wrote:
> > > +
> > > +  
> > > +This event advertizes the primary device that the server 
> > > prefers. There
> > > +is exactly one primary device.
>
> Which device should this be if the scanout engine is separate from the
> render engine (e.g. IPU/imx-drm and GPU/etnaviv on i.MX6)

When the surface hints are created, I expect the compositor to send the device
it uses for compositing as the primary device (assuming it's using only one
device).

When the surface becomes fullscreen on a different GPU (meaning it becomes
fullscreen on an output which is managed by another GPU), I'd expect the
compositor to change the primary device for this surface to this other GPU.

If the compositor uses multiple devices for compositing, it'll probably switch
the primary device when the surface is moved from one GPU to the other.

I'm not sure how i.MX6 works, but: even if the same GPU is used for compositing
and scanout, but the compositing preferred formats are different from the
scanout preferred formats, the compositor can update the preferred format
without changing the preferred device.

Is there an issue with this? Maybe something should be added to the protocol to
explain it better?

> What about contiguous vs non-contiguous memory?
>
> On i.MX6QP (Vivante GC3000) we would probably want the client to always
> render DRM_FORMAT_MOD_VIVANTE_SUPER_TILED, because this can be directly
> read by both texture samplers (non-contiguous) and scanout (must be
> contiguous).
>
> On i.MX6Q (Vivante GC2000) we always want to use the most efficient
> DRM_FORMAT_MOD_VIVANTE_SPLIT_SUPER_TILED, because neither of the
> supported render formats can be sampled or scanned out directly.
> Since the compositor has to resolve into DRM_FORMAT_MOD_VIVANTE_TILED
> (non-contiguous) for texture sampling or DRM_FORMAT_MOD_LINEAR
> (contiguous) for scanout, the client buffers can always be non-
> contiguous.
>
> On i.MX6S (Vivante GC880) the optimal render format for texture sampling
> would be DRM_FORMAT_MOD_VIVANTE_TILED (non-contiguous) and for scanout
> DRM_FORMAT_MOD_VIVANTE_SUPER_TILED (non-contiguous) which would be
> resolved into DRM_FORMAT_MOD_LINEAR (contiguous) by the compositor.

I think all of this works with Daniel's design.

> All three could always handle DRM_FORMAT_MOD_LINEAR (contiguous) client
> buffers for scanout directly, but those would be suboptimal if the
> compositor decides to render on short notice, because the client would
> have already resolved into linear and then the compositor would have to
> resolve back into a texture sampler tiling format.

Is the concern here that switching between scanout and compositing is
non-optimal until the client chooses the preferred format?

___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-02 Thread Simon Ser
On Friday, November 2, 2018 9:53 AM, Pekka Paalanen  wrote:
> > I think we want another event here, to group sets of modifiers
> > together by preference.
> >
> > For example, say the surface could be directly scanned out, but only
> > if it uses the linear or X-tiled modifiers. Our surface-preferred
> > modifiers would be LINEAR + X_TILED. However, the client may not be
> > able to produce that combination. If the GPU still supports Y_TILED,
>
> Combination? I thought modifiers are never combined with other
> modifiers?

I think Daniel refers to the format + modifier combination. Yes, modifiers
cannot be mixed with each other.

> > then we want to indicate that the client _can_ use Y_TILED if it needs
> > to, but _should_ use LINEAR or X_TILED.
> >
> > DRI3 implements this by sending sets of modifiers in 'tranches', which
> > are arrays of arrays, which in this case would be:
> > tranches = {
> >   [0 /* optimal */] = {
> > { .format = XRGB, .modifier = LINEAR }
> > { .format = XRGB, .modifier = X_TILED }
> >   },
> >   [1 /* less optimal */] = {
> > { .format = XRGB, .modifier = Y_TILED }
> >   }
> > }
> >
> > I imagine the best way to do it with Wayland events would be to add a
> > 'marker' event to indicate the border between these tranches. So we
> > would send:
> >   modifier(XRGB, LINEAR)
> >   modifier(XRGB, X_TILED)
> >   barrier()
> >   modifier(XRGB, Y_TILED)
> >   barrier()
> >   done()
>
> Yeah. Another option is to send a wl_array of modifiers per format and
> tranch.

True. Any reason why this hasn't been done in the global?

> I suppose it will be enough to send tranches for just the currently
> used format? Otherwise it could be "a lot" of data.

What do you mean by "the currently used format"?

I expect clients to bind to this interface and create a surface hints object
before the surface is mapped. In this case there's no "currently used format".

It will be a fair amount of data, yes. However it's just a list of integers.
When we send strings over the protocol (e.g. toplevel title in xdg-shell) it's
about the same amount of data I guess.

> >
> > For a simple 'GPU composition or scanout' case, this would only be two
> > tranches, which are 'most optimal' and 'fallback'. For multiple GPUs
> > though, we could end up with three tranches: scanout-capable,
> > same-GPU-composition, or cross-GPU-composition. Similarly, if we take
> > media recording into account, we could end up with more than two
> > tranches.
> >
> > What do you think?
>
> At first I didn't understand this at all. I wonder if Simon is as
> puzzled as I was. :-)
>
> Is the idea of tranches such that within a tranch, a client will be able
> to pick a modifier that is optimal for its rendering? This would convey
> the knowledge that all modifiers withing a tranch are equally good
> for the compositor, so the client can pick what it can use the best.
>
> This is contrary to a flat preference list, where a client would pick
> the first modifier it can use, even if it is less optimal than a later
> modifer for its rendering while for compositor it would not make a
> difference.

Yeah, that's what I've understood too.

> I'm also not sure I understand your tranch categories. Are you thinking
> that, for instance, if a client uses same-GPU-composition modifers
> which exclude cross-GPU-composition that a compositor would start
> copy-converting buffers if the composition no longer happens on the
> same GPU, until the client adjusts to the new preference? That makes
> sense, if I guessed right what you meant.

Right. I don't think we can do any better.

> I'm wondering how the requirement "a compositor must always be able to
> consume the buffer regardless of where it will be shown" is accounted
> for here. Do we need a reminder about that in the spec?

A reminder might be a good idea. The whole surface hints are just hints. The
client can choose to use another device or another format, and in the worst case
it'll just be more work and more copies on the compositor side.
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-02 Thread Philipp Zabel
On Thu, 2018-11-01 at 17:04 +, Daniel Stone wrote:
> Hi Simon,
> Thanks a lot for taking this on! :)
> 
> On Thu, 1 Nov 2018 at 16:45, Simon Ser  wrote:
> > This commit introduces a new wp_linux_dmabuf_device_hints object. This 
> > object
> > advertizes a preferred device via a file descriptor and a set of preferred
> > formats/modifiers.
> 
> s/advertizes/advertises/g (including in the XML doc)
> 
> I also think this would be better called
> wp_linux_dmabuf_surface_hints, since the change over the dmabuf
> protocol is that it's surface-specific.
> 
> > +  
> > +
> > +  This object advertizes dmabuf hints for a surface. Such hints 
> > include the
> 
> *advertises
> 
> > +
> > +  
> > +This event advertizes the primary device that the server prefers. 
> > There
> > +is exactly one primary device.

Which device should this be if the scanout engine is separate from the
render engine (e.g. IPU/imx-drm and GPU/etnaviv on i.MX6)

[...]
> > +
> > +  
> > +This event advertises the formats that the server prefers, along 
> > with
> > +the modifiers preferred for each format.
> > +
> > +For the definition of the format and modifier codes, see the
> > +wp_linux_buffer_params::create request.
> > +  
> > +  
> > +   > +   summary="high 32 bits of layout modifier"/>
> > +   > +   summary="low 32 bits of layout modifier"/>
> > +
> 
> I think we want another event here, to group sets of modifiers
> together by preference.
> 
> For example, say the surface could be directly scanned out, but only
> if it uses the linear or X-tiled modifiers. Our surface-preferred
> modifiers would be LINEAR + X_TILED. However, the client may not be
> able to produce that combination. If the GPU still supports Y_TILED,
> then we want to indicate that the client _can_ use Y_TILED if it needs
> to, but _should_ use LINEAR or X_TILED.
> 
> DRI3 implements this by sending sets of modifiers in 'tranches', which
> are arrays of arrays, which in this case would be:
> tranches = {
>   [0 /* optimal */] = {
> { .format = XRGB, .modifier = LINEAR }
> { .format = XRGB, .modifier = X_TILED }
>   },
>   [1 /* less optimal */] = {
> { .format = XRGB, .modifier = Y_TILED }
>   }
> }
> 
> I imagine the best way to do it with Wayland events would be to add a
> 'marker' event to indicate the border between these tranches. So we
> would send:
>   modifier(XRGB, LINEAR)
>   modifier(XRGB, X_TILED)
>   barrier()
>   modifier(XRGB, Y_TILED)
>   barrier()
>   done()
> 
> For a simple 'GPU composition or scanout' case, this would only be two
> tranches, which are 'most optimal' and 'fallback'. For multiple GPUs
> though, we could end up with three tranches: scanout-capable,
> same-GPU-composition, or cross-GPU-composition. Similarly, if we take
> media recording into account, we could end up with more than two
> tranches.
> 
> What do you think?

What about contiguous vs non-contiguous memory?

On i.MX6QP (Vivante GC3000) we would probably want the client to always
render DRM_FORMAT_MOD_VIVANTE_SUPER_TILED, because this can be directly
read by both texture samplers (non-contiguous) and scanout (must be
contiguous).

On i.MX6Q (Vivante GC2000) we always want to use the most efficient 
DRM_FORMAT_MOD_VIVANTE_SPLIT_SUPER_TILED, because neither of the
supported render formats can be sampled or scanned out directly.
Since the compositor has to resolve into DRM_FORMAT_MOD_VIVANTE_TILED
(non-contiguous) for texture sampling or DRM_FORMAT_MOD_LINEAR
(contiguous) for scanout, the client buffers can always be non-
contiguous.

On i.MX6S (Vivante GC880) the optimal render format for texture sampling
would be DRM_FORMAT_MOD_VIVANTE_TILED (non-contiguous) and for scanout
DRM_FORMAT_MOD_VIVANTE_SUPER_TILED (non-contiguous) which would be
resolved into DRM_FORMAT_MOD_LINEAR (contiguous) by the compositor.

All three could always handle DRM_FORMAT_MOD_LINEAR (contiguous) client
buffers for scanout directly, but those would be suboptimal if the
compositor decides to render on short notice, because the client would
have already resolved into linear and then the compositor would have to
resolve back into a texture sampler tiling format.

regards
Philipp
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-02 Thread Simon Ser
Hi,

Thanks for your review!

> On Thu, 1 Nov 2018 at 16:45, Simon Ser  wrote:
> > This commit introduces a new wp_linux_dmabuf_device_hints object. This 
> > object
> > advertizes a preferred device via a file descriptor and a set of preferred
> > formats/modifiers.
>
> s/advertizes/advertises/g (including in the XML doc)

Ah, it seems that for once British English and American English agree on the
spelling. Noted!

> I also think this would be better called
> wp_linux_dmabuf_surface_hints, since the change over the dmabuf
> protocol is that it's surface-specific.

Right. The intent was to be able to re-use this object for hints not bound to
surfaces in the future. But better not to try to think of all possible
extensions (which will probably have different requirements).

Updated to use wp_linux_dmabuf_surface_hints.

> > +
> > +  
> > +This event advertizes the primary device that the server prefers. 
> > There
> > +is exactly one primary device.
> > +  
> > +  
> > +
>
> I _think_ this might want to refer to separate objects.
>
> When we receive an FD from the server, we don't know what device it
> refers to, so we have to open the device to probe it. Opening the
> device can be slow: if a device is in a low PCI power state, it can be
> a couple of seconds to physically power up the device and then wait
> for it to initialise before we can interrogate it.
>
> One way around this would be to have a separate wp_linux_dmabuf_device
> object, lazily sent as a new object in an event by the root
> wp_linux_dmabuf object, with the per-surface hints then referring to a
> previously-sent device. This would allow clients to only probe each
> device once per EGLDisplay, rather than once per EGLSurface.

I see. One other way to fix this issue would be to keep the protocol as-is but
to make the client use stat(3p) to check if it doesn't already know about the
device. Per the POSIX spec [1]:

> The st_ino and st_dev fields taken together uniquely identify the file within
> the system.

This would remove the overhead and complexity of server-allocated objects, which
are hard to teardown.

But I'm maybe missing some use-cases here?

[1]: http://pubs.opengroup.org/onlinepubs/009696699/basedefs/sys/stat.h.html

> > +
> > +  
> > +This event advertises the formats that the server prefers, along 
> > with
> > +the modifiers preferred for each format.
> > +
> > +For the definition of the format and modifier codes, see the
> > +wp_linux_buffer_params::create request.
> > +  
> > +  
> > +   > +   summary="high 32 bits of layout modifier"/>
> > +   > +   summary="low 32 bits of layout modifier"/>
> > +
>
> I think we want another event here, to group sets of modifiers
> together by preference.
>
> For example, say the surface could be directly scanned out, but only
> if it uses the linear or X-tiled modifiers. Our surface-preferred
> modifiers would be LINEAR + X_TILED. However, the client may not be
> able to produce that combination. If the GPU still supports Y_TILED,
> then we want to indicate that the client _can_ use Y_TILED if it needs
> to, but _should_ use LINEAR or X_TILED.
>
> DRI3 implements this by sending sets of modifiers in 'tranches', which
> are arrays of arrays, which in this case would be:
> tranches = {
>   [0 /* optimal */] = {
> { .format = XRGB, .modifier = LINEAR }
> { .format = XRGB, .modifier = X_TILED }
>   },
>   [1 /* less optimal */] = {
> { .format = XRGB, .modifier = Y_TILED }
>   }
> }
>
> I imagine the best way to do it with Wayland events would be to add a
> 'marker' event to indicate the border between these tranches. So we
> would send:
>   modifier(XRGB, LINEAR)
>   modifier(XRGB, X_TILED)
>   barrier()
>   modifier(XRGB, Y_TILED)
>   barrier()
>   done()
>
> For a simple 'GPU composition or scanout' case, this would only be two
> tranches, which are 'most optimal' and 'fallback'. For multiple GPUs
> though, we could end up with three tranches: scanout-capable,
> same-GPU-composition, or cross-GPU-composition. Similarly, if we take
> media recording into account, we could end up with more than two
> tranches.
>
> What do you think?

This seems like a good idea. Other solutions include having an enum for tranches
(preferred, fallback, etc) but that restricts the number of tranches. Using
tranche indexes makes the protocol more complicated. So your idea LGTM.

I'll also change the wording from "preferred" to "supported in order of
preference".

I have another question: what if the compositor doesn't know about the preferred
device? For instance if it's running nested in another Wayland compositor that
doesn't support this new protocol version. Maybe we should make all events
optional to let the compositor say "I have no idea"?
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org

Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-02 Thread Pekka Paalanen
On Thu, 1 Nov 2018 17:04:51 +
Daniel Stone  wrote:

> Hi Simon,
> Thanks a lot for taking this on! :)
> 
> On Thu, 1 Nov 2018 at 16:45, Simon Ser  wrote:
> > This commit introduces a new wp_linux_dmabuf_device_hints object. This 
> > object
> > advertizes a preferred device via a file descriptor and a set of preferred
> > formats/modifiers.  
> 
> s/advertizes/advertises/g (including in the XML doc)
> 
> I also think this would be better called
> wp_linux_dmabuf_surface_hints, since the change over the dmabuf
> protocol is that it's surface-specific.
> 
> > +  
> > +
> > +  This object advertizes dmabuf hints for a surface. Such hints 
> > include the  
> 
> *advertises
> 
> > +
> > +  
> > +This event advertizes the primary device that the server prefers. 
> > There
> > +is exactly one primary device.
> > +  
> > +  
> > +  
> 
> I _think_ this might want to refer to separate objects.
> 
> When we receive an FD from the server, we don't know what device it
> refers to, so we have to open the device to probe it. Opening the
> device can be slow: if a device is in a low PCI power state, it can be
> a couple of seconds to physically power up the device and then wait
> for it to initialise before we can interrogate it.

Hi,

wouldn't drmGetDevice2() with flags=0 gets us everything needed without
waking up a sleeping PCI device?

I just read it from Emil:
https://lists.freedesktop.org/archives/mesa-dev/2018-October/207447.html

> 
> One way around this would be to have a separate wp_linux_dmabuf_device
> object, lazily sent as a new object in an event by the root
> wp_linux_dmabuf object, with the per-surface hints then referring to a
> previously-sent device. This would allow clients to only probe each
> device once per EGLDisplay, rather than once per EGLSurface.

This optimization does sound attractive to me in any case.

> 
> > +
> > +  
> > +This event advertises the formats that the server prefers, along 
> > with
> > +the modifiers preferred for each format.
> > +
> > +For the definition of the format and modifier codes, see the
> > +wp_linux_buffer_params::create request.
> > +  
> > +  
> > +   > +   summary="high 32 bits of layout modifier"/>
> > +   > +   summary="low 32 bits of layout modifier"/>
> > +  
> 
> I think we want another event here, to group sets of modifiers
> together by preference.
> 
> For example, say the surface could be directly scanned out, but only
> if it uses the linear or X-tiled modifiers. Our surface-preferred
> modifiers would be LINEAR + X_TILED. However, the client may not be
> able to produce that combination. If the GPU still supports Y_TILED,

Combination? I thought modifiers are never combined with other
modifiers?

> then we want to indicate that the client _can_ use Y_TILED if it needs
> to, but _should_ use LINEAR or X_TILED.
> 
> DRI3 implements this by sending sets of modifiers in 'tranches', which
> are arrays of arrays, which in this case would be:
> tranches = {
>   [0 /* optimal */] = {
> { .format = XRGB, .modifier = LINEAR }
> { .format = XRGB, .modifier = X_TILED }
>   },
>   [1 /* less optimal */] = {
> { .format = XRGB, .modifier = Y_TILED }
>   }
> }
> 
> I imagine the best way to do it with Wayland events would be to add a
> 'marker' event to indicate the border between these tranches. So we
> would send:
>   modifier(XRGB, LINEAR)
>   modifier(XRGB, X_TILED)
>   barrier()
>   modifier(XRGB, Y_TILED)
>   barrier()
>   done()

Yeah. Another option is to send a wl_array of modifiers per format and
tranch.

I suppose it will be enough to send tranches for just the currently
used format? Otherwise it could be "a lot" of data.

> 
> For a simple 'GPU composition or scanout' case, this would only be two
> tranches, which are 'most optimal' and 'fallback'. For multiple GPUs
> though, we could end up with three tranches: scanout-capable,
> same-GPU-composition, or cross-GPU-composition. Similarly, if we take
> media recording into account, we could end up with more than two
> tranches.
> 
> What do you think?

At first I didn't understand this at all. I wonder if Simon is as
puzzled as I was. :-)

Is the idea of tranches such that within a tranch, a client will be able
to pick a modifier that is optimal for its rendering? This would convey
the knowledge that all modifiers withing a tranch are equally good
for the compositor, so the client can pick what it can use the best.

This is contrary to a flat preference list, where a client would pick
the first modifier it can use, even if it is less optimal than a later
modifer for its rendering while for compositor it would not make a
difference.

I'm also not sure I understand your tranch categories. Are you thinking
that, for instance, if a client uses same-GPU-composition modifers
which exclude cross-GPU-composition that a compositor would start

Re: [PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

2018-11-01 Thread Daniel Stone
Hi Simon,
Thanks a lot for taking this on! :)

On Thu, 1 Nov 2018 at 16:45, Simon Ser  wrote:
> This commit introduces a new wp_linux_dmabuf_device_hints object. This object
> advertizes a preferred device via a file descriptor and a set of preferred
> formats/modifiers.

s/advertizes/advertises/g (including in the XML doc)

I also think this would be better called
wp_linux_dmabuf_surface_hints, since the change over the dmabuf
protocol is that it's surface-specific.

> +  
> +
> +  This object advertizes dmabuf hints for a surface. Such hints include 
> the

*advertises

> +
> +  
> +This event advertizes the primary device that the server prefers. 
> There
> +is exactly one primary device.
> +  
> +  
> +

I _think_ this might want to refer to separate objects.

When we receive an FD from the server, we don't know what device it
refers to, so we have to open the device to probe it. Opening the
device can be slow: if a device is in a low PCI power state, it can be
a couple of seconds to physically power up the device and then wait
for it to initialise before we can interrogate it.

One way around this would be to have a separate wp_linux_dmabuf_device
object, lazily sent as a new object in an event by the root
wp_linux_dmabuf object, with the per-surface hints then referring to a
previously-sent device. This would allow clients to only probe each
device once per EGLDisplay, rather than once per EGLSurface.

> +
> +  
> +This event advertises the formats that the server prefers, along with
> +the modifiers preferred for each format.
> +
> +For the definition of the format and modifier codes, see the
> +wp_linux_buffer_params::create request.
> +  
> +  
> +   +   summary="high 32 bits of layout modifier"/>
> +   +   summary="low 32 bits of layout modifier"/>
> +

I think we want another event here, to group sets of modifiers
together by preference.

For example, say the surface could be directly scanned out, but only
if it uses the linear or X-tiled modifiers. Our surface-preferred
modifiers would be LINEAR + X_TILED. However, the client may not be
able to produce that combination. If the GPU still supports Y_TILED,
then we want to indicate that the client _can_ use Y_TILED if it needs
to, but _should_ use LINEAR or X_TILED.

DRI3 implements this by sending sets of modifiers in 'tranches', which
are arrays of arrays, which in this case would be:
tranches = {
  [0 /* optimal */] = {
{ .format = XRGB, .modifier = LINEAR }
{ .format = XRGB, .modifier = X_TILED }
  },
  [1 /* less optimal */] = {
{ .format = XRGB, .modifier = Y_TILED }
  }
}

I imagine the best way to do it with Wayland events would be to add a
'marker' event to indicate the border between these tranches. So we
would send:
  modifier(XRGB, LINEAR)
  modifier(XRGB, X_TILED)
  barrier()
  modifier(XRGB, Y_TILED)
  barrier()
  done()

For a simple 'GPU composition or scanout' case, this would only be two
tranches, which are 'most optimal' and 'fallback'. For multiple GPUs
though, we could end up with three tranches: scanout-capable,
same-GPU-composition, or cross-GPU-composition. Similarly, if we take
media recording into account, we could end up with more than two
tranches.

What do you think?

Cheers,
Daniel
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel