subject:"Re\: \[Mesa\-dev\] Plumbing explicit synchronization through the Linux ecosystem"

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Adam Jackson

On Tue, 2020-03-17 at 10:12 -0700, Jacob Lifshay wrote:
> One related issue with explicit sync using sync_file is that combined
> CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> rendering in userspace (like llvmpipe but for Vulkan and with extra
> instructions for GPU tasks) but need to synchronize with other
> drivers/processes is that there should be some way to create an
> explicit fence/semaphore from userspace and later signal it. This
> seems to conflict with the requirement for a sync_file to complete in
> finite time, since the user process could be stopped or killed.

DRI3 (okay, libxshmfence specifically) uses futexes for this. Would
that work for you? IIRC the semantics there are that if the process
dies the futex is treated as triggered, which seems like the only
sensible thing to do.

- ajax

___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Daniel Vetter

On Wed, Mar 18, 2020 at 11:05:48AM +0100, Michel Dänzer wrote:
> On 2020-03-17 6:21 p.m., Lucas Stach wrote:
> > That's one of the issues with implicit sync that explicit may solve: 
> > a single client taking way too much time to render something can 
> > block the whole pipeline up until the display flip. With explicit 
> > sync the compositor can just decide to use the last client buffer if 
> > the latest buffer isn't ready by some deadline.
> 
> FWIW, the compositor can do this with implicit sync as well, by polling
> a dma-buf fd for the buffer. (Currently, it has to poll for writable,
> because waiting for the exclusive fence only isn't enough with amdgpu)

Would be great if we don't have to make this recommended uapi, just
because amdgpu leaks it's trickery into the wider world. Polling for read
really should be enough (and I guess Christian gets to fix up amdgpu more,
at least for anything that has a dma-buf attached even if it's not shared
with anything !amdgpu.ko).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Daniel Vetter

On Tue, Mar 17, 2020 at 12:18:47PM -0500, Jason Ekstrand wrote:
> On Tue, Mar 17, 2020 at 12:13 PM Jacob Lifshay  
> wrote:
> >
> > One related issue with explicit sync using sync_file is that combined
> > CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> > rendering in userspace (like llvmpipe but for Vulkan and with extra
> > instructions for GPU tasks) but need to synchronize with other
> > drivers/processes is that there should be some way to create an
> > explicit fence/semaphore from userspace and later signal it. This
> > seems to conflict with the requirement for a sync_file to complete in
> > finite time, since the user process could be stopped or killed.
> 
> Yeah... That's going to be a problem.  The only way I could see that
> working is if you created a sync_file that had a timeout associated
> with it.  However, then you run into the issue where you may have
> corruption if stuff doesn't complete on time.  Then again, you're not
> really dealing with an external unit and so the latency cost of going
> across the window system protocol probably isn't massively different
> from the latency cost of triggering the sync_file.  Maybe the answer
> there is to just do everything in-order and not worry about
> synchronization?

vgem does that already (fences with timeout). The corruption issue is also
not new, if your shaders take forever real gpus will nick your rendering
with a quick reset. Iirc someone (from cros google team maybe) was even
looking into making llvmpipe run on top of vgem as a real dri/drm mesa
driver.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Lucas Stach

Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> On Tue, Mar 17, 2020 at 10:21 AM Lucas Stach  wrote:
> > Am Dienstag, den 17.03.2020, 10:12 -0700 schrieb Jacob Lifshay:
> > > One related issue with explicit sync using sync_file is that combined
> > > CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> > > rendering in userspace (like llvmpipe but for Vulkan and with extra
> > > instructions for GPU tasks) but need to synchronize with other
> > > drivers/processes is that there should be some way to create an
> > > explicit fence/semaphore from userspace and later signal it. This
> > > seems to conflict with the requirement for a sync_file to complete in
> > > finite time, since the user process could be stopped or killed.
> > > 
> > > Any ideas?
> > 
> > Finite just means "not infinite". If you stop the process that's doing
> > part of the pipeline processing you block the pipeline, you get to keep
> > the pieces in that case.
> 
> Seems reasonable.
> 
> > That's one of the issues with implicit sync
> > that explicit may solve: a single client taking way too much time to
> > render something can block the whole pipeline up until the display
> > flip. With explicit sync the compositor can just decide to use the last
> > client buffer if the latest buffer isn't ready by some deadline.
> > 
> > With regard to the process getting killed: whatever you sync primitive
> > is, you need to make sure to signal the fence (possibly with an error
> > condition set) when you are not going to make progress anymore. So
> > whatever your means to creating the sync_fd from your software renderer
> > is, it needs to signal any outstanding fences on the sync_fd when the
> > fd is closed.
> 
> I think I found a userspace-accessible way to create sync_files and
> dma_fences that would fulfill the requirements:
> https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> 
> I'm just not sure if that's a good interface to use, since it appears
> to be designed only for debugging. Will have to check for additional
> requirements of signalling an error when the process that created the
> fence is killed.

Something like that can certainly be lifted for general use if it makes
sense. But then with a software renderer I don't really see how fences
help you at all. With a software renderer you know exactly when the
frame is finished and you can just defer pushing it over to the next
pipeline element until that time. You won't gain any parallelism by
using fences as the CPU is busy doing the rendering and will not run
other stuff concurrently, right?

Regards,
Lucas

___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Lucas Stach

Am Dienstag, den 17.03.2020, 10:12 -0700 schrieb Jacob Lifshay:
> One related issue with explicit sync using sync_file is that combined
> CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> rendering in userspace (like llvmpipe but for Vulkan and with extra
> instructions for GPU tasks) but need to synchronize with other
> drivers/processes is that there should be some way to create an
> explicit fence/semaphore from userspace and later signal it. This
> seems to conflict with the requirement for a sync_file to complete in
> finite time, since the user process could be stopped or killed.
> 
> Any ideas?

Finite just means "not infinite". If you stop the process that's doing
part of the pipeline processing you block the pipeline, you get to keep
the pieces in that case. That's one of the issues with implicit sync
that explicit may solve: a single client taking way too much time to
render something can block the whole pipeline up until the display
flip. With explicit sync the compositor can just decide to use the last
client buffer if the latest buffer isn't ready by some deadline.

With regard to the process getting killed: whatever you sync primitive
is, you need to make sure to signal the fence (possibly with an error
condition set) when you are not going to make progress anymore. So
whatever your means to creating the sync_fd from your software renderer
is, it needs to signal any outstanding fences on the sync_fd when the
fd is closed.

Regards,
Lucas

___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Lucas Stach

Am Dienstag, den 17.03.2020, 11:33 -0400 schrieb Nicolas Dufresne:
> Le lundi 16 mars 2020 à 23:15 +0200, Laurent Pinchart a écrit :
> > Hi Jason,
> > 
> > On Mon, Mar 16, 2020 at 10:06:07AM -0500, Jason Ekstrand wrote:
> > > On Mon, Mar 16, 2020 at 5:20 AM Laurent Pinchart wrote:
> > > > On Wed, Mar 11, 2020 at 04:18:55PM -0400, Nicolas Dufresne wrote:
> > > > > (I know I'm going to be spammed by so many mailing list ...)
> > > > > 
> > > > > Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> > > > > > On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand 
> > > > > >  wrote:
> > > > > > > All,
> > > > > > > 
> > > > > > > Sorry for casting such a broad net with this one. I'm sure most 
> > > > > > > people
> > > > > > > who reply will get at least one mailing list rejection.  However, 
> > > > > > > this
> > > > > > > is an issue that affects a LOT of components and that's why it's
> > > > > > > thorny to begin with.  Please pardon the length of this e-mail as
> > > > > > > well; I promise there's a concrete point/proposal at the end.
> > > > > > > 
> > > > > > > 
> > > > > > > Explicit synchronization is the future of graphics and media.  At
> > > > > > > least, that seems to be the consensus among all the graphics 
> > > > > > > people
> > > > > > > I've talked to.  I had a chat with one of the lead Android 
> > > > > > > graphics
> > > > > > > engineers recently who told me that doing explicit sync from the 
> > > > > > > start
> > > > > > > was one of the best engineering decisions Android ever made.  It's
> > > > > > > also the direction being taken by more modern APIs such as Vulkan.
> > > > > > > 
> > > > > > > 
> > > > > > > ## What are implicit and explicit synchronization?
> > > > > > > 
> > > > > > > For those that aren't familiar with this space, GPUs, media 
> > > > > > > encoders,
> > > > > > > etc. are massively parallel and synchronization of some form is
> > > > > > > required to ensure that everything happens in the right order and
> > > > > > > avoid data races.  Implicit synchronization is when bits of work 
> > > > > > > (3D,
> > > > > > > compute, video encode, etc.) are implicitly based on the absolute
> > > > > > > CPU-time order in which API calls occur.  Explicit 
> > > > > > > synchronization is
> > > > > > > when the client (whatever that means in any given context) 
> > > > > > > provides
> > > > > > > the dependency graph explicitly via some sort of synchronization
> > > > > > > primitives.  If you're still confused, consider the following
> > > > > > > examples:
> > > > > > > 
> > > > > > > With OpenGL and EGL, almost everything is implicit sync.  Say you 
> > > > > > > have
> > > > > > > two OpenGL contexts sharing an image where one writes to it and 
> > > > > > > the
> > > > > > > other textures from it.  The way the OpenGL spec works, the 
> > > > > > > client has
> > > > > > > to make the API calls to render to the image before (in CPU time) 
> > > > > > > it
> > > > > > > makes the API calls which texture from the image.  As long as it 
> > > > > > > does
> > > > > > > this (and maybe inserts a glFlush?), the driver will ensure that 
> > > > > > > the
> > > > > > > rendering completes before the texturing happens and you get 
> > > > > > > correct
> > > > > > > contents.
> > > > > > > 
> > > > > > > Implicit synchronization can also happen across processes.  
> > > > > > > Wayland,
> > > > > > > for instance, is currently built on implicit sync where the client
> > > > > > > does their rendering and then does a hand-off (via 
> > > > > > > wl_surface::commit)
> > > > > > > to tell the compositor it's done at which point the compositor 
> > > > > > > can now
> > > > > > > texture from the surface.  The hand-off ensures that the client's
> > > > > > > OpenGL API calls happen before the server's OpenGL API calls.
> > > > > > > 
> > > > > > > A good example of explicit synchronization is the Vulkan API.  
> > > > > > > There,
> > > > > > > a client (or multiple clients) can simultaneously build command
> > > > > > > buffers in different threads where one of those command buffers
> > > > > > > renders to an image and the other textures from it and then submit
> > > > > > > both of them at the same time with instructions to the driver for
> > > > > > > which order to execute them in.  The execution order is described 
> > > > > > > via
> > > > > > > the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
> > > > > > > extension, you can even submit the work which does the texturing
> > > > > > > BEFORE the work which does the rendering and the driver will sort 
> > > > > > > it
> > > > > > > out.
> > > > > > > 
> > > > > > > The #1 problem with implicit synchronization (which explicit 
> > > > > > > solves)
> > > > > > > is that it leads to a lot of over-synchronization both in client 
> > > > > > > space
> > > > > > > and in driver/device space.  The client has to synchronize a lot 
> > > > > > > more
> > > > > > > because it has to ensure that the API

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Nicolas Dufresne

Le mercredi 18 mars 2020 à 11:05 +0100, Michel Dänzer a écrit :
> On 2020-03-17 6:21 p.m., Lucas Stach wrote:
> > That's one of the issues with implicit sync that explicit may solve: 
> > a single client taking way too much time to render something can 
> > block the whole pipeline up until the display flip. With explicit 
> > sync the compositor can just decide to use the last client buffer if 
> > the latest buffer isn't ready by some deadline.
> 
> FWIW, the compositor can do this with implicit sync as well, by polling
> a dma-buf fd for the buffer. (Currently, it has to poll for writable,
> because waiting for the exclusive fence only isn't enough with amdgpu)

That is very interesting, thanks for sharing, could allow fixing some
issues in userspace for backward compatibility.

thanks,
Nicolas

___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Michel Dänzer

On 2020-03-17 6:21 p.m., Lucas Stach wrote:
> That's one of the issues with implicit sync that explicit may solve: 
> a single client taking way too much time to render something can 
> block the whole pipeline up until the display flip. With explicit 
> sync the compositor can just decide to use the last client buffer if 
> the latest buffer isn't ready by some deadline.

FWIW, the compositor can do this with implicit sync as well, by polling
a dma-buf fd for the buffer. (Currently, it has to poll for writable,
because waiting for the exclusive fence only isn't enough with amdgpu)


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Jacob Lifshay

On Tue, Mar 17, 2020 at 11:35 PM Jason Ekstrand  wrote:
>
> On Wed, Mar 18, 2020 at 12:20 AM Jacob Lifshay  
> wrote:
> >
> > The main issue with doing everything immediately is that a lot of the
> > function calls that games expect to take a very short time (e.g.
> > vkQueueSubmit) would instead take a much longer time, potentially
> > causing problems.
>
> Do you have any evidence that it will cause problems?  What I said
> above is what switfshader is doing and they're running real apps and
> I've not heard of it causing any problems.  It's also worth noting
> that you would only really have to stall at sync_file export.  You can
> async as much as you want internally.

Ok, seems worth trying out.

> > One idea for a safe userspace-backed sync_file is to have a step
> > counter that counts down until the sync_file is ready, where if
> > userspace doesn't tell it to count any steps in a certain amount of
> > time, then the sync_file switches to the error state. This way, it
> > will error shortly after a process deadlocks for some reason, while
> > still having the finite-time guarantee.
> >
> > When the sync_file is created, the step counter would be set to the
> > number of jobs that the fence is waiting on.
> >
> > It can also be set to pause the timeout to wait until another
> > sync_file signals, to handle cases where a sync_file is waiting on a
> > userspace process that is waiting on another sync_file.
> >
> > The main issue is that the kernel would have to make sure that the
> > sync_file graph doesn't have loops, maybe by erroring all sync_files
> > that it finds in the loop.
> >
> > Does that sound like a good idea?
>
> Honestly, I don't think you'll ever be able to sell that to the kernel
> community.  All of the deadlock detection would add massive complexity
> to the already non-trivial dma_fence infrastructure and for what
> benefit?  So that a software rasterizer can try to pretend to be more
> like a GPU?  You're going to have some very serious perf numbers
> and/or other proof of necessity if you want to convince the kernel to
> people to accept that level of complexity/risk.  "I designed my
> software to work this way" isn't going to convince anyone of anything
> especially when literally every other software rasterizer I'm aware of
> is immediate and they work just fine.

After some further research, it turns out that it will work to have
all the sync_files that a sync_file needs to depend on specified at
creation, which forces the dependence graph to be a DAG since you
can't depend on a sync_file that isn't yet created, so loops are
impossible by design.

Since kernel deadlock detection isn't actually required, just timeouts
for the case of halted userspace, does this seem feasable?

I'd guess that it'd require maybe 200-300 lines of code in a
self-contained driver similar to the sync_file debugging driver
mentioned previously but with the additional timeout code for safety.

Jacob
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jason Ekstrand

On Wed, Mar 18, 2020 at 12:20 AM Jacob Lifshay  wrote:
>
> On Tue, Mar 17, 2020 at 7:08 PM Jason Ekstrand  wrote:
> >
> > On Tue, Mar 17, 2020 at 7:16 PM Jacob Lifshay  
> > wrote:
> > >
> > > On Tue, Mar 17, 2020 at 11:14 AM Lucas Stach  wrote:
> > > >
> > > > Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> > > > > I think I found a userspace-accessible way to create sync_files and
> > > > > dma_fences that would fulfill the requirements:
> > > > > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> > > > >
> > > > > I'm just not sure if that's a good interface to use, since it appears
> > > > > to be designed only for debugging. Will have to check for additional
> > > > > requirements of signalling an error when the process that created the
> > > > > fence is killed.
> >
> > It is expressly only for debugging and testing.  Exposing such an API
> > to userspace would break the finite time guarantees that are relied
> > upon to keep sync_file a secure API.
>
> Ok, I was figuring that was probably the case.
>
> > > > Something like that can certainly be lifted for general use if it makes
> > > > sense. But then with a software renderer I don't really see how fences
> > > > help you at all. With a software renderer you know exactly when the
> > > > frame is finished and you can just defer pushing it over to the next
> > > > pipeline element until that time. You won't gain any parallelism by
> > > > using fences as the CPU is busy doing the rendering and will not run
> > > > other stuff concurrently, right?
> > >
> > > There definitely may be other hardware and/or processes that can
> > > process some stuff concurrently with the main application, such as the
> > > compositor and or video encoding processes (for video capture).
> > > Additionally, from what I understand, sync_file is the standard way to
> > > export and import explicit synchronization between processes and
> > > between drivers on Linux, so it seems like a good idea to support it
> > > from an interoperability standpoint even if it turns out that there
> > > aren't any scheduling/timing benefits.
> >
> > There are different ways that one can handle interoperability,
> > however.  One way is to try and make the software rasterizer look as
> > much like a GPU as possible:  lots of threads to make things as
> > asynchronous as possible, "real" implementations of semaphores and
> > fences, etc.
>
> This is basically the route I've picked, though rather than making
> lots of native threads, I'm planning on having just one thread per
> core and have a work-stealing scheduler (inspired by Rust's rayon
> crate) schedule all the individual render/compute jobs, because that
> allows making a lot more jobs to allow finer load balancing.
>
> > Another is to let a SW rasterizer be a SW rasterizer: do
> > everything immediately, thread only so you can exercise all the CPU
> > cores, and minimally implement semaphores and fences well enough to
> > maintain compatibility.  If you take the first approach, then we have
> > to solve all these problems with letting userspace create unsignaled
> > sync_files which it will signal later and figure out how to make it
> > safe.  If you take the second approach, you'll only ever have to
> > return already signaled sync_files and there's no problem with the
> > sync_file finite time guarantees.
>
> The main issue with doing everything immediately is that a lot of the
> function calls that games expect to take a very short time (e.g.
> vkQueueSubmit) would instead take a much longer time, potentially
> causing problems.

Do you have any evidence that it will cause problems?  What I said
above is what switfshader is doing and they're running real apps and
I've not heard of it causing any problems.  It's also worth noting
that you would only really have to stall at sync_file export.  You can
async as much as you want internally.

> One idea for a safe userspace-backed sync_file is to have a step
> counter that counts down until the sync_file is ready, where if
> userspace doesn't tell it to count any steps in a certain amount of
> time, then the sync_file switches to the error state. This way, it
> will error shortly after a process deadlocks for some reason, while
> still having the finite-time guarantee.
>
> When the sync_file is created, the step counter would be set to the
> number of jobs that the fence is waiting on.
>
> It can also be set to pause the timeout to wait until another
> sync_file signals, to handle cases where a sync_file is waiting on a
> userspace process that is waiting on another sync_file.
>
> The main issue is that the kernel would have to make sure that the
> sync_file graph doesn't have loops, maybe by erroring all sync_files
> that it finds in the loop.
>
> Does that sound like a good idea?

Honestly, I don't think you'll ever be able to sell that to the kernel
community.  All of the deadlock detection would add massive complexity
to the already non-trivi

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jacob Lifshay

On Tue, Mar 17, 2020 at 7:08 PM Jason Ekstrand  wrote:
>
> On Tue, Mar 17, 2020 at 7:16 PM Jacob Lifshay  
> wrote:
> >
> > On Tue, Mar 17, 2020 at 11:14 AM Lucas Stach  wrote:
> > >
> > > Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> > > > I think I found a userspace-accessible way to create sync_files and
> > > > dma_fences that would fulfill the requirements:
> > > > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> > > >
> > > > I'm just not sure if that's a good interface to use, since it appears
> > > > to be designed only for debugging. Will have to check for additional
> > > > requirements of signalling an error when the process that created the
> > > > fence is killed.
>
> It is expressly only for debugging and testing.  Exposing such an API
> to userspace would break the finite time guarantees that are relied
> upon to keep sync_file a secure API.

Ok, I was figuring that was probably the case.

> > > Something like that can certainly be lifted for general use if it makes
> > > sense. But then with a software renderer I don't really see how fences
> > > help you at all. With a software renderer you know exactly when the
> > > frame is finished and you can just defer pushing it over to the next
> > > pipeline element until that time. You won't gain any parallelism by
> > > using fences as the CPU is busy doing the rendering and will not run
> > > other stuff concurrently, right?
> >
> > There definitely may be other hardware and/or processes that can
> > process some stuff concurrently with the main application, such as the
> > compositor and or video encoding processes (for video capture).
> > Additionally, from what I understand, sync_file is the standard way to
> > export and import explicit synchronization between processes and
> > between drivers on Linux, so it seems like a good idea to support it
> > from an interoperability standpoint even if it turns out that there
> > aren't any scheduling/timing benefits.
>
> There are different ways that one can handle interoperability,
> however.  One way is to try and make the software rasterizer look as
> much like a GPU as possible:  lots of threads to make things as
> asynchronous as possible, "real" implementations of semaphores and
> fences, etc.

This is basically the route I've picked, though rather than making
lots of native threads, I'm planning on having just one thread per
core and have a work-stealing scheduler (inspired by Rust's rayon
crate) schedule all the individual render/compute jobs, because that
allows making a lot more jobs to allow finer load balancing.

> Another is to let a SW rasterizer be a SW rasterizer: do
> everything immediately, thread only so you can exercise all the CPU
> cores, and minimally implement semaphores and fences well enough to
> maintain compatibility.  If you take the first approach, then we have
> to solve all these problems with letting userspace create unsignaled
> sync_files which it will signal later and figure out how to make it
> safe.  If you take the second approach, you'll only ever have to
> return already signaled sync_files and there's no problem with the
> sync_file finite time guarantees.

The main issue with doing everything immediately is that a lot of the
function calls that games expect to take a very short time (e.g.
vkQueueSubmit) would instead take a much longer time, potentially
causing problems.

One idea for a safe userspace-backed sync_file is to have a step
counter that counts down until the sync_file is ready, where if
userspace doesn't tell it to count any steps in a certain amount of
time, then the sync_file switches to the error state. This way, it
will error shortly after a process deadlocks for some reason, while
still having the finite-time guarantee.

When the sync_file is created, the step counter would be set to the
number of jobs that the fence is waiting on.

It can also be set to pause the timeout to wait until another
sync_file signals, to handle cases where a sync_file is waiting on a
userspace process that is waiting on another sync_file.

The main issue is that the kernel would have to make sure that the
sync_file graph doesn't have loops, maybe by erroring all sync_files
that it finds in the loop.

Does that sound like a good idea?

Jacob
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jason Ekstrand

On Tue, Mar 17, 2020 at 7:16 PM Jacob Lifshay  wrote:
>
> On Tue, Mar 17, 2020 at 11:14 AM Lucas Stach  wrote:
> >
> > Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> > > I think I found a userspace-accessible way to create sync_files and
> > > dma_fences that would fulfill the requirements:
> > > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> > >
> > > I'm just not sure if that's a good interface to use, since it appears
> > > to be designed only for debugging. Will have to check for additional
> > > requirements of signalling an error when the process that created the
> > > fence is killed.

It is expressly only for debugging and testing.  Exposing such an API
to userspace would break the finite time guarantees that are relied
upon to keep sync_file a secure API.

> > Something like that can certainly be lifted for general use if it makes
> > sense. But then with a software renderer I don't really see how fences
> > help you at all. With a software renderer you know exactly when the
> > frame is finished and you can just defer pushing it over to the next
> > pipeline element until that time. You won't gain any parallelism by
> > using fences as the CPU is busy doing the rendering and will not run
> > other stuff concurrently, right?
>
> There definitely may be other hardware and/or processes that can
> process some stuff concurrently with the main application, such as the
> compositor and or video encoding processes (for video capture).
> Additionally, from what I understand, sync_file is the standard way to
> export and import explicit synchronization between processes and
> between drivers on Linux, so it seems like a good idea to support it
> from an interoperability standpoint even if it turns out that there
> aren't any scheduling/timing benefits.

There are different ways that one can handle interoperability,
however.  One way is to try and make the software rasterizer look as
much like a GPU as possible:  lots of threads to make things as
asynchronous as possible, "real" implementations of semaphores and
fences, etc.  Another is to let a SW rasterizer be a SW rasterizer: do
everything immediately, thread only so you can exercise all the CPU
cores, and minimally implement semaphores and fences well enough to
maintain compatibility.  If you take the first approach, then we have
to solve all these problems with letting userspace create unsignaled
sync_files which it will signal later and figure out how to make it
safe.  If you take the second approach, you'll only ever have to
return already signaled sync_files and there's no problem with the
sync_file finite time guarantees.

--Jason
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jacob Lifshay

On Tue, Mar 17, 2020 at 11:14 AM Lucas Stach  wrote:
>
> Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> > I think I found a userspace-accessible way to create sync_files and
> > dma_fences that would fulfill the requirements:
> > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> >
> > I'm just not sure if that's a good interface to use, since it appears
> > to be designed only for debugging. Will have to check for additional
> > requirements of signalling an error when the process that created the
> > fence is killed.
>
> Something like that can certainly be lifted for general use if it makes
> sense. But then with a software renderer I don't really see how fences
> help you at all. With a software renderer you know exactly when the
> frame is finished and you can just defer pushing it over to the next
> pipeline element until that time. You won't gain any parallelism by
> using fences as the CPU is busy doing the rendering and will not run
> other stuff concurrently, right?

There definitely may be other hardware and/or processes that can
process some stuff concurrently with the main application, such as the
compositor and or video encoding processes (for video capture).
Additionally, from what I understand, sync_file is the standard way to
export and import explicit synchronization between processes and
between drivers on Linux, so it seems like a good idea to support it
from an interoperability standpoint even if it turns out that there
aren't any scheduling/timing benefits.

Jacob
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jacob Lifshay

On Tue, Mar 17, 2020 at 10:21 AM Lucas Stach  wrote:
>
> Am Dienstag, den 17.03.2020, 10:12 -0700 schrieb Jacob Lifshay:
> > One related issue with explicit sync using sync_file is that combined
> > CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> > rendering in userspace (like llvmpipe but for Vulkan and with extra
> > instructions for GPU tasks) but need to synchronize with other
> > drivers/processes is that there should be some way to create an
> > explicit fence/semaphore from userspace and later signal it. This
> > seems to conflict with the requirement for a sync_file to complete in
> > finite time, since the user process could be stopped or killed.
> >
> > Any ideas?
>
> Finite just means "not infinite". If you stop the process that's doing
> part of the pipeline processing you block the pipeline, you get to keep
> the pieces in that case.

Seems reasonable.

> That's one of the issues with implicit sync
> that explicit may solve: a single client taking way too much time to
> render something can block the whole pipeline up until the display
> flip. With explicit sync the compositor can just decide to use the last
> client buffer if the latest buffer isn't ready by some deadline.
>
> With regard to the process getting killed: whatever you sync primitive
> is, you need to make sure to signal the fence (possibly with an error
> condition set) when you are not going to make progress anymore. So
> whatever your means to creating the sync_fd from your software renderer
> is, it needs to signal any outstanding fences on the sync_fd when the
> fd is closed.

I think I found a userspace-accessible way to create sync_files and
dma_fences that would fulfill the requirements:
https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c

I'm just not sure if that's a good interface to use, since it appears
to be designed only for debugging. Will have to check for additional
requirements of signalling an error when the process that created the
fence is killed.

Jacob

>
> Regards,
> Lucas
>
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jason Ekstrand

On Tue, Mar 17, 2020 at 12:13 PM Jacob Lifshay  wrote:
>
> One related issue with explicit sync using sync_file is that combined
> CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> rendering in userspace (like llvmpipe but for Vulkan and with extra
> instructions for GPU tasks) but need to synchronize with other
> drivers/processes is that there should be some way to create an
> explicit fence/semaphore from userspace and later signal it. This
> seems to conflict with the requirement for a sync_file to complete in
> finite time, since the user process could be stopped or killed.

Yeah... That's going to be a problem.  The only way I could see that
working is if you created a sync_file that had a timeout associated
with it.  However, then you run into the issue where you may have
corruption if stuff doesn't complete on time.  Then again, you're not
really dealing with an external unit and so the latency cost of going
across the window system protocol probably isn't massively different
from the latency cost of triggering the sync_file.  Maybe the answer
there is to just do everything in-order and not worry about
synchronization?
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Jacob Lifshay

One related issue with explicit sync using sync_file is that combined
CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
rendering in userspace (like llvmpipe but for Vulkan and with extra
instructions for GPU tasks) but need to synchronize with other
drivers/processes is that there should be some way to create an
explicit fence/semaphore from userspace and later signal it. This
seems to conflict with the requirement for a sync_file to complete in
finite time, since the user process could be stopped or killed.

Any ideas?

Jacob Lifshay
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Marek Olšák

On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer  wrote:

> On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> > The synchronization works because the Mesa driver waits for idle (drains
> > the GFX pipeline) at the end of command buffers and there is only 1
> > graphics queue, so everything is ordered.
> >
> > The GFX pipeline runs asynchronously to the command buffer, meaning the
> > command buffer only starts draws and doesn't wait for completion. If the
> > Mesa driver didn't wait at the end of the command buffer, the command
> > buffer would finish and a different process could start execution of its
> > own command buffer while shaders of the previous process are still
> running.
> >
> > If the Mesa driver submits a command buffer internally (because it's
> full),
> > it doesn't wait, so the GFX pipeline doesn't notice that a command buffer
> > ended and a new one started.
> >
> > The waiting at the end of command buffers happens only when the flush is
> > external (Swap buffers, glFlush).
> >
> > It's a performance problem, because the GFX queue is blocked until the
> GFX
> > pipeline is drained at the end of every frame at least.
> >
> > So explicit fences for SwapBuffers would help.
>
> Not sure what difference it would make, since the same thing needs to be
> done for explicit fences as well, doesn't it?
>

No. Explicit fences don't require userspace to wait for idle in the command
buffer. Fences are signalled when the last draw is complete and caches are
flushed. Before that happens, any command buffer that is not dependent on
the fence can start execution. There is never a need for the GPU to be idle
if there is enough independent work to do.

Marek
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Michel Dänzer

On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> The synchronization works because the Mesa driver waits for idle (drains
> the GFX pipeline) at the end of command buffers and there is only 1
> graphics queue, so everything is ordered.
> 
> The GFX pipeline runs asynchronously to the command buffer, meaning the
> command buffer only starts draws and doesn't wait for completion. If the
> Mesa driver didn't wait at the end of the command buffer, the command
> buffer would finish and a different process could start execution of its
> own command buffer while shaders of the previous process are still running.
> 
> If the Mesa driver submits a command buffer internally (because it's full),
> it doesn't wait, so the GFX pipeline doesn't notice that a command buffer
> ended and a new one started.
> 
> The waiting at the end of command buffers happens only when the flush is
> external (Swap buffers, glFlush).
> 
> It's a performance problem, because the GFX queue is blocked until the GFX
> pipeline is drained at the end of every frame at least.
> 
> So explicit fences for SwapBuffers would help.

Not sure what difference it would make, since the same thing needs to be
done for explicit fences as well, doesn't it?


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-15 Thread Marek Olšák

The synchronization works because the Mesa driver waits for idle (drains
the GFX pipeline) at the end of command buffers and there is only 1
graphics queue, so everything is ordered.

The GFX pipeline runs asynchronously to the command buffer, meaning the
command buffer only starts draws and doesn't wait for completion. If the
Mesa driver didn't wait at the end of the command buffer, the command
buffer would finish and a different process could start execution of its
own command buffer while shaders of the previous process are still running.

If the Mesa driver submits a command buffer internally (because it's full),
it doesn't wait, so the GFX pipeline doesn't notice that a command buffer
ended and a new one started.

The waiting at the end of command buffers happens only when the flush is
external (Swap buffers, glFlush).

It's a performance problem, because the GFX queue is blocked until the GFX
pipeline is drained at the end of every frame at least.

So explicit fences for SwapBuffers would help.

Marek

On Sun., Mar. 15, 2020, 22:49 Jason Ekstrand,  wrote:

> Could you elaborate. If there's something missing from my mental model of
> how implicit sync works, I'd like to have it corrected. People continue
> claiming that AMD is somehow special but I have yet to grasp what makes it
> so.  (Not that anyone has bothered to try all that hard to explain it.)
>
>
> --Jason
>
> On March 13, 2020 21:03:21 Marek Olšák  wrote:
>
>> There is no synchronization between processes (e.g. 3D app and
>> compositor) within X on AMD hw. It works because of some hacks in Mesa.
>>
>> Marek
>>
>> On Wed, Mar 11, 2020 at 1:31 PM Jason Ekstrand 
>> wrote:
>>
>>> All,
>>>
>>> Sorry for casting such a broad net with this one. I'm sure most people
>>> who reply will get at least one mailing list rejection.  However, this
>>> is an issue that affects a LOT of components and that's why it's
>>> thorny to begin with.  Please pardon the length of this e-mail as
>>> well; I promise there's a concrete point/proposal at the end.
>>>
>>>
>>> Explicit synchronization is the future of graphics and media.  At
>>> least, that seems to be the consensus among all the graphics people
>>> I've talked to.  I had a chat with one of the lead Android graphics
>>> engineers recently who told me that doing explicit sync from the start
>>> was one of the best engineering decisions Android ever made.  It's
>>> also the direction being taken by more modern APIs such as Vulkan.
>>>
>>>
>>> ## What are implicit and explicit synchronization?
>>>
>>> For those that aren't familiar with this space, GPUs, media encoders,
>>> etc. are massively parallel and synchronization of some form is
>>> required to ensure that everything happens in the right order and
>>> avoid data races.  Implicit synchronization is when bits of work (3D,
>>> compute, video encode, etc.) are implicitly based on the absolute
>>> CPU-time order in which API calls occur.  Explicit synchronization is
>>> when the client (whatever that means in any given context) provides
>>> the dependency graph explicitly via some sort of synchronization
>>> primitives.  If you're still confused, consider the following
>>> examples:
>>>
>>> With OpenGL and EGL, almost everything is implicit sync.  Say you have
>>> two OpenGL contexts sharing an image where one writes to it and the
>>> other textures from it.  The way the OpenGL spec works, the client has
>>> to make the API calls to render to the image before (in CPU time) it
>>> makes the API calls which texture from the image.  As long as it does
>>> this (and maybe inserts a glFlush?), the driver will ensure that the
>>> rendering completes before the texturing happens and you get correct
>>> contents.
>>>
>>> Implicit synchronization can also happen across processes.  Wayland,
>>> for instance, is currently built on implicit sync where the client
>>> does their rendering and then does a hand-off (via wl_surface::commit)
>>> to tell the compositor it's done at which point the compositor can now
>>> texture from the surface.  The hand-off ensures that the client's
>>> OpenGL API calls happen before the server's OpenGL API calls.
>>>
>>> A good example of explicit synchronization is the Vulkan API.  There,
>>> a client (or multiple clients) can simultaneously build command
>>> buffers in different threads where one of those command buffers
>>> renders to an image and the other textures from it and then submit
>>> both of them at the same time with instructions to the driver for
>>> which order to execute them in.  The execution order is described via
>>> the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
>>> extension, you can even submit the work which does the texturing
>>> BEFORE the work which does the rendering and the driver will sort it
>>> out.
>>>
>>> The #1 problem with implicit synchronization (which explicit solves)
>>> is that it leads to a lot of over-synchronization both in client space
>>> and in driver/device space.  The

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-15 Thread Jason Ekstrand

Could you elaborate. If there's something missing from my mental model of 
how implicit sync works, I'd like to have it corrected. People continue 
claiming that AMD is somehow special but I have yet to grasp what makes it 
so.  (Not that anyone has bothered to try all that hard to explain it.)



--Jason

On March 13, 2020 21:03:21 Marek Olšák  wrote:
There is no synchronization between processes (e.g. 3D app and compositor) 
within X on AMD hw. It works because of some hacks in Mesa.


Marek

On Wed, Mar 11, 2020 at 1:31 PM Jason Ekstrand  wrote:
All,

Sorry for casting such a broad net with this one. I'm sure most people
who reply will get at least one mailing list rejection.  However, this
is an issue that affects a LOT of components and that's why it's
thorny to begin with.  Please pardon the length of this e-mail as
well; I promise there's a concrete point/proposal at the end.


Explicit synchronization is the future of graphics and media.  At
least, that seems to be the consensus among all the graphics people
I've talked to.  I had a chat with one of the lead Android graphics
engineers recently who told me that doing explicit sync from the start
was one of the best engineering decisions Android ever made.  It's
also the direction being taken by more modern APIs such as Vulkan.


## What are implicit and explicit synchronization?

For those that aren't familiar with this space, GPUs, media encoders,
etc. are massively parallel and synchronization of some form is
required to ensure that everything happens in the right order and
avoid data races.  Implicit synchronization is when bits of work (3D,
compute, video encode, etc.) are implicitly based on the absolute
CPU-time order in which API calls occur.  Explicit synchronization is
when the client (whatever that means in any given context) provides
the dependency graph explicitly via some sort of synchronization
primitives.  If you're still confused, consider the following
examples:

With OpenGL and EGL, almost everything is implicit sync.  Say you have
two OpenGL contexts sharing an image where one writes to it and the
other textures from it.  The way the OpenGL spec works, the client has
to make the API calls to render to the image before (in CPU time) it
makes the API calls which texture from the image.  As long as it does
this (and maybe inserts a glFlush?), the driver will ensure that the
rendering completes before the texturing happens and you get correct
contents.

Implicit synchronization can also happen across processes.  Wayland,
for instance, is currently built on implicit sync where the client
does their rendering and then does a hand-off (via wl_surface::commit)
to tell the compositor it's done at which point the compositor can now
texture from the surface.  The hand-off ensures that the client's
OpenGL API calls happen before the server's OpenGL API calls.

A good example of explicit synchronization is the Vulkan API.  There,
a client (or multiple clients) can simultaneously build command
buffers in different threads where one of those command buffers
renders to an image and the other textures from it and then submit
both of them at the same time with instructions to the driver for
which order to execute them in.  The execution order is described via
the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
extension, you can even submit the work which does the texturing
BEFORE the work which does the rendering and the driver will sort it
out.

The #1 problem with implicit synchronization (which explicit solves)
is that it leads to a lot of over-synchronization both in client space
and in driver/device space.  The client has to synchronize a lot more
because it has to ensure that the API calls happen in a particular
order.  The driver/device have to synchronize a lot more because they
never know what is going to end up being a synchronization point as an
API call on another thread/process may occur at any time.  As we move
to more and more multi-threaded programming this synchronization (on
the client-side especially) becomes more and more painful.


## Current status in Linux

Implicit synchronization in Linux works via a the kernel's internal
dma_buf and dma_fence data structures.  A dma_fence is a tiny object
which represents the "done" status for some bit of work.  Typically,
dma_fences are created as a by-product of someone submitting some bit
of work (say, 3D rendering) to the kernel.  The dma_buf object has a
set of dma_fences on it representing shared (read) and exclusive
(write) access to the object.  When work is submitted which, for
instance renders to the dma_buf, it's queued waiting on all the fences
on the dma_buf and and a dma_fence is created representing the end of
said rendering work and it's installed as the dma_buf's exclusive
fence.  This way, the kernel can manage all its internal queues (3D
rendering, display, video encode, etc.) and know which things to
submit in what order.

For the last few years, we've h

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-13 Thread Marek Olšák

There is no synchronization between processes (e.g. 3D app and compositor)
within X on AMD hw. It works because of some hacks in Mesa.

Marek

On Wed, Mar 11, 2020 at 1:31 PM Jason Ekstrand  wrote:

> All,
>
> Sorry for casting such a broad net with this one. I'm sure most people
> who reply will get at least one mailing list rejection.  However, this
> is an issue that affects a LOT of components and that's why it's
> thorny to begin with.  Please pardon the length of this e-mail as
> well; I promise there's a concrete point/proposal at the end.
>
>
> Explicit synchronization is the future of graphics and media.  At
> least, that seems to be the consensus among all the graphics people
> I've talked to.  I had a chat with one of the lead Android graphics
> engineers recently who told me that doing explicit sync from the start
> was one of the best engineering decisions Android ever made.  It's
> also the direction being taken by more modern APIs such as Vulkan.
>
>
> ## What are implicit and explicit synchronization?
>
> For those that aren't familiar with this space, GPUs, media encoders,
> etc. are massively parallel and synchronization of some form is
> required to ensure that everything happens in the right order and
> avoid data races.  Implicit synchronization is when bits of work (3D,
> compute, video encode, etc.) are implicitly based on the absolute
> CPU-time order in which API calls occur.  Explicit synchronization is
> when the client (whatever that means in any given context) provides
> the dependency graph explicitly via some sort of synchronization
> primitives.  If you're still confused, consider the following
> examples:
>
> With OpenGL and EGL, almost everything is implicit sync.  Say you have
> two OpenGL contexts sharing an image where one writes to it and the
> other textures from it.  The way the OpenGL spec works, the client has
> to make the API calls to render to the image before (in CPU time) it
> makes the API calls which texture from the image.  As long as it does
> this (and maybe inserts a glFlush?), the driver will ensure that the
> rendering completes before the texturing happens and you get correct
> contents.
>
> Implicit synchronization can also happen across processes.  Wayland,
> for instance, is currently built on implicit sync where the client
> does their rendering and then does a hand-off (via wl_surface::commit)
> to tell the compositor it's done at which point the compositor can now
> texture from the surface.  The hand-off ensures that the client's
> OpenGL API calls happen before the server's OpenGL API calls.
>
> A good example of explicit synchronization is the Vulkan API.  There,
> a client (or multiple clients) can simultaneously build command
> buffers in different threads where one of those command buffers
> renders to an image and the other textures from it and then submit
> both of them at the same time with instructions to the driver for
> which order to execute them in.  The execution order is described via
> the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
> extension, you can even submit the work which does the texturing
> BEFORE the work which does the rendering and the driver will sort it
> out.
>
> The #1 problem with implicit synchronization (which explicit solves)
> is that it leads to a lot of over-synchronization both in client space
> and in driver/device space.  The client has to synchronize a lot more
> because it has to ensure that the API calls happen in a particular
> order.  The driver/device have to synchronize a lot more because they
> never know what is going to end up being a synchronization point as an
> API call on another thread/process may occur at any time.  As we move
> to more and more multi-threaded programming this synchronization (on
> the client-side especially) becomes more and more painful.
>
>
> ## Current status in Linux
>
> Implicit synchronization in Linux works via a the kernel's internal
> dma_buf and dma_fence data structures.  A dma_fence is a tiny object
> which represents the "done" status for some bit of work.  Typically,
> dma_fences are created as a by-product of someone submitting some bit
> of work (say, 3D rendering) to the kernel.  The dma_buf object has a
> set of dma_fences on it representing shared (read) and exclusive
> (write) access to the object.  When work is submitted which, for
> instance renders to the dma_buf, it's queued waiting on all the fences
> on the dma_buf and and a dma_fence is created representing the end of
> said rendering work and it's installed as the dma_buf's exclusive
> fence.  This way, the kernel can manage all its internal queues (3D
> rendering, display, video encode, etc.) and know which things to
> submit in what order.
>
> For the last few years, we've had sync_file in the kernel and it's
> plumbed into some drivers.  A sync_file is just a wrapper around a
> single dma_fence.  A sync_file is typically created as a by-product of
> submitting work (3D,

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

21 matches

Site Navigation

Mail list logo

Footer information