Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-25 Thread Jerome Glisse
On Thu, Jan 21, 2010 at 01:59:26PM +0100, Thomas Hellstrom wrote: Jerome Glisse wrote: On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote: We had to do a similar thing in the Poulsbo driver and it turned out that we could save a significant amount of CPU by using a delayed

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-25 Thread Jerome Glisse
On Thu, Jan 21, 2010 at 04:14:39PM +0100, Luca Barbieri wrote: I'm not sure I understand your proposal correctly. It seems your proposoal is similar to mine, replacing the term fence nodes with ttm transactions, but I'm not sure if I understand it correctly Here is some pseudocode for a

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-25 Thread Thomas Hellstrom
Jerome Glisse wrote: On Thu, Jan 21, 2010 at 01:59:26PM +0100, Thomas Hellstrom wrote: Jerome Glisse wrote: On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote: We had to do a similar thing in the Poulsbo driver and it turned out that we could save a significant

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Jerome Glisse
On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote: We had to do a similar thing in the Poulsbo driver and it turned out that we could save a significant amount of CPU by using a delayed workqueue, collecting objects and destroying them periodically. Yes, indeed, we don't

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Thomas Hellstrom
Luca Barbieri wrote: We had to do a similar thing in the Poulsbo driver and it turned out that we could save a significant amount of CPU by using a delayed workqueue, collecting objects and destroying them periodically. Yes, indeed, we don't really care about a fence expiring unless we

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Thomas Hellstrom
Jerome Glisse wrote: On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote: We had to do a similar thing in the Poulsbo driver and it turned out that we could save a significant amount of CPU by using a delayed workqueue, collecting objects and destroying them periodically.

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Luca Barbieri
At a first glance: 1) We probably *will* need a delayed destroyed workqueue to avoid wasting memory that otherwise should be freed to the system. At the very least, the delayed delete process should optionally be run by a system shrinker. You are right. For VRAM we don't care since we are the

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Francisco Jerez
Luca Barbieri l...@luca-barbieri.com writes: At a first glance: 1) We probably *will* need a delayed destroyed workqueue to avoid wasting memory that otherwise should be freed to the system. At the very least, the delayed delete process should optionally be run by a system shrinker. You are

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Luca Barbieri
Nvidia cards have a synchronization primitive that could be used to synchronize several FIFOs in hardware (AKA semaphores, see [1] for an example). Does this operate wholly on the GPU on all nVidia cards? It seems that at least on some GPUs this will trigger software methods that are

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Thomas Hellstrom
Luca Barbieri wrote: At a first glance: 1) We probably *will* need a delayed destroyed workqueue to avoid wasting memory that otherwise should be freed to the system. At the very least, the delayed delete process should optionally be run by a system shrinker. You are right. For VRAM we

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Francisco Jerez
Luca Barbieri l...@luca-barbieri.com writes: Nvidia cards have a synchronization primitive that could be used to synchronize several FIFOs in hardware (AKA semaphores, see [1] for an example). Does this operate wholly on the GPU on all nVidia cards? It seems that at least on some GPUs this

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Luca Barbieri
I'm not sure I understand your proposal correctly. It seems your proposoal is similar to mine, replacing the term fence nodes with ttm transactions, but I'm not sure if I understand it correctly Here is some pseudocode for a improved, simplified version of my proposal. It is modified so that

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Luca Barbieri
If not, it could possibly be hacked around by reading from a DMA object at the address of the fence sequence number and then resizing the DMA object so that addresses from a certain point on would trigger a protection fault interrupt. I don't think you can safely modify a DMA object without

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Maarten Maathuis
On Thu, Jan 21, 2010 at 3:44 PM, Francisco Jerez curroje...@riseup.net wrote: Luca Barbieri l...@luca-barbieri.com writes: Nvidia cards have a synchronization primitive that could be used to synchronize several FIFOs in hardware (AKA semaphores, see [1] for an example). Does this operate

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-21 Thread Francisco Jerez
Luca Barbieri l...@luca-barbieri.com writes: If not, it could possibly be hacked around by reading from a DMA object at the address of the fence sequence number and then resizing the DMA object so that addresses from a certain point on would trigger a protection fault interrupt. I don't

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Thomas Hellstrom
Thomas Hellstrom wrote: Yes, it looks correct. Although it seems a little unintuitive to enter the loop with the spinlock held, and exit it with the spinlock not held. I've attached yet another patch to have that fixed. Could you take a look at whether it seems OK with you and, in that case,

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Thomas Hellstrom
Thomas Hellstrom wrote: Thomas Hellstrom wrote: Yes, it looks correct. Although it seems a little unintuitive to enter the loop with the spinlock held, and exit it with the spinlock not held. I've attached yet another patch to have that fixed. Could you take a look at whether it seems OK

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Luca Barbieri
Yes it's fine. I sent your patch to Dave with an expanded commit comment for merging. Here is a possible redesign of the mechanism inspired by this issue. It seems that what we are racing against is buffer eviction, due to delayed deletion buffers being still kept on the LRU list. I'm wondering

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Luca Barbieri
Also note that the delayed delete list is not in fence order but in deletion-time order, which perhaps gives room for more optimizations. You are right. I think then that ttm_bo_delayed_delete may still need to be changed, because it stops when ttm_bo_cleanup_refs returns -EBUSY, which happens

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Luca Barbieri
When designing this, we should also keep in mind that some drivers (e.g. nouveau) have multiple FIFO channels, and thus we would like a buffer to be referenced for reading by multiple channels at once (and be destroyed only when all fences are expired, obviously). Also, hardware may support on-GPU

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Thomas Hellstrom
Luca Barbieri wrote: Also note that the delayed delete list is not in fence order but in deletion-time order, which perhaps gives room for more optimizations. You are right. I think then that ttm_bo_delayed_delete may still need to be changed, because it stops when ttm_bo_cleanup_refs

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-20 Thread Thomas Hellstrom
Luca Barbieri wrote: When designing this, we should also keep in mind that some drivers (e.g. nouveau) have multiple FIFO channels, and thus we would like a buffer to be referenced for reading by multiple channels at once (and be destroyed only when all fences are expired, obviously). Also,

Re: [Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

2010-01-19 Thread Thomas Hellstrom
Luca, Good catch. Some comments inline: Luca Barbieri wrote: + entry = list_first_entry(bdev-ddestroy, + struct ttm_buffer_object, ddestroy); + kref_get(entry-list_kref); - if (next != bdev-ddestroy) { - nentry = list_entry(next,