On Thu, Jan 21, 2010 at 01:59:26PM +0100, Thomas Hellstrom wrote:
Jerome Glisse wrote:
On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote:
We had to do a similar thing in the
Poulsbo driver and it turned out that we could save a significant amount of
CPU by using a delayed
On Thu, Jan 21, 2010 at 04:14:39PM +0100, Luca Barbieri wrote:
I'm not sure I understand your proposal correctly.
It seems your proposoal is similar to mine, replacing the term fence
nodes with ttm transactions, but I'm not sure if I understand it
correctly
Here is some pseudocode for a
Jerome Glisse wrote:
On Thu, Jan 21, 2010 at 01:59:26PM +0100, Thomas Hellstrom wrote:
Jerome Glisse wrote:
On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote:
We had to do a similar thing in the
Poulsbo driver and it turned out that we could save a significant
On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote:
We had to do a similar thing in the
Poulsbo driver and it turned out that we could save a significant amount of
CPU by using a delayed workqueue, collecting objects and destroying them
periodically.
Yes, indeed, we don't
Luca Barbieri wrote:
We had to do a similar thing in the
Poulsbo driver and it turned out that we could save a significant amount of
CPU by using a delayed workqueue, collecting objects and destroying them
periodically.
Yes, indeed, we don't really care about a fence expiring unless we
Jerome Glisse wrote:
On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote:
We had to do a similar thing in the
Poulsbo driver and it turned out that we could save a significant amount of
CPU by using a delayed workqueue, collecting objects and destroying them
periodically.
At a first glance:
1) We probably *will* need a delayed destroyed workqueue to avoid wasting
memory that otherwise should be freed to the system. At the very least, the
delayed delete process should optionally be run by a system shrinker.
You are right. For VRAM we don't care since we are the
Luca Barbieri l...@luca-barbieri.com writes:
At a first glance:
1) We probably *will* need a delayed destroyed workqueue to avoid wasting
memory that otherwise should be freed to the system. At the very least, the
delayed delete process should optionally be run by a system shrinker.
You are
Nvidia cards have a synchronization primitive that could be used to
synchronize several FIFOs in hardware (AKA semaphores, see [1] for an
example).
Does this operate wholly on the GPU on all nVidia cards?
It seems that at least on some GPUs this will trigger software
methods that are
Luca Barbieri wrote:
At a first glance:
1) We probably *will* need a delayed destroyed workqueue to avoid wasting
memory that otherwise should be freed to the system. At the very least, the
delayed delete process should optionally be run by a system shrinker.
You are right. For VRAM we
Luca Barbieri l...@luca-barbieri.com writes:
Nvidia cards have a synchronization primitive that could be used to
synchronize several FIFOs in hardware (AKA semaphores, see [1] for an
example).
Does this operate wholly on the GPU on all nVidia cards?
It seems that at least on some GPUs this
I'm not sure I understand your proposal correctly.
It seems your proposoal is similar to mine, replacing the term fence
nodes with ttm transactions, but I'm not sure if I understand it
correctly
Here is some pseudocode for a improved, simplified version of my proposal.
It is modified so that
If not, it could possibly be hacked around by reading from a DMA
object at the address of the fence sequence number and then resizing
the DMA object so that addresses from a certain point on would trigger
a protection fault interrupt.
I don't think you can safely modify a DMA object without
On Thu, Jan 21, 2010 at 3:44 PM, Francisco Jerez curroje...@riseup.net wrote:
Luca Barbieri l...@luca-barbieri.com writes:
Nvidia cards have a synchronization primitive that could be used to
synchronize several FIFOs in hardware (AKA semaphores, see [1] for an
example).
Does this operate
Luca Barbieri l...@luca-barbieri.com writes:
If not, it could possibly be hacked around by reading from a DMA
object at the address of the fence sequence number and then resizing
the DMA object so that addresses from a certain point on would trigger
a protection fault interrupt.
I don't
Thomas Hellstrom wrote:
Yes, it looks correct. Although it seems a little unintuitive to enter
the loop with the spinlock held, and exit it with the spinlock not held.
I've attached yet another patch to have that fixed. Could you take a
look at whether it seems OK with you and, in that case,
Thomas Hellstrom wrote:
Thomas Hellstrom wrote:
Yes, it looks correct. Although it seems a little unintuitive to enter
the loop with the spinlock held, and exit it with the spinlock not held.
I've attached yet another patch to have that fixed. Could you take a
look at whether it seems OK
Yes it's fine. I sent your patch to Dave with an expanded commit
comment for merging.
Here is a possible redesign of the mechanism inspired by this issue.
It seems that what we are racing against is buffer eviction, due to
delayed deletion buffers being still kept on the LRU list.
I'm wondering
Also note that the delayed delete list is not in fence order but in
deletion-time order, which perhaps gives room for more optimizations.
You are right.
I think then that ttm_bo_delayed_delete may still need to be changed,
because it stops when ttm_bo_cleanup_refs returns -EBUSY, which
happens
When designing this, we should also keep in mind that some drivers
(e.g. nouveau) have multiple FIFO channels, and thus we would like a
buffer to be referenced for reading by multiple channels at once (and
be destroyed only when all fences are expired, obviously).
Also, hardware may support on-GPU
Luca Barbieri wrote:
Also note that the delayed delete list is not in fence order but in
deletion-time order, which perhaps gives room for more optimizations.
You are right.
I think then that ttm_bo_delayed_delete may still need to be changed,
because it stops when ttm_bo_cleanup_refs
Luca Barbieri wrote:
When designing this, we should also keep in mind that some drivers
(e.g. nouveau) have multiple FIFO channels, and thus we would like a
buffer to be referenced for reading by multiple channels at once (and
be destroyed only when all fences are expired, obviously).
Also,
Luca,
Good catch.
Some comments inline:
Luca Barbieri wrote:
+ entry = list_first_entry(bdev-ddestroy,
+ struct ttm_buffer_object, ddestroy);
+ kref_get(entry-list_kref);
- if (next != bdev-ddestroy) {
- nentry = list_entry(next,
23 matches
Mail list logo