On Wed, 13 Feb 2008, Jason Gunthorpe wrote:
> Christoph: It seemed to me you were first talking about
> freeing/swapping/faulting RDMA'able pages - but would pure migration
> as a special hardware supported case be useful like Catilan suggested?
That is a special case of the proposed solution. Yo
> -Original Message-
> From: Christoph Lameter [mailto:[EMAIL PROTECTED]
> Sent: Friday, February 15, 2008 2:50 PM
> To: Caitlin Bestler
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED];
> [EMAIL PROTECTED]; kvm-devel@lists.sourceforge.net
> Subject: RE: [ofa-general] Re: Dem
On Fri, 15 Feb 2008, Caitlin Bestler wrote:
> There isn't much point in the RDMA layer subscribing to mmu
> notifications
> if the specific RDMA device will not be able to react appropriately when
> the notification occurs. I don't see how you get around needing to know
> which devices are capable
Christoph Lameter wrote
>
> > Merely mlocking pages deals with the end-to-end RDMA semantics.
> > What still needs to be addressed is how a fastpath interface
> > would dynamically pin and unpin. Yielding pins for short-term
> > suspensions (and flushing cached translations) deals with the
> > res
On Fri, 15 Feb 2008, Caitlin Bestler wrote:
> So that would mean that mlock is used by the application before it
> registers memory for direct access, and then it is up to the RDMA
> layer and the OS to negotiate actual pinning of the addresses for
> whatever duration is required.
Right.
> The
> -Original Message-
> From: Christoph Lameter [mailto:[EMAIL PROTECTED]
> Sent: Friday, February 15, 2008 10:46 AM
> To: Caitlin Bestler
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED];
> [EMAIL PROTECTED]; kvm-devel@lists.sourceforge.net
> Subject: RE: [ofa-general] Re: De
On Fri, 15 Feb 2008, Caitlin Bestler wrote:
> > What does it mean that the "application layer has to be determine what
> > pages are registered"? The application does not know which of its
> pages
> > are currently in memory. It can only force these pages to stay in
> > memory if their are mlocked
Christoph Lameter asked:
>
> What does it mean that the "application layer has to be determine what
> pages are registered"? The application does not know which of its
pages
> are currently in memory. It can only force these pages to stay in
> memory if their are mlocked.
>
An application that a
On Thu, 14 Feb 2008, Caitlin Bestler wrote:
> So any solution that requires the upper layers to suspend operations
> for a brief bit will require explicit interaction with those layers.
> No RDMA layer can perform the sleight of hand tricks that you seem
> to want it to perform.
Looks like it has
> -Original Message-
> From: Christoph Lameter [mailto:[EMAIL PROTECTED]
> Sent: Thursday, February 14, 2008 2:49 PM
> To: Caitlin Bestler
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED];
> [EMAIL PROTECTED]; kvm-devel@lists.sourceforge.net
> Subject: Re: [ofa-general] Re: D
On Thu, 14 Feb 2008, Caitlin Bestler wrote:
> I have no problem with that, as long as the application layer is responsible
> for
> tearing down and re-establishing the connections. The RDMA/transport layers
> are incapable of tearing down and re-establishing a connection transparently
> because c
On Thu, Feb 14, 2008 at 12:20 PM, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> On Thu, 14 Feb 2008, Caitlin Bestler wrote:
>
> > So suspend/resume to re-arrange pages is one thing. Suspend/resume to cover
> > swapping out pages so they can be reallocated is an exercise in futility.
> By the
>
On Thu, 14 Feb 2008, Caitlin Bestler wrote:
> So suspend/resume to re-arrange pages is one thing. Suspend/resume to cover
> swapping out pages so they can be reallocated is an exercise in futility. By
> the
> time you resume the connections will be broken or at the minimum damaged.
The connectio
On Thu, Feb 14, 2008 at 11:39 AM, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> On Thu, 14 Feb 2008, Steve Wise wrote:
>
> > Note that for T3, this involves suspending _all_ rdma connections that are
> in
> > the same PD as the MR being remapped. This is because the driver doesn't
> know
> >
On Thu, 14 Feb 2008, Steve Wise wrote:
> Note that for T3, this involves suspending _all_ rdma connections that are in
> the same PD as the MR being remapped. This is because the driver doesn't know
> who the application advertised the rkey/stag to. So without that knowledge,
> all connections t
On Wed, 13 Feb 2008, Kanoj Sarcar wrote:
> Oh ok, yes, I did see the discussion on this; sorry I
> missed it. I do see what notifiers bring to the table
> now (without endorsing it :-)).
>
> An orthogonal question is this: is IB/rdma the only
> "culprit" that elevates page refcounts? Are there no
On Thu, Feb 14, 2008 at 8:23 AM, Steve Wise <[EMAIL PROTECTED]> wrote:
> Robin Holt wrote:
> > On Thu, Feb 14, 2008 at 09:09:08AM -0600, Steve Wise wrote:
> >> Note that for T3, this involves suspending _all_ rdma connections that are
> >> in the same PD as the MR being remapped. This is becaus
Robin Holt wrote:
> On Thu, Feb 14, 2008 at 09:09:08AM -0600, Steve Wise wrote:
>> Note that for T3, this involves suspending _all_ rdma connections that are
>> in the same PD as the MR being remapped. This is because the driver
>> doesn't know who the application advertised the rkey/stag to. S
On Thu, Feb 14, 2008 at 09:09:08AM -0600, Steve Wise wrote:
> Note that for T3, this involves suspending _all_ rdma connections that are
> in the same PD as the MR being remapped. This is because the driver
> doesn't know who the application advertised the rkey/stag to. So without
Is there a
Felix Marti wrote:
>
> That is correct, not a change we can make for T3. We could, in theory,
> deal with changing mappings though. The change would need to be
> synchronized though: the VM would need to tell us which mapping were
> about to change and the driver would then need to disable DMA to
Hi Kanoj,
On Wed, Feb 13, 2008 at 03:43:17PM -0800, Kanoj Sarcar wrote:
> Oh ok, yes, I did see the discussion on this; sorry I
> missed it. I do see what notifiers bring to the table
> now (without endorsing it :-)).
I'm not really livelocks are really the big issue here.
I'm running N 1G VM on
On Wed, Feb 13, 2008 at 06:23:08PM -0500, Pete Wyckoff wrote:
> [EMAIL PROTECTED] wrote on Tue, 12 Feb 2008 20:09 -0800:
> > One other area that has not been brought up yet (I think) is the
> > applicability of notifiers in letting users know when pinned memory
> > is reclaimed by the kernel. This
--- Christoph Lameter <[EMAIL PROTECTED]> wrote:
> On Wed, 13 Feb 2008, Kanoj Sarcar wrote:
>
> > It seems that the need is to solve potential
> memory
> > shortage and overcommit issues by being able to
> > reclaim pages pinned by rdma driver/hardware. Is
> my
> > understanding correct?
>
> Co
[EMAIL PROTECTED] wrote on Tue, 12 Feb 2008 20:09 -0800:
> One other area that has not been brought up yet (I think) is the
> applicability of notifiers in letting users know when pinned memory
> is reclaimed by the kernel. This is useful when a lower-level
> library employs lazy deregistration st
On Wed, 13 Feb 2008, Kanoj Sarcar wrote:
> It seems that the need is to solve potential memory
> shortage and overcommit issues by being able to
> reclaim pages pinned by rdma driver/hardware. Is my
> understanding correct?
Correct.
> If I do understand correctly, then why is rdma page
> pinning
--- Christoph Lameter <[EMAIL PROTECTED]> wrote:
> On Wed, 13 Feb 2008, Christian Bell wrote:
>
> > not always be in the thousands but you're still
> claiming scalability
> > for a mechanism that essentially logs who accesses
> the regions. Then
> > there's the fact that reclaim becomes a colle
On Wed, 13 Feb 2008, Jason Gunthorpe wrote:
> Unfortunately it really has little to do with the drivers - changes,
> for instance, need to be made to support this in the user space MPI
> libraries. The RDMA ops do not pass through the kernel, userspace
> talks directly to the hardware which compli
On Wed, 13 Feb 2008, Christian Bell wrote:
> not always be in the thousands but you're still claiming scalability
> for a mechanism that essentially logs who accesses the regions. Then
> there's the fact that reclaim becomes a collective communication
> operation over all region accessors. Makes
On Wed, Feb 13, 2008 at 10:51:58AM -0800, Christoph Lameter wrote:
> On Tue, 12 Feb 2008, Jason Gunthorpe wrote:
>
> > But this isn't how IB or iwarp work at all. What you describe is a
> > significant change to the general RDMA operation and requires changes to
> > both sides of the connection an
On Wed, 13 Feb 2008, Christoph Lameter wrote:
> Right. We (SGI) have done something like this for a long time with XPmem
> and it scales ok.
I'd dispute this based on experience developing PGAS language support
on the Altix but more importantly (and less subjectively), I think
that "scales ok" r
On Wed, 13 Feb 2008, Christoph Raisch wrote:
> For ehca we currently can't modify a large MR when it has been allocated.
> EHCA Hardware expects the pages to be there (MRs must not have "holes").
> This is also true for the global MR covering all kernel space.
> Therefore we still need the memory
On Tue, 12 Feb 2008, Christian Bell wrote:
> You're arguing that a HW page table is not needed by describing a use
> case that is essentially what all RDMA solutions already do above the
> wire protocols (all solutions except Quadrics, of course).
The HW page table is not essential to the notific
On Tue, 12 Feb 2008, Jason Gunthorpe wrote:
> But this isn't how IB or iwarp work at all. What you describe is a
> significant change to the general RDMA operation and requires changes to
> both sides of the connection and the wire protocol.
Yes it may require a separate connection between both s
> > > Chelsio's T3 HW doesn't support this.
For ehca we currently can't modify a large MR when it has been allocated.
EHCA Hardware expects the pages to be there (MRs must not have "holes").
This is also true for the global MR covering all kernel space.
Therefore we still need the memory to be
On Tue, 12 Feb 2008, Christoph Lameter wrote:
> On Tue, 12 Feb 2008, Jason Gunthorpe wrote:
>
> > The problem is that the existing wire protocols do not have a
> > provision for doing an 'are you ready' or 'I am not ready' exchange
> > and they are not designed to store page tables on both sides
On Tue, Feb 12, 2008 at 06:35:09PM -0800, Christoph Lameter wrote:
> On Tue, 12 Feb 2008, Jason Gunthorpe wrote:
>
> > The problem is that the existing wire protocols do not have a
> > provision for doing an 'are you ready' or 'I am not ready' exchange
> > and they are not designed to store page t
On Tue, 12 Feb 2008, Jason Gunthorpe wrote:
> The problem is that the existing wire protocols do not have a
> provision for doing an 'are you ready' or 'I am not ready' exchange
> and they are not designed to store page tables on both sides as you
> propose. The remote side can send RDMA WRITE tra
On Tue, 12 Feb 2008, Christian Bell wrote:
> I think there are very potential clients of the interface when an
> optimistic approach is used. Part of the trick, however, has to do
> with being able to re-start transfers instead of buffering the data
> or making guarantees about delivery that coul
On Tue, 12 Feb 2008, Christoph Lameter wrote:
> On Tue, 12 Feb 2008, Jason Gunthorpe wrote:
>
> > Well, certainly today the memfree IB devices store the page tables in
> > host memory so they are already designed to hang onto packets during
> > the page lookup over PCIE, adding in faulting makes
Jason Gunthorpe wrote:
> On Tue, Feb 12, 2008 at 05:01:17PM -0800, Christoph Lameter wrote:
>> On Tue, 12 Feb 2008, Jason Gunthorpe wrote:
>>
>>> Well, certainly today the memfree IB devices store the page tables in
>>> host memory so they are already designed to hang onto packets during
>>> the pa
On Tue, Feb 12, 2008 at 05:01:17PM -0800, Christoph Lameter wrote:
> On Tue, 12 Feb 2008, Jason Gunthorpe wrote:
>
> > Well, certainly today the memfree IB devices store the page tables in
> > host memory so they are already designed to hang onto packets during
> > the page lookup over PCIE, addin
On Tue, 12 Feb 2008, Jason Gunthorpe wrote:
> Well, certainly today the memfree IB devices store the page tables in
> host memory so they are already designed to hang onto packets during
> the page lookup over PCIE, adding in faulting makes this time
> larger.
You really do not need a page table
On Tue, 12 Feb 2008, Felix Marti wrote:
> > I don't know anything about the T3 internals, but it's not clear that
> > you could do this without a new chip design in general. Lot's of RDMA
> > devices were designed expecting that when a packet arrives, the HW can
> > look up the bus address for a
On Tue, 12 Feb 2008, Roland Dreier wrote:
> I don't know anything about the T3 internals, but it's not clear that
> you could do this without a new chip design in general. Lot's of RDMA
> devices were designed expecting that when a packet arrives, the HW can
> look up the bus address for a given
On Tue, Feb 12, 2008 at 02:41:48PM -0800, Roland Dreier wrote:
> > > Chelsio's T3 HW doesn't support this.
>
> > Not so far I guess but it could be equipped with these features right?
>
> I don't know anything about the T3 internals, but it's not clear that
> you could do this without a new ch
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:general-
> [EMAIL PROTECTED] On Behalf Of Roland Dreier
> Sent: Tuesday, February 12, 2008 2:42 PM
> To: Christoph Lameter
> Cc: Rik van Riel; [EMAIL PROTECTED]; Andrea Arcangeli;
> [EMAIL PROTECTED]; [EMAIL PROTECTED]; linux-
> [EMAI
> > Chelsio's T3 HW doesn't support this.
> Not so far I guess but it could be equipped with these features right?
I don't know anything about the T3 internals, but it's not clear that
you could do this without a new chip design in general. Lot's of RDMA
devices were designed expecting that w
47 matches
Mail list logo