Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-04-05 Thread Michael R. Hines
FYI, I used the following redhat cgroups instructions, to test if overcommit + RDMA was working: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html - Michael On 03/21/2013 02:11 AM, Michael S. Tsirkin wrote: On Tue, Mar

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-04-05 Thread Michael R. Hines
On 03/21/2013 02:11 AM, Michael S. Tsirkin wrote: On Tue, Mar 19, 2013 at 01:49:34PM -0400, Michael R. Hines wrote: I also did a test using RDMA + cgroup, and the kernel killed my QEMU :) So, infiniband is not smart enough to know how to avoid pinning a zero page, I guess. - Michael On 03/19/

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-21 Thread Michael R. Hines
Very nice catch. Yes, I didn't think about that. Thanks. On 03/21/2013 02:11 AM, Michael S. Tsirkin wrote: I really shouldn't break COW if you don't request LOCAL_WRITE. I think it's a kernel bug, and apparently has been there in the code since the first version: get_user_pages parameters swa

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael S. Tsirkin
On Tue, Mar 19, 2013 at 01:49:34PM -0400, Michael R. Hines wrote: > I also did a test using RDMA + cgroup, and the kernel killed my QEMU :) > > So, infiniband is not smart enough to know how to avoid pinning a > zero page, I guess. > > - Michael > > On 03/19/2013 01:14 PM, Paolo Bonzini wrote: >

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2013 at 04:56:01PM -0400, Michael R. Hines wrote: > > Forgive me, vmsplice system call? Or some other interface? > > I'm not following.. > > On 03/20/2013 04:46 PM, Michael S. Tsirkin wrote: > >On Wed, Mar 20, 2013 at 04:39:00PM -0400, Michael R. Hines wrote: > >>Unmapped vir

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael R. Hines
Forgive me, vmsplice system call? Or some other interface? I'm not following.. On 03/20/2013 04:46 PM, Michael S. Tsirkin wrote: On Wed, Mar 20, 2013 at 04:39:00PM -0400, Michael R. Hines wrote: Unmapped virtual addresses cannot be pinned for RDMA (the hardware will break), but there's no

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2013 at 04:45:05PM -0400, Michael R. Hines wrote: > On 03/20/2013 04:37 PM, Michael S. Tsirkin wrote: > >On Wed, Mar 20, 2013 at 04:24:14PM -0400, Michael R. Hines wrote: > >>On 03/20/2013 11:55 AM, Michael S. Tsirkin wrote: > >>>Then, later, in a separate patch, I can implement /de

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael R. Hines
On 03/20/2013 04:37 PM, Michael S. Tsirkin wrote: On Wed, Mar 20, 2013 at 04:24:14PM -0400, Michael R. Hines wrote: On 03/20/2013 11:55 AM, Michael S. Tsirkin wrote: Then, later, in a separate patch, I can implement /dev/pagemap support. When that's done, RDMA dynamic registration will actuall

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2013 at 04:39:00PM -0400, Michael R. Hines wrote: > Unmapped virtual addresses cannot be pinned for RDMA (the hardware > will break), > but there's no way to know they are unmapped without checking > another data structure. So for RDMA, when you try to register them, this will faul

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael R. Hines
Agreed. Very useful for KSM. Unmapped virtual addresses cannot be pinned for RDMA (the hardware will break), but there's no way to know they are unmapped without checking another data structure. - Michael On 03/20/2013 04:31 PM, Michael S. Tsirkin wrote: OK sure, this could be useful to de

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2013 at 04:24:14PM -0400, Michael R. Hines wrote: > On 03/20/2013 11:55 AM, Michael S. Tsirkin wrote: > >Then, later, in a separate patch, I can implement /dev/pagemap support. > > > >When that's done, RDMA dynamic registration will actually take effect and > >benefit from actually

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2013 at 04:20:06PM -0400, Michael R. Hines wrote: > On 03/20/2013 03:06 PM, Michael S. Tsirkin wrote: > > No, not just ballooning. Overcommit (i.e. cgroups). > > Anytime cgroups kicks out a page (or anytime the balloon kicks in), > the page would become unmapped. > >

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael R. Hines
On 03/20/2013 11:55 AM, Michael S. Tsirkin wrote: Then, later, in a separate patch, I can implement /dev/pagemap support. When that's done, RDMA dynamic registration will actually take effect and benefit from actually verifying that the page is mapped or not. - Michael Mapped into guest? You me

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael R. Hines
On 03/20/2013 03:06 PM, Michael S. Tsirkin wrote: No, not just ballooning. Overcommit (i.e. cgroups). Anytime cgroups kicks out a page (or anytime the balloon kicks in), the page would become unmapped. OK but we still need to send that page to remote. It's in swap but has guest data in there, yo

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2013 at 12:08:40PM -0400, Michael R. Hines wrote: > > On 03/20/2013 11:55 AM, Michael S. Tsirkin wrote: > >On Wed, Mar 20, 2013 at 11:15:48AM -0400, Michael R. Hines wrote: > >>OK, can we make a deal? =) > >> > >>I'm willing to put in the work to perform the dynamic registration >

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael R. Hines
On 03/20/2013 11:55 AM, Michael S. Tsirkin wrote: On Wed, Mar 20, 2013 at 11:15:48AM -0400, Michael R. Hines wrote: OK, can we make a deal? =) I'm willing to put in the work to perform the dynamic registration on the destination side, but let's go a step further and piggy-back on the effort:

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2013 at 11:15:48AM -0400, Michael R. Hines wrote: > OK, can we make a deal? =) > > I'm willing to put in the work to perform the dynamic registration > on the destination side, > but let's go a step further and piggy-back on the effort: > > We need to couple this registration with

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael R. Hines
s / is page mapped?/ is page unmapped?/ g On 03/20/2013 11:15 AM, Michael R. Hines wrote: OK, can we make a deal? =) I'm willing to put in the work to perform the dynamic registration on the destination side, but let's go a step further and piggy-back on the effort: We need to couple this r

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael R. Hines
OK, can we make a deal? =) I'm willing to put in the work to perform the dynamic registration on the destination side, but let's go a step further and piggy-back on the effort: We need to couple this registration with a very small modification to save_ram_block(): Currently, save_ram_block

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-20 Thread Michael S. Tsirkin
On Tue, Mar 19, 2013 at 06:52:59PM +0100, Paolo Bonzini wrote: > Il 19/03/2013 18:40, Michael R. Hines ha scritto: > > registration scheme would not work with cgroups because we would be > > attempting to pin zero pages (for no reason) that cgroups has already > > kicked out, which would defeat th

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael R. Hines
On 03/19/2013 01:52 PM, Paolo Bonzini wrote: So, if I submit a separate patch to fix this, would you guys review it? (Using /dev/pagemap). Sorry about the ignorance, but what is /dev/pagemap? :) /dev/pagemap is a recent interface for eserland accesses to the pagetables. https://www.kernel.org/d

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Paolo Bonzini
Il 19/03/2013 18:40, Michael R. Hines ha scritto: > registration scheme would not work with cgroups because we would be > attempting to pin zero pages (for no reason) that cgroups has already > kicked out, which would defeat the purpose of using cgroups. Yeah, pinning would be a problem. > So, i

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael R. Hines
I also did a test using RDMA + cgroup, and the kernel killed my QEMU :) So, infiniband is not smart enough to know how to avoid pinning a zero page, I guess. - Michael On 03/19/2013 01:14 PM, Paolo Bonzini wrote: Il 19/03/2013 18:09, Michael R. Hines ha scritto: Allowing QEMU to swap due to

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael R. Hines
OK, so I did a quick test and the cgroup does appear to be working correctly for zero pages. Nevertheless, this still doesn't solve the chunk registration problem for RDMA. Even with a cgroup on the sender *or* receiver side, there is no API that I know that would correctly indicate to the m

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2013 at 06:14:45PM +0100, Paolo Bonzini wrote: > Il 19/03/2013 18:09, Michael R. Hines ha scritto: > > Allowing QEMU to swap due to a cgroup limit during migration is a viable > > overcommit option? > > > > I'm trying to keep an open mind, but that would kill the migration > > time

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Paolo Bonzini
Il 19/03/2013 18:09, Michael R. Hines ha scritto: > Allowing QEMU to swap due to a cgroup limit during migration is a viable > overcommit option? > > I'm trying to keep an open mind, but that would kill the migration > time. Would it swap? Doesn't the kernel back all zero pages with a single

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael R. Hines
Allowing QEMU to swap due to a cgroup limit during migration is a viable overcommit option? I'm trying to keep an open mind, but that would kill the migration time. - Michael On 03/19/2013 11:36 AM, Michael S. Tsirkin wrote: On Tue, Mar 19, 2013 at 11:32:49AM -0400, Michael R. Hines wrote

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2013 at 11:32:49AM -0400, Michael R. Hines wrote: > On 03/19/2013 11:16 AM, Michael S. Tsirkin wrote: > >On Tue, Mar 19, 2013 at 11:08:24AM -0400, Michael R. Hines wrote: > >>This is actual a much bigger problem that I thought, not just for RDMA: > >> > >>Currently the *sender* side

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael R. Hines
On 03/19/2013 11:16 AM, Michael S. Tsirkin wrote: On Tue, Mar 19, 2013 at 11:08:24AM -0400, Michael R. Hines wrote: This is actual a much bigger problem that I thought, not just for RDMA: Currently the *sender* side is does not support overcommit during a regular TCP migration...I assume be

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2013 at 11:08:24AM -0400, Michael R. Hines wrote: > This is actual a much bigger problem that I thought, not just for RDMA: > > Currently the *sender* side is does not support overcommit > during a regular TCP migration...I assume because the > migration_bitmap does not know wh

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael R. Hines
This is actual a much bigger problem that I thought, not just for RDMA: Currently the *sender* side is does not support overcommit during a regular TCP migration...I assume because the migration_bitmap does not know which memory is mapped or unmapped by the host kernel. Is this a known issue

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael R. Hines
On 03/19/2013 04:19 AM, Michael S. Tsirkin wrote: We have ways (e.g. cgroups) to limit what a VM can do. If it tries to use more RAM than we let it, it will swap, still making progress, just slower. OTOH it looks like pinning more memory than allowed by the cgroups limit will just get stuck for

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-19 Thread Michael S. Tsirkin
On Mon, Mar 18, 2013 at 07:23:53PM -0400, Michael R. Hines wrote: > On 03/18/2013 05:26 PM, Michael S. Tsirkin wrote: > > > >Probably but I haven't mentioned ballooning at all. > > > >memory overcommit != ballooning > > Sure, then setting ballooning aside for the moment, > then let's just consider

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-18 Thread Michael R. Hines
On 03/18/2013 05:26 PM, Michael S. Tsirkin wrote: Probably but I haven't mentioned ballooning at all. memory overcommit != ballooning Sure, then setting ballooning aside for the moment, then let's just consider regular (unused) virtual memory. In this case, what's wrong with the destination

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-18 Thread Michael S. Tsirkin
On Mon, Mar 18, 2013 at 04:24:44PM -0400, Michael R. Hines wrote: > On 03/18/2013 06:40 AM, Michael S. Tsirkin wrote: > >I think there are two things here, API documentation and protocol > >documentation, protocol documentation still needs some more work. > >Also if what I understand from this docu

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-18 Thread Michael R. Hines
On 03/18/2013 06:40 AM, Michael S. Tsirkin wrote: I think there are two things here, API documentation and protocol documentation, protocol documentation still needs some more work. Also if what I understand from this document is correct this breaks memory overcommit on destination which needs

Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-18 Thread Michael S. Tsirkin
On Sun, Mar 17, 2013 at 11:18:56PM -0400, mrhi...@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" > > This tries to cover all the questions I got the last time. > > Please do tell me what is not clear, and I'll revise again. > > Signed-off-by: Michael R. Hines > --- > docs/rdma.txt | 20

[Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport

2013-03-17 Thread mrhines
From: "Michael R. Hines" This tries to cover all the questions I got the last time. Please do tell me what is not clear, and I'll revise again. Signed-off-by: Michael R. Hines --- docs/rdma.txt | 208 + 1 file changed, 208 insertions(+)