Next steps with pv_ops for Xen

2007-11-21 Thread Stephen C. Tweedie
Hi all, I've been looking at the next steps to try to get Xen running fully on top of pv_ops. To that end, I've (just) started looking at one of the next major jobs --- i686 dom0 on pv_ops. There are still a number of things needing done to reach parity with xen-unstable: x86_64 xen on pv_ops

Re: Next steps with pv_ops for Xen

2007-12-03 Thread Gerd Hoffmann
Stephen C. Tweedie wrote: > Hi all, > > driver domains Looked at the gntdev (grant table mappings for user space) driver, noticed that one is not self-contained. It needs a hook for page unmapping: http://xenbits.xensource.com/xen-3.1-testing.hg?rev/7180d2e61f92 plus an s/ptep_get_and_cle

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-26 Thread Juan Quintela
Hi, your console works great, but rest of patches are assuming: arch/x86/boot/compressed/notes-xen.c arch/x86/xen/early.c at least. It looks as if there is missing another patche, could you take a look, please? Otherwise, I will take a look at what is missing. It breaks with: Intel machine ch

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-26 Thread Jeremy Fitzhardinge
Juan Quintela wrote: > Hi, > > your console works great, but rest of patches are assuming: > > arch/x86/boot/compressed/notes-xen.c > arch/x86/xen/early.c > Yes, those are leftovers from a somewhat unsuccessful attempt at getting ELF-in-bzImage booting working. I need to go back and make bzIma

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Jan Beulich
>> It breaks with: >> >> Intel machine check architecture supported. >> (XEN) traps.c:1734:d0 Domain attempted WRMSR 0404 from :0001 >> to >> :. >> Intel machine check reporting enabled on CPU#0. >> general protection fault: [#1] SMP >> Modules linked in: >>

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Jeremy Fitzhardinge
Jan Beulich wrote: >>> It breaks with: >>> >>> Intel machine check architecture supported. >>> (XEN) traps.c:1734:d0 Domain attempted WRMSR 0404 from >>> :0001 to >>> :. >>> Intel machine check reporting enabled on CPU#0. >>> general protection fault: [#1] SMP

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Jan Beulich
>The oops and backtrace doesn't suggest it's an MSR write. Does a crX Oh, right, the MSR write is being ignored, not failed. >write take the same path through the emulator as an MSR write? No, the two operations take different paths. Jan ___ Virtua

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-11-27 Thread Stephen C. Tweedie
Hi, On Tue, 2007-11-27 at 09:00 -0800, Jeremy Fitzhardinge wrote: > > Why do you think that's a CR0 write? > > Well, the oops says "EIP is at native_write_cr0+0x0/0x4", and the caller > is prepare_set(), which does: > > /* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */ >

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Gerd Hoffmann
Derek Murray wrote: > I take the blame for that one. I added the hook because, if a process > were to die whilst holding one or more grants, there were no hooks that > would make it possible to carry out the grant-unmap. All existing hooks > on either the device or the VMA were called *after* the P

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Derek Murray
Gerd Hoffmann wrote: Derek Murray wrote: I take the blame for that one. I added the hook because, if a process were to die whilst holding one or more grants, there were no hooks that would make it possible to carry out the grant-unmap. All existing hooks on either the device or the VMA were call

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Derek Murray
I take the blame for that one. I added the hook because, if a process were to die whilst holding one or more grants, there were no hooks that would make it possible to carry out the grant-unmap. All existing hooks on either the device or the VMA were called *after* the PTEs were cleared. It ge

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Mark Williamson
> >> It gets better, though. The same hook is used in the version of blktap > >> in linux-2.6.18-xen (not, as far as I can see, in the sparse tree for > >> xen-3.1-testing): > > > > Oh, I'm thinking more in the direction of killing blktap altogether in > > favor of a pure userspace implementation o

RE: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread D.G. Murray
Hi Mark, > Maybe a change to the gntdev userspace API to allow batching > of mapping requests? Something along the lines of the following? /** * Memory maps one or more grant references from one or more domains to a * contiguous local address range. Mappings should be unmapped with * xc_gnt

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Mark Williamson
> Hi Mark, > > > Maybe a change to the gntdev userspace API to allow batching > > of mapping requests? > > Something along the lines of the following? Just like that :-D When you said "multiple syscalls per mapping" I assumed you meant that we'd lose the batching you get by doing a mulicall. If

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-03 Thread Gerd Hoffmann
Derek Murray wrote: > If we let Linux zap the page tables before we unmap the grant reference, > then it is not possible to unmap the grant reference. The > unmap_grant_ref hypercall ultimately calls destroy_grant_pte_mapping in > xen/arch/x86/mm.c, which ensures that the PTE does in fact point to

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Derek Murray
Gerd Hoffmann wrote: On this point I completely agree with you! If anyone has any less radical suggestions, then I'd be delighted to refactor the gntdev code to use them. However, I'm not currently aware of any alternative that maintains robustness to process crashes. Oh, for me it isn't robust

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Stephen C. Tweedie
Hi, On Tue, 2007-12-04 at 13:01 +0100, Gerd Hoffmann wrote: > >> Who uses the gntdev device right now? > > > > Good question! I'm aware of it being used in a few research projects, > > and it seems to work for them (though I think it is mostly used with the > > linux-2.6.18-xen kernel). Anyone e

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Gerd Hoffmann
Derek Murray wrote: > Gerd Hoffmann wrote: >> Oh, for me it isn't robust at all, it crashes on the first munmap >> syscall. It is the Fedora 8 kernel. See attachment. Didn't try >> xensource 2.6.18 yet. > > My gut feeling is that something changed in mm between 2.6.18 and > 2.6.21, but that see

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Gerd Hoffmann
Stephen C. Tweedie wrote: > Hi, > > On Tue, 2007-12-04 at 13:01 +0100, Gerd Hoffmann wrote: > Who uses the gntdev device right now? >>> Good question! I'm aware of it being used in a few research projects, >>> and it seems to work for them (though I think it is mostly used with the >>> linux

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-04 Thread Mark Williamson
> I am not quite clear about the purpose of pv-ops , what do we want to > deal with by developping "pv-ops"? is it used for HVM or for PV or KVM > or something ? I have seen it for a few months in the list ,and > "pv-ops"is an active project ,but i am not clear about what is the aim > of "pv-ops" ,

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Hi Gerd, Gerd Hoffmann wrote: Want reproduce? Here we go: * grab xenner 0.8 from http://dl.bytesex.org/releases/xenner/ * grab a xenified dom0 kernel without blktap driver (either not compiled or module not loaded). * start xend * start blkbackd from xenner package (you probably wa

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Gerd Hoffmann
Stephen C. Tweedie wrote: > I can't help wondering if this is a hint that now is the time to find a > better API, which doesn't have the requirement (a) that seems to be > causing such trouble? Are other PV guests --- *BSD, Solaris --- going > to have the same problems with their VM layers if they

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Gerd, Can you try the attached patch against linux-2.6.18-xen.hg? I think the problem was that the gntdev VMA is not marked as being VM_PFNMAP, therefore it tries to get a struct page_struct for each granted page when it is unmapped (and maybe sometimes succeeds (incorrectly), which could be

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Keir Fraser wrote: Is this patch to go into linux-2.6.18-xen.hg then? Yes, even if it doesn't fix the exact bug we're seeing here, I think it should go in. I've attached a version with my signed-off-by and a better commit comment. Cheers, Derek. # HG changeset patch # User [EMAIL PROTECTED

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Keir Fraser wrote: Yes, this would work okay I suspect. Good enough as a stop-gap measure? Are there any other responsibilities that you acquire if you make use of VM_FOREIGN (in particular, how would this affect get_user_pages)? VM_FOREIGN is already set for the gntdev VMA (mostly because it's

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Jeremy Fitzhardinge wrote: Could we use one of the software-defined bits in the PTE to indicate that this is a foreign/granted PTE, and have set_pte_at behave differently if you pass it a pte with this bit set? Actually, as Gerd pointed out in his answer to his own question, the use of VM_DONT

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Keir Fraser wrote: Actually I'm not so sure now. Presumably you add VM_PFNMAP to make vm_normal_page() return NULL? But actually I would expect pte_pfn() to return max_mapnr because the mapped page is not a local page. And that should cause vm_normal_page() to return NULL always, regardless of w

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Keir Fraser wrote: Need to bite the bullet and fix this properly by setting a software flag in ptes that are not subject to reference counting. Could we get away with testing the VM_FOREIGN flag in vm_normal_page()? Although I get the impression that this wouldn't be easily justified if tryin

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Jeremy Fitzhardinge
Derek Murray wrote: > Ultimately, fork calls dup_mm, which calls, dup_mmap, which calls > copy_{page,pud,pmd,pte}_range, which calls copy_one_pte, which calls > set_pte_at, which hypercalls HYPERVISOR_update_va_mapping. > > The hypercall will not succeed and will return an error code > indicating t

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Jeremy Fitzhardinge
Derek Murray wrote: > Jeremy Fitzhardinge wrote: >> Could we use one of the software-defined bits in the PTE to indicate >> that this is a foreign/granted PTE, and have set_pte_at behave >> differently if you pass it a pte with this bit set? > > Actually, as Gerd pointed out in his answer to his ow

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 17:17, "Derek Murray" <[EMAIL PROTECTED]> wrote: >> Actually I'm not so sure now. Presumably you add VM_PFNMAP to make >> vm_normal_page() return NULL? But actually I would expect pte_pfn() to >> return max_mapnr because the mapped page is not a local page. And that >> should cause vm_n

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 20:15, "Jeremy Fitzhardinge" <[EMAIL PROTECTED]> wrote: > In 2.6.18-xen the only two implementations of zap_pte are > blktap_clear_pte and gntdev_clear_pte. Given a ptep with the > grant-mapping bit set, could we determine which of these need calling > and do the appropriate thing? Do

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 17:48, "Derek Murray" <[EMAIL PROTECTED]> wrote: > Keir Fraser wrote: >> Need to bite the bullet and fix this properly by setting a software flag in >> ptes that are not subject to reference counting. > > Could we get away with testing the VM_FOREIGN flag in vm_normal_page()? > Althoug

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Gerd Hoffmann
>> Alternatively, could we use the _PAGE_GNTTAB PTE flag that is used for >> debugging? Indeed, if we did this, could be obviate the need for the >> PTE-zapping hook, by instead catching the case where this flag is set, >> and unmapping the grant implicitly? > > Well, in the general case you don't

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 14:30, "Derek Murray" <[EMAIL PROTECTED]> wrote: > Keir Fraser wrote: >> Is this patch to go into linux-2.6.18-xen.hg then? > > Yes, even if it doesn't fix the exact bug we're seeing here, I think it > should go in. I've attached a version with my signed-off-by and a better > commit co

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Geoffrey Lefebvre
> Can we take a different approach from the zap_pte hook? Given that > we're 1) planning on claiming a pte bit for grant mappings, and 2) need > to hook ptep_get_and_clear anyway to solve the mprotect performance > problems, couldn't we just special-case grant mapping pte_clears? > > In 2.6.18-xen

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Derek Murray
Stephen C. Tweedie wrote: So... the interface (a) cannot be used on the Linux VM without at least one invasive VM modification, due to the requirement of ptes being explicitly unmapped via hypercall; Also there is the use of VM_FOREIGN (http://xenbits.xensource.com/linux-2.6.18-xen.hg?file/b27

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Keir Fraser
On 5/12/07 14:12, "Gerd Hoffmann" <[EMAIL PROTECTED]> wrote: >> Thanks for the repro details. I'll have a go at this later. One thing we >> haven't tested AFAIK is mapping grants in the same domain: could you >> check to see if the bug is the same if you attach a block device to a >> domain other

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Gerd Hoffmann
Hi, > Thanks for the repro details. I'll have a go at this later. One thing we > haven't tested AFAIK is mapping grants in the same domain: could you > check to see if the bug is the same if you attach a block device to a > domain other than Dom0? Also, could you send any Xen console output, if

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-05 Thread Gerd Hoffmann
Hi, > gntdev doesn't even try to handle forking. I wouldn't be surprised if > that is a great way to kill Domain-0. The xen hypervisor will most > likely not be amused to find a pte refering to a granted (but foreign) > page which wasn't established using the grant table interface. Pinning >

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Derek Murray
Keir Fraser wrote: You'd need to track pte->grant_handle mappings somewhere, but it could certainly be done this way, yes. At the moment, blktap and gntdev provide struct pages to get_user_pages by smuggling them in the vm_private_data field of the relevant vm_area_struct. Could we use this f

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Gerd Hoffmann
Geoffrey Lefebvre wrote: > In order to unmap a grant, you need the grant handle obtained when the > grant is mapped. That handle needs to be stored somewhere for the > lifetime of the mapping. Where would the handle be stored (as Gerd > proposed) in order to be able to unmap from ptep_get_and_clear

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Gerd Hoffmann
D.G. Murray wrote: > Hi Mark, > >> Maybe a change to the gntdev userspace API to allow batching >> of mapping requests? > > Something along the lines of the following? > > void *xc_gnttab_map_grant_refs(int xcg_handle, >uint32_t count, >

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Derek Murray
Gerd Hoffmann wrote: Yes, except that it should actually work ;) It doesn't for me (Fedora 8 again). Grab xenner 0.9 (just uploaded), edit blkbackd.c and flip the BATCH_MAPS from 0 to 1, compile, run, see it not work. Which version of the Xen tools are you using? There was a bug in the versi

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Gerd Hoffmann
> Which version of the Xen tools are you using? There was a bug in the > version released with Xen 3.1, which should have been cleaned up in the > subsequent minor versions. Try grabbing the patch to libxc at: > > http://xenbits.xensource.com/xen-3.1-testing.hg?raw-rev/135d5088909f Probably it i

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-06 Thread Jeremy Fitzhardinge
Derek Murray wrote: > Keir Fraser wrote: >> You'd need to track pte->grant_handle mappings somewhere, but it could >> certainly be done this way, yes. > > At the moment, blktap and gntdev provide struct pages to > get_user_pages by smuggling them in the vm_private_data field of the > relevant vm_ar

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-12 Thread Keir Fraser
We already make the VM_FOREIGN check conditional on defined(CONFIG_XEN). We could add defined(CONFIG_X86) as well? This would seem reasonable as a temporary measure for the old 2.6.18 tree. -- Keir On 12/12/07 08:27, "Isaku Yamahata" <[EMAIL PROTECTED]> wrote: > This patch breaks blktap and gnt

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-12 Thread Isaku Yamahata
On Wed, Dec 05, 2007 at 06:15:49PM +, Derek Murray wrote: > Keir Fraser wrote: > >Yes, this would work okay I suspect. Good enough as a stop-gap measure? Are > >there any other responsibilities that you acquire if you make use of > >VM_FOREIGN (in particular, how would this affect get_user_page

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-12 Thread Isaku Yamahata
On Wed, Dec 12, 2007 at 08:39:41AM +, Keir Fraser wrote: > We already make the VM_FOREIGN check conditional on defined(CONFIG_XEN). We > could add defined(CONFIG_X86) as well? This would seem reasonable as a > temporary measure for the old 2.6.18 tree. Yes, ok for IA64. -- yamahata __

Re: [Xen-devel] Re: Next steps with pv_ops for Xen

2007-12-21 Thread Gerd Hoffmann
D.G. Murray wrote: > Hi Mark, > void *xc_gnttab_map_grant_refs(int xcg_handle, >uint32_t count, >uint32_t *domids, >uint32_t *refs, >int prot); Fedora 8 has 3.1.2 packa