[Xen-devel] Question to Xen log level in the case of PT

2015-09-07 Thread Chen, Tiejun

All guys,

Sorry to raise a question to you since I'm not very sure how to deal 
with this.


When I passthrough a device like IGD, I can see so many messages:

"memory_map:add:" and "memory_map:remove:"

since we have to add/remove all pages map residing PCI bar. Especially 
as a graphic device, oftentimes this range would occupy dozens of MB, 
even hundreds of MB. These print messages consume a lot of time to boot 
a VM. For instance, it takes about 5 minutes to boot a Windows guest on 
my BDW. But if I remove these output simply like this,


diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 7f959f3..82da9d1 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1049,10 +1049,6 @@ long 
do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)


 if ( add )
 {
-printk(XENLOG_G_INFO
-   "memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n",
-   d->domain_id, gfn, mfn, nr_mfns);
-
 ret = map_mmio_regions(d, gfn, nr_mfns, mfn);
 if ( ret )
 printk(XENLOG_G_WARNING
@@ -1061,10 +1057,6 @@ long 
do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)

 }
 else
 {
-printk(XENLOG_G_INFO
-   "memory_map:remove: dom%d gfn=%lx mfn=%lx nr=%lx\n",
-   d->domain_id, gfn, mfn, nr_mfns);
-
 ret = unmap_mmio_regions(d, gfn, nr_mfns, mfn);
 if ( ret && is_hardware_domain(current->domain) )
 printk(XENLOG_ERR

its down to a half, about 2.5 minutes.

I know I can't delete this directly. But currently there are four log 
level on Xen side,


 *   XENLOG_ERR: Fatal errors, either Xen, Guest or Dom0
 *   is about to crash.
 *
 *   XENLOG_WARNING: Something bad happened, but we can recover.
 *
 *   XENLOG_INFO: Interesting stuff, but not too noisy.
 *
 *   XENLOG_DEBUG: Use where ever you like. Lots of noise.

looks I have to change XENLOG_G_INFO to XENLOG_G_WARNING but its not 
appropriate here.


So can Xen change log level dynamically like Linux? If yes, we might 
change this level temporarily while passing through IGD. If not, any 
suggestion?


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 17/18] VT-d: Dump the posted format IRTE

2015-09-07 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 07, 2015 9:05 PM
> To: Wu, Feng
> Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v6 17/18] VT-d: Dump the posted format IRTE
> 
> >>> On 25.08.15 at 03:57,  wrote:
> > @@ -220,7 +224,7 @@ static void dump_iommu_info(unsigned char key)
> >  struct iremap_entry *p;
> >  if ( i % (1 << IREMAP_ENTRY_ORDER) == 0 )
> >  {
> > -/* This entry across page boundry */
> > +/* This entry across page boundary. */
> 
> Either leave the comment alone, or fix its content along with its format.
> 
> > @@ -230,7 +234,7 @@ static void dump_iommu_info(unsigned char key)
> >  else
> >  p = &iremap_entries[i % (1 <<
> IREMAP_ENTRY_ORDER)];
> >
> > -if ( !p->remap.p )
> > +if ( !p->remap.p || p->remap.im )
> >  continue;
> >
> printk("  %04x:  %x   %x  %04x %08x %02x%x   %x  %x  %x  %x"
> >  "   %x %x\n", i,
> > @@ -239,6 +243,41 @@ static void dump_iommu_info(unsigned char key)
> >  p->remap.rh, p->remap.dm, p->remap.fpd,
> p->remap.p);
> >  print_cnt++;
> >  }
> > +
> > +if ( iremap_entries )
> > +unmap_vtd_domain_page(iremap_entries);
> > +
> > +iremap_entries = NULL;
> > +printk ("\nEntries for posted format:\n");
> > +printk("   SVT  SQ   SID  PDA  V
> URG AVL FPD P\n");
> > +for ( i = 0; i < nr_entry; i++ )
> > +{
> > +struct iremap_entry *p;
> > +if ( i % (1 << IREMAP_ENTRY_ORDER) == 0 )
> > +{
> > +/* This entry across page boundry */
> > +if ( iremap_entries )
> > +unmap_vtd_domain_page(iremap_entries);
> > +
> > +GET_IREMAP_ENTRY(iremap_maddr, i,
> > + iremap_entries, p);
> > +}
> > +else
> > +p = &iremap_entries[i % (1 <<
> IREMAP_ENTRY_ORDER)];
> > +
> > +if ( !p->post.p || !p->post.im )
> > +continue;
> > +
> > +
> printk("  %04x:  %x   %x  %04x %16lx %02x%x   %x  %x  %x\n",
> > +i,
> > +p->post.svt, p->post.sq, p->post.sid,
> > +((u64)p->post.pda_h << 32) | (p->post.pda_l << 6),
> > +p->post.vector, p->post.urg, p->post.avail,
> p->post.fpd,
> > +p->post.p);
> > +
> > +print_cnt++;
> > +}
> 
> Hmm, this two stage approach seems to imply more overhead in
> terms of execution time and in terms of analysis (of the output)
> time. Can you see to make this a single loop with the output
> formatted in a way such that
> - one can immediately recognize the kind that each line represents,
This is easy, I can add a prefix in each line, such as "posted", "remapped".

> - mixed output is still sufficiently aligned to retain overall readability?
I think this is a little hard, there are many different fields between posted
format and remapped format, and the number of filed for each format
is different, mixing them total will hurt the readability very big.

And consider this is just a debug function, is it that bad to add a
two stage search here? The output is clear and nice in this way.

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.10 bisection] complete test-amd64-amd64-xl-vhd

2015-09-07 Thread osstest service owner
branch xen-unstable
xen branch xen-unstable
job test-amd64-amd64-xl-vhd
test debian-di-install

Tree: linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  Bug introduced:  292f53675e097ca807df744fb6fd39211a134bf7
  Bug not present: 3f2c206ae6f9e1005ac7f092e8d65c17307a0d59


  commit 292f53675e097ca807df744fb6fd39211a134bf7
  Author: Marek Marczykowski-Górecki 
  Date:   Fri Jun 26 03:28:24 2015 +0200

  xen/gntdevt: Fix race condition in gntdev_release()

  commit 30b03d05e07467b8c6ec683ea96b5bffcbcd3931 upstream.

  While gntdev_release() is called the MMU notifier is still registered
  and can traverse priv->maps list even if no pages are mapped (which is
  the case -- gntdev_release() is called after all). But
  gntdev_release() will clear that list, so make sure that only one of
  those things happens at the same time.

  Signed-off-by: Marek Marczykowski-Górecki 

  Signed-off-by: David Vrabel 
  Signed-off-by: Greg Kroah-Hartman 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-3.10/test-amd64-amd64-xl-vhd.debian-di-install.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/linux-3.10/test-amd64-amd64-xl-vhd.debian-di-install
 --summary-out=tmp/61590.bisection-summary --basis-template=60670 
--blessings=real,real-bisect linux-3.10 test-amd64-amd64-xl-vhd 
debian-di-install
Searching for failure / basis pass:
 61268 fail [host=chardonnay1] / 60670 [host=godello1] 60656 [host=chardonnay0] 
60548 [host=fiano1] template as basis? using template as basis.
Failure / basis pass flights: 61268 / 60670
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 5a427ce18a14d6b85972c62196a8f10c3624d74a 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
5cdde31eacdd288359746019ad05cac8ed5d9f70 
b05befcbea71a979509ce04f02929969a790c923 
801ab48e5556cb54f67e3cb57f077f47e8663ced
Basis pass 78fb9f4236d9077fb343fb5a4e55fe27075e9a1d 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7f057440b31da38196e3398fd1b618fc36ad97d6 
bcf35eec0b621c46dbf0aeb40c6bc06b5d3981aa 
201eac83831d94ba2e9a63a7eed4c128633fafb1
Generating revisions with ./adhoc-revtuple-generator  
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#78fb9f4236d9077fb343fb5a4e55fe27075e9a1d-5a427ce18a14d6b85972c62196a8f10c3624d74a
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/staging/qemu-xen-unstable.git#7f057440b31da38196e3398fd1b618fc36ad97d6-5cdde31eacdd288359746019ad05cac8ed5d9f70
 
git://xenbits.xen.org/staging/qemu-upstream-unstable.git#bcf35eec0b621c46dbf0aeb40c6bc06b5d3981aa-b05befcbea71a979509ce04f02929969a790c923
 
git://xenbits.xen.org/xen.git#201eac83831d94ba2e9a63a7eed4c128633fafb1-801ab48e5556cb54f67e3cb57f077f47e8663ced
Loaded 18027 nodes in revision graph
Searching for test results:
 60670 [host=godello1]
 60656 [host=chardonnay0]
 60827 fail irrelevant
 60790 fail irrelevant
 60748 fail irrelevant
 60950 fail 5a427ce18a14d6b85972c62196a8f10c3624d74a 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7f057440b31da38196e3398fd1b618fc36ad97d6 
b05befcbea71a979509ce04f02929969a790c923 
7b99717f62caeac08eea224a177cd28f047ac4b5
 60865 fail irrelevant
 60992 fail 5a427ce18a14d6b85972c62196a8f10c3624d74a 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7f057440b31da38196e3398fd1b618fc36ad97d6 
b05befcbea71a979509ce04f02929969a790c923 
7b99717f62caeac08eea224a177cd28f047ac4b5
 61103 fail 5a427ce18a14d6b85972c62196a8f10c3624d74a 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7f057440b31da38196e3398fd1b618fc36ad97d6 
b05befcbea71a979509ce04f02929969a790c923 
7b99717f62caeac08eea224a177cd28f047ac4b5
 61327 fail 5a427ce18a14d6b85972c62196a8f10c3624d74a 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7f057440b31da38196e3398fd1b618fc36ad97d6 
b05befcbea71a979509ce04f02929969a790c923 
7b99717f62caeac08eea224a177cd28f047ac4b5
 61325 fail 5a427ce18a14d6b85972c62196a8f10c3624d74a 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7f057440b31da38196e3398fd1b618fc36ad97d6 
b05befcbea71a979509ce04f02929969a790c923 
7b99717f62

Re: [Xen-devel] [PATCH v6 06/18] vmx: Add some helper functions for Posted-Interrupts

2015-09-07 Thread Jan Beulich
>>> "Wu, Feng"  09/08/15 4:39 AM >>>
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Monday, September 07, 2015 6:46 PM
>> >>> On 06.09.15 at 04:05,  wrote:
>> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> Sent: Friday, September 04, 2015 10:40 PM
>> >> >>> On 25.08.15 at 03:57,  wrote:
>> >> > @@ -121,11 +122,31 @@ static inline int pi_test_and_clear_on(struct
>> >> pi_desc *pi_desc)
>> >> >  return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
>> >> >  }
>> >> >
>> >> > +static inline int pi_test_on(struct pi_desc *pi_desc)
>> >> > +{
>> >> > +return test_bit(POSTED_INTR_ON, &pi_desc->control);
>> >> > +}
>> >>
>> >> For this and ...
>> >>
>> >> > +static inline int pi_test_sn(struct pi_desc *pi_desc)
>> >> > +{
>> >> > +return test_bit(POSTED_INTR_SN, &pi_desc->control);
>> >> > +}
>> >>
>> >> ... this I wonder whether using the bitfield you defined in the
>> >> previous patch wouldn't allow the compiler more freedom in
>> >> how to carry this out.
>> >
>> > I am sorry, I don't quite understand it. Do you mean: the bitfield
>> > defined in previous patch is pointless, or using the bitfield here?
>> 
>> Use it here would seem preferable. (But please recall that I
>> questioned this two fold access model - partly using bitfields,
>> partly using bitops - earlier on, and ideally _all_ accesses to
>> a certain kind of data structure would follow a single, uniform
>> model.)
>
>Yes, I remember your suggestion about this. So now I only use
>bitops everywhere, in that case, do you think I still need to use
>bitfields here?

Then you understood only half of it: You now use the bitfield for
accessing multi-bit fields, and bitops for single-bit ones. While
certainly you can't use bitops for multi-bit fields, the corresponding
method would be masks and shifts.

Bottom line - if using a mixture, then use what allows the compiler to
generate most efficient code. And clarify for yourself whether the
modifying bitops really need to be locked ones.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.4-testing test] 61512: regressions - FAIL

2015-09-07 Thread osstest service owner
flight 61512 xen-4.4-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/61512/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qcow2  9 debian-di-install fail REGR. vs. 60727
 test-amd64-i386-xl-raw9 debian-di-install fail REGR. vs. 60727

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt-vhd   9 debian-di-install fail REGR. vs. 60727
 test-amd64-amd64-libvirt-raw  9 debian-di-install fail REGR. vs. 60727
 test-amd64-amd64-libvirt-vhd  9 debian-di-install fail REGR. vs. 60727
 test-armhf-armhf-xl-multivcpu 16 guest-start/debian.repeatfail  like 60696
 test-amd64-i386-libvirt-qcow2  9 debian-di-installfail  like 60727
 test-amd64-i386-xl-vhd9 debian-di-installfail   like 60727
 test-amd64-amd64-xl-vhd   9 debian-di-installfail   like 60727
 test-amd64-i386-libvirt  11 guest-start  fail   like 60727
 test-amd64-amd64-libvirt 11 guest-start  fail   like 60727
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 60727

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-migrupgrade  1 build-check(1)   blocked  n/a
 test-amd64-i386-migrupgrade   1 build-check(1)   blocked  n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 build-amd64-prev  5 xen-buildfail   never pass
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 build-i386-prev   5 xen-buildfail   never pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-amd64-amd64-xl-qcow2 9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt 11 guest-start  fail   never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 21 leak-check/checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass

version targeted for testing:
 xen  339f5743e84a28dd01ffa7498372e410301cd0b4
baseline version:
 xen  3646b134c1673f09c0a239de10b0da4c9265c8e8

Last test of basis60727  2015-08-16 16:15:09 Z   22 days
Failing since 60802  2015-08-20 14:41:37 Z   18 days8 attempts
Testing same since61512  2015-09-07 11:42:03 Z0 days1 attempts


People who touched revisions under test:
  Ian Campbell 
  Jan Beulich 
  Julien Grall 

jobs:
 build-amd64-xend pass
 build-i386-xend  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  

Re: [Xen-devel] [PATCH v6 15/18] vmx: Properly handle notification event when vCPU is running

2015-09-07 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 07, 2015 8:10 PM
> To: Wu, Feng; Zhang, Yang Z
> Cc: Andrew Cooper; Tian, Kevin; xen-devel@lists.xen.org; Keir Fraser
> Subject: Re: [PATCH v6 15/18] vmx: Properly handle notification event when
> vCPU is running
> 
> >>> On 25.08.15 at 03:57,  wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >  const struct hvm_function_table * __init start_vmx(void)
> >  {
> >  set_in_cr4(X86_CR4_VMXE);
> > @@ -2073,7 +2119,7 @@ const struct hvm_function_table * __init
> start_vmx(void)
> >
> >  if ( cpu_has_vmx_posted_intr_processing )
> >  {
> > -alloc_direct_apic_vector(&posted_intr_vector,
> event_check_interrupt);
> > +alloc_direct_apic_vector(&posted_intr_vector,
> pi_notification_interrupt);
> >
> >  if ( iommu_intpost )
> >  alloc_direct_apic_vector(&pi_wakeup_vector,
> pi_wakeup_interrupt);
> 
> Considering that you do this setup independent of iommu_intpost,
> won't this (namely, but not only) for the !iommu_inpost case result
> in a whole lot of useless softirqs to be raised? 

I don't think lots of useless softirqs will be raised in !iommu_intpost
case, since we can get pi_notification_interrupt() only when someone
called __vmx_deliver_posted_interrupt() in which, the VCPU_KICK_SOFTIRQ
bit was set.

> IOW - shouldn't this setup be conditional, 

We cannot setup pi_notification_interrupt() conditional, since it is
for all cases, non-vtd-pi and vt-d pi.

> and shouldn't the handler also only conditionally raise the softirq
> (to as much as possible limit their amount)?

As mentioned above, there are not extra softirq raised. If you think
It is necessary, I can add a condition here as below, which I really think
is useless, but I am fine to add it if you really think it is a must.

if ( iommu_intpost )
raise_softirq(VCPU_KICK_SOFTIRQ);

Thanks,
Feng


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 04/18] vt-d: VT-d Posted-Interrupts feature detection

2015-09-07 Thread Jan Beulich
>>> "Wu, Feng"  09/08/15 4:39 AM >>>
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Monday, September 07, 2015 6:43 PM
>> >>> On 06.09.15 at 03:49,  wrote:
>> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> Sent: Friday, September 04, 2015 10:31 PM
>> >> >>> On 25.08.15 at 03:57,  wrote:
>> >> > --- a/xen/drivers/passthrough/vtd/iommu.c
>> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c
>> >> > @@ -2079,6 +2079,9 @@ static int init_vtd_hw(void)
>> >> >  disable_intremap(drhd->iommu);
>> >> >  }
>> >> > @@ -2176,6 +2179,14 @@ int __init intel_vtd_setup(void)
>> >> >  if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
>> >> >  iommu_intremap = 0;
>> >> >
>> >> > +/*
>> >> > + * We cannot use posted interrupt if X86_FEATURE_CX16 is
>> >> > + * not supported, since we count on this feature to
>> >> > + * atomically update 16-byte IRTE in posted format.
>> >> > + */
>> >> > +if ( !iommu_intremap || !cap_intr_post(iommu->cap)
>> >> || !cpu_has_cx16 )
>> >> > +iommu_intpost = 0;
>> >>
>> >> You should check iommu_intpost instead of iommu_intremap to
>> >> match the other checks above here.
>> >
>> > Here I followed your advice:
>> > http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg01261.html
>> >
>> > I think the following code should be ok here though, if you like it, I can
>> > change back.
>> >
>> > +if ( !iommu_intpost || !cap_intr_post(iommu->cap)
>> > +|| !cpu_has_cx16 )
>> > +iommu_intpost = 0;
>> 
>> The first part is neither in line with the earlier adjustments (as _still_
>> see in the patch context above) not does it make much sense to set
>> a variable to zero when it already is zero.
>
>So, what about this?
>
>+if ( !cap_intr_post(iommu->cap) || !cpu_has_cx16 )
>+iommu_intpost = 0;

Yes. While still not in sync with the other checks, this one is actually better 
than
those (which kind of pointlessly check the feature they want to disable).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 13/18] Update IRTE according to guest interrupt config changes

2015-09-07 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 07, 2015 7:03 PM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org
> Subject: RE: [PATCH v6 13/18] Update IRTE according to guest interrupt config
> changes
> 
> >>> On 06.09.15 at 06:54,  wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Friday, September 04, 2015 11:59 PM
> >> >>> On 25.08.15 at 03:57,  wrote:
> >>> 
> >> > +if ( vcpu )
> >> > +{
> >> > +rc = pi_update_irte( vcpu, info,
> pirq_dpci->gmsi.gvec );
> >> > +if ( unlikely(rc) )
> >> > +dprintk(XENLOG_G_INFO,
> >> > +"%pv: failed to update PI IRTE,
> >> gvec:%02x\n",
> >> > +vcpu, pirq_dpci->gmsi.gvec);
> >>
> >> Even if only a debug build printk() - aren't you afraid that if this
> >> ever triggers, it will trigger a lot? And hence be pretty useless?
> >
> > I think it reaches this debug printk rarely, but a least, when it is really
> > failed, it
> > can give people some hints about why we are using interrupt remapping
> > instead
> > of interrupt posing for the external interrupts.
> 
> I understand your motivation, but you don't really answer my
> question. (And btw., if you really mean "rarely", then there's a bug
> somewhere that you need to fix. It should _never_ trigger if
> everything is working correctly.)

I mean pi_update_irte() can return failure theoretically, because there
are some pointer check in it. You said, it would trigger a lot if this ever
triggers. Do you mean for a specific pirq, it will always fail after the first
failure? So what is your suggestion here? Adding an ASSERT()?

Thanks,
Feng

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-07 Thread Liuqiming (John)

Ok, I will try to explain, correct me if I got anything wrong:

The problem here is not interrupts lost but interrupts not delivered in 
time.


there are basically two path to inject an interrupt into VM  (or vCPU to 
another vCPU):

Path 1, the traditional way:
  1) set bit  in vlapic IRR field which represent an interrupt, 
then kick vcpu

  2) a VCPU_KICK_SOFTIRQ softirq raised
  3) if VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise 
return and do nothing

  4) send an EVENT_CHECK_VECTOR IPI  to target vcpu
  5) target vcpu will VMEXIT due to EXIT_REASON_EXTERNAL_INTERRUPT
  6) the interrupt handler basically do nothing
  7) interrupt in IRR will be evaluated
  8) VCPU_KICK_SOFTIRQ will be cleared when do_softirq
  9) there will be an interrupt inject into vcpu when VMENTRY

Path 2, the Posted-interrupt way (current logic):
  1) set bit in posted-interrupt descriptor which represent an 
interrupt
  2) if VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise 
return and do nothing

  3) send an POSTED_INTR_NOTIFICATION_VECTOR IPI to target vcpu
  4) if target vcpu in non-ROOT mode it will receive the interrupt 
immediately otherwise interrupt will be injected when VMENTRY


As the first operation in both path is setting a interrupt represent 
bit, so no interrupts will lost.


The issue is:
in path 2, the first interrupt will cause VCPU_KICK_SOFTIRQ set to 1,
and unless a VMEXIT occured or somewhere called do_softirq directly,
VCPU_KICK_SOFTIRQ will not cleared, that will make the later interrupts 
injection  ignored at step 2),

which will delay irq handler process in VM.

And because path 2 set VCPU_KICK_SOFTIRQ to 1, the kick vcpu logic in 
path 1 will also return in step 3),
which make this vcpu only can handle interrupt when some other reason 
cause VMEXIT.


On 2015/9/8 10:46, Zhang, Yang Z wrote:

Hanweidong (Randy) wrote on 2015-09-08:

Jan Beulich wrote on ent: 2015年9月7日 22:46:

Subject: Re: [Xen-devel] [PATCH] Remove a set operation for
VCPU_KICK_SOFTIRQ when post interrupt to vm.


On 07.09.15 at 16:24,  wrote:

I believe this also has something to do with a windows guest boot hang
issue.

It randomly occured, when boot a guest has windows 2008 os and pv-
driver installed. The boot process hangs when wait xenstored replay
event signal.

It can be reproduced after hundreds reboot using the xen staging
branch. But after I changed this code the hang issue can not reproduce.

The change below (which I don't think was ever posted to xen-devel)
does not make any sense, as it prohibits timely delivery of guest
interrupts. If there is an issue, I think you'd need to start with
clearly

This change won't prohibit timely delivery of guest interrupts,
intead, it helps to deliver guest interrupt timely. Posted interrupt
delivery doesn't kick cpu, so it should not set VCPU_KICK_SOFTIRQ bit,
and doesn't care about if VCPU_KICK_SOFTIRQ is set or not. if
VCPU_KICK_SOFTIRQ is set, next interrupt will not be delivered due to
test_and_set_bit check. What's more, it also impacts vcpu_kick() to
kick cpu (smp_send_event_check_cpu) when VCPU_KICK_SOFTIRQ is set.

The patch seems wrong to me since the interrupt will lost in some corner cases 
with those changes. Can you explain more detail like why next interrupt will 
get lost if set the softirq here?

Best regards,
Yang






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 0/4] arm64: Add Xen boot support (via fdt)

2015-09-07 Thread Fu Wei
Hi Vladimir,

Do you have any suggestion on this patchset?
Do I need to improve anything?

Great thanks for your help.

On 4 August 2015 at 16:34, Fu Wei  wrote:
> Hi Vladimir,
>
> this patchset follows all your comment of v2 patchset.
> Do you have any suggestion on this patchset?
>
> Great thanks for your help.
>
>
> On 23 July 2015 at 13:16,   wrote:
>> From: Fu Wei 
>>
>>   - This adds support for the Xen boot on ARM specification for arm64.
>>
>>   - Add and export some accessor functions of "loaded" flag and
>> grub_linux_get_fdt function in include/grub/arm64/linux.h for xen boot.
>>
>>   - Introduce xen_hypervisor, xen_linux, xen_initrd and xen_xsm
>> to load different binaries for xen boot.
>> Introduce xen_module to load common or custom module for xen boot.
>>
>>   - This Xen boot support is a separated  module for aarch64,
>> but reuse the existing code of devicetree in linux module.
>>
>>   - Add the support of xen_hypervisor, xen_linux and xen_initrd
>> in util/grub.d/20_linux_xen.in
>>
>>   - Add the introduction of all xen boot commands in docs/grub.texi
>>
>>   - The example of this support is > Foundation FVP model>
>> 
>> https://wiki.linaro.org/LEG/Engineering/Grub2/Xen_booting_on_Foundation_FVP_model_by_GRUB
>>
>> Changelog:
>> v3: create separate module for xen boot: xen_boot
>> create separate commands for different types of module
>> delete order-dependent for commands of xen module
>> simplify the code
>>
>> v2: remove the patches which have been accepted.
>> according to Vladimir's suggestion, change the command manes
>> and relevant code:
>> multiboot-->xen_hypervisor
>> module-->xen_module
>> improve the option parsing support for xen_hypervisor/xen_module 
>> commands.
>> add a patch for adding xen_hypervisor/xen_module support
>> in util/grub.d/20_linux_xen.in.
>> update docs/grub.texi patch for the new command names.
>>
>> v1: The first version upstream patchset to grub-devel mailing list
>>
>>
>> Fu Wei (4):
>>   arm64: Add and export some accessor functions for xen boot
>>   arm64: Add xen_boot module file
>>   * util/grub.d/20_linux_xen.in: Add support of the XEN boot on aarch64
>>   arm64: Add the introduction of xen boot commands in docs/grub.texi
>>
>>  docs/grub.texi|  56 
>>  grub-core/Makefile.core.def   |   7 +
>>  grub-core/loader/arm64/linux.c|  13 +
>>  grub-core/loader/arm64/xen_boot.c | 685 
>> ++
>>  include/grub/arm64/linux.h|   6 +-
>>  util/grub.d/20_linux_xen.in   |  16 +-
>>  6 files changed, 779 insertions(+), 4 deletions(-)
>>  create mode 100644 grub-core/loader/arm64/xen_boot.c
>>
>> --
>> 1.8.3.1
>>
>
>
>
> --
> Best regards,
>
> Fu Wei
> Software Engineer
> Red Hat Software (Beijing) Co.,Ltd.Shanghai Branch
> Ph: +86 21 61221326(direct)
> Ph: +86 186 2020 4684 (mobile)
> Room 1512, Regus One Corporate Avenue,Level 15,
> One Corporate Avenue,222 Hubin Road,Huangpu District,
> Shanghai,China 200021



-- 
Best regards,

Fu Wei
Software Engineer
Red Hat Software (Beijing) Co.,Ltd.Shanghai Branch
Ph: +86 21 61221326(direct)
Ph: +86 186 2020 4684 (mobile)
Room 1512, Regus One Corporate Avenue,Level 15,
One Corporate Avenue,222 Hubin Road,Huangpu District,
Shanghai,China 200021

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [ovmf bisection] complete test-amd64-amd64-xl-qemuu-ovmf-amd64

2015-09-07 Thread osstest service owner
branch xen-unstable
xen branch xen-unstable
job test-amd64-amd64-xl-qemuu-ovmf-amd64
test debian-hvm-install

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf https://github.com/tianocore/edk2.git
Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  ovmf https://github.com/tianocore/edk2.git
  Bug introduced:  ead7cb12d5b0e23b55e47f38a8a0675958783668
  Bug not present: 1a85139d9eac7f260a21a6ecb06a578a5255edf7


  commit ead7cb12d5b0e23b55e47f38a8a0675958783668
  Author: Laszlo Ersek 
  Date:   Fri Aug 28 08:12:51 2015 +
  
  OvmfPkg: prevent code execution from DXE stack
  
  SVN rev 18166 ("MdeModulePkg DxeIpl: Add stack NX support") enables
  platforms to request non-executable stack for the DXE phase, by setting
  PcdSetNxForStack to TRUE.
  
  The PCD defaults to FALSE, because:
  
  (a) A non-executable DXE stack is a new feature and causes changes in
  behavior. Some platform could rely on executing code from the stack.
  
  (b) The code enabling NX in the DXE IPL PEIM enforces the
  
PcdSetNxForStack ==> PcdDxeIplBuildPageTables
  
  implication for "64-bit PEI + 64-bit DXE" platforms, with a new
  ASSERT(). Some platform might not comply with this requirement
  immediately.
  
  Regarding (a), in none of the OVMF builds do we try to execute code from
  the stack.
  
  Regarding (b):
  
  - In the OvmfPkgX64.dsc build (which is where (b) applies) we simply
inherit the PcdDxeIplBuildPageTables|TRUE default from
"MdeModulePkg/MdeModulePkg.dec". Therefore we can set PcdSetNxForStack
to TRUE.
  
  - In OvmfPkgIa32X64.dsc, page tables are built by default for DXE. Hence
we can set PcdSetNxForStack to TRUE.
  
  - In OvmfPkgIa32.dsc, page tables used not to be necessary until now.
After we set PcdSetNxForStack to TRUE in this patch, the DXE IPL will
construct page tables even when it is built as part of OvmfPkgIa32.dsc,
provided the (virtual) hardware supports both PAE mode and the XD bit.
  
  Should this setting cause problems in a GPU (or other device) passthru
  scenario, with a UEFI_DRIVER in the PCI option rom attempting to execute
  code from the stack, the feature can be dynamically disabled on the QEMU
  command line, with "-cpu ,-nx".
  
  Cc: Paolo Bonzini 
  Cc: Jordan Justen 
  Cc: "Zeng, Star" 
  Suggested-by: Paolo Bonzini 
  Contributed-under: TianoCore Contribution Agreement 1.0
  Signed-off-by: Laszlo Ersek 
  Reviewed-by: Star Zeng 
  
  git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@18360 
6f19259b-4bc3-4df7-8a09-765794883524


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/ovmf/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/ovmf/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install
 --summary-out=tmp/61576.bisection-summary --basis-template=60869 
--blessings=real,real-bisect ovmf test-amd64-amd64-xl-qemuu-ovmf-amd64 
debian-hvm-install
Searching for failure / basis pass:
 61299 fail [host=italia1] / 60904 ok.
Failure / basis pass flights: 61299 / 60904
(tree with no url: seabios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf https://github.com/tianocore/edk2.git
Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 9b8b905951bde404f20a7bd4b37a5134f3484569 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
776566530b15a4241db00ac317659e5904ad453c 
5cdde31eacdd288359746019ad05cac8ed5d9f70 
b05befcbea71a979509ce04f02929969a790c923 
801ab48e5556cb54f67e3cb57f077f47e8663ced
Basis pass 9b8b905951bde404f20a7bd4b37a5134f3484569 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
218ea6c164525292239d5af891365676c6a18f2a 
7f057440b31da38196e3398fd1b618fc36ad97d6 
b05befcbea71a979509ce04f02929969a790c923 
7b99717f62caeac08eea224a177cd28f047ac4b5
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#9b8b905951bde404f20a7bd4b37a5134f3484569-9b8b905951bde404f20a7bd4b37a5134f3484569
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
https://github.com/tianocore/edk2.git#218ea6c164525292239d5af891365676c6a18f2a-77

Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-07 Thread Zhang, Yang Z
Hanweidong (Randy) wrote on 2015-09-08:
> 
> Jan Beulich wrote on ent: 2015年9月7日 22:46:
>> Subject: Re: [Xen-devel] [PATCH] Remove a set operation for
>> VCPU_KICK_SOFTIRQ when post interrupt to vm.
>> 
> On 07.09.15 at 16:24,  wrote:
>>> I believe this also has something to do with a windows guest boot hang
>>> issue.
>>> 
>>> It randomly occured, when boot a guest has windows 2008 os and pv-
>>> driver installed. The boot process hangs when wait xenstored replay
>>> event signal.
>>> 
>>> It can be reproduced after hundreds reboot using the xen staging
>>> branch. But after I changed this code the hang issue can not reproduce.
>> 
>> The change below (which I don't think was ever posted to xen-devel)
>> does not make any sense, as it prohibits timely delivery of guest
>> interrupts. If there is an issue, I think you'd need to start with
>> clearly
> 
> This change won't prohibit timely delivery of guest interrupts,
> intead, it helps to deliver guest interrupt timely. Posted interrupt
> delivery doesn't kick cpu, so it should not set VCPU_KICK_SOFTIRQ bit,
> and doesn't care about if VCPU_KICK_SOFTIRQ is set or not. if
> VCPU_KICK_SOFTIRQ is set, next interrupt will not be delivered due to
> test_and_set_bit check. What's more, it also impacts vcpu_kick() to
> kick cpu (smp_send_event_check_cpu) when VCPU_KICK_SOFTIRQ is set.

The patch seems wrong to me since the interrupt will lost in some corner cases 
with those changes. Can you explain more detail like why next interrupt will 
get lost if set the softirq here?

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 06/18] vmx: Add some helper functions for Posted-Interrupts

2015-09-07 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 07, 2015 6:46 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Tian, Kevin; xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [PATCH v6 06/18] vmx: Add some helper functions for
> Posted-Interrupts
> 
> >>> On 06.09.15 at 04:05,  wrote:
> 
> >
> >> -Original Message-
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Friday, September 04, 2015 10:40 PM
> >> To: Wu, Feng
> >> Cc: Andrew Cooper; Tian, Kevin; xen-devel@lists.xen.org; Keir Fraser
> >> Subject: Re: [PATCH v6 06/18] vmx: Add some helper functions for
> >> Posted-Interrupts
> >>
> >> >>> On 25.08.15 at 03:57,  wrote:
> >> > @@ -121,11 +122,31 @@ static inline int pi_test_and_clear_on(struct
> >> pi_desc *pi_desc)
> >> >  return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
> >> >  }
> >> >
> >> > +static inline int pi_test_on(struct pi_desc *pi_desc)
> >> > +{
> >> > +return test_bit(POSTED_INTR_ON, &pi_desc->control);
> >> > +}
> >>
> >> For this and ...
> >>
> >> > +static inline int pi_test_sn(struct pi_desc *pi_desc)
> >> > +{
> >> > +return test_bit(POSTED_INTR_SN, &pi_desc->control);
> >> > +}
> >>
> >> ... this I wonder whether using the bitfield you defined in the
> >> previous patch wouldn't allow the compiler more freedom in
> >> how to carry this out.
> >
> > I am sorry, I don't quite understand it. Do you mean: the bitfield
> > defined in previous patch is pointless, or using the bitfield here?
> 
> Use it here would seem preferable. (But please recall that I
> questioned this two fold access model - partly using bitfields,
> partly using bitops - earlier on, and ideally _all_ accesses to
> a certain kind of data structure would follow a single, uniform
> model.)

Yes, I remember your suggestion about this. So now I only use
bitops everywhere, in that case, do you think I still need to use
bitfields here?

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 04/18] vt-d: VT-d Posted-Interrupts feature detection

2015-09-07 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 07, 2015 6:43 PM
> To: Wu, Feng
> Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org
> Subject: RE: [Xen-devel] [PATCH v6 04/18] vt-d: VT-d Posted-Interrupts feature
> detection
> 
> >>> On 06.09.15 at 03:49,  wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Friday, September 04, 2015 10:31 PM
> >> >>> On 25.08.15 at 03:57,  wrote:
> >> > --- a/xen/drivers/passthrough/vtd/iommu.c
> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> >> > @@ -2079,6 +2079,9 @@ static int init_vtd_hw(void)
> >> >  disable_intremap(drhd->iommu);
> >> >  }
> >> > @@ -2176,6 +2179,14 @@ int __init intel_vtd_setup(void)
> >> >  if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
> >> >  iommu_intremap = 0;
> >> >
> >> > +/*
> >> > + * We cannot use posted interrupt if X86_FEATURE_CX16 is
> >> > + * not supported, since we count on this feature to
> >> > + * atomically update 16-byte IRTE in posted format.
> >> > + */
> >> > +if ( !iommu_intremap || !cap_intr_post(iommu->cap)
> >> || !cpu_has_cx16 )
> >> > +iommu_intpost = 0;
> >>
> >> You should check iommu_intpost instead of iommu_intremap to
> >> match the other checks above here.
> >
> > Here I followed your advice:
> > http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg01261.html
> >
> > I think the following code should be ok here though, if you like it, I can
> > change back.
> >
> > +if ( !iommu_intpost || !cap_intr_post(iommu->cap)
> > +|| !cpu_has_cx16 )
> > +iommu_intpost = 0;
> 
> The first part is neither in line with the earlier adjustments (as _still_
> see in the patch context above) not does it make much sense to set
> a variable to zero when it already is zero.

So, what about this?

 +if ( !cap_intr_post(iommu->cap) || !cpu_has_cx16 )
 +iommu_intpost = 0;

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] BUG: failed to save x86 HVM guest with 1TB ram

2015-09-07 Thread wangxin (U)


> -Original Message-
> From: Andrew Cooper [mailto:am...@hermes.cam.ac.uk] On Behalf Of Andrew Cooper
> Sent: Monday, September 07, 2015 5:49 PM
> To: wangxin (U); xen-de...@lists.xenproject.org
> Cc: Fanhenglong; wei.l...@citrix.com; Hanweidong (Randy)
> Subject: Re: [Xen-devel] BUG: failed to save x86 HVM guest with 1TB ram
> 
> On 07/09/15 09:09, wangxin (U) wrote:
> > Hi,
> >
> > I'm tring to hibernate an x86 HVM guest with 1TB ram,
> >[1.VM config]
> >builder = "hvm"
> >name = "suse12_sp3"
> >memory = 1048576
> >vcpus = 16
> >boot = "c"
> >disk = [ '/mnt/sda10/vm/SELS_ide_disk.img,raw,xvda,rw' ]
> >device_model_version = "qemu-xen"
> >vnc = 1
> >vnclisten = '9.51.3.174'
> >vncdisplay = 0
> >
> > but I get the error messages(see below) from XC:
> >[2.VM saving] xl save -p suse12_sp3 suse12_sp3.save
> >Saving to suse12_sp3.save new xl format (info 0x1/0x0/1309)
> >xc: error: Cannot save this big a guest: Internal error
> >libxl: error: libxl_dom.c:1875:libxl__xc_domain_save_done: saving 
> > domain: \
> >domain did not respond to suspend request: Argument list too long
> >libxl: error: libxl_dom.c:2032:remus_teardown_done: Remus: failed to \
> >teardown device for guest with domid 3, rc -8
> >Failed to save domain, resuming domain
> >xc: error: Dom 3 not suspended: (shutdown 0, reason 255): Internal error
> >libxl: error: libxl.c:508:libxl__domain_resume: xc_domain_resume failed \
> >for domain 3: Invalid argument
> >
> > The error in function xc_domain_save in xc_domain_save.c,
> >  /* Get the size of the P2M table */
> >  dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;
> >
> >  if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK )
> >  {
> >  errno = E2BIG;
> >  ERROR("Cannot save this big a guest");
> >  goto out;
> >  }
> >
> > it may be 1TB ram plus pci-hole space make the MFN wider than limit size.
> >
> > If I want to save a VM with 1TB ram or larger, what shoud I do? Did anyone
> > have tried this before and have some configuration I can refer to?
> 
> This is clearly not from Xen 4.6, but the same issue will be present.

Yeah,  It's Xen 4.5.0, and I haven't try it in Xen 4.6 upstream yet.

> 
> The check serves a dual purpose.  In the legacy case, it is to avoid
> clobbering the upper bits of pfn information with pfn type information
> for 32bit toolstacks;  any PFN above 2^28 would have type information
> clobbering the upper bits.  This has been mitigated somewhat in
> migration v2, as pfns are strictly 64bit values, still using the upper 4
> bits for type information, allowing 60 bits for the PFN itself.
> 
> The second purpose is just as a limit on toolstack resources. Migration
> requires allocating structures which scale linearly with the size of the
> VM; the biggest of which would be ~1GB for the p2m. Added to this is
>  >1GB for the m2p, and suddenly a 32bit toolstack process is looking
> scarce on RAM.
> 
> During the development of migration v2, I didn't spend any time
> considering if or how much it was sensible to lift the restriction by,
> so the check was imported wholesale from the legacy code.
> 
> For now, I am going to say that it simply doesn't work.  Simply upping
> the limit is only a stopgap measure; an HVM guest can still mess this up

Will the stopgap measure work in xen 4.5? 

> by playing physmap games and mapping a page of ram at a really high
> (guest) physical address.  Longterm, we need hypervisor support for
> getting a compressed view of guest physical address space, so toolstack
> side resources are proportional to the amount of RAM given to the guest,
> not to how big a guest decides to make its physmap.

Is that in your further work?  

Best regards,
Xin

> 
> ~Andrew
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 15/18] vmx: Properly handle notification event when vCPU is running

2015-09-07 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-07:
 On 07.09.15 at 15:00,  wrote:
>> Jan Beulich wrote on 2015-09-07:
>>> Yang, in this context: Why does __vmx_deliver_posted_interrupt()
>>> not use cpu_raise_softirq(), instead kind of open coding it (see your
>>> d7dafa375b ["VMX: Add posted interrupt supporting"])?
>> 
>> Sorry, I am not in the context. What do you mean of using
>> cpu_raise_softirq() in __vmx_deliver_posted_interrupt()?
> 
> Why is the function not using that ready to use helper? Looking at
> it ...
> 
>> static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>> {
>>bool_t running = v->is_running;
>>
>>vcpu_unblock(v);
>>if ( running && (in_irq() || (v != current)) )
>>{
>>unsigned int cpu = v->processor;
>>
>>if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> 
> ... this line as well as ...
> 
>> && (cpu != smp_processor_id()) )
>>send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> 
> ... this one ...
> 
>>}
>> }
> 
> ... pretty certainly don't belong into vmx.c, or the apparent open
> coding of cpu_raise_softirq() would require a justifying comment.

I still don't see how to use cpu_raise_softirq() since the posted_intr_vector 
doesn't belong to softirq.

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-mainline test] 61516: regressions - FAIL

2015-09-07 Thread osstest service owner
flight 61516 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/61516/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-ovmf-amd64 19 guest-start/debianhvm.repeat fail 
REGR. vs. 60958
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. 
vs. 60958
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 19 guest-start/debianhvm.repeat fail 
REGR. vs. 60958
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 19 guest-start/debianhvm.repeat 
fail REGR. vs. 60958

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt-vhd  6 xen-boot  fail REGR. vs. 60958
 test-armhf-armhf-xl-rtds 11 guest-start   fail REGR. vs. 60958

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass
 test-amd64-i386-libvirt-vhd   9 debian-di-installfail   never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-vhd   9 debian-di-installfail   never pass
 test-amd64-amd64-libvirt-vhd  9 debian-di-installfail   never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-vhd9 debian-di-installfail   never pass

version targeted for testing:
 qemuu298fae38972cc0165415ead04b64bfcae55640d9
baseline version:
 qemuub352153f5f7eb12b56602fc03ae4361e01201f90

Last test of basis60958  2015-08-27 22:03:27 Z   11 days
Failing since 61006  2015-08-29 21:27:40 Z9 days4 attempts
Testing same since61516  2015-09-07 11:41:12 Z0 days1 attempts


People who touched revisions under test:
  Aurelien Jarno 
  Christian Borntraeger 
  Christopher Covington 
  Christopher Covington 
  Cornelia Huck 
  Denis V. Lunev 
  Dr. David Alan Gilbert 
  Eric Blake 
  Fam Zheng 
  Gabriel Somlo 
  Jason J. Herne 
  Jean-Christophe Dubois 
  John Snow 
  Laurent Vivier 
  Leif Lindholm 
  Leonid Bloch 
  Marc-André Lureau 
  Marc-André Lurea

[Xen-devel] [PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode

2015-09-07 Thread Tiejun Chen
Currently we don't allow passing through any group devices which are
sharing same RMRR entry since it would break security among VMs. And
indeed, we expect we can figure out a better way to handle this kind
of case completely.

But before the group assignment gets implemented, we might make this
permission dependent on our RMRR policy. So, now it would be allowed
in the relaxed mode.

CC: Yang Zhang 
CC: Kevin Tian 
CC: Jan Beulich 
Signed-off-by: Tiejun Chen 
---
 xen/drivers/passthrough/vtd/iommu.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 836aed5..4249cfa 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device(
  PCI_DEVFN2(bdf) == devfn &&
  rmrr->scope.devices_cnt > 1 )
 {
+u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED;
+
 printk(XENLOG_G_ERR VTDPREFIX
-   " cannot assign %04x:%02x:%02x.%u"
+   " Currently its %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
+   relaxed ? "risky" : "disallowed",
seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);
-return -EPERM;
+if ( !relaxed )
+return -EPERM;
 }
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-07 Thread Hanweidong (Randy)

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 2015年9月7日 22:46
> To: Liuqiming (John)
> Cc: Hanweidong (Randy); Zhangwei (FF); yang.z.zh...@intel.com; xen-
> de...@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH] Remove a set operation for
> VCPU_KICK_SOFTIRQ when post interrupt to vm.
> 
> >>> On 07.09.15 at 16:24,  wrote:
> > I believe this also has something to do with a windows guest boot
> hang
> > issue.
> >
> > It randomly occured, when boot a guest has windows 2008 os and pv-
> driver
> > installed.
> > The boot process hangs when wait xenstored replay event signal.
> >
> > It can be reproduced after hundreds reboot using the xen staging
> branch.
> > But after I changed this code the hang issue can not reproduce.
> 
> The change below (which I don't think was ever posted to xen-devel)
> does not make any sense, as it prohibits timely delivery of guest
> interrupts. If there is an issue, I think you'd need to start with
> clearly

This change won't prohibit timely delivery of guest interrupts, intead, it 
helps to deliver guest interrupt timely. Posted interrupt delivery doesn't kick 
cpu, so it should not set VCPU_KICK_SOFTIRQ bit, and doesn't care about if 
VCPU_KICK_SOFTIRQ is set or not. if VCPU_KICK_SOFTIRQ is set, next interrupt 
will not be delivered due to test_and_set_bit check. What's more, it also 
impacts vcpu_kick() to kick cpu (smp_send_event_check_cpu) when 
VCPU_KICK_SOFTIRQ is set.

Weidong


> describing the issue you see and what you think it is being caused by.
> 
> Jan
> 
> > On 2015/9/7 22:46, - wrote:
> >> From: liuqiming 00178499 
> >>
> >> This set operation doesn't make any sense, and it will block later
> >> interrupt injected into vm (by vcpu_kick or deliver_posted_intr),
> >> which will cause a performance issue on CPU supporting posted-
> interrupt.
> >>
> >> Signed-off-by: Qiming Liu 
> >> Cc: Yang Zhang 
> >> ---
> >>   xen/arch/x86/hvm/vmx/vmx.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> >> index 2582cdd..9480b44 100644
> >> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> @@ -1681,7 +1681,7 @@ static void
> __vmx_deliver_posted_interrupt(struct vcpu
> > *v)
> >>   {
> >>   unsigned int cpu = v->processor;
> >>
> >> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
> &softirq_pending(cpu))
> >> +if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> >>&& (cpu != smp_processor_id()) )
> >>   send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> >>   }
> >
> >
> >
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v11 2/3] autoconf: xen: enable explicit preference option for xenstored preference

2015-09-07 Thread Luis R. Rodriguez
On Mon, Sep 07, 2015 at 05:28:26PM +0100, George Dunlap wrote:
> On Wed, Jul 30, 2014 at 5:40 PM, Luis R. Rodriguez
>  wrote:
> > From: "Luis R. Rodriguez" 
> >
> > As it stands oxenstored will be used by default if ocaml tools are
> > found, the init system will also try to use oxenstored first if its
> > found otherwise the cxenstored will be used. Lets simplify the init
> > script and let users be explicit about the preference through configure.
> >
> > This adds support to let you be explicit about the xenstored preference,
> > you can only use one of these two options:
> >
> > ./configure --with-xenstored=xenstored
> > ./configure --with-xenstored=oxenstored
> >
> > We continue with the old behaviour and default oxenstored will be used
> > but only if you have ocaml dependencies. Since the xenstored preference
> > is explicit now and since we require configure substitutions for it we
> > make use of the AX_XEN_EXPAND_CONFIG() helpers as otherwise substitution
> > for SBINDIR is not propagated from the top level configuration.
> >
> > All this allows us to simplify the init script to use the configured
> > xenstore from the start. We update the sysconfig/default xencommons file
> > with the paths for the different options though, this can be used by
> > users to override the default xenstored, this follows the old behaviour
> > but we now just explicitly provide the full configured paths for users.
> 
> You keep saying "this follows the old behaviour", but it doesn't.
> Before, if I build oxenstored but put it in a separate package (say,
> xen-ocaml), it would run oxenstored if available, and if not run
> xenstored.  Now it will only try to run one, and if that's not
> available it will fail.  That's less of a feature IMHO.

George,

Wow, you're review is over 1 year late! You should have raised your concerns
when the patches were posted. I will note that I gave the community a shot at a
slew of options, one was a "laucher" solution [0] which would try one binary
first and then another, the solution we ended up using was what the community
decided, not me.

So -- if you'd like an alternative you can surely send a patch, but please
review early as well.

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-June/019740.html

 Luis

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Asus X99-A VT-d problems

2015-09-07 Thread Pasi Kärkkäinen
On Mon, Sep 07, 2015 at 08:45:20PM +0300, Valtteri Kiviniemi wrote:
>Hi,
> 

Hi,

>I'm using a Asus X99-A motherboard with latest bios which in paper (and in
>BIOS) supports VT-d. However, with Xen 4.5.1 and Xen unstable the VT-d
>fails completely on SATA AHCI with "failed to IDENTIFY".
> 
>I also tried to disable the SATA controller completely from BIOS and try
>to boot from USB stick (to debug further). However, also the USB-ports
>fail when using VT-d with error message "Fatal error, HC died" and then
>the kernel panics.
> 
>I googled around and tried the pci-phantom boot parameter for both the
>SATA and USB controllers and also the iommu=workaround_bios_bug but with
>no success.
> 
>The SATA controller that my system is using is:
> 
>00:1f.2 SATA controller: Intel Corporation Wellsburg 6-Port SATA
>Controller [AHCI mode] (rev 05)
> 
>And the USB controllers:
> 
>00:14.0 USB controller: Intel Corporation Wellsburg USB xHCI Host
>Controller (rev 05)
>00:1d.0 USB controller: Intel Corporation Wellsburg USB Enhanced Host
>Controller #1 (rev 05)
>07:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host
>Controller
> 
>And lastly: I installed ESXI 6.0 and it installed successfully with VT-d
>enabled. So I suspect that this might be a combination of BIOS/Xen bug
>that VMware has been able to solve (or then they do something
>differently).
> 
>I'm happy to help to debug this further if needed, but so far I'm not able
>to boot the system at all without iommu=0 boot parameter.
> 

It'd be helpful if you could use a serial console (PCI serial card perhaps? Or 
if your motherboard has a built-in IPMI/BMC/AMT management processor with 
Serial-Over-LAN port that works aswell..) and capture a logfile from the full 
Xen + Linux boot.. with all the verbose/debugging options enabled.

>Best regards,
>Valtteri Kiviniemi


-- Pasi


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Asus X99-A VT-d problems

2015-09-07 Thread Valtteri Kiviniemi
Hi,

I'm using a Asus X99-A motherboard with latest bios which in paper (and in
BIOS) supports VT-d. However, with Xen 4.5.1 and Xen unstable the VT-d
fails completely on SATA AHCI with "failed to IDENTIFY".


I also tried to disable the SATA controller completely from BIOS and try to
boot from USB stick (to debug further). However, also the USB-ports fail
when using VT-d with error message "Fatal error, HC died" and then the
kernel panics.

I googled around and tried the pci-phantom boot parameter for both the SATA
and USB controllers and also the iommu=workaround_bios_bug but with no
success.

The SATA controller that my system is using is:

00:1f.2 SATA controller: Intel Corporation Wellsburg 6-Port SATA Controller
[AHCI mode] (rev 05)

And the USB controllers:

00:14.0 USB controller: Intel Corporation Wellsburg USB xHCI Host
Controller (rev 05)
00:1d.0 USB controller: Intel Corporation Wellsburg USB Enhanced Host
Controller #1 (rev 05)
07:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host
Controller

And lastly: I installed ESXI 6.0 and it installed successfully with VT-d
enabled. So I suspect that this might be a combination of BIOS/Xen bug that
VMware has been able to solve (or then they do something differently).

I'm happy to help to debug this further if needed, but so far I'm not able
to boot the system at all without iommu=0 boot parameter.

Best regards,
Valtteri Kiviniemi
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V1 3/3] xen: add pvUSB backend

2015-09-07 Thread Stefano Stabellini
On Thu, 3 Sep 2015, Juergen Gross wrote:
> Add a backend for para-virtualized USB devices for xen domains.
> 
> The backend is using host-libusb to forward USB requests from a
> domain via libusb to the real device(s) passed through.
> 
> Signed-off-by: Juergen Gross 

Aside from a few minor comments below, this looks pretty good from a Xen
perspective, however I am not too qualified to review the details of the
pvusb protocol or the way the requests are handled internally in QEMU.

I would be glad if Gerd could take a look at this too.

Juergen, if we commit this, would you be happy to maintain it going
forward?


> diff --git a/hw/usb/Makefile.objs b/hw/usb/Makefile.objs
> index 3fe4dff..0253184 100644
> --- a/hw/usb/Makefile.objs
> +++ b/hw/usb/Makefile.objs
> @@ -36,3 +36,7 @@ common-obj-$(CONFIG_USB_REDIR) += redirect.o quirks.o
>  
>  # usb pass-through
>  common-obj-y += $(patsubst %,host-%.o,$(HOST_USB))
> +
> +ifeq ($(CONFIG_USB_LIBUSB),y)
> +common-obj-$(CONFIG_XEN_BACKEND) += xen-usb.o
> +endif
> diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
> new file mode 100644
> index 000..2570bd7
> --- /dev/null
> +++ b/hw/usb/xen-usb.c
> @@ -0,0 +1,1120 @@
> +/*
> + *  xen paravirt usb device backend
> + *
> + *  (c) Juergen Gross 
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; under version 2 of the License.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License along
> + *  with this program; if not, see .
> + *
> + *  Contributions after 2012-01-13 are licensed under the terms of the
> + *  GNU GPL, version 2 or (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "qemu-common.h"
> +#include "qemu/config-file.h"
> +#include "hw/sysbus.h"
> +#include "hw/usb.h"
> +#include "hw/xen/xen_backend.h"
> +#include "monitor/qdev.h"
> +#include "qapi/qmp/qbool.h"
> +#include "qapi/qmp/qint.h"
> +#include "qapi/qmp/qstring.h"
> +#include "sys/user.h"
> +
> +#include 
> +#include 
> +
> +#define TR(fmt, args...)\
> +{   \
> +struct timeval tv;  \
> +\
> +gettimeofday(&tv, NULL);\
> +fprintf(stderr, "%8ld.%06ld xen-usb(%s):" fmt, tv.tv_sec,   \
> +tv.tv_usec, __func__, ##args);  \

Please use xen_be_printf. I am not sure I would go as far as using
gettimeofday, but I can see it could be useful here.


> +}
> +#define TR_REQ(fmt, args...) { if (tr_debug & 1) TR(fmt, ##args) }
> +#define TR_BUS(fmt, args...) { if (tr_debug & 2) TR(fmt, ##args) }

I would drop these and just use the loglevels of xen_be_printf.


> +#define USBBACK_MAXPORTSUSBIF_PIPE_PORT_MASK
> +#define USBBACK_DEVNAME_SIZE32
> +#define USB_DEV_ADDR_SIZE   128
> +
> +struct usbif_ctrlrequest {
> +uint8_tbRequestType;
> +uint8_tbRequest;
> +uint16_t   wValue;
> +uint16_t   wIndex;
> +uint16_t   wLength;
> +};
> +
> +struct usbif_isoc_descriptor {
> +uint32_t   offset;
> +uint32_t   length;
> +uint32_t   actual_length;
> +int32_tstatus;
> +};
> +
> +struct usbback_info;
> +struct usbback_req;
> +
> +struct usbback_stub {
> +USBDevice  *dev;
> +USBPortport;
> +unsigned   speed;
> +bool   attached;
> +QTAILQ_HEAD(submit_q_head, usbback_req) submit_q;
> +};
> +
> +struct usbback_req {
> +struct usbback_info  *usbif;
> +struct usbback_stub  *stub;
> +struct usbif_urb_request req;
> +USBPacketpacket;
> +
> +unsigned int nr_buffer_segs; /* # of transfer_buffer 
> segments */
> +unsigned int nr_extra_segs;  /* # of iso_frame_desc segments 
>  */
> +
> +QTAILQ_ENTRY(usbback_req) q;
> +
> +void *buffer;
> +void *isoc_buffer;
> +struct libusb_transfer   *xfer;
> +};
> +
> +struct usbback_info {
> +struct XenDevice xendev;  /* must be first */
> +USBBus   bus;
> +void *urb_sring;
> +void *conn_sring;
> +struct usbif_urb_back_ring urb_ring;
> +struct usbif_conn_back_ring conn_ring;
> +int  num_ports;
> +int  usb_ver;
> +bool ring_error;
> +QTAILQ_HEAD

Re: [Xen-devel] [PATCH for-4.6] efi: introduce efi_arch_flush_dcache_area

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 03:13:50PM +0100, Stefano Stabellini wrote:
> Objects loaded by FileHandle->Read need to be flushed to dcache,
> otherwise copy_from_paddr will read stale data when copying the kernel,
> causing a failure to boot.
> 
> Introduce efi_arch_flush_dcache_area and call it from read_file.
> 
> This commit introduces no functional changes on x86.
> 
> Reported-by: Mark Rutland 
> Signed-off-by: Stefano Stabellini 
> CC: ian.campb...@citrix.com
> CC: jbeul...@suse.com
> CC: wei.l...@citrix.com

I'm expecting a v2 with minor adjustments.

In any case this is a valid and straightforward bug fix:

  Release-acked-by: Wei Liu 

I will wait for v2 and keep an eye on our testing progress. This will
probably go in after RC3 (i.e. after branching) unless OSSTest is wedged
again and we somehow need to retest staging.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-09-07 Thread Wei Liu
You might need to rebase you patch. A patch to netback went it recently.

On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
> The PV network protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity working as a
> network backend on a non-modified Xen.
> 
> It's only necessary to adapt the ring size and break skb data in small
> chunk of 4KB. The rest of the code is relying on the grant table code.
> 
> Signed-off-by: Julien Grall 
> 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-09-07 Thread Stefano Stabellini
On Mon, 7 Sep 2015, Julien Grall wrote:
> For ARM64 guests, Linux is able to support either 64K or 4K page
> granularity. Although, the hypercall interface is always based on 4K
> page granularity.
> 
> With 64K page granularity, a single page will be spread over multiple
> Xen frame.
> 
> To avoid splitting the page into 4K frame, take advantage of the
> extent_order field to directly allocate/free chunk of the Linux page
> size.
> 
> Note that PVMMU is only used for PV guest (which is x86) and the page
> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
> that because the code has not been modified.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: Boris Ostrovsky 
> Cc: David Vrabel 
> Cc: Wei Liu 
> 
> Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
> for the PV MMU code is kept in order to have at least one even if we
> ever decide to drop of code section.
> 
> Changes in v4:
> - s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
> - Use the field lru in the page to get a list of pages when
> decreasing the memory reservation. It avoids to use a static
> array to store the pages (see v3).
> - Update comment for EXTENT_ORDER.
> 
> Changes in v3:
> - Fix errors reported by checkpatch.pl
> - s/mfn/gfn/ based on the new naming
> - Rather than splitting the page into 4KB chunk, use the
> extent_order field to allocate directly a Linux page size. This
> is avoid lots of code for no benefits.
> 
> Changes in v2:
> - Use xen_apply_to_page to split a page in 4K chunk
> - It's not necessary to have a smaller frame list. Re-use
> PAGE_SIZE
> - Convert reserve_additional_memory to use XEN_... macro
> ---
>  drivers/xen/balloon.c | 59 
> ++-
>  1 file changed, 44 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index c79329f..3babf13 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -70,6 +70,11 @@
>  #include 
>  #include 
>  
> +/* Use one extent per PAGE_SIZE to avoid to break down the page into
> + * multiple frame.
> + */
> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
> +
>  /*
>   * balloon_process() state:
>   *
> @@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long 
> credit)
>   nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>  
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> +  * different page granularity.
> +  */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>  /*
>   * add_memory() will build page tables for the new memory so
>   * the p2m must contain invalid entries so the correct
> @@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long 
> credit)
>  static enum bp_state increase_reservation(unsigned long nr_pages)
>  {
>   int rc;
> - unsigned long  pfn, i;
> + unsigned long i;
>   struct page   *page;
>   struct xen_memory_reservation reservation = {
>   .address_bits = 0,
> - .extent_order = 0,
> + .extent_order = EXTENT_ORDER,
>   .domid= DOMID_SELF
>   };
>  
> @@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long 
> nr_pages)
>   nr_pages = i;
>   break;
>   }
> - frame_list[i] = page_to_pfn(page);
> +
> + /* XENMEM_populate_physmap requires a PFN based on Xen
> +  * granularity.
> +  */
> + frame_list[i] = page_to_xen_pfn(page);
>   page = balloon_next_page(page);
>   }
>  
> @@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long 
> nr_pages)
>   page = balloon_retrieve(false);
>   BUG_ON(page == NULL);
>  
> - pfn = page_to_pfn(page);
> -
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> +  * different page granularity.
> +  */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> + unsigned long pfn = page_to_pfn(page);
> +
>   set_phys_to_machine(pfn, frame_list[i]);
>  
>   /* Link back into the page tables if not highmem. */
> @@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long 
> nr_pages)
>  static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  {
>   enum bp_state state = BP_DONE;
> - unsigned long  pfn, i;
> - struct page   *page;
> + unsigned long i;
> + struct page *page, *tmp;
>   int ret;
>   struct xen_memory_reserv

Re: [Xen-devel] [PATCH for-4.6] efi: introduce efi_arch_flush_dcache_area

2015-09-07 Thread Mark Rutland
On Mon, Sep 07, 2015 at 05:06:36PM +0100, Ian Campbell wrote:
> On Mon, 2015-09-07 at 16:08 +0100, Stefano Stabellini wrote:
> > On Mon, 7 Sep 2015, Jan Beulich wrote:
> > > > > > On 07.09.15 at 16:13,  wrote:
> > > > Objects loaded by FileHandle->Read need to be flushed to dcache,
> > > > otherwise copy_from_paddr will read stale data when copying the 
> > > > kernel,
> > > > causing a failure to boot.
> > > 
> > > I have to admit that I'm a little puzzled by this description: If this
> > > was a general requirement (which is how it reads to me)
> > 
> > Yes, it is
> > 
> > 
> > > why does > ->Read() not do the necessary flushing? Or if it's not a
> > > general requirement, perhaps the description could be changed to make
> > > clear what exact dependency exists that does not exist for all users
> > > of ->Read()?
> > 
> > It is a general requirement: anything that could be accessed without a
> > cacheable mapping needs to be flushed. Unfortunately the UEFI spec is
> > not clear enough on who should do the flushing on what, and in fact the
> > Forum is working on improving it
> 
> I might be wrong and/or misremembering but I think this stems partly from
> the fact that the ARM UEFI stub in Xen needs to turn off caches (and
> paging?) before jumping to the usual Xen entry point.
> 
> Then when caches come back on you get inconsistencies because of stale
> stuff in the cache which suddenly reappears etc.

It's more to do with using inconsistent attributes than about whether
the caches are enabled -- at both points the kernel image is
manipulated, SCTLR_EL2.{C,M} == {1,1}.

EFI_FILE_PROTOCOL.Read() may place data into the caches without updating
memory because of coherent IO or simply because it copies to a cacheable
alias. If Xen were to modify the image in any way (e.g. decompressing
it), it would only update the cached copy.

Later kernel_zimage_load calls copy_from_paddr, which uses a
non-cacheable alias, bypassing the caches and going straight to memory.

Even if the two aliases were in use simultaneously, they wouldn't be
coherent with each other.

For more background, Marc Zyngier did a good talk at KVM Forum regarding
the usual behaviour of the caches [1,2].

Mark.

[1] http://events.linuxfoundation.org/sites/events/files/slides/slides_10.pdf
[2] https://www.youtube.com/watch?v=A_zCxzpxzmE

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v11 2/3] autoconf: xen: enable explicit preference option for xenstored preference

2015-09-07 Thread George Dunlap
On Wed, Jul 30, 2014 at 5:40 PM, Luis R. Rodriguez
 wrote:
> From: "Luis R. Rodriguez" 
>
> As it stands oxenstored will be used by default if ocaml tools are
> found, the init system will also try to use oxenstored first if its
> found otherwise the cxenstored will be used. Lets simplify the init
> script and let users be explicit about the preference through configure.
>
> This adds support to let you be explicit about the xenstored preference,
> you can only use one of these two options:
>
> ./configure --with-xenstored=xenstored
> ./configure --with-xenstored=oxenstored
>
> We continue with the old behaviour and default oxenstored will be used
> but only if you have ocaml dependencies. Since the xenstored preference
> is explicit now and since we require configure substitutions for it we
> make use of the AX_XEN_EXPAND_CONFIG() helpers as otherwise substitution
> for SBINDIR is not propagated from the top level configuration.
>
> All this allows us to simplify the init script to use the configured
> xenstore from the start. We update the sysconfig/default xencommons file
> with the paths for the different options though, this can be used by
> users to override the default xenstored, this follows the old behaviour
> but we now just explicitly provide the full configured paths for users.

You keep saying "this follows the old behaviour", but it doesn't.
Before, if I build oxenstored but put it in a separate package (say,
xen-ocaml), it would run oxenstored if available, and if not run
xenstored.  Now it will only try to run one, and if that's not
available it will fail.  That's less of a feature IMHO.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 27/29] Manual allocation: Provide JobInfo in mg-allocate

2015-09-07 Thread Ian Jackson
Use manual_allocation_base_jobinfo to obtain some information which
will go into `job' in the planner, and show up everywhere.

Put just the request (from the command line) in the Xinfo.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 mg-allocate |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mg-allocate b/mg-allocate
index 14c9088..672d9ec 100755
--- a/mg-allocate
+++ b/mg-allocate
@@ -269,7 +269,10 @@ sub showposs ($) {
 
 sub plan () {
 my @got;
-alloc_resources(sub {
+
+my $info = "manual ".manual_allocation_base_jobinfo();
+
+alloc_resources(JobInfo => $info, sub {
 my ($plan, $mayalloc) = @_;
 
@got = ();
@@ -337,7 +340,7 @@ sub plan () {
 foreach my $req (@reqlist) {
 my $book= {
 Reso => $req->{Reso},
-Xinfo => "manual",
+Xinfo => $req->{Ident},
 Start => $planned->{Start},
 End => $planned->{Start} + $duration,
 };
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 29/29] Manual allocation: Set $| in mg-blockage

2015-09-07 Thread Ian Jackson
This makes it slightly easier to see what it's happening if
mg-allocate or mg-blockage has no tty.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 mg-allocate |2 ++
 mg-blockage |2 ++
 2 files changed, 4 insertions(+)

diff --git a/mg-allocate b/mg-allocate
index 672d9ec..1e517a2 100755
--- a/mg-allocate
+++ b/mg-allocate
@@ -36,6 +36,8 @@ use Osstest::Executive;
 
 csreadconfig();
 
+$|=1;
+
 our $tid;
 our %magictask;
 
diff --git a/mg-blockage b/mg-blockage
index 90547a5..d3e6169 100755
--- a/mg-blockage
+++ b/mg-blockage
@@ -24,6 +24,8 @@ sub parsedate ($) {
 return $r;
 }
 
+$|=1;
+
 my $baseinfo = manual_allocation_base_jobinfo();
 my $info = "blockage $baseinfo [$start .. $end]";
 $info .= " $minfo" if defined $minfo;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 16/29] Plan reporting: Provide get-last-plan queuedaemon command

2015-09-07 Thread Ian Jackson
This allows retrieval, by monitoring clients which are not
participating in the planning queue, of the finished projection, or
the unfinished plan as it was at the time of last restart.

Signed-off-by: Ian Jackson 
---
v2: Fix invocation of return-plan-to-client.
Use data-W.final.pl, not data-W-final.pl, to fit
with existing .gitignore, and be slightly neater.
---
 README.planner |   11 +++
 ms-queuedaemon |8 
 2 files changed, 19 insertions(+)

diff --git a/README.planner b/README.planner
index b1bacd4..21b3415 100644
--- a/README.planner
+++ b/README.planner
@@ -224,6 +224,17 @@ ms-queuedaemon commands
 
> thought-done | thought-wait
 
+> get-last-plan projection|plan
+Retrieve the last fully completed plan (`projection'),
+   or the most recent short-term information used for
+   actual resource allocation (`plan').
+
+   For monitoring tools, not participants in the resource
+   allocation process.  Response is as for get-plan:
+
+   < OK get-plan BYTES
+   PLAN-DATA
+
 
 Plan is:
Start   time_t used for "now" *
diff --git a/ms-queuedaemon b/ms-queuedaemon
index 8c9af81..18d703e 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -293,6 +293,7 @@ proc report-plan {w wo} {
 } emsg]} {
 log "INTERNAL ERROR showing $w html: $emsg"
 } else {
+   file copy -force data-$w.pl data-$wo.final.pl
 log "$w report-plan OK"
 }
 }
@@ -367,6 +368,13 @@ proc cmd/get-plan {chan desc} {
 set plan [return-plan-to-client $chan $w]
 }
 
+proc cmd/get-last-plan {chan desc w} {
+global walkers
+if {[lsearch -exact $walkers $w] < 0} { error "unknown last-plan ($w)" }
+if {![file exists data-$w.final.pl]} { error "no last-plan $w" }
+return-plan-to-client $chan $w.final
+}
+
 proc cmd/book-resources {chan desc bytes} {
 read-chan-data $chan $bytes do-book-resources
 }
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 05/29] Planner: client side: $mayalloc parameter to $resourcecall->()

2015-09-07 Thread Ian Jackson
Add a new parameter to $resourcecall which allows the alloc_resources
loop in Osstest::Executive to specify to its clients that on this
occasion they should not make any actual allocations.

The callers of alloc_resources are all adjusted to honour this new
parameter:
 * ts-hosts-allocate-Executive avoids allocating unless $mayalloc
 * mg-allocate avoids allocating unless $mayalloc
 * mg-blockage never allocates anyway.

Currently we always pass 1, so no functional change.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 Osstest/Executive.pm|4 ++--
 mg-allocate |4 ++--
 mg-blockage |2 +-
 ts-hosts-allocate-Executive |4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index ab015d2..f9be0a0 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -604,7 +604,7 @@ sub plan_search () {
 }
 
 sub alloc_resources {
-my ($resourcecall) = pop @_; # $resourcecall->($plan);
+my ($resourcecall) = pop @_; # $resourcecall->($plan, $mayalloc);
 my (%xparams) = @_;
 # $resourcecall should die (abort) or return ($ok, $bookinglist)
 #
@@ -720,7 +720,7 @@ sub alloc_resources {
$plan= from_json($jplan);
}, sub {
if (!eval {
-   ($ok, $bookinglist) = $resourcecall->($plan);
+   ($ok, $bookinglist) = $resourcecall->($plan, 1);
1;
}) {
warn "resourcecall $@";
diff --git a/mg-allocate b/mg-allocate
index fc1b394..6dc7ddd 100755
--- a/mg-allocate
+++ b/mg-allocate
@@ -270,7 +270,7 @@ sub showposs ($) {
 sub plan () {
 my @got;
 alloc_resources(sub {
-my ($plan) = @_;
+my ($plan, $mayalloc) = @_;
 
@got = ();
 my @possmatrix = ([]);
@@ -310,7 +310,7 @@ sub plan () {
die unless $planned;
 
 my $allok=0;
-if (!$planned->{Start}) {
+if ($mayalloc && !$planned->{Start}) {
 $allok=1;
 
 alloc_prep();
diff --git a/mg-blockage b/mg-blockage
index a21f15b..1f66d8e 100755
--- a/mg-blockage
+++ b/mg-blockage
@@ -40,7 +40,7 @@ sub max { (reverse sort @_)[0]; }
 
 sub plan () {
 alloc_resources(sub {
-   my ($plan) = @_;
+   my ($plan, $mayalloc) = @_;
 
my $now = time;
if ($now > $end) { return (1, { Bookings => [ ] }); }
diff --git a/ts-hosts-allocate-Executive b/ts-hosts-allocate-Executive
index 6583191..4c7c614 100755
--- a/ts-hosts-allocate-Executive
+++ b/ts-hosts-allocate-Executive
@@ -606,7 +606,7 @@ sub alloc_hosts () {
 }
 
 sub attempt_allocation {
-($plan) = @_;
+($plan, $mayalloc) = @_;
 undef $best;
 
 logm("allocating hosts: ".join(' ', map { $_->{Ident} } @hids));
@@ -632,7 +632,7 @@ sub attempt_allocation {
 
 my $retval=0;
 
-if (!$best->{Start}) {
+if ($mayalloc && !$best->{Start}) {
$retval= 1;
foreach my $hid (@hids) {
my $got= actual_allocation($hid);
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 26/29] Manual allocation: Provide JobInfo in mg-blockage

2015-09-07 Thread Ian Jackson
Supply most of the information about our allocations in JobInfo,
rather than Xinfo.  And, correspondingly, rename all the variables
`xinfo' to `info'.

Supply only the supplied hostname as the Xinfo.  This is slightly
redundant but it does at least tell us what came out of the db.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 mg-blockage |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mg-blockage b/mg-blockage
index 0286c27..90547a5 100755
--- a/mg-blockage
+++ b/mg-blockage
@@ -14,7 +14,7 @@ csreadconfig();
 die unless @ARGV==3 || @ARGV==4;
 die if $ARGV[0] =~ m/^=/;
 
-our ($start,$end,$hostflag,$mxinfo) = @ARGV;
+our ($start,$end,$hostflag,$minfo) = @ARGV;
 
 sub parsedate ($) {
 open D, "-|", qw(date +%s -d), @_ or die $!;
@@ -24,13 +24,13 @@ sub parsedate ($) {
 return $r;
 }
 
-my $basexinfo = manual_allocation_base_joinfo();
-my $xinfo = "blockage $basexinfo [$start .. $end]";
-$xinfo .= " $mxinfo" if defined $mxinfo;
+my $baseinfo = manual_allocation_base_jobinfo();
+my $info = "blockage $baseinfo [$start .. $end]";
+$info .= " $minfo" if defined $minfo;
 
 $start = parsedate $start;
 $end   = parsedate $end;
-print $xinfo, "\n" or die $!;
+print $info, "\n" or die $!;
 
 die unless $end > $start;
 
@@ -38,7 +38,7 @@ sub min { (sort @_)[0]; }
 sub max { (reverse sort @_)[0]; }
 
 sub plan () {
-alloc_resources(sub {
+alloc_resources(JobInfo => $info, sub {
my ($plan, $mayalloc) = @_;
 
my $now = time;
@@ -72,7 +72,7 @@ END
foreach (my $ix=0; $ix<$avail; $ix++) {
my $booking =  {
Reso => $res->{restype}.' '.$res->{resname},
-   Xinfo => $xinfo,
+   Xinfo => $host,
Start => $bookstart - $now,
End => $bookend - $now,
};
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 09/29] Planner: ms-queuedaemon: Prep for multiple walkers

2015-09-07 Thread Ian Jackson
We are going to introduce multiple concurrent streams of planning
processing, called `walkers'.

Prepare the ground for this with some formulaic changes which will
otherwise greatly clutter substantive patches.

(A client will still only think for one walker at once, because that's
what the client protocol expects - and anything else would be far too
confusing.)

General:
 * Introduce the concept of a `walker' to ms-queuedaemon.
 * Provide a list of the walkers which might exist, `walkers'
 * Provide some helper procedures for iterating over these,
   and easily accessing their state.

Queue handling:
 * Add a new `w' argument to many procs: specifically, most of the
   procs in the section `machinery for running the queue'.
 * Log the walker ($w) at the start of all relevant log messages.
 * Pass the -w option to ms-planner and ms-planner-debug.
 * Add safety catches which will crash the ms-queuedaemon if it finds
   it is asking the same client to think for more than one walker.
 * we-are-thinking and check-we-are-thinking tell the caller what
   walker the client is thinking for.
 * In the resource-plan.html filename, replace `plan' with the walker
   filename.

Elsewhere:
 * Teach dequeue-chan to deal with all the walkers, including
   maybe the (one) walker for which the client is thinking.
 * Teach log-state to report on all the walkers.
 * In the runneeded logic, hardcode `plan' as the walker to use.

There is still actually only one walker.

No overall functional change, except to some log messages.

Signed-off-by: Ian Jackson 

---
v2: Fix walker-globals to import the $w/$v from #0, ie the global scope
Correct invocation of upvar in walker-globals
Use walker-globals everywhere, not obsolete name walker-vars
Do not pass w to do-book-resources (which does not want it
because it uses uses chan-we-are-thinking)
---
 ms-queuedaemon |  189 +++-
 1 file changed, 120 insertions(+), 69 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index 1aa526c..6e1ba0e 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -21,6 +21,23 @@
 
 source ./tcl/daemonlib.tcl
 
+set walkers {plan}
+
+proc walker-globals {w} {
+# introduces queue_running, thinking[_after] for the specific walker
+foreach v {queue_running thinking thinking_after} {
+   uplevel 1 [list upvar #0 $w/$v $v]
+}
+}
+
+proc foreach-walker {walkervar body} {
+global walkers
+upvar 1 $walkervar w
+foreach w $walkers {
+   uplevel 1 walker-globals $w
+   uplevel 1 $body
+}
+}
 
 proc chan-destroy-stuff {chan} {
 dequeue-chan $chan destroy
@@ -31,13 +48,20 @@ proc chan-destroy-stuff {chan} {
 proc dequeue-chan {chan why} {
 log-event "dequeue-chan $chan $why"
 
-global queue queue_running thinking
+global queue
 lremove queue $chan
 
-if {[info exists queue_running]} { lremove queue_running $chan }
-if {[info exists thinking] &&
-![string compare $thinking $chan]} {
-queuerun-step-done $why
+foreach-walker w {
+   if {[info exists queue_running]} { lremove queue_running $chan }
+}
+
+# Reentrancy: this next loop can only trigger once, because
+# the queuerun-step-done won't ever start chan thinking again
+foreach-walker w {
+   if {[info exists thinking] &&
+   ![string compare $thinking $chan]} {
+   queuerun-step-done $w $why
+   }
 }
 }
 
@@ -55,31 +79,35 @@ proc log-event {m} {
 }
 
 proc log-state {m} {
-global need_queue_run queue queue_running thinking
+global need_queue_run queue
 
 set lhs [format "N=%d Q=%d (%-11.11s) " \
  $need_queue_run [llength $queue] $queue]
 
-if {[info exists queue_running]} {
-append lhs [format "R=%d " [llength $queue_running]]
-if {[info exists thinking]} {
-append lhs [format "T=%s " $thinking]
-} else {
-append lhs [format ""]
-}
-append lhs [format "(%-11.11s) " $queue_running]
-} else {
-append lhs "  "
+foreach-walker w {
+   if {[info exists queue_running]} {
+   append lhs [format "R=%d " [llength $queue_running]]
+   if {[info exists thinking]} {
+   append lhs [format "T=%s " $thinking]
+   } else {
+   append lhs [format ""]
+   }
+   append lhs [format "(%-11.11s) " $queue_running]
+   } else {
+   append lhs "  "
+   }
 }
+
 log "$lhs | $m"
 }
 
 #-- machinery for making sure we run the queue --
 #
 # variables:
-#   queuechans that are waiting for resources
-#   queue_runningunset if not running, list of chans if running
-#   need_queue_run   0: not needed; 1: needed if more resources; 2: force
+#  $w/queuechans that are waiting for resources
+#  $w/queue_runningunset if not running, l

[Xen-devel] [OSSTEST PATCH 03/29] Planner: docs: Document set-info command

2015-09-07 Thread Ian Jackson
Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 README.planner |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/README.planner b/README.planner
index 4cf682d..ef2acba 100644
--- a/README.planner
+++ b/README.planner
@@ -184,6 +184,16 @@ ms-queuedaemon commands
 < OK ms-queuedaemon [INFO...]
 Banner on connection.  INFO should be ignored.
 
+> set-info KEY VALUE
+There are various informational keys which are used by
+the planner, provided by clients.  Keys include:
+
+> set-info preinfo ...   } used for reporting
+> set-info job ...   }
+> set-info priority ...  } used for
+> set-info sub-priority ...  } queue adjustment
+> set-info wait-start-adjust ... }
+
> wait
I want to join the plan
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 13/29] Planner: ms-queuedaemon: Make report-plan take output walker name

2015-09-07 Thread Ian Jackson
We are going to want to process each walker's data into reports which
are not necessarily named after the same walker.

No functional change as yet.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 ms-queuedaemon |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index 1fe555d..811f0ee 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -138,7 +138,7 @@ proc runneeded-perhaps-start {} {
 
 if {![llength $queue]} {
 plan-reset plan
-report-plan plan
+report-plan plan plan
 return
 }
 
@@ -174,7 +174,7 @@ proc runneeded-perhaps-start {} {
 
 proc queuerun-finished/plan {} {
 runneeded-ensure-will 0
-report-plan plan
+report-plan plan plan
 }
 
 proc runneeded-ensure-polling {} {
@@ -272,10 +272,10 @@ proc queuerun-perhaps-step {w} {
 notify-to-think $w $thinking
 }
 
-proc report-plan {w} {
+proc report-plan {w wo} {
 global c
 if {[catch {
-   set outputfile "$c(WebspaceFile)/resource-$w.html"
+   set outputfile "$c(WebspaceFile)/resource-$wo.html"
exec ./ms-planner -w$w show-html > $outputfile
 } emsg]} {
 log "INTERNAL ERROR showing $w html: $emsg"
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH v2 00/29] Planner: Performance improvement

2015-09-07 Thread Ian Jackson
This series addresses a performance problem which we seem to be
experiencing with the existing osstest resource planner in Xen Project
test colo.  The current setup can leave hosts idle for too long before
getting around to allocating them to test jobs.

There are also some reporting/debugging improvements.

The full explanation of the new algorithm is in the commit message for
  14/29 Planner: ms-queuedaemon: Restart planning when resources become free

Most of this series touches code in the central queue daemon, or in
command line tools, and is not subject to the osstest self-push-gate.

I have extensively tested this series in the Cambridge instance, where
this version of ms-queuedaemon is currently running and seems to be
happy.  The queuedaemon changes and the client (per-flight) changes
are fully compatible in both directions (except for Ian C's
out-of-mainline reporting tool which calls `get-plan' and is already
rather broken).


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 02/29] Planner: docs: Minor fixes

2015-09-07 Thread Ian Jackson
 * Document the ms-queuedaemon banner
 * Document the argument to the allocation $resourcecall callback fn.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 Osstest/Executive.pm |2 +-
 README.planner   |3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index bf968c8..ab015d2 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -604,7 +604,7 @@ sub plan_search () {
 }
 
 sub alloc_resources {
-my ($resourcecall) = pop @_;
+my ($resourcecall) = pop @_; # $resourcecall->($plan);
 my (%xparams) = @_;
 # $resourcecall should die (abort) or return ($ok, $bookinglist)
 #
diff --git a/README.planner b/README.planner
index 73e97ea..4cf682d 100644
--- a/README.planner
+++ b/README.planner
@@ -181,6 +181,9 @@ DETAILED PROTOCOL NOTES
 
 ms-queuedaemon commands
 
+< OK ms-queuedaemon [INFO...]
+Banner on connection.  INFO should be ignored.
+
> wait
I want to join the plan
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 11/29] Planner: ms-queuedaemon: Break out queuerun-finished/

2015-09-07 Thread Ian Jackson
This formalises the queue-completed interface, allowing parts outside
the queuerun machinery to cleanly be notified when a queue is
completed, and relieving the queuerun-perhaps-step of the need to know
what to do for the end of any particular walker's queue.

Currently there is still only one walker, `plan'.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 ms-queuedaemon |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index d51fd52..3026a64 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -172,6 +172,11 @@ proc runneeded-perhaps-start {} {
 queuerun-start plan
 }
 
+proc queuerun-finished/plan {} {
+runneeded-ensure-will 0
+report-plan plan
+}
+
 proc runneeded-ensure-polling {} {
 log-event runneeded-ensure-polling
 global polling_after queue c
@@ -245,8 +250,7 @@ proc queuerun-perhaps-step {w} {
 
 if {![llength $queue_running]} {
 unset queue_running
-runneeded-ensure-will 0
-report-plan $w
+   queuerun-finished/$w
 return
 }
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 07/29] Planner: client side: Do not force OSSTEST_RESOURCE_PRIORITY

2015-09-07 Thread Ian Jackson
This makes it possible to run mg-allocate with a different priority
simply by setting OSSTEST_RESOURCE_PRIORITY in its environment.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 mg-allocate |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mg-allocate b/mg-allocate
index 6dc7ddd..14c9088 100755
--- a/mg-allocate
+++ b/mg-allocate
@@ -365,7 +365,7 @@ while (@ARGV && $ARGV[0] =~ m/^[-0-9]/) {
  $2 eq 'm' ?60 :
  1);
 } elsif (s/^\-U/-/) {
-$ENV{OSSTEST_RESOURCE_PRIORITY}= -100;
+$ENV{OSSTEST_RESOURCE_PRIORITY} //= -100;
 } else {
 die "bad option \`$_'";
 }
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 10/29] Planner: ms-queuedaemon: Synchronise thinking multiple walkers

2015-09-07 Thread Ian Jackson
If multiple walkers want to ask the same chan, we want to serialise
them.  This is actually straightforward:  Firstly, we arrrange that
each walker finishing a thought will prompt _all_ walkers to
reconsider whether they need to continue.  Then we can simply do
nothing if we want to a chan to think that another walker is already
waiting for; since that other walker will prompt us later.

Still no actual functional change because there is still only one
walker.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 ms-queuedaemon |   14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index 6e1ba0e..d51fd52 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -216,6 +216,12 @@ proc runneeded-2-requeue {} {
 #  $w/thinking_aftertimeout
 # all the $w/ are generally upvar'd by walker-globals
 
+proc walkers-perhaps-queue-steps {} {
+foreach-walker w {
+   after idle queuerun-perhaps-step $w
+}
+}
+
 proc plan-reset {w} {
 exec ./ms-planner -w$w reset < /dev/null
 }
@@ -226,7 +232,7 @@ proc queuerun-start {w} {
 log-event "$w queuerun-start"
 plan-reset $w
 set queue_running $queue
-after idle queuerun-perhaps-step plan
+walkers-perhaps-queue-steps
 }
 
 proc queuerun-perhaps-step {w} {
@@ -247,7 +253,9 @@ proc queuerun-perhaps-step {w} {
 set next [lindex $queue_running 0]
 set already [we-are-thinking $next]
 if {[llength $already]} {
-   error "next $next thinking $already but also want $w"
+   # $already will wake us via walkers-perhaps-queue-steps
+   log-event "$w queuerun-perhaps-step already $already"
+   return
 }
 
 set thinking $next
@@ -304,7 +312,7 @@ proc queuerun-step-done {w why} {
 unset thinking_after
 }
 unset thinking
-after idle queuerun-perhaps-step $w
+walkers-perhaps-queue-steps
 }
 
 proc queue-thoughts-timedout {w} {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 08/29] Planner: ms-planner support -w option

2015-09-07 Thread Ian Jackson
We are going to introduce multiple concurrent streams of planning
processing, called `walkers' in ms-queuedaemon.  The work-in-progress
plan is stored, server-side, during planning, in data-plan.pl.  But we
need to have more than one of these.

Update ms-planner and ms-planner-debug to honour a -w option, to
specify a replacement for the word `plan' in `data-plan.pl'.

No overall functional change, since nothing uses these options yet.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 ms-planner   |6 +-
 ms-planner-debug |   13 ++---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/ms-planner b/ms-planner
index f38f05b..053f330 100755
--- a/ms-planner
+++ b/ms-planner
@@ -33,12 +33,16 @@ use Osstest::Executive;
 
 open DEBUG, ">/dev/null" or die $!;
 
+our $walker = 'plan';
+
 while (@ARGV and $ARGV[0] =~ m/^-/) {
 $_= shift @ARGV;
 last if m/^--$/;
 while (m/^-./) {
 if (s/^-D/-/) {
 open DEBUG, ">&STDERR" or die $!;
+} elsif (s/^-w(.+)/-/) {
+   $walker = $1;
 } else {
 die "$_ ?";
 }
@@ -49,7 +53,7 @@ csreadconfig();
 
 our ($plan);
 
-my $fn= "data-plan.pl";
+my $fn= "data-$walker.pl";
 
 sub allocations ($$) {
 my ($init, $each) = @_;
diff --git a/ms-planner-debug b/ms-planner-debug
index e277ca6..eac5675 100755
--- a/ms-planner-debug
+++ b/ms-planner-debug
@@ -25,7 +25,14 @@ use Osstest;
 
 csreadconfig();
 
-my $f= sprintf "data-plan-debug-%s.txt", time;
+my $walker = 'plan';
+
+if (@ARGV && $ARGV[0] =~ m/^-w(.+)$/) {
+$walker = $1;
+shift @ARGV;
+}
+
+my $f= sprintf "data-$walker-debug-%s.txt", time;
 
 printf "%s\n", $f;
 
@@ -42,9 +49,9 @@ foreach my $arg (@ARGV) {
 }
 }
 
-print "==data-plan.pl==\n";
+print "==data-$walker.pl==\n";
 
-system 'cat data-plan.pl 2>&1';
+system 'cat data-$walker.pl 2>&1';
 
 print "==resources==\n";
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 14/29] Planner: ms-queuedaemon: Restart planning when resources become free

2015-09-07 Thread Ian Jackson
This solves a performance problem with the existing planner.

The problem is that with a large installation, and a big queue, a full
plan can take a long time to prepare.  (In our current installation,
perhaps as long as half an hour.)  Any resource which becomes free
during one plan run cannot be allocated to a new job until the next
plan run starts.  This means resources (test machines) are often
sitting around idle.

Fix this by restarting the planning process as soon as any new
resource becomes free.  This means that jobs at the front of the queue
get a chance to allocate it right away, so it will probably be
allocated soon.

If it is only interesting to jobs later in the queue, then there may
be a delay in reallocating it, but presumably the resource is not much
in demand and those later jobs will allocate it when they get a bit
closer to the head.

But, there is a problem with this: it means that the plan is generally
never completed.  So we have no overview any more of when which
flights will finish and what the overall queue is like.  We solve this
problem by running a second instance of the planner algorithm, all the
way to completion, in a `dummy' mode where no actual resource
allocation takes place.  This second `projection' instance comes into
being whenever the main `plan' instance is restarted, and it inherits
the planning state from the main `plan' instance.

Global livelock (where we keep restarting the plan but never manage to
allocate anything) is not possible because each restart involves a new
resource becoming free.  If nothing gets allocated because we can't
get that far before being restarted, then eventually there will be
nothing left allocated to become newly free.

Starvation, of a form, is possible: a late-in-queue job which wants a
resource available right now might have difficulty allocating it
because the planner is spending its effort rescheduling early-in-queue
jobs which want resources which are in greater demand - so that the
late-in-queue job never gets called.  Arguably this is an appropriate
allocation of planning time.

With this arrangement we can generate two reports: a `plan' report
containing the short term plan which was used for actual resource
allocation, and which is frequently restarted and therefore not
necessarily complete; and a `projection' report which contains a
complete plan for all work the system is currently aware of, but which
is less-frequently updated.

Because planner clients do not contain the planning algorithm state,
the only client change needed is the ability to run in a `dummy' mode
without actual allocation; this is the `noalloc' feature earlier in
this series.

The main work is in ms-queuedaemon.  We have prepared the ground for
multiple instances of the planning algorithm; from the point of view
of ms-queuedaemon, an instance of the planning algorithm is mainly a
walk over the job queue.  So we call them `walkers'.

Therefore, what we do here is introduce a new `projection' walker,
as follows:

Add `projection' to the global list of possible walkers.

Invent a new section of code, the `restarter', which is responsible
for managing the relationship between the two walkers.  (It uses
direct knowledge of the queue state data structures, etc., to avoid
having to invent a complete formal interface to a walker.)

If we ever finish the plan walker's queue, we update both the
projection report output and the plan report output, from the same
plan.  Finishing the projection walker's queue means we have a
complete projection, but we don't touch the plan.

In principle it might happen that the plan walker might overtake the
projection walker, and then complete, write out a complete and up to
date plan as the projection, and that the projection walker would then
complete and overwrite the projection with less-up-to-date
information.  We don't explicitly exclude this.  Of course such a
result will be rectified soon enough by another planning run.

The restarter can ask the database for the list of currently-available
resources, and can therefore detect when new become newly-free.

The rest of the code remains largely ignorant of the operation of the
restarter.  There are a few hooks:

runneeded-perhaps-start notifies the restarter when we start the
plan; this is used by the restarter to record the set of free
resources at the start of a planning run, so that it can see later
whether any /new/ resources have become free.

restarter-maybe-provoke-restart is called when we get notification
from the the owner daemon that resources may have become idle.  We
look for newly-idle resources, and if there are any, and we are
running the plan walker, we directly edit the plan walker's queue to
put RESTART at the front.

queuerun-perhaps-step spots the special entry RESTART in its queue and
calls into back the restarter when it finds it.  This deferred
approach is necessary because we can't do the restart operation while
a client is thinking (because we would ha

[Xen-devel] [OSSTEST PATCH 06/29] Planner: client side: New `!OK think noalloc' protocol

2015-09-07 Thread Ian Jackson
Introduce a way for the queue daemon to tell its client that it must
not allocate anything in this planning iteration.

In the client:
 * Advertise the new feature via set-info.
 * Accept the `noalloc' part of `!OK think noalloc';
 * Print that in our log message;
 * Honour it by passing it to $resourcecall.

And document the new protocol.  However, there is no server-side yet,
so this does not yet introduce any overall change to the system.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 Osstest/Executive.pm |9 ++---
 README.planner   |6 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index f9be0a0..4f51d70 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -674,6 +674,7 @@ sub alloc_resources {
 $set_info->('priority', $priority);
 $set_info->('sub-priority',$ENV{OSSTEST_RESOURCE_SUBPRIORITY});
 $set_info->('preinfo', $ENV{OSSTEST_RESOURCE_PREINFO});
+   $set_info->('feature-noalloc', 1);
 
 if (defined $waitstart) {
 $set_info->('wait-start',$waitstart);
@@ -699,7 +700,9 @@ sub alloc_resources {
 
 logm("resource allocation: awaiting our slot...");
 
-$_= <$qserv>;  defined && m/^\!OK think\s$/ or die "$_ ?";
+$_= <$qserv>;
+   defined && m/^\!OK think( noalloc)?\s$/ or die "$_ ?";
+   my $noalloc = $1 // '';
 
 opendb_tests();
 
@@ -715,12 +718,12 @@ sub alloc_resources {
read($qserv, $jplan, $jplanlen) == $jplanlen or die $!;
my $jplanprint= $jplan;
chomp $jplanprint;
-   logm("resource allocation: obtained base plan.");
+   logm("resource allocation: obtained base plan$noalloc.");
$debugm->("base plan = ", $jplanprint);
$plan= from_json($jplan);
}, sub {
if (!eval {
-   ($ok, $bookinglist) = $resourcecall->($plan, 1);
+   ($ok, $bookinglist) = $resourcecall->($plan, !$noalloc);
1;
}) {
warn "resourcecall $@";
diff --git a/README.planner b/README.planner
index ef2acba..52f757b 100644
--- a/README.planner
+++ b/README.planner
@@ -194,11 +194,15 @@ ms-queuedaemon commands
 > set-info sub-priority ...  } queue adjustment
 > set-info wait-start-adjust ... }
 
+> set-info feature-noalloc 1
+   The client understands `!OK think noalloc'.
+
> wait
I want to join the plan
 
-   < !OK think
+   < !OK think [noalloc]
Now is the time to add yourself to the plan
+`noalloc' means client must not not actually allocate.
 
> get-plan
< OK get-plan BYTES
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 22/29] Planner: ms-queuedaemon: Better log message for Tcl `after idle'

2015-09-07 Thread Ian Jackson
This does not mean the planner is `idle' in any general sense of the
word.  It just means that the Tcl event loop has finished processing
outstanding events.  Change the debug message to be less confusing.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 ms-queuedaemon |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index a249b50..9280f51 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -73,7 +73,7 @@ proc log-event {m} {
 if {![info exists log_state_after]} {
 set log_state_after [after idle {
 unset log_state_after
-log-state idle
+log-state tcl-idle
 }]
 }
 }
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 12/29] Planner: ms-queuedaemon: Break out notify-to-think

2015-09-07 Thread Ian Jackson
This is going to want to do something more complicated shortly.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
v2: Fix a whitespace error
---
 ms-queuedaemon |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index 3026a64..1fe555d 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -268,9 +268,8 @@ proc queuerun-perhaps-step {w} {
 
 set thinking_after [after [expr {$c(QueueThoughtsTimeout) * 1000}] \
 queue-thoughts-timedout $w]
-for-chan $thinking {
-puts-chan $thinking "!OK think"
-}
+
+notify-to-think $w $thinking
 }
 
 proc report-plan {w} {
@@ -379,6 +378,12 @@ proc cmd/unwait {chan desc} {
 puts-chan $chan "OK unwait $res"
 }
 
+proc notify-to-think {w thinking} {
+for-chan $thinking {
+   puts-chan $thinking "!OK think"
+}
+}
+
 #-- general stuff --
 
 proc banner {chan} {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 20/29] Planner: ms-queuedaemon: Use chan-get-desc in chan-plan-info

2015-09-07 Thread Ian Jackson
This relieves the callers of the need to provide a desc.  This is
helpful because chan-note-unprocessed doesn't have one.

Overall functional change is to show the chan desc (ie, the client TCP
connection info) in the list of unprocessed clients.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 ms-queuedaemon |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index fba28a1..29b906f 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -375,7 +375,8 @@ proc cmd/get-last-plan {chan desc w} {
 return-plan-to-client $chan $w.final
 }
 
-proc chan-plan-info {chan desc} {
+proc chan-plan-info {chan} {
+set desc [get-chan-desc $chan]
 set l {}
 lappend l [chan-get-info $chan {"$info(preinfo)"} ""]
 lappend l [chan-get-info $chan {"job $info(job)"} $desc]
@@ -388,7 +389,7 @@ proc cmd/book-resources {chan desc bytes} {
 proc do-book-resources {chan desc data} {
 global plan errorInfo
 set w [check-we-are-thinking $chan]
-set info [chan-plan-info $plan $desc]
+set info [chan-plan-info $chan]
 if {[catch {
exec ./ms-planner -w$w book-resources $info << $data
 } emsg]} {
@@ -494,7 +495,7 @@ proc restarter-restart-now {} {
 }
 
 proc chan-note-unprocessed {w chan} {
-exec ./ms-planner -w$w unprocessed [chan-plan-info $chan {}]
+exec ./ms-planner -w$w unprocessed [chan-plan-info $chan]
 }
 
 proc notify-to-think {w thinking} {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 28/29] Manual allocation: Show tty in jobinfo

2015-09-07 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
v2: New patch
---
 Osstest/Executive.pm |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index 1efcfd4..84600b4 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -498,7 +498,10 @@ END
 sub manual_allocation_base_jobinfo () {
 my $whoami = `whoami`; chomp $whoami;
 my $hostname = `uname -n`; chomp $hostname;
-return "$whoami\@$hostname";
+my $info = "$whoami\@$hostname";
+my $tty = `tty 2>/dev/null`; chomp $tty;
+$info .= " ($tty)" unless $?;
+return $info;
 }
 
 sub alloc_resources_rollback_begin_work () {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 23/29] Planner: ms-queuedaemon: Reformat debugging output for more tasks

2015-09-07 Thread Ian Jackson
These wider fields will work well for up to 999 tasks.

Cosmetic change only.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 ms-queuedaemon |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index 9280f51..98cf5c1 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -81,20 +81,21 @@ proc log-event {m} {
 proc log-state {m} {
 global need_queue_run queue
 
-set lhs [format "N=%d Q=%d (%-11.11s) " \
+set lhs [format "N=%d Q=%d (%-15.15s) " \
  $need_queue_run [llength $queue] $queue]
 
 foreach-walker w {
+   append lhs "| "
if {[info exists queue_running]} {
-   append lhs [format "R=%d " [llength $queue_running]]
+   append lhs [format "R=%-3d " [llength $queue_running]]
if {[info exists thinking]} {
-   append lhs [format "T=%s " $thinking]
+   append lhs [format "T=%-7s " $thinking]
} else {
-   append lhs [format ""]
+   append lhs [format "  "]
}
-   append lhs [format "(%-11.11s) " $queue_running]
+   append lhs [format "(%-15.15s) " $queue_running]
} else {
-   append lhs "  "
+   append lhs "  "
}
 }
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 19/29] Tcl: Provide get-chan-desc in daemonlib

2015-09-07 Thread Ian Jackson
Provide a facility for the daemon main program to get the channel
connection information.

No caller yet.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 tcl/daemonlib.tcl |5 +
 1 file changed, 5 insertions(+)

diff --git a/tcl/daemonlib.tcl b/tcl/daemonlib.tcl
index d097624..55bc385 100644
--- a/tcl/daemonlib.tcl
+++ b/tcl/daemonlib.tcl
@@ -106,6 +106,11 @@ proc puts-chan-desc {chan m} {
 log "$desc $m"
 }
 
+proc get-chan-desc {chan} {
+upvar \#0 chandesc($chan) desc
+return $desc
+}
+
 proc must-gets-chan {chan re} {
 if {[gets $chan l] <= 0} { error "NOT $chan $re ?" }
 puts-chan-desc $chan "<< $l"
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 15/29] Plan reporting: Break out return-plan-to-client

2015-09-07 Thread Ian Jackson
We are going to want to reuse this.  No functional change.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 ms-queuedaemon |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index d48715e..8c9af81 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -355,11 +355,16 @@ proc cmd/thought-done {chan desc} {
 puts-chan $chan "OK thought"
 }
 
+proc return-plan-to-client {chan wf} {
+set tplan [exec -keepnewline ./ms-planner -w$wf get-plan < /dev/null]
+puts-chan-data $chan "OK get-plan" $tplan
+return $tplan
+}
+
 proc cmd/get-plan {chan desc} {
 global plan
 set w [check-we-are-thinking $chan]
-set plan [exec -keepnewline ./ms-planner -w$w get-plan < /dev/null]
-puts-chan-data $chan "OK get-plan" $plan
+set plan [return-plan-to-client $chan $w]
 }
 
 proc cmd/book-resources {chan desc bytes} {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 25/29] Manual allocation: Break out manual_allocation_base_jobinfo from mg-blockage

2015-09-07 Thread Ian Jackson
This is called `jobinfo' because it ought to be used in
alloc_resources's JobInfo xparam, rather than an Xinfo in the booking:
JobInfo is per planning client; Xinfo is per individual resource.

mg-blockage currently gets this wrong; we will fix that shortly.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 Osstest/Executive.pm |7 +++
 mg-blockage  |5 ++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index 4f51d70..1efcfd4 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -51,6 +51,7 @@ BEGIN {
   report_altchangecolour
   report_blessingscond report_find_push_age_info
   tcpconnect_queuedaemon plan_search
+  manual_allocation_base_jobinfo
   alloc_resources alloc_resources_rollback_begin_work
   resource_check_allocated resource_shared_mark_ready
   duration_estimator
@@ -494,6 +495,12 @@ END
 return $taskid;
 }
 
+sub manual_allocation_base_jobinfo () {
+my $whoami = `whoami`; chomp $whoami;
+my $hostname = `uname -n`; chomp $hostname;
+return "$whoami\@$hostname";
+}
+
 sub alloc_resources_rollback_begin_work () {
 $dbh_tests->rollback();
 db_begin_work($dbh_tests, \@all_lock_tables);
diff --git a/mg-blockage b/mg-blockage
index 1f66d8e..0286c27 100755
--- a/mg-blockage
+++ b/mg-blockage
@@ -24,9 +24,8 @@ sub parsedate ($) {
 return $r;
 }
 
-my $whoami = `whoami`; chomp $whoami;
-my $hostname = `uname -n`; chomp $hostname;
-my $xinfo = "blockage $whoami\@$hostname [$start .. $end]";
+my $basexinfo = manual_allocation_base_joinfo();
+my $xinfo = "blockage $basexinfo [$start .. $end]";
 $xinfo .= " $mxinfo" if defined $mxinfo;
 
 $start = parsedate $start;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 17/29] Planner: ms-queuedaemon: Break out chan-plan-info

2015-09-07 Thread Ian Jackson
Also, refactor the space separator handling to use a list and `join'
(since we are going to maybe have desc be "").

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 ms-queuedaemon |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index 18d703e..72e22d0 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -375,14 +375,20 @@ proc cmd/get-last-plan {chan desc w} {
 return-plan-to-client $chan $w.final
 }
 
+proc chan-plan-info {chan desc} {
+set l {}
+lappend l [chan-get-info $chan {"$info(preinfo)"} ""]
+lappend l [chan-get-info $chan {"job $info(job)"} $desc]
+return [join $l " "]
+}
+
 proc cmd/book-resources {chan desc bytes} {
 read-chan-data $chan $bytes do-book-resources
 }
 proc do-book-resources {chan desc data} {
 global plan errorInfo
 set w [check-we-are-thinking $chan]
-set info [chan-get-info $chan {"$info(preinfo) "} ""]
-append info [chan-get-info $chan {"job $info(job)"} $desc]
+set info [chan-plan-info $plan $desc]
 if {[catch {
exec ./ms-planner -w$w book-resources $info << $data
 } emsg]} {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 24/29] Manual allocation: Report better info in plan for rogue tasks

2015-09-07 Thread Ian Jackson
(This will only take effect as such tasks appear in the plan for the
first time.  Ie, once a rogue task is found, the plan is populated by
whatever version of the planner is running at that time.  So the
effect will not be immediately visible.)

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 ms-planner |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ms-planner b/ms-planner
index 47e0ea7..1f996a6 100755
--- a/ms-planner
+++ b/ms-planner
@@ -291,7 +291,10 @@ END
$info= "(preparing)";
} else {
print DEBUG "rogue $reso $shareix: $arow->{owntaskid}\n";
-   $info= "rogue task $arow->{subtask}";
+   $info= "rogue task";
+   $info .= " $arow->{type} $arow->{refkey}";
+   $info .= " ($arow->{comment})" if defined $arow->{comment};
+   $info .= " $arow->{subtask}";
}
$plan->{Allocations}{$reskey}= {
 Task => $arow->{owntaskid},
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 18/29] Planner: Report unprocessed planning clients

2015-09-07 Thread Ian Jackson
With recent changes, it can happen that a queue daemon client is not
given an opportunity to report itself in the plan.  This makes the
plan incomplete.

(For resource-plan.html, because the planning run was restarted to try
to quickly allocate new resources; for resource-projection.html,
because it's an old client that doesn't support feature-noalloc.)

When this happens, provide an explicit indication of this in the plan:

* Invent a new entry Unprocessed in data-*.pl for this information.
* Display the first 50 in ms-planner show-html.
* Provide a new ms-planner invocation `unprocessed' to record one.
* Note unprocessed when we skip a client due to !feature-noalloc.
* Note unprocessed for remaining queue when we restart planning.

For now this algorithm can be rather unfortunately O(n^2) when
draining the planning queue, because each `ms-planner unprocessed'
invocation adds only one job but needs to read and write the whole
plan.   This will be fixed shortly.

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 ms-planner |   27 +++
 ms-queuedaemon |   10 ++
 2 files changed, 37 insertions(+)

diff --git a/ms-planner b/ms-planner
index 053f330..495c8ff 100755
--- a/ms-planner
+++ b/ms-planner
@@ -238,6 +238,7 @@ sub cmd_reset () {
 
 $plan->{Start}= time;
 $plan->{Events}= { };
+$plan->{Unprocessed}= [ ];
 
 my %magictask;
 foreach my $taskrefkey (qw(preparing shared)) {
@@ -731,12 +732,38 @@ sub cmd_show_html () {
printf "\n";
 }
 printf "\n";
+printf "\n";
+my $unprocessed = $plan->{Unprocessed};
+if (@$unprocessed) {
+   printf "%d tasks not processed and therefore not shown:\n",
+   scalar @$unprocessed;
+   my $shown = 0;
+   printf "\n";
+   foreach my $unprocessed (@$unprocessed) {
+   if (++$shown > 50) {
+   printf "...\n";
+   last;
+   }
+   printf "%s\n", encode_entities($unprocessed->{Info});
+   }
+   printf "\n";
+   printf "\n";
+}
 printf "Report generated %s.\n",
 strftime("%Y-%b-%d %a %H:%M:%S", localtime $now);
 die $! if STDOUT->error;
 die $! unless STDOUT->flush;
 }
 
+sub cmd_unprocessed () {
+die unless @ARGV==1;
+my ($baseinfo) = @ARGV;
+
+get_current_plan();
+push @{ $plan->{Unprocessed} }, { Info => $baseinfo };
+check_write_new_plan();
+}
+
 die unless @ARGV;
 die if $ARGV[0] =~ m/^-/;
 my $subcmd= shift @ARGV;
diff --git a/ms-queuedaemon b/ms-queuedaemon
index 72e22d0..fba28a1 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -482,12 +482,21 @@ proc restarter-restart-now {} {
log-event "restarter-restart-now projection-running"
 }
 
+foreach skip [set plan/queue_running] {
+   for-chan $skip {
+   chan-note-unprocessed plan $skip
+   }
+}
 report-plan plan plan
 
 unset plan/queue_running
 runneeded-ensure-will 2
 }
 
+proc chan-note-unprocessed {w chan} {
+exec ./ms-planner -w$w unprocessed [chan-plan-info $chan {}]
+}
+
 proc notify-to-think {w thinking} {
 for-chan $thinking {
set noalloc [chan-get-info $thinking {$info(feature-noalloc)} {}]
@@ -496,6 +505,7 @@ proc notify-to-think {w thinking} {
projection.1 { puts-chan $thinking "!OK think noalloc" }
projection.* {
# oh well, can't include it in the projection; too bad
+   chan-note-unprocessed $w $thinking
queuerun-step-done $w "!feature-noalloc"
}
}
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 21/29] Planner: Remove O(n^2) problem from plan restart

2015-09-07 Thread Ian Jackson
Change `./ms-planner unprocessed' to take a file of infos on stdin,
and when we restart the planning, invoke it once.

(This would be an incompatible change to the planner, needing a
queuedaemon restart, if this patch were applied separately from the
previous "Report unprocessed planning clients".)

Signed-off-by: Ian Jackson 
---
v2: New patch
---
 ms-planner |   10 +++---
 ms-queuedaemon |   18 +++---
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/ms-planner b/ms-planner
index 495c8ff..47e0ea7 100755
--- a/ms-planner
+++ b/ms-planner
@@ -756,11 +756,15 @@ sub cmd_show_html () {
 }
 
 sub cmd_unprocessed () {
-die unless @ARGV==1;
-my ($baseinfo) = @ARGV;
+die unless @ARGV==0;
 
 get_current_plan();
-push @{ $plan->{Unprocessed} }, { Info => $baseinfo };
+
+while () {
+   chomp;
+   push @{ $plan->{Unprocessed} }, { Info => $_ };
+}
+
 check_write_new_plan();
 }
 
diff --git a/ms-queuedaemon b/ms-queuedaemon
index 29b906f..a249b50 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -483,10 +483,10 @@ proc restarter-restart-now {} {
log-event "restarter-restart-now projection-running"
 }
 
-foreach skip [set plan/queue_running] {
-   for-chan $skip {
-   chan-note-unprocessed plan $skip
-   }
+if {[catch {
+   chans-note-unprocessed plan [set plan/queue_running]
+} emsg]} {
+   log "INTERNAL ERROR setting unprocessed: $emsg"
 }
 report-plan plan plan
 
@@ -494,8 +494,12 @@ proc restarter-restart-now {} {
 runneeded-ensure-will 2
 }
 
-proc chan-note-unprocessed {w chan} {
-exec ./ms-planner -w$w unprocessed [chan-plan-info $chan]
+proc chans-note-unprocessed {w chans} {
+set data {}
+foreach chan $chans {
+   append data [chan-plan-info $chan] "\n"
+}
+exec ./ms-planner -w$w unprocessed << $data
 }
 
 proc notify-to-think {w thinking} {
@@ -506,7 +510,7 @@ proc notify-to-think {w thinking} {
projection.1 { puts-chan $thinking "!OK think noalloc" }
projection.* {
# oh well, can't include it in the projection; too bad
-   chan-note-unprocessed $w $thinking
+   chans-note-unprocessed $w [list $thinking]
queuerun-step-done $w "!feature-noalloc"
}
}
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 01/29] Tcl: Use lshift instead of open-coding with lrange

2015-09-07 Thread Ian Jackson
In ms-queuedaemon, and JobDB-Executive, once each.  No functional
change.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 ms-queuedaemon  |3 +--
 tcl/JobDB-Executive.tcl |3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index e15bc79..d6d59ee 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -210,8 +210,7 @@ proc queuerun-perhaps-step {} {
 return
 }
 
-set thinking [lindex $queue_running 0]
-set queue_running [lrange $queue_running 1 end]
+set thinking [lshift queue_running]
 log-event "queuerun-perhaps-step selected"
 
 set thinking_after [after [expr {$c(QueueThoughtsTimeout) * 1000}] \
diff --git a/tcl/JobDB-Executive.tcl b/tcl/JobDB-Executive.tcl
index 7228712..d61d2a2 100644
--- a/tcl/JobDB-Executive.tcl
+++ b/tcl/JobDB-Executive.tcl
@@ -74,8 +74,7 @@ proc set-flight {} {
 set argv [lrange $argv 2 end]
 }
 
-set flight [lindex $argv 0]
-set argv [lrange $argv 1 end]
+set flight [lshift argv]
 set env(OSSTEST_FLIGHT) $flight
 }
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 04/29] Planner: Fix indefinite holdoff

2015-09-07 Thread Ian Jackson
runneeded-ensure-will would always reset the runneeded_holdoff_after
timer.  So no new queue run would start until no runneeded-ensure-will
has occurred for (currently) 30s.

Instead, only start the timer if it's not already running.

Signed-off-by: Ian Jackson 
Acked-by: Ian Campbell 
---
 ms-queuedaemon |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/ms-queuedaemon b/ms-queuedaemon
index d6d59ee..1aa526c 100755
--- a/ms-queuedaemon
+++ b/ms-queuedaemon
@@ -86,10 +86,12 @@ proc runneeded-ensure-will {need} {
 log-event "runneeded-ensure-will $need (was $need_queue_run)"
 
 if {$need > $need_queue_run} { set need_queue_run $need }
-catch { after cancel $runneeded_holdoff_after }
-set runneeded_holdoff_after \
-[after [expr {$c(QueueDaemonHoldoff) * 1000}] \
- runneeded-perhaps-start]
+
+if {![info exists runneeded_holdoff_after]} {
+   set runneeded_holdoff_after \
+   [after [expr {$c(QueueDaemonHoldoff) * 1000}] \
+runneeded-perhaps-start]
+}
 }
 
 proc runneeded-perhaps-start {} {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Questions related to the analysis of the Xen mailing lists

2015-09-07 Thread Lars Kurth
Daniel,

I am assuming it is OK to post this one to the devel-list, to get extra answers.

> On 7 Sep 2015, at 16:19, Daniel Izquierdo  wrote:
> 
> Hi Lars,
> 
> I promise this is the last email ;).
> 
> We've found that there are several flags launched by different
> developers depending on the status of the patchset.
> 
> By 'flag' I mean comments like 'Signed-off-by: xxx@xxx'.
> 
> However, I think we do not fully understand all of them. I'm listing all
> of them below adding some comments. Please, would you mind double
> checking this and adding others that we may have missed? Thanks in advance!.
> 
> * Signed-off-by: Jan Beulich : This is the author of
> the patch

Correct
See https://www.kernel.org/doc/Documentation/SubmittingPatches section 11 - we 
have exactly the same requirement in Xen

> * Reviewed-by: Andrew Cooper  : This is the
> reviewer of the patch

Correct
Although the definition is more loose compared to 
https://www.kernel.org/doc/Documentation/SubmittingPatches

> * Acked-by: Roger Pau Monnà : Is this another
> reviewer of the patch, but with more privileges?

Correct
Although the definition is more loose compared to 
https://www.kernel.org/doc/Documentation/SubmittingPatches

One of the things I would like to find out, is whether we do have a lot of 
cases where there are
a) comments
b) changes
for a patch after it was marked Acked-by and whether there are any frequent 
patterns.  

> * Release-acked-by: Wei Liu : Is this the developer
> in charge of merging the code into master?

That means the release manager signals that this particular patch can be 
committed. It is normally only relevant during times of a feature freeze. In 
other words, a patch can be acked (because it's technical sound), but it might 
not be appropriate to be committed (too much risk vs too little gain). 

I think you should just ignore that tag when counting contributions. But it may 
be worthwhile including it in the model.

> * Reported-by: Roger Pau Monnà : Is this the
> developer that reported the issue?

See https://www.kernel.org/doc/Documentation/SubmittingPatches - section 13. 
Use is informal right now. We don't require it to be used.

> 
> * Tested-by: Roger Pau Monnà : I guess this is a
> tester, but this does not take place that often.

See https://www.kernel.org/doc/Documentation/SubmittingPatches - section 13. 
Use is informal right now. We don't require it to be used.

> 
> Then, we have also noticed some variations such as:
> 
> * Reviewed-by: Andrew Cooper , with Konrads
> suggestion fixed.

This is common practice when something out of the ordinary happens and is a way 
to highlight it 

> Are you aware of extra 'flags'?

All the ones in https://www.kernel.org/doc/Documentation/SubmittingPatches can 
turn up occasionally, such as suggested-by:, fixes:, ... but these are rare

> In addition to this, we have also noticed several ways to identify a
> patch serie, but pretty similar:
> 
> * [PATCH v0 2/15]
> * [PATCH v0 02/15]
> * [PATCH v1 2 of 15]

I think probably 90% will be covered by the canonical form, with some slight 
variants. Another common variant is [RFC PATCH ...] with the RFC being dropped 
eventually.

Is there usually enough meta-information to map all the patches into a series?

> Given that this process is becoming with some probability more and more
> automated, we're in first place retrieving info for the last year, and
> then expanding the analysis for the rest of the previous years. This
> should help to understand the issue nowadays and then easily escalate
> the problem.
> 
> Regards,
> Daniel.
> 
> 
> 
> --
> Daniel Izquierdo Cortazar, PhD
> Chief Data Officer
> -
> "Software Analytics for your peace of mind"
> www.bitergia.com
> @bitergia
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] efi: introduce efi_arch_flush_dcache_area

2015-09-07 Thread Ian Campbell
On Mon, 2015-09-07 at 16:08 +0100, Stefano Stabellini wrote:
> On Mon, 7 Sep 2015, Jan Beulich wrote:
> > > > > On 07.09.15 at 16:13,  wrote:
> > > Objects loaded by FileHandle->Read need to be flushed to dcache,
> > > otherwise copy_from_paddr will read stale data when copying the 
> > > kernel,
> > > causing a failure to boot.
> > 
> > I have to admit that I'm a little puzzled by this description: If this
> > was a general requirement (which is how it reads to me)
> 
> Yes, it is
> 
> 
> > why does > ->Read() not do the necessary flushing? Or if it's not a
> > general requirement, perhaps the description could be changed to make
> > clear what exact dependency exists that does not exist for all users
> > of ->Read()?
> 
> It is a general requirement: anything that could be accessed without a
> cacheable mapping needs to be flushed. Unfortunately the UEFI spec is
> not clear enough on who should do the flushing on what, and in fact the
> Forum is working on improving it

I might be wrong and/or misremembering but I think this stems partly from
the fact that the ARM UEFI stub in Xen needs to turn off caches (and
paging?) before jumping to the usual Xen entry point.

Then when caches come back on you get inconsistencies because of stale
stuff in the cache which suddenly reappears etc.

Ideally we would flush the whole cache when we disable the cache, but the
mechanism for doing so for the whole cache (by set/way) is not
architecturally required to be snooped by external cache controllers. Only
flush by VA is required to be snooped but that would take a very long time
on a system of any size. At this level Xen is not yet aware of any external
cache controllers to flush them explicitly etc, so we are a bit stuck and
end up having to flush each thing we load by VA.

I'm not sure but an alternative could be to make ARM's head.S work
regardless of the cache being enabled or disabled. I don't know if that is
practical though.

Ian.

> 
> 
> > > --- a/xen/arch/arm/efi/efi-boot.h
> > > +++ b/xen/arch/arm/efi/efi-boot.h
> > > @@ -9,6 +9,7 @@
> > >  #include 
> > >  
> > >  void noreturn efi_xen_start(void *fdt_ptr, uint32_t fdt_size);
> > > +void __flush_dcache_area(const void * vaddr, unsigned long size);
> > 
> > While I see Ian ack-ed this, I find it wrong to replicate a declaration
> > here that now can get out of sync with the canonical one at any
> > point in time, without anyone noticing at build time. Or wait - this
> > looks to be a boot time only interface that so far didn't even have a
> > C declaration.
> 
> That's right
> 
> 
> > That's fine then except for the minor coding style issue
> > (stray blank between "*" and "vaddr").
> 
> Thanks for spotting that
> 
> 
> > > --- a/xen/common/efi/boot.c
> > > +++ b/xen/common/efi/boot.c
> > > @@ -526,7 +526,8 @@ static bool_t __init read_file(EFI_FILE_HANDLE 
> > > dir_handle, CHAR16 *name,
> > >  PrintErr(what);
> > >  PrintErr(L" failed for ");
> > >  PrintErrMesg(name, ret);
> > > -}
> > > +} else
> > > +efi_arch_flush_dcache_area(file->ptr, file->size);
> > 
> > No need for the (misplaced) "else" - PrintErrMesg() does not return.
> 
> Ah! Thanks!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XEN_NETIF_NR_SLOTS_MIN

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 09:15:40AM -0600, Jan Beulich wrote:
> >>> On 07.09.15 at 16:56,  wrote:
> > On Fri, Sep 04, 2015 at 06:50:45AM -0600, Jan Beulich wrote:
> >> Wei,
> >> 
> >> coming back to commit 723a375f4e introducing this constant along
> >> with some commentary: First of all - why 18 when in old Linux kernels
> >> MAX_SKB_FRAGS is 18 and the head usually requires another slot?
> > 
> > That indeed didn't count the head slot, which is handled separately. It
> > also doesn't include meta slots.
> > 
> > Maybe the comment needs to be updated to "slots that contain actual
> > data"?
> 
> That would take care of the meta slots, but the head to me also is
> "actual data".
> 

Right, I get what you mean now. Bumping that to 19 seems appropriate.

> >> And then - why mostly/only for the (guest) TX path? Don't we have
> >> two compatibility issues:
> >> 1) old netfront (max 18 frags) sending via new netback (max 17
> >> frags)
> > 
> > That's handled by netback by consolidating frags.
> 
> Right - that's the TX path that the description in netif.h refers to.
> 
> >> 2) old netback (max 18 frags) handing received packets to new
> >> frontend (max 17 frags)
> > 
> > That's fine because 18 > 17.
> 
> Exactly because of 18 > 17 this is not fine: How would netfront fit
> 18 slots handed to it by netback into 17 frags without special
> adjustments?
> 

I misread it as guest tx while you said guest rx, sorry.

> >> I.e. shouldn't modern netback be capable to (guest) send at least
> >> 19 slots despite there only being 17 frags, and shouldn't similarly
> > 
> > It does -- as said above, the head and extra slots are not counted
> > against that limit.
> > 
> >> modern netfront be able to receive at least 19 slots? In the latter
> >> case - if not, how should netback guess at what number of slots it
> >> can safely forward? (Yes, I am aware of David's 1650d5455b
> > 
> > Note that slot != frag. I admit netif.h conflates the two concepts in a
> > bad way. I think that's due to netback has evolved a lot overtime.
> > 
> > I'm pretty sure there is check in netback to not overrun the ring by
> > using too many slots.
> > 
> > If you're talking about number of frags, it's not on protocol level.
> > Linux netfront is able to consolidate several slots into reasonable
> > number of frags and make sure the number of frags won't exceed limit.
> > See xennet_fill_frags.
> 
> That's too late - xennet_get_responses() would have rejected such
> a packet already afaict (error message "Too many slots").
> 

Right, that's a bit unfortunate.  We might need to have that on the
protocol level, say, backend should not push more than X slots to
frontend.

OOI are you seeing this in real world? If you have a list of bugs that
you find that would be easier to check against source code. I admit I
often find it quite surprising that netback does things in a certain
ways -- that's mostly due to many assumptions were broken over the
years.

Wei.

> >> relying on hypervisor side improvements available only in 4.6, i.e.
> >> while in general considering this a reasonable approach, I'm not
> >> sure this should be done unconditionally, and hence I'm trying to
> >> figure reasonable criteria of when to do this on older hypervisors.)
> > 
> > I think with my above explanation it should be safe for you to not
> > include 1650d5455b.
> 
> Without an equivalent of David's change I don't see how netfront
> would cope with what _old_ netback may hand it.
> 
> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 09:23:59AM -0600, Jan Beulich wrote:
> >>> On 07.09.15 at 17:10,  wrote:
> > On Mon, Sep 07, 2015 at 09:03:12AM -0600, Jan Beulich wrote:
> >> >>> On 07.09.15 at 16:47,  wrote:
> >> > With that in mind, even MAX_SKB_FRAGS + 1 is not enough. It would be
> >> > MAX_SKB_FRAGS + 64K / PAGE_SIZE, i.e. we count the most extreme
> >> > situation that we have 63K data in SKB, and 1 byte in each frag.
> >> 
> >> No, every component (head + every frag) can contribute
> >> exactly one copy operation to a RX ring slot (larger heads, just
> >> like larger frags) would get copied into multiple RX ring slots.
> >> 
> > 
> > I think you misunderstand copy_op slot vs RX ring slot.  The limitation
> > rs for copy_op slots, not RX ring slots. Note that multiplication of
> > XEN_NETIF_RX_RING_SIZE.
> > 
> > The skb header data len is not limited to one 4K page (4K because that's
> > what xen grant table interface supports), netback will break skb header
> > data into multiple copy_op slots.
> 
> But in that case it will not only use multiple copy_op slots, but also
> copy into multiple RX slots. I.e. still - per RX slot there can only be
> MAX_SKB_FRAGS+1 copy_op slots.
> 
> Two examples may help
> 
> head  1 byte
> frag1 1 byte
> ...
> fragN 1 byte
> 
> will need N+1 copy slots and one RX slot.
> 
> head  8k  2 copy slots, 2 RX slots
> frag1 1 byte  3rd copy slot, 3rd RX slot
> ...
> fragN 1 byte  N+2nd copy slot, 3rd RX slot
> 

Yes, indeed. I missed that. I should have notice this when I wrote "what
xen grant table interface supports".

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 10/20] xen/xenbus: Use Xen page definition

2015-09-07 Thread Julien Grall
All the ring (xenstore, and PV rings) are always based on the page
granularity of Xen.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/MFN/GFN base on the new naming
- Add David and Stefano's reviewed-by

Changes in v2:
- Also update the ring mapping function
---
 drivers/xen/xenbus/xenbus_client.c | 6 +++---
 drivers/xen/xenbus/xenbus_probe.c  | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index 2ba09c1..359e654 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -388,7 +388,7 @@ int xenbus_grant_ring(struct xenbus_device *dev, void 
*vaddr,
}
grefs[i] = err;
 
-   vaddr = vaddr + PAGE_SIZE;
+   vaddr = vaddr + XEN_PAGE_SIZE;
}
 
return 0;
@@ -555,7 +555,7 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device 
*dev,
if (!node)
return -ENOMEM;
 
-   area = alloc_vm_area(PAGE_SIZE * nr_grefs, ptes);
+   area = alloc_vm_area(XEN_PAGE_SIZE * nr_grefs, ptes);
if (!area) {
kfree(node);
return -ENOMEM;
@@ -750,7 +750,7 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device 
*dev, void *vaddr)
unsigned long addr;
 
memset(&unmap[i], 0, sizeof(unmap[i]));
-   addr = (unsigned long)vaddr + (PAGE_SIZE * i);
+   addr = (unsigned long)vaddr + (XEN_PAGE_SIZE * i);
unmap[i].host_addr = arbitrary_virt_to_machine(
lookup_address(addr, &level)).maddr;
unmap[i].dev_bus_addr = 0;
diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 3cbe055..33a31cf 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -802,7 +802,8 @@ static int __init xenbus_init(void)
goto out_error;
xen_store_gfn = (unsigned long)v;
xen_store_interface =
-   xen_remap(xen_store_gfn << PAGE_SHIFT, PAGE_SIZE);
+   xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
+ XEN_PAGE_SIZE);
break;
default:
pr_warn("Xenstore state unknown\n");
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 16/20] block/xen-blkback: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity behaving as a
block backend on a non-modified Xen.

It's only necessary to adapt the ring size and the number of request per
indirect frames. The rest of the code is relying on the grant table
code.

Note that the grant table code is allocating a Linux page per grant
which will result to waste 6OKB for every grant when Linux is using 64KB
page granularity. This could be improved by sharing the page between
multiple grants.

Signed-off-by: Julien Grall 
Acked-by: "Roger Pau Monné" 
---

Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

This has been tested only with a loop device. I plan to test passing
hard drive partition but I didn't yet convert the swiotlb code.

Changes in v4:
- Add Roger's acked-by

Changes in v3:
- Use DIV_ROUND_UP in INDIRECT_PAGES to avoid a line over 80
characters
---
 drivers/block/xen-blkback/blkback.c |  5 +++--
 drivers/block/xen-blkback/common.h  | 17 +
 drivers/block/xen-blkback/xenbus.c  |  9 ++---
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 954c002..802319a 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -961,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request 
*req,
seg[n].nsec = segments[i].last_sect -
segments[i].first_sect + 1;
seg[n].offset = (segments[i].first_sect << 9);
-   if ((segments[i].last_sect >= (PAGE_SIZE >> 9)) ||
+   if ((segments[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
(segments[i].last_sect < segments[i].first_sect)) {
rc = -EINVAL;
goto unmap;
@@ -1210,6 +1210,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
req_operation = req->operation == BLKIF_OP_INDIRECT ?
req->u.indirect.indirect_op : req->operation;
+
if ((req->operation == BLKIF_OP_INDIRECT) &&
(req_operation != BLKIF_OP_READ) &&
(req_operation != BLKIF_OP_WRITE)) {
@@ -1268,7 +1269,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
seg[i].nsec = req->u.rw.seg[i].last_sect -
req->u.rw.seg[i].first_sect + 1;
seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
-   if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
+   if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 
9)) ||
(req->u.rw.seg[i].last_sect <
 req->u.rw.seg[i].first_sect))
goto fail_response;
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 45a044a..68e87a0 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,12 +52,20 @@ extern unsigned int xen_blkif_max_ring_order;
  */
 #define MAX_INDIRECT_SEGMENTS 256
 
-#define SEGS_PER_INDIRECT_FRAME \
-   (PAGE_SIZE/sizeof(struct blkif_request_segment))
+/*
+ * Xen use 4K pages. The guest may use different page size (4K or 64K)
+ * Number of Xen pages per segment
+ */
+#define XEN_PAGES_PER_SEGMENT   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PAGES_PER_INDIRECT_FRAME \
+   (XEN_PAGE_SIZE/sizeof(struct blkif_request_segment))
+#define SEGS_PER_INDIRECT_FRAME\
+   (XEN_PAGES_PER_INDIRECT_FRAME / XEN_PAGES_PER_SEGMENT)
+
 #define MAX_INDIRECT_PAGES \
((MAX_INDIRECT_SEGMENTS + SEGS_PER_INDIRECT_FRAME - 
1)/SEGS_PER_INDIRECT_FRAME)
-#define INDIRECT_PAGES(_segs) \
-   ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+#define INDIRECT_PAGES(_segs) DIV_ROUND_UP(_segs, XEN_PAGES_PER_INDIRECT_FRAME)
 
 /* Not a real protocol.  Used to generate ring structs which contain
  * the elements common to all protocols only.  This way we get a
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index deb3f00..edd27e4 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -176,21 +176,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, 
grant_ref_t *gref,
{
struct blkif_sring *sring;
sring = (struct blkif_sring *)blkif->blk_ring;
-   BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE * 
nr_grefs);
+   BACK_RING_INIT(&blkif->blk_rings.native, sring,
+ 

[Xen-devel] [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-09-07 Thread Julien Grall
For ARM64 guests, Linux is able to support either 64K or 4K page
granularity. Although, the hypercall interface is always based on 4K
page granularity.

With 64K page granularity, a single page will be spread over multiple
Xen frame.

To avoid splitting the page into 4K frame, take advantage of the
extent_order field to directly allocate/free chunk of the Linux page
size.

Note that PVMMU is only used for PV guest (which is x86) and the page
granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
that because the code has not been modified.

Signed-off-by: Julien Grall 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: Wei Liu 

Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
for the PV MMU code is kept in order to have at least one even if we
ever decide to drop of code section.

Changes in v4:
- s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
- Use the field lru in the page to get a list of pages when
decreasing the memory reservation. It avoids to use a static
array to store the pages (see v3).
- Update comment for EXTENT_ORDER.

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- Rather than splitting the page into 4KB chunk, use the
extent_order field to allocate directly a Linux page size. This
is avoid lots of code for no benefits.

Changes in v2:
- Use xen_apply_to_page to split a page in 4K chunk
- It's not necessary to have a smaller frame list. Re-use
PAGE_SIZE
- Convert reserve_additional_memory to use XEN_... macro
---
 drivers/xen/balloon.c | 59 ++-
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index c79329f..3babf13 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -70,6 +70,11 @@
 #include 
 #include 
 
+/* Use one extent per PAGE_SIZE to avoid to break down the page into
+ * multiple frame.
+ */
+#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
+
 /*
  * balloon_process() state:
  *
@@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long credit)
nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+   /* We don't support PV MMU when Linux and Xen is using
+* different page granularity.
+*/
+   BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
 /*
  * add_memory() will build page tables for the new memory so
  * the p2m must contain invalid entries so the correct
@@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long 
credit)
 static enum bp_state increase_reservation(unsigned long nr_pages)
 {
int rc;
-   unsigned long  pfn, i;
+   unsigned long i;
struct page   *page;
struct xen_memory_reservation reservation = {
.address_bits = 0,
-   .extent_order = 0,
+   .extent_order = EXTENT_ORDER,
.domid= DOMID_SELF
};
 
@@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
nr_pages = i;
break;
}
-   frame_list[i] = page_to_pfn(page);
+
+   /* XENMEM_populate_physmap requires a PFN based on Xen
+* granularity.
+*/
+   frame_list[i] = page_to_xen_pfn(page);
page = balloon_next_page(page);
}
 
@@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
page = balloon_retrieve(false);
BUG_ON(page == NULL);
 
-   pfn = page_to_pfn(page);
-
 #ifdef CONFIG_XEN_HAVE_PVMMU
+   /* We don't support PV MMU when Linux and Xen is using
+* different page granularity.
+*/
+   BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   unsigned long pfn = page_to_pfn(page);
+
set_phys_to_machine(pfn, frame_list[i]);
 
/* Link back into the page tables if not highmem. */
@@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
 static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 {
enum bp_state state = BP_DONE;
-   unsigned long  pfn, i;
-   struct page   *page;
+   unsigned long i;
+   struct page *page, *tmp;
int ret;
struct xen_memory_reservation reservation = {
.address_bits = 0,
-   .extent_order = 0,
+   .extent_order = EXTENT_ORDER,
.domid= DOMID_SELF
};
+   LIST_HEAD(pages);
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
 

[Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall 

---
Cc: Ian Campbell 
Cc: Wei Liu 
Cc: net...@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Note that I haven't add a comment why the offset is 0 after the first
iteration. See [1] for more details.

[1] https://lkml.org/lkml/2015/8/10/456

Changes in v4:
- Add a comment to explain how we compute MAX_XEN_SKB_FRAGS

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
- The grant callback doesn't allow anymore to use less data. An
helpers has been added in netback to handle this.

Changes in v2:
- Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
- Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
fragment into a new skb
- Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  18 +++--
 drivers/net/xen-netback/netback.c | 153 --
 2 files changed, 110 insertions(+), 61 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..24cb365 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
int id;
@@ -80,16 +81,21 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0x
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+/* The maximum number of frags is derived from the size of a grant (same
+ * as a Xen page size for now).
+ */
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
@@ -203,7 +209,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 /* Maximum number of Rx slots a to-guest packet may use, including the
  * slot needed for GSO meta-data.
  */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
 
 enum state_bit_shift {
/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index d4c1bc7..b1649aa 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct 
xenvif_queue *queue,
return meta;
 }
 
+struct gop_frag_copy {
+   struct xenvif_queue *queue;
+   struct netrx_pending_operations *npo;
+   struct xenvif_rx_meta *meta;
+   int head;
+   int gso_type;
+
+   struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+ unsigned int offset,
+ unsigned int *len,
+ struct gop_frag_copy *info)
+{
+   struct gnttab_copy *copy_gop;
+   struct xen_page_foreign *foreign;
+   /* Convenient aliases */
+   struct xenvif_queue *queue = info->queue;
+   struct netrx_pending_operations *npo = info->npo;
+   struct page *page = info->page;
+
+   BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+   if (npo->copy_off == MAX_BUFFER_OFFSET)
+   info->meta = get_next_rx_buffer(queue, npo);
+
+   if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+   *len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+   copy_gop = npo->copy + npo->copy_prod++;
+   copy_gop->flags = GNTCOPY_de

[Xen-devel] [PATCH v4 17/20] net/xen-netfront: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: net...@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

Changes in v4:
- s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming
- Add David's reviewed-by

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ base on the new naming
- xennet_tx_setup_grant was calling itself resulting an
guest stall when using iperf.
- The grant callback doesn't allow anymore to change the len
(wasn't used here)
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- gnttab_page_grant_foreign_ref has been renamed to
gnttab_foreach_grant_foreign_ref_one

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page in grant
- Fix count slots
---
 drivers/net/xen-netfront.c | 122 -
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 47f791e..17b1013 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF  0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
struct sk_buff *skb;
unsigned short id;
grant_ref_t ref;
-   unsigned long gfn;
+   struct page *page;
struct xen_netif_rx_request *req;
 
skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
BUG_ON((signed short)ref < 0);
queue->grant_rx_ref[id] = ref;
 
-   gfn = 
xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+   page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
req = RING_GET_REQUEST(&queue->rx, req_prod);
-   gnttab_grant_foreign_access_ref(ref,
-   queue->info->xbdev->otherend_id,
-   gfn,
-   0);
-
+   gnttab_page_grant_foreign_access_ref_one(ref,
+
queue->info->xbdev->otherend_id,
+page,
+0);
req->id = id;
req->gref = ref;
}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-   struct netfront_queue *queue, struct sk_buff *skb,
-   struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+   struct netfront_queue *queue;
+   struct sk_buff *skb;
+   struct page *page;
+   struct xen_netif_tx_request *tx; /* Last request */
+   unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data)
 {
+   struct xennet_gnttab_make_txreq *info = data;
unsigned int id;
struct xen_netif_tx_request *tx;
grant_ref_t ref;
-
-   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   /* convenient aliases */
+   struct page *page = info->page;
+   struct netfront_queue *queue = info->queue;
+   struct sk_buff *skb = info->skb;
 
id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_

[Xen-devel] [PATCH v4 13/20] xen/events: fifo: Make it running on 64KB granularity

2015-09-07 Thread Julien Grall
Only use the first 4KB of the page to store the events channel info. It
means that we will waste 60KB every time we allocate page for:
 * control block: a page is allocating per CPU
 * event array: a page is allocating everytime we need to expand it

I think we can reduce the memory waste for the 2 areas by:

* control block: sharing between multiple vCPUs. Although it will
require some bookkeeping in order to not free the page when the CPU
goes offline and the other CPUs sharing the page still there

* event array: always extend the array event by 64K (i.e 16 4K
chunk). That would require more care when we fail to expand the
event channel.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Note I haven't updated the suggestion to reduce the memory waste
after David's email [1]. I can do it if necessary.

Changes in v3:
- Add David and Stefano's reviewed-by

[1] http://lists.xen.org/archives/html/xen-devel/2015-07/msg04596.html
---
 drivers/xen/events/events_base.c | 2 +-
 drivers/xen/events/events_fifo.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c49bb7a..00dd923 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -40,11 +40,11 @@
 #include 
 #include 
 #include 
-#include 
 #endif
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 1d4baf5..e3e9e3d 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -54,7 +54,7 @@
 
 #include "events_internal.h"
 
-#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define EVENT_WORDS_PER_PAGE (XEN_PAGE_SIZE / sizeof(event_word_t))
 #define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS / EVENT_WORDS_PER_PAGE)
 
 struct evtchn_fifo_queue {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 14/20] xen/grant-table: Make it running on 64KB granularity

2015-09-07 Thread Julien Grall
The Xen interface is using 4KB page granularity. This means that each
grant is 4KB.

The current implementation allocates a Linux page per grant. On Linux
using 64KB page granularity, only the first 4KB of the page will be
used.

We could decrease the memory wasted by sharing the page with multiple
grant. It will require some care with the {Set,Clear}ForeignPage macro.

Note that no changes has been made in the x86 code because both Linux
and Xen will only use 4KB page granularity.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Stefano Stabellini 
Cc: Russell King 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Changes in v3:
- Add Stefano's reviewed-by

Changes in v2
- Add David's reviewed-by
---
 arch/arm/xen/p2m.c| 6 +++---
 drivers/xen/grant-table.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 887596c..0ed01f2 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -93,8 +93,8 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref 
*map_ops,
for (i = 0; i < count; i++) {
if (map_ops[i].status)
continue;
-   set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
-   map_ops[i].dev_bus_addr >> PAGE_SHIFT);
+   set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
+   map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT);
}
 
return 0;
@@ -108,7 +108,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref 
*unmap_ops,
int i;
 
for (i = 0; i < count; i++) {
-   set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
+   set_phys_to_machine(unmap_ops[i].host_addr >> XEN_PAGE_SHIFT,
INVALID_P2M_ENTRY);
}
 
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7b4e1cf..99ed9c2 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -642,7 +642,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
if (xen_auto_xlat_grant_frames.count)
return -EINVAL;
 
-   vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
+   vaddr = xen_remap(addr, XEN_PAGE_SIZE * max_nr_gframes);
if (vaddr == NULL) {
pr_warn("Failed to ioremap gnttab share frames (addr=%pa)!\n",
&addr);
@@ -654,7 +654,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
return -ENOMEM;
}
for (i = 0; i < max_nr_gframes; i++)
-   pfn[i] = PFN_DOWN(addr) + i;
+   pfn[i] = XEN_PFN_DOWN(addr) + i;
 
xen_auto_xlat_grant_frames.vaddr = vaddr;
xen_auto_xlat_grant_frames.pfn = pfn;
@@ -1004,7 +1004,7 @@ static void gnttab_request_version(void)
 {
/* Only version 1 is used, which will always be available. */
grant_table_version = 1;
-   grefs_per_grant_frame = PAGE_SIZE / sizeof(struct grant_entry_v1);
+   grefs_per_grant_frame = XEN_PAGE_SIZE / sizeof(struct grant_entry_v1);
gnttab_interface = &gnttab_v1_ops;
 
pr_info("Grant tables using version %d layout\n", grant_table_version);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 11/20] tty/hvc: xen: Use xen page definition

2015-09-07 Thread Julien Grall
The console ring is always based on the page granularity of Xen.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: David Vrabel 
Cc: Boris Ostrovsky 
Cc: linuxppc-...@lists.ozlabs.org

Changes in v4:
- The ring is always 4K (i.e XEN_PAGE_SIZE), so no need to
map with PAGE_SIZE. This was correctly done in v2 but lost with
the rebase to the "s/mfn/gfn/" series

Changes in v3:
- Some changes has been moved in the series "Use correctly the
Xen memory terminologies in Linux".
- Add Stefano's reviewed-by
---
 drivers/tty/hvc/hvc_xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 10beb15..fa816b7 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
if (r < 0 || v == 0)
goto err;
gfn = v;
-   info->intf = xen_remap(gfn << PAGE_SHIFT, PAGE_SIZE);
+   info->intf = xen_remap(gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE);
if (info->intf == NULL)
goto err;
info->vtermno = HVC_COOKIE;
@@ -472,7 +472,7 @@ static int xencons_resume(struct xenbus_device *dev)
struct xencons_info *info = dev_get_drvdata(&dev->dev);
 
xencons_disconnect_backend(info);
-   memset(info->intf, 0, PAGE_SIZE);
+   memset(info->intf, 0, XEN_PAGE_SIZE);
return xencons_connect_backend(dev, info);
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 20/20] arm/xen: Add support for 64KB page granularity

2015-09-07 Thread Julien Grall
The hypercall interface is always using 4KB page granularity. This is
requiring to use xen page definition macro when we deal with hypercall.

Note that pfn_to_gfn is working with a Xen pfn (i.e 4KB). We may want to
rename pfn_gfn to make this explicit.

We also allocate a 64KB page for the shared page even though only the
first 4KB is used. I don't think this is really important for now as it
helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
Xen PFN).

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Russell King 

Stefano, I've dropped your reviewed-by given I've updated the doc and do
changes to avoid usage of XEN_PAGE_SHIFT

Changes in v4:
- Add Stefano's Reviewed-by

Changes in v3:
- s/MFN/GFN/ base on the new naming
- Use virt_to_gfn to avoid use XEN_PAGE_SHIFT
- Drop Stefano's reviewed-by
- Add some docs in arch/arm/asm/xen/page.h

Changes in v2
- Add Stefano's reviewed-by
---
 arch/arm/include/asm/xen/page.h | 15 +--
 arch/arm/xen/enlighten.c|  6 +++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 98c9fc3..e3d94cf 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -28,6 +28,17 @@ typedef struct xpaddr {
 
 #define INVALID_P2M_ENTRY  (~0UL)
 
+/*
+ * The pseudo-physical frame (pfn) used in all the helpers is always based
+ * on Xen page granularity (i.e 4KB).
+ *
+ * A Linux page may be split across multiple non-contiguous Xen page so we
+ * have to keep track with frame based on 4KB page granularity.
+ *
+ * PV drivers should never make a direct usage of those helpers (particularly
+ * pfn_to_gfn and gfn_to_pfn).
+ */
+
 unsigned long __pfn_to_mfn(unsigned long pfn);
 extern struct rb_root phys_to_mach;
 
@@ -64,8 +75,8 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn)
 #define bfn_to_local_pfn(bfn)  bfn_to_pfn(bfn)
 
 /* VIRT <-> GUEST conversion */
-#define virt_to_gfn(v) (pfn_to_gfn(virt_to_pfn(v)))
-#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << PAGE_SHIFT))
+#define virt_to_gfn(v) (pfn_to_gfn(virt_to_phys(v) >> XEN_PAGE_SHIFT))
+#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << XEN_PAGE_SHIFT))
 
 /* Only used in PV code. But ARM guests are always HVM. */
 static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index eeeab07..50b4769 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,8 +89,8 @@ static void xen_percpu_init(void)
pr_info("Xen: initializing cpu%d\n", cpu);
vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
 
-   info.mfn = __pa(vcpup) >> PAGE_SHIFT;
-   info.offset = offset_in_page(vcpup);
+   info.mfn = virt_to_gfn(vcpup);
+   info.offset = xen_offset_in_page(vcpup);
 
err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
BUG_ON(err);
@@ -213,7 +213,7 @@ static int __init xen_guest_init(void)
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+   xatp.gpfn = virt_to_gfn(shared_info_page);
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
BUG();
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 15/20] block/xen-blkfront: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using block
device on a non-modified Xen.

The block API is using segment which should at least be the size of a
Linux page. Therefore, the driver will have to break the page in chunk
of 4K before giving the page to the backend.

When breaking a 64KB segment in 4KB chunks, it is possible that some
chunks are empty. As the PV protocol always require to have data in the
chunk, we have to count the number of Xen page which will be in use and
avoid sending empty chunks.

Note that, a pre-defined number of grants are reserved before preparing
the request. This pre-defined number is based on the number and the
maximum size of the segments. If each segment contains a very small
amount of data, the driver may reserve too many grants (16 grants is
reserved per segment with 64KB page granularity).

Furthermore, in the case of persistent grants we allocate one Linux page
per grant although only the first 4KB of the page will be effectively
in use. This could be improved by sharing the page with multiple grants.

Signed-off-by: Julien Grall 
Acked-by: Roger Pau Monné 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Improvement such as support 64KB grant is not taken into consideration in
this patch because we have the requirement to run a Linux using 64KB page
on a non-modified Xen.

Changes in v4:
- Rebase after d50babbe300eedf33ea5b00a12c5df3a05bd96c7 "
xen-blkfront: introduce blkfront_gather_backend_features()"
- Fix typoes
- Add Roger's acked-by

Changes in v3:
- Use DIV_ROUND_UP in INDIRECT_GREFS
- Split lines over 80 characters whenever it's possible
- s/mfn/gfn/ based on the new naming
- The grant callback doesn't allow anymore to change the len
(wasn't used here).
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- Use gnttab_count_grant to get the number of grants in a sg
- Do some renaming to use the correct variable every time

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page into grant
---
 drivers/block/xen-blkfront.c | 324 ---
 1 file changed, 213 insertions(+), 111 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 4232cbd..f2cdc73 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -78,6 +78,7 @@ struct blk_shadow {
struct grant **grants_used;
struct grant **indirect_grants;
struct scatterlist *sg;
+   unsigned int num_sg;
 };
 
 struct split_bio {
@@ -107,8 +108,12 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for 
the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
(info)->nr_ring_pages)
-#define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * 
XENBUS_MAX_RING_PAGES)
+#define BLK_RING_SIZE(info)\
+   __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages)
+
+#define BLK_MAX_RING_SIZE  \
+   __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
  * characters are enough. Define to 20 to keep consist with backend.
@@ -147,6 +152,7 @@ struct blkfront_info
unsigned int discard_granularity;
unsigned int discard_alignment;
unsigned int feature_persistent:1;
+   /* Number of 4KB segments handled */
unsigned int max_indirect_segments;
int is_ready;
struct blk_mq_tag_set tag_set;
@@ -175,10 +181,23 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define DEV_NAME   "xvd"   /* name in /dev */
 
-#define SEGS_PER_INDIRECT_FRAME \
-   (PAGE_SIZE/sizeof(struct blkif_request_segment))
-#define INDIRECT_GREFS(_segs) \
-   ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+/*
+ * Grants are always the same size as a Xen page (i.e 4KB).
+ * A physical segment is always the same size as a Linux page.
+ * Number of grants per physical segment
+ */
+#define GRANTS_PER_PSEG(PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define GRANTS_PER_INDIRECT_FRAME \
+   (XEN_PAGE_SIZE / sizeof(struct blkif_request_segment))
+
+#define PSEGS_PER_INDIRECT_FRAME   \
+   (GRANTS_INDIRECT_FRAME / GRANTS_PSEGS)
+
+#define INDIRECT_GREFS(_grants)\
+   DIV_ROUND_UP(_grants, GRANTS_PER_INDIRECT_FRAME)
+
+#define GREFS(_psegs)  ((_psegs) * GRANTS_PER_PSEG)
 
 static int blkfront_setup_indirect(struct blkfront_info *info);
 static int blkfront_gather_backend_features(struct blkfront_info *info);
@@ -466,14 +485,100 @@ static int blkif_queue_discard_req(struct request *req)
return 0;
 }
 
+struct setu

[Xen-devel] [PATCH v4 19/20] xen/privcmd: Add support for Linux 64KB page granularity

2015-09-07 Thread Julien Grall
The hypercall interface (as well as the toolstack) is always using 4KB
page granularity. When the toolstack is asking for mapping a series of
guest PFN in a batch, it expects to have the page map contiguously in
its virtual memory.

When Linux is using 64KB page granularity, the privcmd driver will have
to map multiple Xen PFN in a single Linux page.

Note that this solution works on page granularity which is a multiple of
4KB.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

I kept the hypercall arguments in remap_data to avoid allocating them on
the stack every time that remap_pte_fn is called.
I will keep like that unless someone is strongly disagree.

Changes in v4:
- s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
- Add David's reviewed-by

Changes in v3:
- The function to split a Linux page in mutiple Xen page has
been moved internally. It was the only use (not used anymore in
the balloon) and it's not quite clear what should be the common
interface. Differ the question until someone need to use it.
- s/nr_pfn/numgfns/ to make clear that we are dealing with GFN
- Use DIV_ROUND_UP rather round_up and fix the usage in
xen_xlate_unmap_gfn_range

Changes in v2:
- Use xen_apply_to_page
---
 drivers/xen/privcmd.c   |   8 ++--
 drivers/xen/xlate_mmu.c | 124 
 2 files changed, 89 insertions(+), 43 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index c6deb87..c8798ee 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
return -EINVAL;
}
 
-   nr_pages = m.num;
+   nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
return -EINVAL;
 
@@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
goto out_unlock;
}
if (xen_feature(XENFEAT_auto_translated_physmap)) {
-   ret = alloc_empty_pages(vma, m.num);
+   ret = alloc_empty_pages(vma, nr_pages);
if (ret < 0)
goto out_unlock;
} else
@@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
state.global_error  = 0;
state.version   = version;
 
+   BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
/* mmap_batch_fn guarantees ret == 0 */
BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
&pagelist, mmap_batch_fn, &state));
@@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
 {
struct page **pages = vma->vm_private_data;
int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+   int numgfns = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
int rc;
 
if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
return;
 
-   rc = xen_unmap_domain_gfn_range(vma, numpgs, pages);
+   rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
if (rc == 0)
free_xenballooned_pages(numpgs, pages);
else
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index cff2387..5063c5e 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -38,31 +38,28 @@
 #include 
 #include 
 
-/* map fgfn of domid to lpfn in the current domain */
-static int map_foreign_page(unsigned long lpfn, unsigned long fgfn,
-   unsigned int domid)
-{
-   int rc;
-   struct xen_add_to_physmap_range xatp = {
-   .domid = DOMID_SELF,
-   .foreign_domid = domid,
-   .size = 1,
-   .space = XENMAPSPACE_gmfn_foreign,
-   };
-   xen_ulong_t idx = fgfn;
-   xen_pfn_t gpfn = lpfn;
-   int err = 0;
+typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
 
-   set_xen_guest_handle(xatp.idxs, &idx);
-   set_xen_guest_handle(xatp.gpfns, &gpfn);
-   set_xen_guest_handle(xatp.errs, &err);
+/* Break down the pages in 4KB chunk and call fn for each gfn */
+static void xen_for_each_gfn(struct page **pages, unsigned nr_gfn,
+xen_gfn_fn_t fn, void *data)
+{
+   unsigned long xen_pfn = 0;
+   struct page *page;
+   int i;
 
-   rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
-   return rc < 0 ? rc : err;
+   for (i = 0; i < nr_gfn; i++) {
+   if ((i % XEN_PFN_PER_PAGE) == 0) {
+   page = pages[i / XEN_PFN_PER_PAGE];
+   xen_pfn = page_to_xen_pfn(page);
+

[Xen-devel] [PATCH v4 05/20] xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one

2015-09-07 Thread Julien Grall
Many PV drivers contain the idiom:

pfn = page_to_gfn(...) /* Or similar */
gnttab_grant_foreign_access_ref

Replace it by a new helper. Note that when Linux is using a different
page granularity than Xen, the helper only gives access to the first 4KB
grant.

This is useful where drivers are allocating a full Linux page for each
grant.

Also include xen/interface/grant_table.h rather than xen/grant_table.h in
asm/page.h for x86 to fix a compilation issue [1]. Only the former is
useful in order to get the structure definition.

[1] Interdependency between asm/page.h and xen/grant_table.h which result
to page_mfn not being defined when necessary.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Changes in v3:
- Rename gnttab_page_grant_foreign_access_ref into
gnttab_page_grant_foreign_access_ref_one
- Fix typo in the commit message
- s/mfn/gfn based on the new naming
- Add David and Stefano's reviewed-by

Changes in v2:
- Patch added
---
 include/xen/grant_table.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 05b5b08..e17a4b3 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -131,6 +131,15 @@ void gnttab_cancel_free_callback(struct 
gnttab_free_callback *callback);
 void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
 unsigned long frame, int readonly);
 
+/* Give access to the first 4K of the page */
+static inline void gnttab_page_grant_foreign_access_ref_one(
+   grant_ref_t ref, domid_t domid,
+   struct page *page, int readonly)
+{
+   gnttab_grant_foreign_access_ref(ref, domid, xen_page_to_gfn(page),
+   readonly);
+}
+
 void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
   unsigned long pfn);
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux

2015-09-07 Thread Julien Grall
Hi all,

ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
hypercall interface and PV protocol are always based on 4KB page granularity.

Any attempt to boot a Linux guest with 64KB pages enabled will result to a
guest crash.

This series is a first attempt to allow those Linux running with the current
hypercall interface and PV protocol.

This solution has been chosen because we want to run Linux 64KB in released
Xen ARM version or/and platform using an old version of Linux DOM0.

There is room for improvement, such as support of 64KB grant, modification
of PV protocol to support different page size... They will be explored in a
separate patch series later.

TODO list:
- Convert swiotlb to 64KB
- Convert xenfb to 64KB
- Support for multiple page ring support
- Support for 64KB in gnttdev
- Support of non-indirect grant with 64KB frontend
- It may be possible to move some common define between
netback/netfront and blkfront/blkback in an header

I've got most of the patches for the TODO items. I'm planning to send them as
a follow-up as it's not a requirement for a basic guests.

All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
to run it on x86 as I don't have a box with Xen x86 running. I would be
happy if someone give a try and see possible regression for x86.

I know that Konrad as a test-suite for x86. Konrand, would it be possible to
give a run to for this series?

A branch based on the latest xentip/for-linus-4.3 can be found here:

git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

Comments, suggestions are welcomed.

Sincerely yours,

Cc: david.vra...@citrix.com
Cc: konrad.w...@oracle.com
Cc: boris.ostrov...@oracle.com
Cc: wei.l...@citrix.com
Cc: roger@citrix.com

Status of each patch:

A: Reviewed-by - Acked-by
M: Patch modified in this series
m: Minor changes in this series (i.e renaming due to previous patches, typoes)
L: Missing Acked-by from a Linux maintainers (Boris, David, Konrad)
N: Missing Acked-by from a Netback maintainers (Ian or Wei)

Julien Grall (20):
A   net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
A   arm/xen: Drop pte_mfn and mfn_pte
A M L   xen: Add Xen specific page definition
A M xen/grant: Introduce helpers to split a page into grant
A   xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
A   block/xen-blkfront: Split blkif_queue_request in 2
A m block/xen-blkfront: Store a page rather a pfn in the grant structure
A   block/xen-blkfront: split get_grant in 2
A m L   xen/biomerge: Don't allow biovec's to be merged when Linux is not
  using 4KB pages
A   xen/xenbus: Use Xen page definition
A m L   tty/hvc: xen: Use xen page definition
  M L   xen/balloon: Don't rely on the page granularity is the same for Xen
  and Linux
A   xen/events: fifo: Make it running on 64KB granularity
A   xen/grant-table: Make it running on 64KB granularity
A m block/xen-blkfront: Make it running on 64KB page granularity
A   block/xen-blkback: Make it running on 64KB page granularity
A m net/xen-netfront: Make it running on 64KB page granularity
  m  N  net/xen-netback: Make it running on 64KB page granularity
A m xen/privcmd: Add support for Linux 64KB page granularity
A   arm/xen: Add support for 64KB page granularity

 arch/arm/include/asm/xen/page.h |  18 +-
 arch/arm/xen/enlighten.c|   6 +-
 arch/arm/xen/p2m.c  |   6 +-
 arch/x86/include/asm/xen/page.h |   2 +-
 drivers/block/xen-blkback/blkback.c |   5 +-
 drivers/block/xen-blkback/common.h  |  17 +-
 drivers/block/xen-blkback/xenbus.c  |   9 +-
 drivers/block/xen-blkfront.c| 552 +++-
 drivers/net/xen-netback/common.h|  18 +-
 drivers/net/xen-netback/netback.c   | 163 +++
 drivers/net/xen-netfront.c  | 122 +---
 drivers/tty/hvc/hvc_xen.c   |   4 +-
 drivers/xen/balloon.c   |  59 +++-
 drivers/xen/biomerge.c  |   8 +
 drivers/xen/events/events_base.c|   2 +-
 drivers/xen/events/events_fifo.c|   2 +-
 drivers/xen/grant-table.c   |  32 ++-
 drivers/xen/privcmd.c   |   8 +-
 drivers/xen/xenbus/xenbus_client.c  |   6 +-
 drivers/xen/xenbus/xenbus_probe.c   |   3 +-
 drivers/xen/xlate_mmu.c | 124 +---
 include/xen/grant_table.h   |  51 
 include/xen/page.h  |  27 +-
 23 files changed, 855 insertions(+), 389 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop

2015-09-07 Thread Julien Grall
The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall 
Acked-by: Wei Liu 

---
Cc: Ian Campbell 
Cc: net...@vger.kernel.org

Changes in v4:
- Add Wei's acked

Changes in v2:
- Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 7c64c74..d4c1bc7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
*queue, struct sk_buff *skb
unsigned long bytes;
int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+   if (skb_is_gso(skb)) {
+   if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+   else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+   }
+
/* Data must not cross a page boundary. */
BUG_ON(size + offset > PAGE_SIZEgso_type & SKB_GSO_TCPV6)
-   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-   }
-
if (*head && ((1 << gso_type) & queue->vif->gso_mask))
queue->rx.req_cons++;
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 04/20] xen/grant: Introduce helpers to split a page into grant

2015-09-07 Thread Julien Grall
Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.

The new helpers will be in charge of splitting the Linux page into grants
and call a function given by the caller on each grant.

Also provide an helper to count the number of grants within a given
contiguous region.

Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h is required.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org

Changes in v4:
- Typoes
- Rename gnttab_one_grant into gnttab_for_one_grant
- Add Stefano and David's reviewed-by
- s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming

Changes in v3:
- Fix error reported by checkpatch.pl
- Typoes
- s/pfn/xen_pfn/ in gnttab_foreach_grant
- Drop the possibility to use less data. The complexity is moved
in netback which is the only user
- Rename gnttab_foreach_grant into gnttab_foreach_grant_in_range
- s/offset/start/ in gnttab_count_grant and update the
description of the parameter
- s/mfn/gfn base on the new terminologies
- Add EXPORT_SYMBOL_GPL for gnttab_foreach_grant_in_range
- Use xen_offset_in_page and XEN_PFN_DOWN whenever it's possible
- Fix compilation on x86.

Changes in v2:
- Patch added
---
 arch/x86/include/asm/xen/page.h |  2 +-
 drivers/xen/grant-table.c   | 26 +
 include/xen/grant_table.h   | 42 +
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 0b762f6..501479e 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -12,7 +12,7 @@
 #include 
 
 #include 
-#include 
+#include 
 #include 
 
 /* Xen machine address */
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62f591f..7b4e1cf 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -776,6 +776,32 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned 
count)
 }
 EXPORT_SYMBOL_GPL(gnttab_batch_copy);
 
+void gnttab_foreach_grant_in_range(struct page *page,
+  unsigned int offset,
+  unsigned int len,
+  xen_grant_fn_t fn,
+  void *data)
+{
+   unsigned int goffset;
+   unsigned int glen;
+   unsigned long xen_pfn;
+
+   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   goffset = xen_offset_in_page(offset);
+
+   xen_pfn = page_to_xen_pfn(page) + XEN_PFN_DOWN(offset);
+
+   while (len) {
+   glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
+   fn(pfn_to_gfn(xen_pfn), goffset, glen, data);
+
+   goffset = 0;
+   xen_pfn++;
+   len -= glen;
+   }
+}
+EXPORT_SYMBOL_GPL(gnttab_foreach_grant_in_range);
+
 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count)
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 4478f4b..05b5b08 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,8 +45,10 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
+#include 
 
 #define GNTTAB_RESERVED_XENSTORE 1
 
@@ -224,4 +226,44 @@ static inline struct xen_page_foreign 
*xen_page_foreign(struct page *page)
 #endif
 }
 
+/* Split Linux page in chunk of the size of the grant and call fn
+ *
+ * Parameters of fn:
+ * gfn: guest frame number
+ * offset: offset in the grant
+ * len: length of the data in the grant.
+ * data: internal information
+ */
+typedef void (*xen_grant_fn_t)(unsigned long gfn, unsigned int offset,
+  unsigned int len, void *data);
+
+void gnttab_foreach_grant_in_range(struct page *page,
+  unsigned int offset,
+  unsigned int len,
+  xen_grant_fn_t fn,
+  void *data);
+
+/* Helper to get to call fn only on the first "grant chunk" */
+static inline void gnttab_for_one_grant(struct page *page, unsigned int offset,
+   unsigned len, xen_grant_fn_t fn,
+   void *data)
+{
+   /* The first request is 

[Xen-devel] [PATCH v4 08/20] block/xen-blkfront: split get_grant in 2

2015-09-07 Thread Julien Grall
Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.

Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.

Provide 2 functions, to setup indirect grant, the other for bio page.

Signed-off-by: Julien Grall 
Acked-by: Roger Pau Monné 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Changes in v4:
- Add Roger's acked-by

Changes in v3:
- Fix errors reported by checkpatch.pl
- gnttab_page_grant_foreign_access_ref has been renamed to
gnttab_page_grant_foreign_access_ref_one
- Fix compilation by using get_indirect_grant rather than
get_grant (the changes was in a later patch...).
- Make grant_foreign_access static inline
- s/mfn/gfn/ based on the new naming

Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 88 +---
 1 file changed, 59 insertions(+), 29 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 556475d..4232cbd 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -245,34 +245,77 @@ out_of_memory:
return -ENOMEM;
 }
 
-static struct grant *get_grant(grant_ref_t *gref_head,
-  struct page *page,
-  struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_info *info)
 {
struct grant *gnt_list_entry;
-   unsigned long buffer_gfn;
 
BUG_ON(list_empty(&info->grants));
gnt_list_entry = list_first_entry(&info->grants, struct grant,
- node);
+ node);
list_del(&gnt_list_entry->node);
 
-   if (gnt_list_entry->gref != GRANT_INVALID_REF) {
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
info->persistent_gnts_c--;
+
+   return gnt_list_entry;
+}
+
+static inline void grant_foreign_access(const struct grant *gnt_list_entry,
+   const struct blkfront_info *info)
+{
+   gnttab_page_grant_foreign_access_ref_one(gnt_list_entry->gref,
+info->xbdev->otherend_id,
+gnt_list_entry->page,
+0);
+}
+
+static struct grant *get_grant(grant_ref_t *gref_head,
+  unsigned long gfn,
+  struct blkfront_info *info)
+{
+   struct grant *gnt_list_entry = get_free_grant(info);
+
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
return gnt_list_entry;
+
+   /* Assign a gref to this page */
+   gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+   BUG_ON(gnt_list_entry->gref == -ENOSPC);
+   if (info->feature_persistent)
+   grant_foreign_access(gnt_list_entry, info);
+   else {
+   /* Grant access to the GFN passed by the caller */
+   gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
+   info->xbdev->otherend_id,
+   gfn, 0);
}
 
+   return gnt_list_entry;
+}
+
+static struct grant *get_indirect_grant(grant_ref_t *gref_head,
+   struct blkfront_info *info)
+{
+   struct grant *gnt_list_entry = get_free_grant(info);
+
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
+   return gnt_list_entry;
+
/* Assign a gref to this page */
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
-   BUG_ON(!page);
-   gnt_list_entry->page = page;
+   struct page *indirect_page;
+
+   /* Fetch a pre-allocated page to use for indirect grefs */
+   BUG_ON(list_empty(&info->indirect_pages));
+   indirect_page = list_first_entry(&info->indirect_pages,
+struct page, lru);
+   list_del(&indirect_page->lru);
+   gnt_list_entry->page = indirect_page;
}
-   buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
-   gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
-   info->xbdev->otherend_id,
-   buffer_gfn, 0);
+   grant_foreign_access(gnt_list_entry, info);
+
return gnt_list_entry;
 }
 
@@ -525,32 +568,19 @@ static 

[Xen-devel] [PATCH v4 09/20] xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages

2015-09-07 Thread Julien Grall
On ARM all dma-capable devices on a same platform may not be protected
by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
order to use correctly the device.

While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
mapping will screw this contiguous mapping.

When Linux is using 64KB page granularitary, the page may be split
accross multiple non-contiguous MFN (Xen is using 4KB page
granularity). Therefore a DMA request will likely fail.

Checking that a 64KB page is using contiguous MFN is tedious. For
now, always says that biovec are not mergeable.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

There is some ideas to check whether two biovec could be merged
(see [1]) but it's not critical and can be consider as a performance
improvement.

Changes in v4:
- Fix typoes in the subject
- Add Stefano's reviewed-by

Changes in v3:
- Update commit message
- s/mfn/bfn/ base on the new renaming
- Update TODO

Changes in v2:
- Remove the workaround and check if the Linux page granularity
is the same as Xen or not

[1] https://lkml.org/lkml/2015/7/17/418
---
 drivers/xen/biomerge.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index 8ae2fc90..4da69db 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -6,10 +6,18 @@
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
   const struct bio_vec *vec2)
 {
+#if XEN_PAGE_SIZE == PAGE_SIZE
unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
 
return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
((bfn1 == bfn2) || ((bfn1+1) == bfn2));
+#else
+   /*
+* XXX: Add support for merging bio_vec when using different page
+* size in Xen and Linux.
+*/
+   return 0;
+#endif
 }
 EXPORT_SYMBOL(xen_biovec_phys_mergeable);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 07/20] block/xen-blkfront: Store a page rather a pfn in the grant structure

2015-09-07 Thread Julien Grall
All the usage of the field pfn are done using the same idiom:

pfn_to_page(grant->pfn)

This will  return always the same page. Store directly the page in the
grant to clean up the code.

Signed-off-by: Julien Grall 
Acked-by: Roger Pau Monné 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Roger, Stefano, I kept your Acked-by/Reviewed-by because the rebase was
minor. Let me know if you disagree.

Changes in v4:
- rebase after 7adf12b87f45a77d364464018fb8e9e1ac875152
"xen-blkfront: don't add indirect pages to list when
!feature_persistent"

Changes in v3:
- Use the correct indentation in get_grant. The current
indentation (i.e without this patch) was wrong because it was
using space rather than tabulation.
- Add Roger's acked and Stefano's reviewed
- s/mfn/gfn based on the new naming

Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 39 +++
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b11f084..556475d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -68,7 +68,7 @@ enum blkif_state {
 
 struct grant {
grant_ref_t gref;
-   unsigned long pfn;
+   struct page *page;
struct list_head node;
 };
 
@@ -222,7 +222,7 @@ static int fill_grant_buffer(struct blkfront_info *info, 
int num)
kfree(gnt_list_entry);
goto out_of_memory;
}
-   gnt_list_entry->pfn = page_to_pfn(granted_page);
+   gnt_list_entry->page = granted_page;
}
 
gnt_list_entry->gref = GRANT_INVALID_REF;
@@ -237,7 +237,7 @@ out_of_memory:
 &info->grants, node) {
list_del(&gnt_list_entry->node);
if (info->feature_persistent)
-   __free_page(pfn_to_page(gnt_list_entry->pfn));
+   __free_page(gnt_list_entry->page);
kfree(gnt_list_entry);
i--;
}
@@ -246,8 +246,8 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
-   unsigned long pfn,
-   struct blkfront_info *info)
+  struct page *page,
+  struct blkfront_info *info)
 {
struct grant *gnt_list_entry;
unsigned long buffer_gfn;
@@ -266,10 +266,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
-   BUG_ON(!pfn);
-   gnt_list_entry->pfn = pfn;
+   BUG_ON(!page);
+   gnt_list_entry->page = page;
}
-   buffer_gfn = pfn_to_gfn(gnt_list_entry->pfn);
+   buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
info->xbdev->otherend_id,
buffer_gfn, 0);
@@ -525,7 +525,7 @@ static int blkif_queue_rw_req(struct request *req)
 
if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
(i % SEGS_PER_INDIRECT_FRAME == 0)) {
-   unsigned long uninitialized_var(pfn);
+   struct page *uninitialized_var(page);
 
if (segments)
kunmap_atomic(segments);
@@ -542,15 +542,15 @@ static int blkif_queue_rw_req(struct request *req)
indirect_page = 
list_first_entry(&info->indirect_pages,
 struct page, 
lru);
list_del(&indirect_page->lru);
-   pfn = page_to_pfn(indirect_page);
+   page = indirect_page;
}
-   gnt_list_entry = get_grant(&gref_head, pfn, info);
+   gnt_list_entry = get_grant(&gref_head, page, info);
info->shadow[id].indirect_grants[n] = gnt_list_entry;
-   segments = 
kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+   segments = kmap_atomic(gnt_list_entry->page);
ring_req->u.indirect.indirect_grefs[n] = 
gnt_list_entry->gref;
}
 
-   gnt_list_entry = get_grant(&gref_head, 
page_to_pfn(sg_page(sg)), info);
+   gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
ref = gnt_list_entry->gref;
 
info->shadow[id].grants_used[i] = gnt_list_entry;
@@ -561,7 +5

[Xen-devel] [PATCH v4 02/20] arm/xen: Drop pte_mfn and mfn_pte

2015-09-07 Thread Julien Grall
They are not used in common code expect in one place in balloon.c which is
only compiled when Linux is using PV MMU. It's not the case on ARM.

Rather than worrying how to handle the 64KB case, drop them.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Russell King 

Changes in v4:
- Add Stefano's reviewed

Changes in v3:
- Patch added
---
 arch/arm/include/asm/xen/page.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 1279563..98c9fc3 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -13,9 +13,6 @@
 
 #define phys_to_machine_mapping_valid(pfn) (1)
 
-#define pte_mfnpte_pfn
-#define mfn_ptepfn_pte
-
 /* Xen machine address */
 typedef struct xmaddr {
phys_addr_t maddr;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 03/20] xen: Add Xen specific page definition

2015-09-07 Thread Julien Grall
The Xen hypercall interface is always using 4K page granularity on ARM
and x86 architecture.

With the incoming support of 64K page granularity for ARM64 guest, it
won't be possible to re-use the Linux page definition in Xen drivers.

Introduce Xen page definition helpers based on the Linux page
definition. They have exactly the same name but prefixed with
XEN_/xen_ prefix.

Also modify xen_page_to_gfn to use new Xen page definition.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Changes in v4:
- Typoes
- Rename xen_page_to_pfn to page_to_xen_pfn

Changes in v3:
- Fix errors reported by checkpatch.pl
- Rename pfn to xen_pfn in xen_pfn_to_page
- Add a comment that we assume PAGE_SIZE to be a multiple of
XEN_PAGE_SIZE
- s/MFN/GFN/ according to new naming
- Add Stefano's reviewed-by

Changes in v2:
- Add XEN_PFN_UP
- Add a comment describing the behavior of page_to_pfn
---
 include/xen/page.h | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/xen/page.h b/include/xen/page.h
index 1daae48..96294ac 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -1,11 +1,36 @@
 #ifndef _XEN_PAGE_H
 #define _XEN_PAGE_H
 
+#include 
+
+/* The hypercall interface supports only 4KB page */
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE  (_AC(1, UL) << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK  (~(XEN_PAGE_SIZE-1))
+#define xen_offset_in_page(p)  ((unsigned long)(p) & ~XEN_PAGE_MASK)
+
+/*
+ * We assume that PAGE_SIZE is a multiple of XEN_PAGE_SIZE
+ * XXX: Add a BUILD_BUG_ON?
+ */
+
+#define xen_pfn_to_page(xen_pfn)   \
+   ((pfn_to_page(((unsigned long)(xen_pfn) << XEN_PAGE_SHIFT) >> 
PAGE_SHIFT)))
+#define page_to_xen_pfn(page)  \
+   (((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
+
+#define XEN_PFN_PER_PAGE   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PFN_DOWN(x)((x) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_UP(x)  (((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_PHYS(x)((phys_addr_t)(x) << XEN_PAGE_SHIFT)
+
 #include 
 
+/* Return the GFN associated to the first 4KB of the page */
 static inline unsigned long xen_page_to_gfn(struct page *page)
 {
-   return pfn_to_gfn(page_to_pfn(page));
+   return pfn_to_gfn(page_to_xen_pfn(page));
 }
 
 struct xen_memory_region {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 06/20] block/xen-blkfront: Split blkif_queue_request in 2

2015-09-07 Thread Julien Grall
Currently, blkif_queue_request has 2 distinct execution path:
- Send a discard request
- Send a read/write request

The function is also allocating grants to use for generating the
request. Although, this is only used for read/write request.

Rather than having a function with 2 distinct execution path, separate
the function in 2. This will also remove one level of tabulation.

Signed-off-by: Julien Grall 
Reviewed-by: Roger Pau Monné 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Roger, if you really want if can drop the else clause in
blkif_queue_request, IHMO it's more clear here. Although I've kept
your Reviewed-by. Let me know if it's not fine.

Changes in v3:
- Fix errors reported by checkpatch.pl
- Add Roger's Reviewed-by

Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 277 ---
 1 file changed, 153 insertions(+), 124 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 432e105..b11f084 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -395,13 +395,35 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t 
mode,
return 0;
 }
 
-/*
- * Generate a Xen blkfront IO request from a blk layer request.  Reads
- * and writes are handled as expected.
- *
- * @req: a request struct
- */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_discard_req(struct request *req)
+{
+   struct blkfront_info *info = req->rq_disk->private_data;
+   struct blkif_request *ring_req;
+   unsigned long id;
+
+   /* Fill out a communications ring structure. */
+   ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+   id = get_id_from_freelist(info);
+   info->shadow[id].request = req;
+
+   ring_req->operation = BLKIF_OP_DISCARD;
+   ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
+   ring_req->u.discard.id = id;
+   ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
+   if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+   ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
+   else
+   ring_req->u.discard.flag = 0;
+
+   info->ring.req_prod_pvt++;
+
+   /* Keep a private copy so we can reissue requests when recovering. */
+   info->shadow[id].req = *ring_req;
+
+   return 0;
+}
+
+static int blkif_queue_rw_req(struct request *req)
 {
struct blkfront_info *info = req->rq_disk->private_data;
struct blkif_request *ring_req;
@@ -421,9 +443,6 @@ static int blkif_queue_request(struct request *req)
struct scatterlist *sg;
int nseg, max_grefs;
 
-   if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
-   return 1;
-
max_grefs = req->nr_phys_segments;
if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
/*
@@ -453,139 +472,131 @@ static int blkif_queue_request(struct request *req)
id = get_id_from_freelist(info);
info->shadow[id].request = req;
 
-   if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
-   ring_req->operation = BLKIF_OP_DISCARD;
-   ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
-   ring_req->u.discard.id = id;
-   ring_req->u.discard.sector_number = 
(blkif_sector_t)blk_rq_pos(req);
-   if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
-   ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
-   else
-   ring_req->u.discard.flag = 0;
+   BUG_ON(info->max_indirect_segments == 0 &&
+  req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+   BUG_ON(info->max_indirect_segments &&
+  req->nr_phys_segments > info->max_indirect_segments);
+   nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+   ring_req->u.rw.id = id;
+   if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+   /*
+* The indirect operation can only be a BLKIF_OP_READ or
+* BLKIF_OP_WRITE
+*/
+   BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
+   ring_req->operation = BLKIF_OP_INDIRECT;
+   ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
+   BLKIF_OP_WRITE : BLKIF_OP_READ;
+   ring_req->u.indirect.sector_number = 
(blkif_sector_t)blk_rq_pos(req);
+   ring_req->u.indirect.handle = info->handle;
+   ring_req->u.indirect.nr_segments = nseg;
} else {
-   BUG_ON(info->max_indirect_segments == 0 &&
-  req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-   BUG_ON(info->max_indirect_segments &&
-  req->nr_phys_segments > info->max_indirect_segments);
-   nseg = blk_r

Re: [Xen-devel] [Patch V1 1/3] xen: introduce dummy system device

2015-09-07 Thread Stefano Stabellini
On Thu, 3 Sep 2015, Juergen Gross wrote:
> Introduce a new dummy system device serving as parent for virtual
> buses. This will enable new pv backends to introduce virtual buses
> which are removable again opposed to system buses which are meant
> to stay once added.
> 
> Signed-off-by: Juergen Gross 
> ---
>  hw/xenpv/xen_machine_pv.c| 39 +++
>  include/hw/xen/xen_backend.h |  1 +
>  2 files changed, 40 insertions(+)
> 
> diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c
> index 2e545d2..57bc071 100644
> --- a/hw/xenpv/xen_machine_pv.c
> +++ b/hw/xenpv/xen_machine_pv.c
> @@ -24,10 +24,15 @@
>  
>  #include "hw/hw.h"
>  #include "hw/boards.h"
> +#include "hw/sysbus.h"
>  #include "hw/xen/xen_backend.h"
>  #include "xen_domainbuild.h"
>  #include "sysemu/block-backend.h"
>  
> +#define TYPE_XENSYSDEV "xensysdev"
> +
> +DeviceState *xen_sysdev;
> +
>  static void xen_init_pv(MachineState *machine)
>  {
>  const char *kernel_filename = machine->kernel_filename;
> @@ -59,6 +64,9 @@ static void xen_init_pv(MachineState *machine)
>  break;
>  }
>  
> +xen_sysdev = qdev_create(NULL, TYPE_XENSYSDEV);
> +qdev_init_nofail(xen_sysdev);
> +
>  xen_be_register("console", &xen_console_ops);
>  xen_be_register("vkbd", &xen_kbdmouse_ops);
>  xen_be_register("vfb", &xen_framebuffer_ops);
> @@ -93,6 +101,31 @@ static void xen_init_pv(MachineState *machine)
>  xen_init_display(xen_domid);
>  }
>  
> +static int xen_sysdev_init(SysBusDevice *dev)
> +{
> +return 0;
> +}
> +
> +static Property xen_sysdev_properties[] = {
> +{/* end of property list */},
> +};
> +
> +static void xen_sysdev_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
> +
> +k->init = xen_sysdev_init;
> +dc->props = xen_sysdev_properties;
> +}
> +
> +static const TypeInfo xensysdev_info = {
> +.name  = TYPE_XENSYSDEV,
> +.parent= TYPE_SYS_BUS_DEVICE,
> +.instance_size = sizeof(SysBusDevice),
> +.class_init= xen_sysdev_class_init,
> +};
> +
>  static QEMUMachine xenpv_machine = {
>  .name = "xenpv",
>  .desc = "Xen Para-virtualized PC",
> @@ -101,9 +134,15 @@ static QEMUMachine xenpv_machine = {
>  .default_machine_opts = "accel=xen",
>  };
>  
> +static void xenpv_register_types(void)
> +{
> +type_register_static(&xensysdev_info);
> +}

Given that you need this just for usbback, I wonder if you could
move xen_sysdev and its initalization to usbback_init.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 24/31] xen/arm: ITS: Add GICR register emulation

2015-09-07 Thread Vijay Kilari
On Mon, Sep 7, 2015 at 7:50 PM, Julien Grall  wrote:
> Hi Vijay,
>
> On 31/08/15 12:06, vijay.kil...@gmail.com wrote:
>> +static int vgic_v3_gits_lpi_mmio_read(struct vcpu *v, mmio_info_t *info)
>
> [...]
>
>> +/* Neglect if not LPI. */
>> +if ( offset < FIRST_GIC_LPI )
>> +{
>> +*r = 0;
>> +return 1;
>> +}
>
> [...]
>
>> +static int vgic_v3_gits_lpi_mmio_write(struct vcpu *v, mmio_info_t *info)
>> +{
>
> [...]
>
>> +/* Neglect if not LPI. */
>> +if ( offset < FIRST_GIC_LPI )
>> +return 1;
>
> Based on the spec, those 2 checks are wrong and make impossible to use
> LPIs. Please test this patch series before sending it on the ML.

Why do you think so?.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 17:10,  wrote:
> On Mon, Sep 07, 2015 at 09:03:12AM -0600, Jan Beulich wrote:
>> >>> On 07.09.15 at 16:47,  wrote:
>> > With that in mind, even MAX_SKB_FRAGS + 1 is not enough. It would be
>> > MAX_SKB_FRAGS + 64K / PAGE_SIZE, i.e. we count the most extreme
>> > situation that we have 63K data in SKB, and 1 byte in each frag.
>> 
>> No, every component (head + every frag) can contribute
>> exactly one copy operation to a RX ring slot (larger heads, just
>> like larger frags) would get copied into multiple RX ring slots.
>> 
> 
> I think you misunderstand copy_op slot vs RX ring slot.  The limitation
> rs for copy_op slots, not RX ring slots. Note that multiplication of
> XEN_NETIF_RX_RING_SIZE.
> 
> The skb header data len is not limited to one 4K page (4K because that's
> what xen grant table interface supports), netback will break skb header
> data into multiple copy_op slots.

But in that case it will not only use multiple copy_op slots, but also
copy into multiple RX slots. I.e. still - per RX slot there can only be
MAX_SKB_FRAGS+1 copy_op slots.

Two examples may help

head1 byte
frag1   1 byte
...
fragN   1 byte

will need N+1 copy slots and one RX slot.

head8k  2 copy slots, 2 RX slots
frag1   1 byte  3rd copy slot, 3rd RX slot
...
fragN   1 byte  N+2nd copy slot, 3rd RX slot

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XEN_NETIF_NR_SLOTS_MIN

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 16:56,  wrote:
> On Fri, Sep 04, 2015 at 06:50:45AM -0600, Jan Beulich wrote:
>> Wei,
>> 
>> coming back to commit 723a375f4e introducing this constant along
>> with some commentary: First of all - why 18 when in old Linux kernels
>> MAX_SKB_FRAGS is 18 and the head usually requires another slot?
> 
> That indeed didn't count the head slot, which is handled separately. It
> also doesn't include meta slots.
> 
> Maybe the comment needs to be updated to "slots that contain actual
> data"?

That would take care of the meta slots, but the head to me also is
"actual data".

>> And then - why mostly/only for the (guest) TX path? Don't we have
>> two compatibility issues:
>> 1) old netfront (max 18 frags) sending via new netback (max 17
>> frags)
> 
> That's handled by netback by consolidating frags.

Right - that's the TX path that the description in netif.h refers to.

>> 2) old netback (max 18 frags) handing received packets to new
>> frontend (max 17 frags)
> 
> That's fine because 18 > 17.

Exactly because of 18 > 17 this is not fine: How would netfront fit
18 slots handed to it by netback into 17 frags without special
adjustments?

>> I.e. shouldn't modern netback be capable to (guest) send at least
>> 19 slots despite there only being 17 frags, and shouldn't similarly
> 
> It does -- as said above, the head and extra slots are not counted
> against that limit.
> 
>> modern netfront be able to receive at least 19 slots? In the latter
>> case - if not, how should netback guess at what number of slots it
>> can safely forward? (Yes, I am aware of David's 1650d5455b
> 
> Note that slot != frag. I admit netif.h conflates the two concepts in a
> bad way. I think that's due to netback has evolved a lot overtime.
> 
> I'm pretty sure there is check in netback to not overrun the ring by
> using too many slots.
> 
> If you're talking about number of frags, it's not on protocol level.
> Linux netfront is able to consolidate several slots into reasonable
> number of frags and make sure the number of frags won't exceed limit.
> See xennet_fill_frags.

That's too late - xennet_get_responses() would have rejected such
a packet already afaict (error message "Too many slots").

>> relying on hypervisor side improvements available only in 4.6, i.e.
>> while in general considering this a reasonable approach, I'm not
>> sure this should be done unconditionally, and hence I'm trying to
>> figure reasonable criteria of when to do this on older hypervisors.)
> 
> I think with my above explanation it should be safe for you to not
> include 1650d5455b.

Without an equivalent of David's change I don't see how netfront
would cope with what _old_ netback may hand it.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread David Vrabel
On 04/09/15 10:27, Jan Beulich wrote:
> Ian, Wei,
> 
> I seem to be seeing two issues in the grant copy handling of netback,
> solely from code inspection:
> 
> 1) Shouldn't MAX_GRANT_COPY_OPS, to take care of the copying
> the header may require, be
> ((MAX_SKB_FRAGS + 1) * XEN_NETIF_RX_RING_SIZE)?

We really should have the copy batches be independent of ring size and
skb format.

e.g., Call xenvif_gnt_copy_add(ref, ptr, len) for each copy op needed.
This will issue the hypercall when the batch is full.  There would need
to be axenvif_gnt_copy_flush() at the end for last hypercall for the
remaining ops.

David


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] efi: introduce efi_arch_flush_dcache_area

2015-09-07 Thread Stefano Stabellini
On Mon, 7 Sep 2015, Jan Beulich wrote:
> >>> On 07.09.15 at 16:13,  wrote:
> > Objects loaded by FileHandle->Read need to be flushed to dcache,
> > otherwise copy_from_paddr will read stale data when copying the kernel,
> > causing a failure to boot.
> 
> I have to admit that I'm a little puzzled by this description: If this
> was a general requirement (which is how it reads to me)

Yes, it is


> why does > ->Read() not do the necessary flushing? Or if it's not a
> general requirement, perhaps the description could be changed to make
> clear what exact dependency exists that does not exist for all users
> of ->Read()?

It is a general requirement: anything that could be accessed without a
cacheable mapping needs to be flushed. Unfortunately the UEFI spec is
not clear enough on who should do the flushing on what, and in fact the
Forum is working on improving it


> > --- a/xen/arch/arm/efi/efi-boot.h
> > +++ b/xen/arch/arm/efi/efi-boot.h
> > @@ -9,6 +9,7 @@
> >  #include 
> >  
> >  void noreturn efi_xen_start(void *fdt_ptr, uint32_t fdt_size);
> > +void __flush_dcache_area(const void * vaddr, unsigned long size);
> 
> While I see Ian ack-ed this, I find it wrong to replicate a declaration
> here that now can get out of sync with the canonical one at any
> point in time, without anyone noticing at build time. Or wait - this
> looks to be a boot time only interface that so far didn't even have a
> C declaration.

That's right


> That's fine then except for the minor coding style issue
> (stray blank between "*" and "vaddr").

Thanks for spotting that


> > --- a/xen/common/efi/boot.c
> > +++ b/xen/common/efi/boot.c
> > @@ -526,7 +526,8 @@ static bool_t __init read_file(EFI_FILE_HANDLE 
> > dir_handle, CHAR16 *name,
> >  PrintErr(what);
> >  PrintErr(L" failed for ");
> >  PrintErrMesg(name, ret);
> > -}
> > +} else
> > +efi_arch_flush_dcache_area(file->ptr, file->size);
> 
> No need for the (misplaced) "else" - PrintErrMesg() does not return.

Ah! Thanks!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 09:03:12AM -0600, Jan Beulich wrote:
> >>> On 07.09.15 at 16:47,  wrote:
> > On Fri, Sep 04, 2015 at 03:27:42AM -0600, Jan Beulich wrote:
> >> Ian, Wei,
> >> 
> >> I seem to be seeing two issues in the grant copy handling of netback,
> >> solely from code inspection:
> >> 
> >> 1) Shouldn't MAX_GRANT_COPY_OPS, to take care of the copying
> >> the header may require, be
> >> ((MAX_SKB_FRAGS + 1) * XEN_NETIF_RX_RING_SIZE)?
> >> 
> > 
> > Not sure what you mean by "header". I take it you mean SKB itself? We do
> > need to use grant copy data inside SKB (not in frag list) to frontend.
> 
> Yes - with "header" is mean everything up to skb_headlen(skb).
> 
> > It sounds plausible that there is some sort of miscounting, just that no
> > SKB is seen to be so broken to trigger that.
> 
> Why "broken"?
> 

Broken because I think that's going to introduce some performance
regression. But I see why "broken" is a bad description. Let's say, a
bit "extreme".

> > With that in mind, even MAX_SKB_FRAGS + 1 is not enough. It would be
> > MAX_SKB_FRAGS + 64K / PAGE_SIZE, i.e. we count the most extreme
> > situation that we have 63K data in SKB, and 1 byte in each frag.
> 
> No, every component (head + every frag) can contribute
> exactly one copy operation to a RX ring slot (larger heads, just
> like larger frags) would get copied into multiple RX ring slots.
> 

I think you misunderstand copy_op slot vs RX ring slot.  The limitation
rs for copy_op slots, not RX ring slots. Note that multiplication of
XEN_NETIF_RX_RING_SIZE.

The skb header data len is not limited to one 4K page (4K because that's
what xen grant table interface supports), netback will break skb header
data into multiple copy_op slots.

Wei.

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable: Keyboard doesn't seem to work when using VNC to HVM with qemu-xen device model.

2015-09-07 Thread Sander Eikelenboom

Monday, September 7, 2015, 3:21:45 PM, you wrote:

> On Sun, Sep 06, 2015 at 12:46:01PM +0200, Sander Eikelenboom wrote:
>> Hi All,
>> 
>> Today i noticed that keyboard doesn't work when using VNC on a HVM guest
>> which runs under qemu-xen device-model.
>> Mouse still works with usbdevice='tablet', but keypress show no response
>> what so ever.
>> 
>> Unfortunately i'm on a tight time budget the next few weeks, so i just
>> switched to device_model_version='qemu-xen-traditional (with the same
>> config) and there all is fine.
>> 
>> This is on a xen-unstable of about 2 weeks ago, since i don't have time to
>> reply, and xen is already in the RC's hopefully someone else can re-test and
>> have a look.
>> 

> I just tried staging branch, keyboard worked. My guest is a my usual
> test guest which has Debian install. Note that I don't run desktop
> environment in that guest.

> I fail to see what changes in recent weeks could affect QEMU VNC
> keyboard.

> Wei.

>> Thanks,
>> 
>> Sander
>> 
>> ___
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

Ok perhaps it's something special on my install then guest was linux console 
(clonezilla). Unfortunatly no time to retest any time soon, but since it works 
for you it will probably work for most installs :) thanks for testing.

--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 16:47,  wrote:
> On Fri, Sep 04, 2015 at 03:27:42AM -0600, Jan Beulich wrote:
>> Ian, Wei,
>> 
>> I seem to be seeing two issues in the grant copy handling of netback,
>> solely from code inspection:
>> 
>> 1) Shouldn't MAX_GRANT_COPY_OPS, to take care of the copying
>> the header may require, be
>> ((MAX_SKB_FRAGS + 1) * XEN_NETIF_RX_RING_SIZE)?
>> 
> 
> Not sure what you mean by "header". I take it you mean SKB itself? We do
> need to use grant copy data inside SKB (not in frag list) to frontend.

Yes - with "header" is mean everything up to skb_headlen(skb).

> It sounds plausible that there is some sort of miscounting, just that no
> SKB is seen to be so broken to trigger that.

Why "broken"?

> With that in mind, even MAX_SKB_FRAGS + 1 is not enough. It would be
> MAX_SKB_FRAGS + 64K / PAGE_SIZE, i.e. we count the most extreme
> situation that we have 63K data in SKB, and 1 byte in each frag.

No, every component (head + every frag) can contribute
exactly one copy operation to a RX ring slot (larger heads, just
like larger frags) would get copied into multiple RX ring slots.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XEN_NETIF_NR_SLOTS_MIN

2015-09-07 Thread Wei Liu
On Fri, Sep 04, 2015 at 06:50:45AM -0600, Jan Beulich wrote:
> Wei,
> 
> coming back to commit 723a375f4e introducing this constant along
> with some commentary: First of all - why 18 when in old Linux kernels
> MAX_SKB_FRAGS is 18 and the head usually requires another slot?

That indeed didn't count the head slot, which is handled separately. It
also doesn't include meta slots.

Maybe the comment needs to be updated to "slots that contain actual
data"?

> And then - why mostly/only for the (guest) TX path? Don't we have
> two compatibility issues:
> 1) old netfront (max 18 frags) sending via new netback (max 17
> frags)

That's handled by netback by consolidating frags.

> 2) old netback (max 18 frags) handing received packets to new
> frontend (max 17 frags)

That's fine because 18 > 17.

> I.e. shouldn't modern netback be capable to (guest) send at least
> 19 slots despite there only being 17 frags, and shouldn't similarly

It does -- as said above, the head and extra slots are not counted
against that limit.

> modern netfront be able to receive at least 19 slots? In the latter
> case - if not, how should netback guess at what number of slots it
> can safely forward? (Yes, I am aware of David's 1650d5455b

Note that slot != frag. I admit netif.h conflates the two concepts in a
bad way. I think that's due to netback has evolved a lot overtime.

I'm pretty sure there is check in netback to not overrun the ring by
using too many slots.

If you're talking about number of frags, it's not on protocol level.
Linux netfront is able to consolidate several slots into reasonable
number of frags and make sure the number of frags won't exceed limit.
See xennet_fill_frags.

> relying on hypervisor side improvements available only in 4.6, i.e.
> while in general considering this a reasonable approach, I'm not
> sure this should be done unconditionally, and hence I'm trying to
> figure reasonable criteria of when to do this on older hypervisors.)
> 

I think with my above explanation it should be safe for you to not
include 1650d5455b.

Wei.

> Thanks, Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] efi: introduce efi_arch_flush_dcache_area

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 16:13,  wrote:
> Objects loaded by FileHandle->Read need to be flushed to dcache,
> otherwise copy_from_paddr will read stale data when copying the kernel,
> causing a failure to boot.

I have to admit that I'm a little puzzled by this description: If this
was a general requirement (which is how it reads to me), why does
->Read() not do the necessary flushing? Or if it's not a general
requirement, perhaps the description could be changed to make
clear what exact dependency exists that does not exist for all
users of ->Read()?

> --- a/xen/arch/arm/efi/efi-boot.h
> +++ b/xen/arch/arm/efi/efi-boot.h
> @@ -9,6 +9,7 @@
>  #include 
>  
>  void noreturn efi_xen_start(void *fdt_ptr, uint32_t fdt_size);
> +void __flush_dcache_area(const void * vaddr, unsigned long size);

While I see Ian ack-ed this, I find it wrong to replicate a declaration
here that now can get out of sync with the canonical one at any
point in time, without anyone noticing at build time. Or wait - this
looks to be a boot time only interface that so far didn't even have a
C declaration. That's fine then except for the minor coding style issue
(stray blank between "*" and "vaddr").

> --- a/xen/common/efi/boot.c
> +++ b/xen/common/efi/boot.c
> @@ -526,7 +526,8 @@ static bool_t __init read_file(EFI_FILE_HANDLE 
> dir_handle, CHAR16 *name,
>  PrintErr(what);
>  PrintErr(L" failed for ");
>  PrintErrMesg(name, ret);
> -}
> +} else
> +efi_arch_flush_dcache_area(file->ptr, file->size);

No need for the (misplaced) "else" - PrintErrMesg() does not return.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [raisin][PATCH] Gracefully handle an unsupported distro.

2015-09-07 Thread Stefano Stabellini
On Tue, 18 Aug 2015, Doug Goldstein wrote:
> When your distro is not supported, handle the case gracefully and let
> the user know instead of spitting out bash errors.
> 
> Signed-off-by: Doug Goldstein 

Thanks for the patch!

Reviewed-by: Stefano Stabellini 


>  lib/common-functions.sh | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/lib/common-functions.sh b/lib/common-functions.sh
> index efde84c..c52174a 100644
> --- a/lib/common-functions.sh
> +++ b/lib/common-functions.sh
> @@ -176,6 +176,7 @@ function get_distro() {
>  ;;
>  *)
>  DISTRO=$os_VENDOR
> +PKGTYPE="unknown"
>  ;;
>  esac
>  
> @@ -220,6 +221,16 @@ function _install-package-rpm() {
>  $SUDO yum install -y $* > /dev/null
>  }
>  
> +function _check-package-unknown() {
> +error_echo "I don't know distro $DISTRO. It might be missing packages."
> +return 1
> +}
> +
> +function _install-package-unknown() {
> +error_echo "I don't know distro $DISTRO. Cannot install packages."
> +return 1
> +}
> +
>  # Modifies inherited variable "missing"
>  function check-package() {
>  for p in $*

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread Wei Liu
On Fri, Sep 04, 2015 at 03:27:42AM -0600, Jan Beulich wrote:
> Ian, Wei,
> 
> I seem to be seeing two issues in the grant copy handling of netback,
> solely from code inspection:
> 
> 1) Shouldn't MAX_GRANT_COPY_OPS, to take care of the copying
> the header may require, be
> ((MAX_SKB_FRAGS + 1) * XEN_NETIF_RX_RING_SIZE)?
> 

Not sure what you mean by "header". I take it you mean SKB itself? We do
need to use grant copy data inside SKB (not in frag list) to frontend.

It sounds plausible that there is some sort of miscounting, just that no
SKB is seen to be so broken to trigger that.

With that in mind, even MAX_SKB_FRAGS + 1 is not enough. It would be
MAX_SKB_FRAGS + 64K / PAGE_SIZE, i.e. we count the most extreme
situation that we have 63K data in SKB, and 1 byte in each frag.

> 2) xenvif_gop_frag_copy() may set up any number of copy_gop-s
> per meta slot, yet xenvif_check_gop() checks only one.
> 

That's indeed something that need to be fixed.

> Apart from that I find it puzzling that the comment ahead of
> xenvif_gop_frag_copy() talks about flipping interfaces, while iirc
> all flipping code has long been deleted from the upstream driver.
> 

That should be removed IMO.

Wei.

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 16:24,  wrote:
> I believe this also has something to do with a windows guest boot hang 
> issue.
> 
> It randomly occured, when boot a guest has windows 2008 os and pv-driver 
> installed.
> The boot process hangs when wait xenstored replay event signal.
> 
> It can be reproduced after hundreds reboot using the xen staging branch.
> But after I changed this code the hang issue can not reproduce.

The change below (which I don't think was ever posted to xen-devel)
does not make any sense, as it prohibits timely delivery of guest
interrupts. If there is an issue, I think you'd need to start with clearly
describing the issue you see and what you think it is being caused by.

Jan

> On 2015/9/7 22:46, - wrote:
>> From: liuqiming 00178499 
>>
>> This set operation doesn't make any sense, and it will block later
>> interrupt injected into vm (by vcpu_kick or deliver_posted_intr),
>> which will cause a performance issue on CPU supporting posted-interrupt.
>>
>> Signed-off-by: Qiming Liu 
>> Cc: Yang Zhang 
>> ---
>>   xen/arch/x86/hvm/vmx/vmx.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index 2582cdd..9480b44 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1681,7 +1681,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu 
> *v)
>>   {
>>   unsigned int cpu = v->processor;
>>   
>> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>> +if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>&& (cpu != smp_processor_id()) )
>>   send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>   }
> 
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 1/2] xen: move perform_gunzip to common

2015-09-07 Thread Stefano Stabellini
The current gunzip code to decompress the Dom0 kernel is implemented in
inflate.c which is included by bzimage.c.

I am looking to doing the same on ARM64 but there is quite a bit of
boilerplate definitions that I would need to import in order for
inflate.c to work correctly.

Instead of copying/pasting the code from x86/bzimage.c, move those
definitions to a new common file, gunzip.c. Export only perform_gunzip
and gzip_check. Leave output_length where it is.

Signed-off-by: Stefano Stabellini 
Acked-by: Jan Beulich 
CC: andrew.coop...@citrix.com

---
Changes in v4:
- move gunzip.init.o to its alphabetically correct place

Changes in v3:
- build gunzip.c as gunzip.init.o
- remove #include 
- remove __init from declarations

Changes in v2:
- the patch has been reworked from scratch
---
 xen/arch/x86/bzimage.c   |  134 +
 xen/common/Makefile  |1 +
 xen/common/gunzip.c  |  137 ++
 xen/include/xen/gunzip.h |7 +++
 4 files changed, 146 insertions(+), 133 deletions(-)
 create mode 100644 xen/common/gunzip.c
 create mode 100644 xen/include/xen/gunzip.h

diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index c86c39e..50ebb84 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -4,148 +4,16 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
-#define HEAPORDER 3
-
-static unsigned char *__initdata window;
-#define memptr long
-static memptr __initdata free_mem_ptr;
-static memptr __initdata free_mem_end_ptr;
-
-#define WSIZE   0x8000
-
-static unsigned char *__initdata inbuf;
-static unsigned __initdata insize;
-
-/* Index of next byte to be processed in inbuf: */
-static unsigned __initdata inptr;
-
-/* Bytes in output buffer: */
-static unsigned __initdata outcnt;
-
-#define OF(args)args
-#define STATIC  static
-
-#define memzero(s, n)   memset((s), 0, (n))
-
-typedef unsigned char   uch;
-typedef unsigned short  ush;
-typedef unsigned long   ulg;
-
-#define INIT__init
-#define INITDATA__initdata
-
-#define get_byte()  (inptr < insize ? inbuf[inptr++] : fill_inbuf())
-
-/* Diagnostic functions */
-#ifdef DEBUG
-#  define Assert(cond, msg) do { if (!(cond)) error(msg); } while (0)
-#  define Trace(x)  do { fprintf x; } while (0)
-#  define Tracev(x) do { if (verbose) fprintf x ; } while (0)
-#  define Tracevv(x)do { if (verbose > 1) fprintf x ; } while (0)
-#  define Tracec(c, x)  do { if (verbose && (c)) fprintf x ; } while (0)
-#  define Tracecv(c, x) do { if (verbose > 1 && (c)) fprintf x ; } while (0)
-#else
-#  define Assert(cond, msg)
-#  define Trace(x)
-#  define Tracev(x)
-#  define Tracevv(x)
-#  define Tracec(c, x)
-#  define Tracecv(c, x)
-#endif
-
-static long __initdata bytes_out;
-static void flush_window(void);
-
-static __init void error(char *x)
-{
-panic("%s", x);
-}
-
-static __init int fill_inbuf(void)
-{
-error("ran out of input data");
-return 0;
-}
-
-
-#include "../../common/inflate.c"
-
-static __init void flush_window(void)
-{
-/*
- * The window is equal to the output buffer therefore only need to
- * compute the crc.
- */
-unsigned long c = crc;
-unsigned n;
-unsigned char *in, ch;
-
-in = window;
-for ( n = 0; n < outcnt; n++ )
-{
-ch = *in++;
-c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-}
-crc = c;
-
-bytes_out += (unsigned long)outcnt;
-outcnt = 0;
-}
-
 static __init unsigned long output_length(char *image, unsigned long image_len)
 {
 return *(uint32_t *)&image[image_len - 4];
 }
 
-static __init int gzip_check(char *image, unsigned long image_len)
-{
-unsigned char magic0, magic1;
-
-if ( image_len < 2 )
-return 0;
-
-magic0 = (unsigned char)image[0];
-magic1 = (unsigned char)image[1];
-
-return (magic0 == 0x1f) && ((magic1 == 0x8b) || (magic1 == 0x9e));
-}
-
-static __init int perform_gunzip(char *output, char *image, unsigned long 
image_len)
-{
-int rc;
-
-if ( !gzip_check(image, image_len) )
-return 1;
-
-window = (unsigned char *)output;
-
-free_mem_ptr = (unsigned long)alloc_xenheap_pages(HEAPORDER, 0);
-free_mem_end_ptr = free_mem_ptr + (PAGE_SIZE << HEAPORDER);
-
-inbuf = (unsigned char *)image;
-insize = image_len;
-inptr = 0;
-
-makecrc();
-
-if ( gunzip() < 0 )
-{
-rc = -EINVAL;
-}
-else
-{
-rc = 0;
-}
-
-free_xenheap_pages((void *)free_mem_ptr, HEAPORDER);
-
-return rc;
-}
-
 struct __packed setup_header {
 uint8_t _pad0[0x1f1];   /* skip uninteresting stuff */
 uint8_t setup_sects;
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 3fdf931..e681aaa 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -10,6 +10,7 @@ obj-y += event_channel.o
 obj-y += event_fifo.o
 obj-y +

[Xen-devel] [PATCH v5 2/2] xen/arm: support gzip compressed kernels

2015-09-07 Thread Stefano Stabellini
Free the memory used for the compressed kernel and update the relative
mod->start and mod->size parameters with the uncompressed ones.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Julien Grall 
CC: ian.campb...@citrix.com

---

Changes in v5:
- code style

Changes in v4:
- return uint32_t from output_length
- __init kernel_decompress
- code style
- add comment
- if kernel_decompress fails with error, return

Changes in v3:
- better error checks in kernel_decompress
- free unneeded pages between output_size and kernel_order_out
- alloc pages from domheap

Changes in v2:
- use gzip_check
- avoid useless casts
- free original kernel image and update the mod with the new address and
size
- remove changes to common Makefile
- remove CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
---
 xen/arch/arm/kernel.c   |   75 +++
 xen/arch/arm/setup.c|2 +-
 xen/include/asm-arm/setup.h |2 ++
 3 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index f641b12..baa5717 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "kernel.h"
 
@@ -257,6 +259,63 @@ static int kernel_uimage_probe(struct kernel_info *info,
 return 0;
 }
 
+static __init uint32_t output_length(char *image, unsigned long image_len)
+{
+return *(uint32_t *)&image[image_len - 4];
+}
+
+static __init int kernel_decompress(struct kernel_info *info,
+ paddr_t *addr, paddr_t *size)
+{
+char *output, *input, *end;
+char magic[2];
+int rc;
+unsigned kernel_order_out;
+paddr_t output_size;
+struct page_info *pages;
+
+if ( *size < 2 )
+return -EINVAL;
+
+copy_from_paddr(magic, *addr, sizeof(magic));
+
+/* only gzip is supported */
+if ( !gzip_check(magic, *size) )
+return -EINVAL;
+
+input = ioremap_cache(*addr, *size);
+if ( input == NULL )
+return -EFAULT;
+
+output_size = output_length(input, *size);
+kernel_order_out = get_order_from_bytes(output_size);
+pages = alloc_domheap_pages(NULL, kernel_order_out, 0);
+if ( pages == NULL )
+{
+iounmap(input);
+return -ENOMEM;
+}
+output = page_to_virt(pages);
+
+rc = perform_gunzip(output, input, *size);
+clean_dcache_va_range(output, output_size);
+iounmap(input);
+
+*addr = virt_to_maddr(output);
+*size = output_size;
+
+end = output + (1 << (kernel_order_out + PAGE_SHIFT));
+/* 
+ * Need to free pages after output_size here because they won't be
+ * freed by discard_initial_modules
+ */
+output += (output_size + PAGE_SIZE - 1) & PAGE_MASK;
+for ( ; output < end; output += PAGE_SIZE )
+free_domheap_page(virt_to_page(output));
+
+return 0;
+}
+
 #ifdef CONFIG_ARM_64
 /*
  * Check if the image is a 64-bit Image.
@@ -463,6 +522,22 @@ int kernel_probe(struct kernel_info *info)
 printk("Loading ramdisk from boot module @ %"PRIpaddr"\n",
info->initrd_bootmodule->start);
 
+/* if it is a gzip'ed image, 32bit or 64bit, uncompress it */
+rc = kernel_decompress(info, &start, &size);
+if (rc < 0 && rc != -EINVAL)
+return rc;
+else if (!rc)
+{
+/* 
+ * Free the original kernel, update the pointers to the
+ * decompressed kernel
+ */
+dt_unreserved_regions(mod->start, mod->start + mod->size,
+  init_domheap_pages, 0);
+mod->start = start;
+mod->size = size;
+}
+
 #ifdef CONFIG_ARM_64
 rc = kernel_zimage64_probe(info, start, size);
 if (rc < 0)
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 6626eba..109c71c 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -165,7 +165,7 @@ static void __init processor_id(void)
 processor_setup();
 }
 
-static void dt_unreserved_regions(paddr_t s, paddr_t e,
+void dt_unreserved_regions(paddr_t s, paddr_t e,
   void (*cb)(paddr_t, paddr_t), int first)
 {
 int i, nr = fdt_num_mem_rsv(device_tree_flattened);
diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
index 81bb3da..30ac53b 100644
--- a/xen/include/asm-arm/setup.h
+++ b/xen/include/asm-arm/setup.h
@@ -54,6 +54,8 @@ void copy_from_paddr(void *dst, paddr_t paddr, unsigned long 
len);
 int construct_dom0(struct domain *d);
 
 void discard_initial_modules(void);
+void dt_unreserved_regions(paddr_t s, paddr_t e,
+   void (*cb)(paddr_t, paddr_t), int first);
 
 size_t __init boot_fdt_info(const void *fdt, paddr_t paddr);
 const char __init *boot_fdt_cmdline(const void *fdt);
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-07 Thread Liuqiming (John)
I believe this also has something to do with a windows guest boot hang 
issue.


It randomly occured, when boot a guest has windows 2008 os and pv-driver 
installed.

The boot process hangs when wait xenstored replay event signal.

It can be reproduced after hundreds reboot using the xen staging branch.
But after I changed this code the hang issue can not reproduce.

Any Ideas?

On 2015/9/7 22:46, - wrote:

From: liuqiming 00178499 

This set operation doesn't make any sense, and it will block later
interrupt injected into vm (by vcpu_kick or deliver_posted_intr),
which will cause a performance issue on CPU supporting posted-interrupt.

Signed-off-by: Qiming Liu 
Cc: Yang Zhang 
---
  xen/arch/x86/hvm/vmx/vmx.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2582cdd..9480b44 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1681,7 +1681,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
  {
  unsigned int cpu = v->processor;
  
-if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))

+if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
   && (cpu != smp_processor_id()) )
  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
  }




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >