Re: [Xen-devel] [PATCH V4] Decouple SandyBridge quirk from VTd timeout

2014-12-04 Thread Don Dugger
On Fri, Dec 05, 2014 at 05:30:52AM +, Tian, Kevin wrote:
> > From: Donald D. Dugger
> > Sent: Friday, November 21, 2014 1:12 PM
> > 
> > Currently the quirk code for SandyBridge uses the VTd timeout value when
> > writing to an IGD register.  This is the wrong timeout to use and, at
> > 1000 msec., is also much too large.  This patch changes the quirk code
> > to use a timeout that is specific to the IGD device and allows the user
> > control of the timeout.
> > 
> > Boolean settings for the boot parameter `snb_igd_quirk' keep their current
> > meaning, enabling or disabling the quirk code with a timeout of 1000 msec.
> > 
> > In addition specifying `snb_igd_quirk=default' will enable the code and
> > set the timeout to the theoretical maximum of 670 msec.  For finer control,
> > specifying `snb_igd_quirk=n', where `n' is a decimal number, will enable
> > the code and set the timeout to `n' msec.
> 
> I'm OK with the patch except one comment. User usually thinks bool option
> means default, but here an explicit 'default' actually means a different 
> value.
> It's a bit confused to me. How about changing 'default' to another more
> meaningful word, e.g. 'max'? and you may still keep a 'snb_igd_quirk=
> legacy' option, so all the possibilities are included in the assigned case, 
> with
> bool option alias to 'legacy'.

I don't see a need for a `legacy' option, merely enabling the option (which is
what current users are doing) will select the current 1 sec. timeout.  If you
don't like `default' (and I don't like `max' since that would select a value
less that the current default) how about the string `cap' (the cap on the
theoretical timeout) to select the 670 msec value?

> 
> Thanks
> Kevin

...

-- 
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
n0...@n0ano.com
Ph: 303/443-3786

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] tools/hotplug: update systemd dependency to use service instead of socket

2014-12-04 Thread Olaf Hering
On Thu, Dec 04, Konrad Rzeszutek Wilk wrote:

> On Thu, Dec 04, 2014 at 08:47:56AM +0100, Olaf Hering wrote:
> > Is that something the sysadmin has to adjust, or should the xen source
> > provide proper values?
> It would be rather cumbersome if the sysadmin had to adjust it. The goal
> here would be that distros could use it and package it neatly so that it
> works out of the box.
> 
> What are the proper values in SuSE?

I have no idea, we dont run with selinux. At least not per default.
So what is supposed to be there, why does it happen to work for me?

And if there are changes required to the config file, they should be
passed in via configure instead of doing a patch.

Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory

2014-12-04 Thread Jan Beulich
>>> On 05.12.14 at 07:24,  wrote:
>>  From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, December 05, 2014 12:29 AM
>> 
>> >>> On 01.12.14 at 10:24,  wrote:
>> > In some cases like igd_opregion_pgbase, guest will use mem_hole_alloc
>> > to allocate some memory to use in runtime cycle, so we alsoe need to
>> > make sure all reserved device memory don't overlap such a region.
>> 
>> While ideally this would get switched to the model outlined for
>> the previous two patches too, it's at least reasonable to keep
>> this simple allocator simple for the time being.
>> 
>> > --- a/tools/firmware/hvmloader/util.c
>> > +++ b/tools/firmware/hvmloader/util.c
>> > @@ -416,9 +416,29 @@ static uint32_t alloc_down =
>> RESERVED_MEMORY_DYNAMIC_END;
>> >
>> >  xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
>> >  {
>> > +unsigned int i, num = hvm_get_reserved_device_memory_map();
>> > +uint64_t rdm_start, rdm_end;
>> > +uint32_t alloc_start, alloc_end;
>> > +
>> >  alloc_down -= nr_mfns << PAGE_SHIFT;
>> > +alloc_start = alloc_down;
>> > +alloc_end = alloc_start + (nr_mfns << PAGE_SHIFT);
>> > +for ( i = 0; i < num; i++ )
>> > +{
>> > +rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
>> > +rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages <<
>> PAGE_SHIFT);
>> > +if ( check_rdm_hole_conflict((uint64_t)alloc_start,
>> > + (uint64_t)alloc_end,
>> 
>> Pointless casts.
>> 
>> > + rdm_start, rdm_end -
>> rdm_start) )
>> > +{
>> > +alloc_end = rdm_start;
>> > +alloc_start = alloc_end - (nr_mfns << PAGE_SHIFT);
>> > +BUG_ON(alloc_up >= alloc_start);
>> 
>> This is redundant with the BUG_ON() below afaict. Or at least it
>> would be, if you would properly update allow_down (if you don't
>> you may end up returning the same PFN for two allocations).
>> 
> 
> I'd like this being done once at init time. Once alloc_up/down is
> verified/adjusted, no need to add run-time overhead here.

I don't think that would work, as you can't predict where the holes
are, and limiting allocations to e.g. just the largest region between
any two holes may end up being too restrictive (without having
checked how much memory may get allocated this way in the
worst case). Of course there's also the problem to address that
the whole region used so far overlaps with an enforced hole.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-12-04 Thread Jan Beulich
>>> On 05.12.14 at 07:23,  wrote:
>>  From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, December 05, 2014 12:20 AM
>> 
>> >>> On 01.12.14 at 10:24,  wrote:
>> > We need to check to reserve all reserved device memory maps in e820
>> > to avoid any potential guest memory conflict.
>> >
>> > Currently, if we can't insert RDM entries directly, we may need to handle
>> > several ranges as follows:
>> > a. Fixed Ranges --> BUG()
>> >  lowmem_reserved_base-0xA: reserved by BIOS implementation,
>> >  BIOS region,
>> >  RESERVED_MEMBASE ~ 0x1,
>> > b. RAM or RAM:Hole -> Try to reserve
>> 
>> I continue to be unconvinced of the overall approach: The domain
>> builder continues to populate these regions when it shouldn't. Yet
>> once it doesn't, it would be most natural to simply communicate the
> 
> doesn't -> does?

No. The domain builder currently populates these regions (at least
I didn't spot a change to make it not do so).

>> RAM regions to hvmloader, and hvmloader would use just that to
>> build the E820 table (and subsequently assign BARs).
>> 
> 
> My impression is that you didn't like extending hvm_info to carry
> sparse RAM regions. that's why the current tradeoff is taken, i.e.
> leaving domain builder unchanged for RAM, then preventing EPT 
> setup for reserved regions in hypervisor (means wasting memory), 
> and then having hvmloader to actually figure out the final e820. 
> and that's also why per-BDF design is introduced to minimize wasted 
> memory. We discussed to change domain builder to avoid populating 
> reserved regions as the next step after 4.5, but w/o extending 
> hvm_info we always need the logic in hvmloader to construct e820 
> from scratch.

Communicating this via hvm_info is not the only way. For example,
the XENMEM_{set_,}memory_map pair of hypercalls could be used
(and is readily available to be extended that way, since for HVM
domains XENMEM_set_memory_map returns -EPERM at present). The
only potentially problematic aspect I can see with using it might be
its limiting of the entry count to E820MAX.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] A few EFI code questions

2014-12-04 Thread Jan Beulich
>>> On 04.12.14 at 22:22,  wrote:
> On Thu, Dec 4, 2014 at 1:35 AM, Jan Beulich  wrote:
> On 03.12.14 at 22:02,  wrote:
>>> 3) Should not we change xen/arch/*/efi/efi-boot.h to
>>>xen/arch/*/efi/efi-boot.c? efi-boot.h contains more
>>>code than definitions, declarations and short static
>>>functions. So, I think that it is more regular *.c file
>>>than header file.
>>
>> That's a matter of taste - I'd probably have made it .c too, but
>> didn't mind it being .h as done by Roy (presumably on the basis
>> that #include directives are preferred to have .h files as their
>> operands). The only thing I regret is that I didn't ask for the
>> pointless efi- prefix to be dropped.
> 
> I don't mind a change here, and I agree that it is more like a .c file
> than a .h.  If a name change is done, is it worth dropping the "efi-" at
> the same time?

If we indeed want to change the name (post 4.5), making both
adjustments at once would be kind of a requirement of mine.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4] Decouple SandyBridge quirk from VTd timeout

2014-12-04 Thread Tian, Kevin
> From: Don Dugger [mailto:n0...@n0ano.com]
> Sent: Friday, December 05, 2014 2:13 PM
> 
> On Fri, Dec 05, 2014 at 05:30:52AM +, Tian, Kevin wrote:
> > > From: Donald D. Dugger
> > > Sent: Friday, November 21, 2014 1:12 PM
> > >
> > > Currently the quirk code for SandyBridge uses the VTd timeout value when
> > > writing to an IGD register.  This is the wrong timeout to use and, at
> > > 1000 msec., is also much too large.  This patch changes the quirk code
> > > to use a timeout that is specific to the IGD device and allows the user
> > > control of the timeout.
> > >
> > > Boolean settings for the boot parameter `snb_igd_quirk' keep their current
> > > meaning, enabling or disabling the quirk code with a timeout of 1000
> msec.
> > >
> > > In addition specifying `snb_igd_quirk=default' will enable the code and
> > > set the timeout to the theoretical maximum of 670 msec.  For finer
> control,
> > > specifying `snb_igd_quirk=n', where `n' is a decimal number, will enable
> > > the code and set the timeout to `n' msec.
> >
> > I'm OK with the patch except one comment. User usually thinks bool option
> > means default, but here an explicit 'default' actually means a different
> value.
> > It's a bit confused to me. How about changing 'default' to another more
> > meaningful word, e.g. 'max'? and you may still keep a 'snb_igd_quirk=
> > legacy' option, so all the possibilities are included in the assigned case, 
> > with
> > bool option alias to 'legacy'.
> 
> I don't see a need for a `legacy' option, merely enabling the option (which is
> what current users are doing) will select the current 1 sec. timeout.  If you
> don't like `default' (and I don't like `max' since that would select a value
> less that the current default) how about the string `cap' (the cap on the
> theoretical timeout) to select the 670 msec value?
> 

that looks fine to me. You can have my ACK for your next version. :-)

Acked-by: Kevin Tian 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory

2014-12-04 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, December 05, 2014 12:29 AM
> 
> >>> On 01.12.14 at 10:24,  wrote:
> > In some cases like igd_opregion_pgbase, guest will use mem_hole_alloc
> > to allocate some memory to use in runtime cycle, so we alsoe need to
> > make sure all reserved device memory don't overlap such a region.
> 
> While ideally this would get switched to the model outlined for
> the previous two patches too, it's at least reasonable to keep
> this simple allocator simple for the time being.
> 
> > --- a/tools/firmware/hvmloader/util.c
> > +++ b/tools/firmware/hvmloader/util.c
> > @@ -416,9 +416,29 @@ static uint32_t alloc_down =
> RESERVED_MEMORY_DYNAMIC_END;
> >
> >  xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
> >  {
> > +unsigned int i, num = hvm_get_reserved_device_memory_map();
> > +uint64_t rdm_start, rdm_end;
> > +uint32_t alloc_start, alloc_end;
> > +
> >  alloc_down -= nr_mfns << PAGE_SHIFT;
> > +alloc_start = alloc_down;
> > +alloc_end = alloc_start + (nr_mfns << PAGE_SHIFT);
> > +for ( i = 0; i < num; i++ )
> > +{
> > +rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
> > +rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages <<
> PAGE_SHIFT);
> > +if ( check_rdm_hole_conflict((uint64_t)alloc_start,
> > + (uint64_t)alloc_end,
> 
> Pointless casts.
> 
> > + rdm_start, rdm_end -
> rdm_start) )
> > +{
> > +alloc_end = rdm_start;
> > +alloc_start = alloc_end - (nr_mfns << PAGE_SHIFT);
> > +BUG_ON(alloc_up >= alloc_start);
> 
> This is redundant with the BUG_ON() below afaict. Or at least it
> would be, if you would properly update allow_down (if you don't
> you may end up returning the same PFN for two allocations).
> 

I'd like this being done once at init time. Once alloc_up/down is
verified/adjusted, no need to add run-time overhead here.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-12-04 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, December 05, 2014 12:20 AM
> 
> >>> On 01.12.14 at 10:24,  wrote:
> > We need to check to reserve all reserved device memory maps in e820
> > to avoid any potential guest memory conflict.
> >
> > Currently, if we can't insert RDM entries directly, we may need to handle
> > several ranges as follows:
> > a. Fixed Ranges --> BUG()
> >  lowmem_reserved_base-0xA: reserved by BIOS implementation,
> >  BIOS region,
> >  RESERVED_MEMBASE ~ 0x1,
> > b. RAM or RAM:Hole -> Try to reserve
> 
> I continue to be unconvinced of the overall approach: The domain
> builder continues to populate these regions when it shouldn't. Yet
> once it doesn't, it would be most natural to simply communicate the

doesn't -> does?

> RAM regions to hvmloader, and hvmloader would use just that to
> build the E820 table (and subsequently assign BARs).
> 

My impression is that you didn't like extending hvm_info to carry
sparse RAM regions. that's why the current tradeoff is taken, i.e.
leaving domain builder unchanged for RAM, then preventing EPT 
setup for reserved regions in hypervisor (means wasting memory), 
and then having hvmloader to actually figure out the final e820. 
and that's also why per-BDF design is introduced to minimize wasted 
memory. We discussed to change domain builder to avoid populating 
reserved regions as the next step after 4.5, but w/o extending 
hvm_info we always need the logic in hvmloader to construct e820 
from scratch.

I did not catch all the discussion history between you and Tiejun, so 
may miss sth. here. (btw Tiejun is on an urgent leave, so his response
will be slow in a few days)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen: fix sparse warning in page.h

2014-12-04 Thread Juergen Gross
The inline function mfn_to_pfn_no_overrides() uses __get_user() to
access the machine_to_phys_mapping array to avoid crashes when
using out of bounds indices e.g. for I/O addresses.

Avoid sparse warnings by attributing the accessed address with __user.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/xen/page.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index f5d5de4..6deff84 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -113,6 +113,7 @@ static inline unsigned long 
mfn_to_pfn_no_overrides(unsigned long mfn)
 {
unsigned long pfn;
int ret;
+   unsigned long __user *addr;
 
if (xen_feature(XENFEAT_auto_translated_physmap))
return mfn;
@@ -125,7 +126,8 @@ static inline unsigned long 
mfn_to_pfn_no_overrides(unsigned long mfn)
 * In such cases it doesn't matter what we return (we return garbage),
 * but we must handle the fault without crashing!
 */
-   ret = __get_user(pfn, &machine_to_phys_mapping[mfn]);
+   addr = (unsigned long __user *)(&machine_to_phys_mapping[mfn]);
+   ret = __get_user(pfn, addr);
if (ret < 0)
return ~0;
 
-- 
2.1.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen: fix sparse warning in p2m.c

2014-12-04 Thread Juergen Gross
The patch "Speed up set_phys_to_machine() by using read-only mappings"
introduced a sparse warning:

arch/x86/xen/p2m.c:628:13: sparse: incorrect type in argument 1
   (different address spaces)

Avoid the warning.

Signed-off-by: Juergen Gross 
---
 arch/x86/xen/p2m.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 8b5db51..2defca9 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -615,6 +615,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long 
mfn)
 {
pte_t *ptep;
unsigned int level;
+   unsigned long __user *addr;
 
/* don't track P2M changes in autotranslate guests */
if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
@@ -625,7 +626,8 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long 
mfn)
return true;
}
 
-   if (likely(!__put_user(mfn, xen_p2m_addr + pfn)))
+   addr = (unsigned long __user *)xen_p2m_addr + pfn;
+   if (likely(!__put_user(mfn, addr)))
return true;
 
ptep = lookup_address((unsigned long)(xen_p2m_addr + pfn), &level);
-- 
2.1.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm

2014-12-04 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, December 04, 2014 11:33 PM
> > +if ( pcidevs == NULL )
> > +{
> > +rcu_unlock_domain(d);
> > +return -ENOMEM;
> > +}
> > +
> > +if ( copy_from_guest(pcidevs, xdsr->pcidevs,
> > +
> xdsr->num_pcidevs*sizeof(*pcidevs)) )
> > +{
> > +xfree(pcidevs);
> > +rcu_unlock_domain(d);
> > +return -EFAULT;
> > +}
> > +}
> > +
> > +d->arch.hvm_domain.pcidevs = pcidevs;
> 
> If the operation gets issued more than once for a given domain,
> you're leaking the old pointer here. Overall should think a bit
> more about this multiple use case (or outright disallow it).

from current discussion let's outright disallow it. the information
should be ready early enough before populating p2m.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4] Decouple SandyBridge quirk from VTd timeout

2014-12-04 Thread Tian, Kevin
> From: Donald D. Dugger
> Sent: Friday, November 21, 2014 1:12 PM
> 
> Currently the quirk code for SandyBridge uses the VTd timeout value when
> writing to an IGD register.  This is the wrong timeout to use and, at
> 1000 msec., is also much too large.  This patch changes the quirk code
> to use a timeout that is specific to the IGD device and allows the user
> control of the timeout.
> 
> Boolean settings for the boot parameter `snb_igd_quirk' keep their current
> meaning, enabling or disabling the quirk code with a timeout of 1000 msec.
> 
> In addition specifying `snb_igd_quirk=default' will enable the code and
> set the timeout to the theoretical maximum of 670 msec.  For finer control,
> specifying `snb_igd_quirk=n', where `n' is a decimal number, will enable
> the code and set the timeout to `n' msec.

I'm OK with the patch except one comment. User usually thinks bool option
means default, but here an explicit 'default' actually means a different value.
It's a bit confused to me. How about changing 'default' to another more
meaningful word, e.g. 'max'? and you may still keep a 'snb_igd_quirk=
legacy' option, so all the possibilities are included in the assigned case, with
bool option alias to 'legacy'.

Thanks
Kevin

> 
> Signed-off-by: Don Dugger 
> --
> diff -r 9d485e2c8339 xen/drivers/passthrough/vtd/quirks.c
> --- a/xen/drivers/passthrough/vtd/quirks.cMon Nov 10 12:03:36 2014 +
> +++ b/xen/drivers/passthrough/vtd/quirks.cThu Nov 20 22:00:51 2014
> -0700
> @@ -50,6 +50,11 @@
>  #define IS_ILK(id)(id == 0x00408086 || id == 0x00448086 || id==
> 0x00628086 || id == 0x006A8086)
>  #define IS_CPT(id)(id == 0x01008086 || id == 0x01048086)
> 
> +/* SandyBridge IGD timeouts in milliseconds */
> +#define SNB_IGD_TIMEOUT_LEGACY1000
> +#define SNB_IGD_TIMEOUT670
> +static u32 snb_igd_timeout = 0;
> +
>  static u32 __read_mostly ioh_id;
>  static u32 __initdata igd_id;
>  bool_t __read_mostly rwbf_quirk;
> @@ -158,6 +163,16 @@
>   * Workaround is to prevent graphics get into RC6
>   * state when doing VT-d IOTLB operations, do the VT-d
>   * IOTLB operation, and then re-enable RC6 state.
> + *
> + * This quirk is enabled with the snb_igd_quirk command
> + * line parameter.  Specifying snb_igd_quirk with no value
> + * (or any of the standard boolean values) enables this
> + * quirk and sets the timeout to the legacy timeout of
> + * 1000 msec.  Setting this parameter to the string
> + * "default" enables this quirk and sets the timeout to
> + * the theoretical maximum of 670 msec.  Setting this
> + * parameter to a numerical value enables the quirk and
> + * sets the timeout to that numerical number of msecs.
>   */
>  static void snb_vtd_ops_preamble(struct iommu* iommu)
>  {
> @@ -177,7 +192,7 @@
>  start_time = NOW();
>  while ( (*(volatile u32 *)(igd_reg_va + 0x22AC) & 0xF) != 0 )
>  {
> -if ( NOW() > start_time + DMAR_OPERATION_TIMEOUT )
> +if ( NOW() > start_time + snb_igd_timeout )
>  {
>  dprintk(XENLOG_INFO VTDPREFIX,
>  "snb_vtd_ops_preamble: failed to disable idle
> handshake\n");
> @@ -208,13 +223,10 @@
>   * call before VT-d translation enable and IOTLB flush operations.
>   */
> 
> -static int snb_igd_quirk;
> -boolean_param("snb_igd_quirk", snb_igd_quirk);
> -
>  void vtd_ops_preamble_quirk(struct iommu* iommu)
>  {
>  cantiga_vtd_ops_preamble(iommu);
> -if ( snb_igd_quirk )
> +if ( snb_igd_timeout != 0 )
>  {
>  spin_lock(&igd_lock);
> 
> @@ -228,7 +240,7 @@
>   */
>  void vtd_ops_postamble_quirk(struct iommu* iommu)
>  {
> -if ( snb_igd_quirk )
> +if ( snb_igd_timeout != 0 )
>  {
>  snb_vtd_ops_postamble(iommu);
> 
> @@ -237,6 +249,36 @@
>  }
>  }
> 
> +static void __init parse_snb_timeout(const char *s)
> +{
> +int t;
> +
> +t = parse_bool(s);
> +if ( t < 0 )
> +{
> +if ( *s == '\0' )
> +{
> +t = SNB_IGD_TIMEOUT_LEGACY;
> +}
> +else if ( strcmp(s, "default") == 0 )
> +{
> +t = SNB_IGD_TIMEOUT;
> +}
> +else
> +{
> +t = strtoul(s, NULL, 0);
> +}
> +}
> +else
> +{
> +t = t ? SNB_IGD_TIMEOUT_LEGACY : 0;
> +}
> +snb_igd_timeout = MILLISECS(t);
> +
> +return;
> +}
> +custom_param("snb_igd_quirk", parse_snb_timeout);
> +
>  /* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3.
>   * Fixed in stepping C-2. */
>  static void __init tylersburg_intremap_quirk(void)
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [seabios test] 32071: tolerable FAIL - PUSHED

2014-12-04 Thread xen . org
flight 32071 seabios real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/32071/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-sedf-pin  5 xen-boot  fail REGR. vs. 31885
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 31885

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass

version targeted for testing:
 seabios  e5f43384be5cf5983f03e3f4c1cb7bbeaba2af3f
baseline version:
 seabios  b7f4a76a929ce4acd60e89aa273a8b208daa8233


People who touched revisions under test:
  Gerd Hoffmann 
  Patrick Georgi 


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-amd64-xl-pcipt-intel  fail
 test-amd64-i386-rhel6hvm-intel   pass
 test-amd64-i386-qemut-rhel6hvm-intel pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-libvirt fail
 test-amd64-i386-libvirt  fail
 test-amd64-i386-xl-multivcpu pass
 test-amd64-amd64-pairpass
 test-amd64-i386-pair fail
 test-amd64-amd64-xl-sedf-pin fail
 test-amd64-amd64-xl-sedf pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 fail
 test-amd64-i

Re: [Xen-devel] Question about arch/x86/xen/mmu.c

2014-12-04 Thread Jürgen Groß

On 12/04/2014 10:30 PM, Jan-Simon Moeller wrote:

Hi !

My name is Jan-Simon Moeller and I'm looking into compiling the kernel with
LLVM/Clang (see llvm.linuxfoundation.org) .

Right now we face this issue when compiling with clang:

  CC  arch/x86/xen/mmu.o
arch/x86/xen/mmu.c:1343:18: error: fields must have a constant size:
  'variable length array in structure' extension will never be
  supported
DECLARE_BITMAP(mask, num_processors);
   ^
include/linux/types.h:10:16: note: expanded from macro 'DECLARE_BITMAP'
unsigned long name[BITS_TO_LONGS(bits)]
  ^
1 error generated.


Question to the experts: why can't we just use NR_CPUS and be done with it ?
NR_CPUS will be setup by CONFIG_NR_CPUS and thus static.
( e.g. arch/x86/configs/x86_64_defconfig:CONFIG_NR_CPUS=64 )


This would expand the structure on kernels configured for many cpus
(e.g. 4096) but running on a smaller machine dramatically.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [rumpuserxen test] 32084: all pass - PUSHED

2014-12-04 Thread xen . org
flight 32084 rumpuserxen real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/32084/

Perfect :-)
All tests in this flight passed
version targeted for testing:
 rumpuserxen  66b09d00078908e61d0f0c1b87fc140ed8c91e06
baseline version:
 rumpuserxen  f302fca6c18c1ccdefe3d0d5f583e2d0970c2e2d


People who touched revisions under test:
  Martin Lucina 


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-rumpuserxen-amd64   pass
 test-amd64-i386-rumpuserxen-i386 pass



sg-report-flight on osstest.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=rumpuserxen
+ revision=66b09d00078908e61d0f0c1b87fc140ed8c91e06
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getconfig Repos
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
++ repos=/export/home/osstest/repos
++ repos_lock=/export/home/osstest/repos/lock
++ '[' x '!=' x/export/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/export/home/osstest/repos/lock
++ exec with-lock-ex -w /export/home/osstest/repos/lock ./ap-push rumpuserxen 
66b09d00078908e61d0f0c1b87fc140ed8c91e06
+ branch=rumpuserxen
+ revision=66b09d00078908e61d0f0c1b87fc140ed8c91e06
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getconfig Repos
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
++ repos=/export/home/osstest/repos
++ repos_lock=/export/home/osstest/repos/lock
++ '[' x/export/home/osstest/repos/lock '!=' x/export/home/osstest/repos/lock 
']'
+ . cri-common
++ . cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=rumpuserxen
+ xenbranch=xen-unstable
+ '[' xrumpuserxen = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ : tested/2.6.39.x
+ . ap-common
++ : osst...@xenbits.xensource.com
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xensource.com:/home/xen/git/xen.git
++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xensource.com:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
+++ besteffort_repo git://git.sv.gnu.org/gnulib.git
+++ local repo=git://git.sv.gnu.org/gnulib.git
+++ cached_repo git://git.sv.gnu.org/gnulib.git '[fetch=try]'
+++ local repo=git://git.sv.gnu.org/gnulib.git
+++ local 'options=[fetch=try]'
 getconfig GitCacheProxy
 perl -e '
use Osstest;
readglobalconfig();
print $c{"GitCacheProxy"} or die $!;
'
+++ local cache=git://drall.uk.xensource.com:9419/
+++ '[' xgit://drall.uk.xensource.com:9419/ '!=' x ']'
+++ echo 
'git://drall.uk.xensource.com:9419/git://git.sv.gnu.org/gnulib.git%20[fetch=try]'
++ : 
'git://drall.uk.xensource.com:9419/git://git.sv.gnu.org/gnulib.git%20[fetch=try]'
++ : https://github.com/rumpkernel/rumprun-xen
++ : git
++ : git://xenbits.xen.org/rumpuser-xen.git
++ : osst...@xenbits.xensource.com:/home/xen/git/rumpuser-xen.git
+++ besteffort_repo https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ cached_repo https://github.com/rumpkernel/rumpkernel-netbsd-src 
'[fetch=try]'
+++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ local 'options=[fetch=try]'
 getconfig GitCacheProxy
 perl -e '
use Osstest;
readglobalconfig();
print $c{"GitCacheProxy"} or die $!;
'
+++ local cache=git://drall.uk.xensource.com:9419/
+++ '[' xgit://drall.uk.xensource.com:9419/ '!=' x ']'
+++ echo 
'git://drall.uk.xensource.com:9419/https://github.com/rumpkernel/rumpkernel-netbsd-src%20[fetch=try]'
++ : 
'git://drall.uk.xensource.com:9419/https://github.com/rumpkernel/rumpkernel-netbsd-src%20[fetch=try]'
++ : git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xensource.com:/home/xen/git/osstest/seabios.git
++

Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-12-04 Thread Konrad Rzeszutek Wilk
On Thu, Dec 04, 2014 at 01:49:47PM +, Jan Beulich wrote:
> >>> On 28.11.14 at 16:46,  wrote:
> > On 28/11/14 15:18, Jan Beulich wrote:
> > On 28.11.14 at 14:55,  wrote:
> >>> The problem is with continuations which reuse the upper bits of the
> >>> input register, not with this HVMOP_op_mask specifically; the
> >>> HVMOP_op_mask simply adds to an existing problem.  This is something
> >>> which needs considering urgently because, as you identify above, we have
> >>> already suffered an accidental ABI breakage with the mem-op widening.
> >> Since we can't retroactively fix the mem-op widening, I still don't see
> >> what's urgent here: As long as we don't change any of these masks,
> >> nothing bad is going to happen. Of course one thing to consider with
> >> this aspect in mind is whether to change the hvm-op or gnttab-op
> >> masks again _before_ 4.5 goes out, based on whether we feel they're
> >> wide enough for the (un)foreseeable future.
> > 
> > By urgent, I mean exactly this, while we have the ability to tweak the
> > masks.
> 
> With no-one else voicing an opinion:
> 
> For hvmop, the mask currently is 8 bits and we've got 22 ops defined.
> 
> For gnttabop, the mask currently is 12 bits and we've got 12 ops defined.
> 
> For the latter, we're fine even without further consideration. For the
> former, the two operations actively using the continuation encoding
> are tools-only ones. Since we're fine to alter the tools only interfaces,
> and since it was intended for the tools-only HVM-ops to be split off
> to a separate hypercall (e.g. hvmctl) anyway, the range restriction
> would then no longer be a problem. Plus, in the worst case we could
> always introduce yet another hypercall if we ran out of numbers.
> 
> Consequently what I'd like to propose (and I guess I'll craft a patch
> as soon as I finished this mail) is that we add comments to these
> masks (also the memop one) to make clear that they mustn't change.
> Additionally for forward compatibility I'd also like to add checks for
> the upper bits to be zero for any of the sub-ops that don't actually
> use them to encode continuation information. Konrad, would you
> consider doing so acceptable for 4.5?

Yes.

> 
> Jan
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] tools/hotplug: update systemd dependency to use service instead of socket

2014-12-04 Thread Konrad Rzeszutek Wilk
On Thu, Dec 04, 2014 at 08:47:56AM +0100, Olaf Hering wrote:
> On Wed, Dec 03, M A Young wrote:
> 
> > On Wed, 3 Dec 2014, Konrad Rzeszutek Wilk wrote:
> > >Options=mode=755,context="$XENSTORED_MOUNT_CTX"
> > 
> > Yes, that was on my probable bug list, as context="none" isn't a valid mount
> > option (on Fedora at least), presumably because context has to be followed
> > by a valid selinux context.
> 
> Is that something the sysadmin has to adjust, or should the xen source
> provide proper values?

It would be rather cumbersome if the sysadmin had to adjust it. The goal
here would be that distros could use it and package it neatly so that it
works out of the box.

What are the proper values in SuSE?
> 
> Olaf
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 2/2] add a new p2m type - p2m_mmio_write_dm

2014-12-04 Thread Yu, Zhang



On 12/5/2014 12:04 AM, Tim Deegan wrote:

Hi,

At 21:13 +0800 on 04 Dec (1417724006), Yu Zhang wrote:

A new p2m type, p2m_mmio_write_dm, is added to trap and emulate
the write operations on GPU's page tables. Handling of this new
p2m type are similar with existing p2m_ram_ro in most condition
checks, with only difference on final policy of emulation vs. drop.
For p2m_ram_ro types, write operations will not trigger the device
model, and will be discarded later in __hvm_copy(); while for the
p2m_mmio_write_dm type pages, writes will go to the device model
via ioreq-server.

Signed-off-by: Yu Zhang 
Signed-off-by: Wei Ye 


Thanks for this -- only two comments:


@@ -5978,7 +5982,8 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
  goto param_fail4;
  }
  if ( !p2m_is_ram(t) &&
- (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) )
+ (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) &&
+ t != p2m_mmio_write_dm )


I think that Jan already brough this up, and maybe I missed your
answer: this realaxation looks wrong to me. I would have thought that
transition between p2m_mmio_write_dm and p2m_ram_rw/p2m_ram_logdirty
would be the only ones you would want to allow.


Ha. Sorry, my negligence, and thanks for pointing out. :)
The transition we use now is only between p2m_mmio_write_dm and 
p2m_ram_rw. So how about this:


if ( !p2m_is_ram(t) &&
 (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) &&
 (t != p2m_mmio_write_dm || a.hvmmem_type != HVMMEM_ram_rw) )




@@ -111,7 +112,8 @@ typedef unsigned int p2m_query_t;
  #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty) \
| p2m_to_mask(p2m_ram_ro) \
| p2m_to_mask(p2m_grant_map_ro)   \
-  | p2m_to_mask(p2m_ram_shared) )
+  | p2m_to_mask(p2m_ram_shared)   \
+  | p2m_to_mask(p2m_mmio_write_dm))


Nit: please align the '\' with the others above it.

Got it, and thanks.

B.R.
Yu


Cheers,

Tim.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-xen-4.5 1/3] tools/hotplug: distclean target should remove files generated by configure

2014-12-04 Thread Konrad Rzeszutek Wilk
On Tue, Dec 02, 2014 at 01:36:20PM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 02, 2014 at 04:16:28PM +0100, Daniel Kiper wrote:
> > Signed-off-by: Daniel Kiper 
> 
> This usage scenario which I can see this being useful (and
> I've tripped over this) is when you rebuild a new version
> from the same repo. As in, this affects developers, but


Lets get it in. It fixes an issues that I keep on tripping
on and it is harmless enough that I don't see a way
for this to cause any regressions.

Release-Acked-by: Konrad Rzeszutek Wilk 
> not end-users and not distros. But perhaps I am missing
> one scenario?
> 
> As such I would lean towards deferring this (and the other
> two) to Xen 4.6.
> 
> Thank you!
> > ---
> >  tools/Makefile |3 +++
> >  tools/hotplug/Makefile |5 -
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/Makefile b/tools/Makefile
> > index af9798a..19b24f3 100644
> > --- a/tools/Makefile
> > +++ b/tools/Makefile
> > @@ -261,6 +261,9 @@ subdir-all-debugger/kdd: .phony
> >  subdir-distclean-firmware: .phony
> > $(MAKE) -C firmware distclean
> >  
> > +subdir-distclean-hotplug: .phony
> > +   $(MAKE) -C hotplug distclean
> > +
> >  subtree-force-update:
> >  ifeq ($(CONFIG_QEMU_XEN),y)
> > $(MAKE) qemu-xen-dir-force-update
> > diff --git a/tools/hotplug/Makefile b/tools/hotplug/Makefile
> > index 14ae9a8..a29a522 100644
> > --- a/tools/hotplug/Makefile
> > +++ b/tools/hotplug/Makefile
> > @@ -6,5 +6,8 @@ SUBDIRS-$(CONFIG_NetBSD) += NetBSD
> >  SUBDIRS-$(CONFIG_Linux) += Linux
> >  SUBDIRS-$(CONFIG_FreeBSD) += FreeBSD
> >  
> > -.PHONY: all clean install
> > +.PHONY: all clean distclean install
> >  all clean install: %: subdirs-%
> > +
> > +distclean:
> > +   rm -f Linux/init.d/sysconfig.xencommons Linux/init.d/xencommons 
> > NetBSD/rc.d/xencommons
> > -- 
> > 1.7.10.4
> > 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-upstream-unstable test] 32069: tolerable trouble: broken/fail/pass - PUSHED

2014-12-04 Thread xen . org
flight 32069 qemu-upstream-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/32069/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-qemuu-rhel6hvm-amd  7 redhat-installfail pass in 32024
 test-amd64-i386-freebsd10-amd64  3 host-install(3)broken pass in 32024
 test-armhf-armhf-libvirt  4 xen-installfail in 32024 pass in 32069

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 31848
 test-amd64-amd64-xl-qemut-winxpsp3  7 windows-install  fail like 31848

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  9 guest-start  fail   never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass

version targeted for testing:
 qemuu1ebb75b1fee779621b63e84fefa7b07354c43a99
baseline version:
 qemuua230ec3101ddda868252c036ea960af2b2d6cd5a


People who touched revisions under test:
  Jason Wang 
  Peter Maydell 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   fail
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  broken  
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-amd64-xl-pcipt-intel  fail
 test-amd64-i386-rhel6hvm-intel   pass
 test-amd64-i386-qemut-rhel6hvm-intel pass
 test-amd64-i386-qemuu-rhel6hvm-intel   

Re: [Xen-devel] Poor network performance between DomU with multiqueue support

2014-12-04 Thread Zhangleiqiang (Trump)
> -Original Message-
> From: Wei Liu [mailto:wei.l...@citrix.com]
> Sent: Thursday, December 04, 2014 6:50 PM
> To: Zhangleiqiang (Trump)
> Cc: Wei Liu; xen-devel@lists.xen.org; zhangleiqiang; Luohao (brian); Xiaoding
> (B); Yuzhou (C); Zhuangyuxin
> Subject: Re: [Xen-devel] Poor network performance between DomU with
> multiqueue support
> 
> On Wed, Dec 03, 2014 at 02:43:37PM +, Zhangleiqiang (Trump) wrote:
> > > -Original Message-
> > > From: Wei Liu [mailto:wei.l...@citrix.com]
> > > Sent: Tuesday, December 02, 2014 11:59 PM
> > > To: Zhangleiqiang (Trump)
> > > Cc: Wei Liu; zhangleiqiang; xen-devel@lists.xen.org; Luohao (brian);
> > > Xiaoding (B); Yuzhou (C); Zhuangyuxin
> > > Subject: Re: [Xen-devel] Poor network performance between DomU with
> > > multiqueue support
> > >
> > > On Tue, Dec 02, 2014 at 02:46:36PM +, Zhangleiqiang (Trump) wrote:
> > > > Thanks for your reply, Wei.
> > > >
> > > > I do the following testing just now and found the results as follows:
> > > >
> > > > There are three DomUs (4U4G) are running on Host A (6U6G) and one
> > > > DomU
> > > (4U4G) is running on Host B (6U6G), I send packets from three DomUs
> > > to the DomU on Host B simultaneously.
> > > >
> > > > 1. The "top" output of Host B as follows:
> > > >
> > > > top - 09:42:11 up  1:07,  2 users,  load average: 2.46, 1.90, 1.47
> > > > Tasks: 173 total,   4 running, 169 sleeping,   0 stopped,   0 zombie
> > > > %Cpu0  :  0.0 us,  0.0 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.0 hi,
> > > > 0.8 si,  1.9 st
> > > > %Cpu1  :  0.0 us, 27.0 sy,  0.0 ni, 63.1 id,  0.0 wa,  0.0 hi,
> > > > 9.5 si,  0.4 st
> > > > %Cpu2  :  0.0 us, 90.0 sy,  0.0 ni,  8.3 id,  0.0 wa,  0.0 hi,
> > > > 1.7 si,  0.0 st
> > > > %Cpu3  :  0.4 us,  1.4 sy,  0.0 ni, 95.4 id,  0.0 wa,  0.0 hi,
> > > > 1.4 si,  1.4 st
> > > > %Cpu4  :  0.0 us, 60.2 sy,  0.0 ni, 39.5 id,  0.0 wa,  0.0 hi,
> > > > 0.3 si,  0.0 st
> > > > %Cpu5  :  0.0 us,  2.8 sy,  0.0 ni, 89.4 id,  0.0 wa,  0.0 hi,
> > > > 6.9 si,  0.9
> > > st
> > > > KiB Mem:   4517144 total,  3116480 used,  1400664 free,  876
> > > buffers
> > > > KiB Swap:  2103292 total,0 used,  2103292 free.  2374656
> > > cached Mem
> > > >
> > > >   PID USER  PR  NIVIRTRESSHR
> S  %CPU  %MEM
> > > TIME+ COMMAND
> > > >  7440 root  20   0   0  0  0 R 71.10 0.000
> > > 8:15.38 vif4.0-q3-guest
> > > >  7434 root  20   0   0  0  0 R 59.14 0.000
> > > 9:00.58 vif4.0-q0-guest
> > > >18 root  20   0   0  0  0 R 33.89 0.000
> > > 2:35.06 ksoftirqd/2
> > > >28 root  20   0   0  0  0 S 20.93 0.000
> > > 3:01.81 ksoftirqd/4
> > > >
> > > >
> > > > As shown above, only two netback related processes (vif4.0-*) are
> > > > running
> > > with high cpu usage, and the other 2 netback processes are idle. The "ps"
> > > result of vif4.0-* processes as follows:
> > > >
> > > > root  7434 50.5  0.0  0 0 ?R09:23
> 11:29
> > > [vif4.0-q0-guest]
> > > > root  7435  0.0  0.0  0 0 ?S09:23
> 0:00
> > > [vif4.0-q0-deall]
> > > > root  7436  0.0  0.0  0 0 ?S09:23
> 0:00
> > > [vif4.0-q1-guest]
> > > > root  7437  0.0  0.0  0 0 ?S09:23
> 0:00
> > > [vif4.0-q1-deall]
> > > > root  7438  0.0  0.0  0 0 ?S09:23
> 0:00
> > > [vif4.0-q2-guest]
> > > > root  7439  0.0  0.0  0 0 ?S09:23
> 0:00
> > > [vif4.0-q2-deall]
> > > > root  7440 48.1  0.0  0 0 ?R09:23
> 10:55
> > > [vif4.0-q3-guest]
> > > > root  7441  0.0  0.0  0 0 ?S09:23
> 0:00
> > > [vif4.0-q3-deall]
> > > > root  9724  0.0  0.0   9244  1520 pts/0S+   09:46
> 0:00
> > > grep --color=auto
> > > >
> > > >
> > > > 2. The "rx" related content in /proc/interupts in receiver DomU (on Host
> B):
> > > >
> > > > 73: 2   0   2925405 0   
> > > > xen-dyn-event
> > >   eth0-q0-rx
> > > > 75: 43  93  0   118 
> > > > xen-dyn-event
> > >   eth0-q1-rx
> > > > 77: 2   337614  1983
> > > > xen-dyn-event
> > >   eth0-q2-rx
> > > > 79: 2414666 0   9   0   
> > > > xen-dyn-event
> > >   eth0-q3-rx
> > > >
> > > > As shown above, it seems like that only q0 and q3 handles the
> > > > interrupt
> > > triggered by packet receving.
> > > >
> > > > Any advise? Thanks.
> > >
> > > Netback selects queue based on the return value of
> skb_get_queue_mapping.
> > > The queue mapping is set by core driver or ndo_select_queue (if
> > > specified by individual driver). In this case netback doesn't have
> > > its implementation of ndo_select_queue, so it's up to core driver to
> > > decide which queue to dispatch the packet to.  I think you need to
> > > inspect why Dom0 only steers 

Re: [Xen-devel] Removing the PVH assert in arch/x86/hvm/io.c:87

2014-12-04 Thread Mukesh Rathor
On Thu, 4 Dec 2014 17:35:59 +0100
Roger Pau Monné  wrote:

> Hello,
> 
> I've just stumbled upon this assert while testing PVH on different
> hardware. It was added in 7c4870 as a safe belt, but it turns out INS
> and OUTS go through handle_mmio. So using this instructions from a PVH
> guest basically kills Xen.

Right. Unf CR-moves/lmsw/clts intercepts also go thru handle_mmio, and
the suggestion was to clean it up first with another emulator function
for CR/IO intercepts. I attempted to do that before I left :

http://lists.xen.org/archives/html/xen-devel/2014-08/msg02204.html

See also:

http://lists.xen.org/archives/html/xen-devel/2014-08/msg00391.html

> I've removed it and everything seems fine, so I'm considering sending
> a patch for 4.5 in order to have it removed. I think the path that
> could trigger the crash because of the missing vioapic stuff is
> already guarded by the other chunk added in the same patch.

No, there used to be another path thru hvm_hap_nested_page_fault()
that would walk the back end handlers and crash xen. So you might
wanna check to make sure. I see hvm_hap_nested_page_fault() looks a
little different now, so not sure if its still broken... 

Mukesh


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xen/arm: uart interrupts handling

2014-12-04 Thread Vijay Kilari
Hi Tim,

On Thu, Dec 4, 2014 at 9:10 PM, Tim Deegan  wrote:
> At 13:51 + on 04 Dec (1417697518), Julien Grall wrote:
>> On 04/12/14 03:50, Vijay Kilari wrote:
>> > Hi Tim,
>>
>> Hi Vijay,
>>
>> > I see that on uart interrupt, ICR is written to clear the all
>> > interrupts except TX, RX and RX timeout. With this, cpu always finds
>> > TX/RX is active and never
>> > comes out of the loop.
>>
>> FWIW, the PL011 serial code has been copied from the Linux drivers.
>>
>> Linux interrupt handler also clear all interrupts except TX, RX, RX
>> timeout. So do you see the issue on Linux?
>>
>> >
>> > With the below changes, TX, RX & RTI are cleared before handling this
>> > interrupts.
>> >
>> > Is my observation is correct?. If so I wonder how it is working on
>> > platforms that
>> > are using pl011. Without this for my cpu just keeps looping here.
>> >
>> >   index fba0a55..d21bce3 100644
>> > --- a/xen/drivers/char/pl011.c
>> > +++ b/xen/drivers/char/pl011.c
>> > @@ -63,7 +63,7 @@ static void pl011_interrupt(int irq, void *data,
>> > struct cpu_user_regs *regs)
>> >  {
>> >  do
>> >  {
>> > -pl011_write(uart, ICR, status & ~(TXI|RTI|RXI));
>> > +pl011_write(uart, ICR, status & (TXI|RTI|RXI));
>>
>>
>> This changes looks wrong to me. We want to clear the bit in status we
>> don't handle. Otherwise the interrupt will be fired in loop.
>
> Yes - even if we do want to explicitly clear the TXI|RTI|RXI bits, we
> must clear the others too!
>
>> If I'm not mistaken, TXI/RTI/RXI will be cleared when data is read or
>> write into the fifo. So we should not clear automatically.
>
> I've been looking at that and I think it's OK for the RX path -- we
> ought to keep calling serial_rx_interrupt() until we've drained the
> FIFO, at which point both RTI and RXI will be cleared.  Certainly we
> shouldn't unconditionally clear RXI/RTI without somehow making sure
> that we're not leaving bytes in the FIFO.
>
> The TX path looks more suspect -- if the uart raises TXI and we have
> nothing buffered to send, then TXI won't get cleared.
> Still, I wonder why you're seeing this and other people haven't.

Yes, this is the behaviour that Iam seeing. In Linux, uart driver
masks TXI interrupt
in IMSC if buffer is empty. However in xen, this scenario is not
handled. This is the reason why cpu does not come out of uart irq
routine if TX interrupt is raised but buffer is empty.

I have added below changes to fix this on top of your suggested change

diff --git a/xen/drivers/char/pl011.c b/xen/drivers/char/pl011.c
index dd19ce8..ad48df3 100644
--- a/xen/drivers/char/pl011.c
+++ b/xen/drivers/char/pl011.c
@@ -109,6 +109,8 @@ static void __init pl011_init_preirq(struct
serial_port *port)
 panic("pl011: No Baud rate configured\n");
 uart->baud = (uart->clock_hz << 2) / divisor;
 }
+/* Trigger RX interrupt at 1/2 full, TX interrupt at 7/8 empty */
+pl011_write(uart, IFLS, (2<<3 | 0));
 /* This write must follow FBRD and IBRD writes. */
 pl011_write(uart, LCR_H, (uart->data_bits - 5) << 5
 | FEN
@@ -197,6 +199,20 @@ static const struct vuart_info
*pl011_vuart(struct serial_port *port)
 return &uart->vuart;
 }

+static void pl011_tx_stop(struct serial_port *port)
+{
+struct pl011 *uart = port->uart;
+
+pl011_write(uart, IMSC, pl011_read(uart, IMSC) & ~(TXI));
+}
+
+static void pl011_tx_start(struct serial_port *port)
+{
+struct pl011 *uart = port->uart;
+
+pl011_write(uart, IMSC, pl011_read(uart, IMSC) | (TXI));
+}
+
 static struct uart_driver __read_mostly pl011_driver = {
 .init_preirq  = pl011_init_preirq,
 .init_postirq = pl011_init_postirq,
@@ -207,6 +223,8 @@ static struct uart_driver __read_mostly pl011_driver = {
 .putc = pl011_putc,
 .getc = pl011_getc,
 .irq  = pl011_irq,
+.start_tx = pl011_tx_start,
+.stop_tx  = pl011_tx_stop,
 .vuart_info   = pl011_vuart,
 };

diff --git a/xen/drivers/char/serial.c b/xen/drivers/char/serial.c
index 44026b1..0f26d40 100644
--- a/xen/drivers/char/serial.c
+++ b/xen/drivers/char/serial.c
@@ -76,6 +76,18 @@ void serial_tx_interrupt(struct serial_port *port,
struct cpu_user_regs *regs)
 cpu_relax();
 }

+if ( port->txbufc == port->txbufp )
+{
+/* Disable TX. nothing to send */
+port->driver->stop_tx(port);
+spin_unlock(&port->tx_lock);
+goto out;
+}
+else
+{
+if ( port->driver->tx_ready(port) )
+port->driver->start_tx(port);
+}
 for ( i = 0, n = port->driver->tx_ready(port); i < n; i++ )
 {
 if ( port->txbufc == port->txbufp )
@@ -117,6 +129,8 @@ static void __serial_putc(struct serial_port *port, char c)
 cpu_relax();
 if ( n > 0 )
 {
+/* Enable TX before sending chars */
+port->driver->start_tx(port);

[Xen-devel] [linux-linus test] 32066: regressions - trouble: broken/fail/pass

2014-12-04 Thread xen . org
flight 32066 linux-linus real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/32066/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-rumpuserxen-i386  8 guest-start   fail REGR. vs. 31241
 test-amd64-i386-xl   11 guest-saverestore fail REGR. vs. 31241
 test-amd64-i386-xl-multivcpu 11 guest-saverestore fail REGR. vs. 31241
 test-amd64-i386-xl-credit2   11 guest-saverestore fail REGR. vs. 31241
 test-amd64-amd64-rumpuserxen-amd64  8 guest-start fail REGR. vs. 31241
 test-amd64-i386-xl1 STARTINGrunning in 32019 [st=running!]
 test-amd64-i386-xl-credit2broken   in 32019
 test-amd64-i386-qemuu-rhel6hvm-intel 1 STARTING running in 32019 [st=running!]
 test-amd64-i386-xl-win7-amd64  blocked in 32019
 test-amd64-amd64-xl-qemut-winxpsp3  2 STARTING  running in 32019 [st=running!]
 test-amd64-i386-xl-qemut-debianhvm-amd64broken in 32019
 test-amd64-amd64-xl-sedf-pin  1 build-check(1)  running in 32019 [st=running!]
 test-amd64-amd64-xl-sedf  1 build-check(1)  running in 32019 [st=running!]
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1broken in 32019
 test-amd64-i386-xl-qemut-winxpsp3  blocked in 32019
 test-amd64-amd64-xl-winxpsp3  1 build-check(1)  running in 32019 [st=running!]

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt  5 xen-bootfail pass in 32019
 test-amd64-amd64-xl   3 host-install(3)   broken pass in 32019
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 3 host-install(3) broken pass in 
32019
 test-amd64-amd64-xl-qemut-win7-amd64  3 host-install(3)   broken pass in 32019
 test-amd64-i386-rhel6hvm-intel  5 xen-boot fail in 32019 pass in 32066
 test-amd64-amd64-xl-qemut-debianhvm-amd64 2 hosts-allocate broken in 32019 
pass in 32066
 test-amd64-amd64-xl-pcipt-intel 2 hosts-allocate broken in 32019 pass in 32066

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl   9 guest-start  fail   like 31241
 test-amd64-i386-freebsd10-i386  7 freebsd-install  fail like 31241
 test-amd64-i386-freebsd10-amd64  7 freebsd-install fail like 31241
 test-amd64-amd64-xl-pcipt-intel  3 host-install(3)  broken REGR. vs. 31241
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 31241
 test-amd64-amd64-xl-qemuu-winxpsp3  7 windows-install  fail like 31241

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-armhf-armhf-libvirt  9 guest-start   fail in 32019 never pass
 test-amd64-i386-libvirt   1 build-check(1)blocked in 32019 n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1) blocked in 32019 n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1)blocked in 32019 n/a
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stopfail in 32019 never pass

version targeted for testing:
 linux3a18ca061311f2f1ee9c44012f89c7436d392117
baseline version:
 linux9f76628da20f96a179ca62b504886f99ecc29223


700 people touched revisions under test,
not listing them all


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt 

Re: [Xen-devel] RFC: Cleaning up the Mini-OS namespace

2014-12-04 Thread Antti Kantee

On 04/12/14 14:40, Andrew Cooper wrote:

There are already-identified issues such as MiniOS leaking things like
ARRAY_SIZE() into linked namespaces, which I havn't yet had enough tuits
to fix.

I think splitting things like the stub libc away from the "MiniOS Xen
Framework" is also a good idea.  Ideally, the result of a "MiniOS Build"
would be a small set of .a's which can then be linked against some
normal C to make a minios guest.  (How feasible this is in reality
remains to be seen.)


I've become increasingly convinced that we (rump kernels) would like to 
use MiniOS as a small firmware library that just takes care of bootstrap 
and provides a high-level interface to the Xen hypervisor.


A componentized MiniOS is not critical from our perspective, as long as 
you can can compile things out.  We always want the minimal version, and 
can use build flags to produce it.  Notably, thanks to recent work by 
Martin, MiniOS is already compiled to a .o in the rumprun-xen 
repository, and then just linked into the final image.


What is critical for us, however, is reducing the amount of support 
routines needed by MiniOS.  Currently, the software stack in rumprun-xen 
is confusing because MiniOS partially uses libc which runs on top of the 
rump kernel which runs on top of MiniOS...  This confusion springs from 
MiniOS providing its own libc, and while it's ok for standalone MiniOS 
to use its own libc, we do not.  To make things as simple as possible, I 
don't want our [compiled] version of MiniOS to depend on anything from libc.


So, while I agree with everyone that merging is a good idea, the realist 
in me remains in doubt just like you do; is it feasible to both trim 
MiniOS to be minimal enough for our needs and keep it maximal enough for 
yours?  Or does that result in too many ifdef noodles?



From a not-public-API point of view, all you have to worry about is that

the existing minios stuff in xen.git, including the stubdom stuff,
continues to work.  We have never made any guarantees to anyone using
minios out-of-tree.


I wonder if work is minimized if we attempt to merge before or after we 
(I?) take the carving knife for a second round in the rumprun-xen repo 
to minimize MiniOS to run only on top of itself.


  - antti

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Question about arch/x86/xen/mmu.c

2014-12-04 Thread Jan-Simon Moeller
Hi !

My name is Jan-Simon Moeller and I'm looking into compiling the kernel with 
LLVM/Clang (see llvm.linuxfoundation.org) . 

Right now we face this issue when compiling with clang:

 CC  arch/x86/xen/mmu.o
arch/x86/xen/mmu.c:1343:18: error: fields must have a constant size:
 'variable length array in structure' extension will never be
 supported
   DECLARE_BITMAP(mask, num_processors);
  ^
include/linux/types.h:10:16: note: expanded from macro 'DECLARE_BITMAP'
   unsigned long name[BITS_TO_LONGS(bits)]
 ^
1 error generated.


Question to the experts: why can't we just use NR_CPUS and be done with it ?
NR_CPUS will be setup by CONFIG_NR_CPUS and thus static.
( e.g. arch/x86/configs/x86_64_defconfig:CONFIG_NR_CPUS=64 )


The code in question is:

static void xen_flush_tlb_others(const struct cpumask *cpus,
 struct mm_struct *mm, unsigned long start,
 unsigned long end)
{
struct {
struct mmuext_op op;
#ifdef CONFIG_SMP
DECLARE_BITMAP(mask, num_processors);
#else
DECLARE_BITMAP(mask, NR_CPUS);
#endif
} *args;
struct multicall_space mcs;

trace_xen_mmu_flush_tlb_others(cpus, mm, start, end);

if (cpumask_empty(cpus))
return; /* nothing to do */

mcs = xen_mc_entry(sizeof(*args));
args = mcs.args;
args->op.arg2.vcpumask = to_cpumask(args->mask);

/* Remove us, and any offline CPUS. */
cpumask_and(to_cpumask(args->mask), cpus, cpu_online_mask);
cpumask_clear_cpu(smp_processor_id(), to_cpumask(args->mask));

args->op.cmd = MMUEXT_TLB_FLUSH_MULTI;
if (end != TLB_FLUSH_ALL && (end - start) <= PAGE_SIZE) {
args->op.cmd = MMUEXT_INVLPG_MULTI;
args->op.arg1.linear_addr = start;
}

MULTI_mmuext_op(mcs.mc, &args->op, 1, NULL, DOMID_SELF);

xen_mc_issue(PARAVIRT_LAZY_MMU);
}


Pointers:
http://www.slideshare.net/linaroorg/lcu14-209-llvm-linux-39165110  # slide 19
http://lwn.net/Articles/441018/
http://stackoverflow.com/questions/14629504/variable-length-array-in-the-middle-of-struct-why-this-c-code-is-valid-for-gcc




Thanks!

Jan-Simon

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] A few EFI code questions

2014-12-04 Thread Roy Franz
On Thu, Dec 4, 2014 at 1:35 AM, Jan Beulich  wrote:
 On 03.12.14 at 22:02,  wrote:
>> Hey,
>>
>> 1) Why is there in EFI code so many functions (e.g. efi_start(),
>>efi_arch_edd(), ...) with local variables declared as a static?
>>Though some of them have also regular local variables. I do not
>>why it was decided that some of them must be the static and
>>some of do not. It is a bit confusing. As I can see there is
>>only one place which have to have local static (place_string()).
>>Other seems to me as thing to save space on the stack but I do
>>not think we need that. According to UEFI spec there will be
>>"128 KiB or more of available stack space" when system runs in
>>boot services mode. It is a lot of space. So, I think we can
>>safely convert most of local static variables to normal local
>>variables. Am I right?
>
> No. Consider what code results when you e.g. make an EFI_GUID
> instance non-static.
>
>> 2) I am going to add EDID support to EFI code. Should it be x86
>>specific code or common one? As I can see EDID is defined as
>>part of GOP so I think that EDID code should be placed in
>>xen/common/efi/boot.c.
>
> Yes.
>
>> 3) Should not we change xen/arch/*/efi/efi-boot.h to
>>xen/arch/*/efi/efi-boot.c? efi-boot.h contains more
>>code than definitions, declarations and short static
>>functions. So, I think that it is more regular *.c file
>>than header file.
>
> That's a matter of taste - I'd probably have made it .c too, but
> didn't mind it being .h as done by Roy (presumably on the basis
> that #include directives are preferred to have .h files as their
> operands). The only thing I regret is that I didn't ask for the
> pointless efi- prefix to be dropped.

I don't mind a change here, and I agree that it is more like a .c file
than a .h.  If a name change is done, is it worth dropping the "efi-" at
the same time?


>
> Jan
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: Correct the opcode for BUG_INSTR on arm32

2014-12-04 Thread Konrad Rzeszutek Wilk
On Thu, Dec 04, 2014 at 07:26:55PM +, Julien Grall wrote:
> A 0 was forgotten when the arm32 BUG instruction opcode has been added in 
> commit
> 3e802c6ca1fb9a9549258c2855a57cad483f3cbd "xen/arm: Correctly support WARN_ON".
> 
> This will result to use a valid instruction (mcreq 0, 3, r0, cr15, cr0, {7}),
> and inhibit usage of BUG/WARN_ON and co.
> 
> Signed-off-by: Julien Grall 
> 
> ---
> 
> Not sure, why I dropped the 0 when I implemented the patch...
> This is a bug fixed for Xen 4.5. This is only affected ARM32 where the
> BUG opcode was malformed.
> 
> With the malformed opcode, the ASSERT/BUG_ON is skipped and the
> processor may execute another patch (because the compiler has optimized

s/patch/path/ ?

> due the unreachable in both macro).
> 
> The code modified is only executed when Xen is in bad state.

Release-Acked-by: Konrad Rzeszutek Wilk  

> ---
>  xen/include/asm-arm/arm32/bug.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/include/asm-arm/arm32/bug.h b/xen/include/asm-arm/arm32/bug.h
> index 155b420..3e66f35 100644
> --- a/xen/include/asm-arm/arm32/bug.h
> +++ b/xen/include/asm-arm/arm32/bug.h
> @@ -6,7 +6,7 @@
>  /* ARMv7 provides a list of undefined opcode (see A8.8.247 DDI 0406C.b)
>   * Use one them encoding A1 to go in exception mode
>   */
> -#define BUG_OPCODE  0xe7f00f0
> +#define BUG_OPCODE  0xe7f000f0
>  
>  #define BUG_INSTR ".word " __stringify(BUG_OPCODE)
>  
> -- 
> 2.1.3
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.5] xen/arm: Correct the opcode for BUG_INSTR on arm32

2014-12-04 Thread Julien Grall
A 0 was forgotten when the arm32 BUG instruction opcode has been added in commit
3e802c6ca1fb9a9549258c2855a57cad483f3cbd "xen/arm: Correctly support WARN_ON".

This will result to use a valid instruction (mcreq 0, 3, r0, cr15, cr0, {7}),
and inhibit usage of BUG/WARN_ON and co.

Signed-off-by: Julien Grall 

---

Not sure, why I dropped the 0 when I implemented the patch...
This is a bug fixed for Xen 4.5. This is only affected ARM32 where the
BUG opcode was malformed.

With the malformed opcode, the ASSERT/BUG_ON is skipped and the
processor may execute another patch (because the compiler has optimized
due the unreachable in both macro).

The code modified is only executed when Xen is in bad state.
---
 xen/include/asm-arm/arm32/bug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/asm-arm/arm32/bug.h b/xen/include/asm-arm/arm32/bug.h
index 155b420..3e66f35 100644
--- a/xen/include/asm-arm/arm32/bug.h
+++ b/xen/include/asm-arm/arm32/bug.h
@@ -6,7 +6,7 @@
 /* ARMv7 provides a list of undefined opcode (see A8.8.247 DDI 0406C.b)
  * Use one them encoding A1 to go in exception mode
  */
-#define BUG_OPCODE  0xe7f00f0
+#define BUG_OPCODE  0xe7f000f0
 
 #define BUG_INSTR ".word " __stringify(BUG_OPCODE)
 
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Konrad Rzeszutek Wilk
On Thu, Dec 04, 2014 at 02:31:11PM +, David Vrabel wrote:
> On 04/12/14 14:09, Sander Eikelenboom wrote:
> > 
> > Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> > 
> >> On 04/12/14 13:10, Sander Eikelenboom wrote:
> >>>
> >>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> >>>
>  On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> >
> > On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
> >>
> >> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
> >>>
> >>> Instead of doing all this complex dance, we depend on the toolstack 
> >>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
> >>> which 'xl' uses when a device is detached or attached from/to a 
> >>> guest. 
> >>> It bypasses the need to worry about the PCI lock. 
> >>
> >> No.  Get pciback to add its own "reset" sysfs file (as I have 
> >> repeatedly 
> >> proposed). 
> >>
> >
> > Which does not work as the kobj will complain (as there is already an 
> > 'reset' associated with the PCI device).
> >>>
>  It is only needed if the core won't provide one.
> >>>
>  +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>  +{
>  +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>  +   struct device *dev = &pci->dev;
>  +   int ret;
>  +
>  +   /* Already have a per-function reset? */
>  +   if (pci_probe_reset_function(pci) == 0)
>  +   return 0;
>  +
>  +   ret = device_create_file(dev, &dev_attr_reset);
>  +   if (ret < 0)
>  +   return ret;
> >>> +   dev_data->>created_reset_file = true;
>  +   return 0;
>  +}
> >>>
> >>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
> >>> "pci.c:__pci_dev_reset" ?
> >>>
> >>> The problem with that function is that from my testing it seems that the 
> >>> first option "pci_dev_specific_reset" always seems to return succes, so 
> >>> all the
> >>> other options are skipped (flr, pm, slot, bus). However the device it 
> >>> self is 
> >>> not properly reset enough (perhaps the pci_dev_specific_reset is good 
> >>> enough for 
> >>> none virtualization purposes and it's probably the least intrusive. For 
> >>> virtualization however it would be nice to be sure it resets properly, or 
> >>> have a 
> >>> way to force a specific reset routine.)
> > 
> >> Then you need work with the maintainer for those specific devices or
> >> drivers to fix their specific reset function.
> > 
> >> I'm not adding stuff to pciback to workaround broken quirks.
> > 
> > OK that's a pretty clear message there, so if one wants to use pci and vga
> > passthrough one should better use KVM and vfio-pci.
> 
> Have you (or anyone else) ever raised the problem with the broken reset
> quirk for certain devices with the relevant maintainer?
> 
> > vfio-pci has:
> > - logic to do the try-slot-bus-reset logic
> 
> Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> as well.
> 
> It makes no sense for both pciback and vfio-pci to workaround problems
> with pci_function_reset() in different ways -- it should be fixed in the
> core PCI code so both can benefit and make use of the same code.

We seem to be going in circles.

This thread: http://patchwork.ozlabs.org/patch/368668/

has an interesting discussion that pretty much touches all of this
and I believe it ends with David agreeing with Alex that an
bus-reset is a perfect use-case.

I believe the contention was on how to expose this interface
to the user-space. David's was thinking to use 'reset' while
I used 'do_flr' (which is a misleading name but hte toolstack
already does it). Perhaps we should just have (as David suggested)
an 'bus_reset' on the devices.

I am going to go on a limb and presume that David was thinking
that this 'bus_reset' would be exposed in the generic PCI land?

> 
> David
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-next test] 32065: tolerable FAIL

2014-12-04 Thread xen . org
flight 32065 linux-next real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/32065/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl   11 guest-saverestore   fail baseline untested
 test-amd64-i386-xl-credit29 guest-start fail baseline untested
 test-amd64-i386-xl-multivcpu  9 guest-start fail baseline untested
 test-amd64-amd64-xl   9 guest-start fail baseline untested
 test-amd64-amd64-xl-sedf 13 guest-saverestore.2 fail baseline untested
 test-amd64-amd64-xl-sedf-pin 12 guest-localmigrate  fail baseline untested
 test-armhf-armhf-xl  12 guest-start.2   fail baseline untested
 test-amd64-amd64-pair16 guest-start/debian  fail baseline untested
 test-amd64-i386-rumpuserxen-i386  5 xen-bootfail baseline untested
 test-amd64-amd64-rumpuserxen-amd64  8 guest-start   fail baseline untested
 test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail baseline untested
 test-amd64-i386-xl-qemuu-win7-amd64  7 windows-install  fail baseline untested
 test-amd64-i386-xl-winxpsp3   7 windows-install fail baseline untested
 test-amd64-i386-xl-qemuu-winxpsp3  7 windows-installfail baseline untested
 test-amd64-amd64-xl-win7-amd64  7 windows-install   fail baseline untested
 test-amd64-i386-xl-winxpsp3-vcpus1  7 windows-install   fail baseline untested
 test-amd64-amd64-xl-qemuu-winxpsp3  7 windows-install   fail baseline untested
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 7 windows-install fail baseline 
untested
 test-amd64-amd64-xl-qemuu-win7-amd64  7 windows-install fail baseline untested
 test-amd64-i386-xl-win7-amd64  7 windows-installfail baseline untested
 test-amd64-amd64-xl-qemut-winxpsp3  7 windows-install   fail baseline untested
 test-amd64-amd64-xl-winxpsp3  7 windows-install fail baseline untested

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-armhf-armhf-libvirt  9 guest-start  fail   never pass
 test-amd64-i386-freebsd10-amd64  7 freebsd-install fail never pass
 test-amd64-i386-freebsd10-i386  7 freebsd-install  fail never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass

version targeted for testing:
 linuxe85db7e77c6801b854864ee65c264be164dd02c1
baseline version:
 linux7a5a4f978750756755dc839014e13d1b088ccc8e

jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  fail
 test-armhf-armhf-xl  fail
 test-amd64-i386-xl   fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  fail
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-rumpuserxen-amd64   fail
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64

[Xen-devel] [libvirt test] 32083: regressions - FAIL

2014-12-04 Thread xen . org
flight 32083 libvirt real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/32083/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i386-libvirt5 libvirt-build fail REGR. vs. 32005
 build-amd64-libvirt   5 libvirt-build fail REGR. vs. 32005
 build-armhf-libvirt   5 libvirt-build fail REGR. vs. 32005

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  bc3b5681039ce564e543b7e4d2bdc9acdd46f386
baseline version:
 libvirt  ff018e686a8a412255bc34d3dc558a1bcf74fac5


People who touched revisions under test:
  Daniel Hansel 
  Dmitry Guryanov 
  John Ferlan 
  Ján Tomko 
  Laine Stump 
  Luyao Huang 
  Martin Kletzander 
  Michal Privoznik 
  Nehal J Wani 
  Pavel Hrdina 
  Peter Krempa 
  Shanzhi Yu 
  Wang Rui 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt blocked
 test-armhf-armhf-libvirt blocked
 test-amd64-i386-libvirt  blocked



sg-report-flight on osstest.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 491 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5] Fixes for PCI backend for 3.19.

2014-12-04 Thread Konrad Rzeszutek Wilk
On Thu, Dec 04, 2014 at 03:46:42PM +, David Vrabel wrote:
> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
> > 
> > These patches fix some issues with PCI back and also add proper
> > bus/slot reset.
> 
> Applied 1-8 to devel/for-linus-3.19.  I did not apply #9.

Excellent. Thank you!
> 
> Thanks.
> 
> David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] PVH cleanups after 4.5

2014-12-04 Thread Tim Deegan
Hi all,

At the Hackathon in Rackspace we discussed a plan to tidy up the
hypervisor side of PVH code so that 'PVH' stops being a separate guest
type, becoming instead a HVM guest with various features disabled (and
one or two other tweaks). 

Although I had hoped to work on that in the meantime, various other
things have got in the way.  But I'd like to at least document the
plan in the hope of getting back on track in the 4.6 development
cycle.

The plan goes something like this:

1. Merge the PVH and HVM hypercall tables.  Patches were already
   posted for this but will need rebasing, and probably more scrutiny.

2. Make PVH-ness a flag on a HVM guest, retaining all the current
   has_hvm_container/is_pvh_domain tests as-is but dropping the
   three-way guest type.

3. Add feature flags to HVM guests that disable various features. 
   These flags should be set once, at domain creation/build, and
   for sanity of testing we should only allow two combinations
   of settings, corresponding to the current HVM and PVH types.
   Make sure that all PVH guests have the PVH feature set.

4. Replace tests for pvh-ness with feature-flag tests wherever
   possible.

5. Fix any outstanding is_pvh tests -- expect this mostly to be 
   feature compatibility with other HVM features (e.g. paging).
   Hopefully we can remove those constraints, or make them constraints
   on a feature set (e.g. can't have PCI passthrough w/out an
   emulated PCI controller).  This also needs an audit of is_hvm tests,
   which are implicitly !is_pvh at the moment.

6. Remove 'PVH' from the hypercall interface, and remove the PVH flag
   from the HVM domain struct.  At this point Xen no longer treats PVH
   and HVM as different (though the tools and the guests themselves
   can maintain the distinction).

Potential feature flags, based on whiteboard notes at the session.
Things that are 'Yes' in both columns might not need actual flags :)

 'HVM'   'PVH'
64bit hypercalls  Yes Yes
32bit hypercalls  Yes No
Paging assistance Yes Yes
ioreq-servers Yes No
HVM-style CPUID   Yes Yes
Interrupt controllers Yes No ([x2]apic, ioapic, pic &c)
TimersYes No (rtc, hpet, pit, pmtimer)
Machine Check regsYes Yes
Emulated PCI  Yes No (PVH to use pcifront?)

This plan doesn't explicitly add things that we might like to happen
for PVH in 4.6 (e.g. PCI passthrough, compat hypercalls) but it ought
to make some of them easier, and we might get some (e.g. shadow
pagetables) for free as the differences between PVH and HVM go away.

I would like to work on this stuff, but I can't really guarantee to
get anything done for 4.6 in the time I have available.  Any
volunteers to help out would be welcomed!  Likewise, any feedback on
the overall plan is welcome before any more work gets done. :)

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data

2014-12-04 Thread Jan Beulich
>>> On 04.12.14 at 17:26,  wrote:
> On 12/04/2014 06:55 AM, Dario Faggioli wrote:
>> On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:
>>> This change requires bumping XEN_SYSCTL_INTERFACE_VERSION
>>>
>> Which would not be the case if we take the approach of adding a new,
>> iotopology specific, hcall, would it?
> 
> I would think that any changes to a public interface, including adding a 
> new function, require new version.

No, additions of new sub-ops don't require the version to be bumped.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xmalloc: add support for checking the pool integrity

2014-12-04 Thread Mihai Donțu
On Thursday 04 December 2014 19:01:40 Mihai Donțu wrote:
> Implemented xmem_pool_check(), xmem_pool_check_locked() and
> xmem_pool_check_unlocked() to verity the integrity of the TLSF matrix.
> 
> Signed-off-by: Mihai Donțu 
> ---
>  xen/common/xmalloc_tlsf.c | 119 
> +-
>  xen/include/xen/xmalloc.h |   7 +++
>  2 files changed, 124 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/xmalloc_tlsf.c b/xen/common/xmalloc_tlsf.c
> index a5769c9..009ba60 100644
> --- a/xen/common/xmalloc_tlsf.c
> +++ b/xen/common/xmalloc_tlsf.c
> @@ -120,9 +120,120 @@ struct xmem_pool {
>  char name[MAX_POOL_NAME_LEN];
>  };
>  
> +static struct xmem_pool *xenpool;
> +
> +static inline void MAPPING_INSERT(unsigned long r, int *fl, int *sl);
> +
>  /*
>   * Helping functions
>   */
> +#ifndef NDEBUG
> +static int xmem_pool_check_size(const struct bhdr *b, int fl, int sl)
> +{
> +while ( b )
> +{
> +int __fl;
> +int __sl;
> +
> +MAPPING_INSERT(b->size, &__fl, &__sl);
> +if ( __fl != fl || __sl != sl )
> +{
> +printk(XENLOG_ERR "xmem_pool: for block %p size = %u, { fl = %d, 
> sl = %d } should be { fl = %d, sl = %d }\n", b, b->size, fl, sl, __fl, __sl);
> +return 0;
> +}
> +b = b->ptr.free_ptr.next;
> +}
> +return 1;
> +}
> +
> +/*
> + * This function must be called from a context where pool->lock is
> + * already acquired
> + */
> +#define xmem_pool_check_unlocked(__pool) 
> __xmem_pool_check_unlocked(__FILE__, __LINE__, __pool)
> +static int __xmem_pool_check_unlocked(const char *file, int line, const 
> struct xmem_pool *pool)
> +{
> +int i;
> +int woops = 0;
> +static int once = 1;
> +
> +for ( i = 0; i < REAL_FLI; i++ )
> +{
> +int fl = ( pool->fl_bitmap & (1 << i) ) ? i : -1;
> +
> +if ( fl >= 0 )
> +{
> +int j;
> +int bitmap_empty = 1;
> +int matrix_empty = 1;
> +
> +for ( j = 0; j < MAX_SLI; j++ )
> +{
> +int sl = ( pool->sl_bitmap[fl] & (1 << j) ) ? j : -1;
> +
> +if ( sl < 0 )
> +continue;
> +
> +if ( once && !pool->matrix[fl][sl] )
> +{
> +/* The bitmap is corrupted */
> +printk(XENLOG_ERR "xmem_pool:%s:%d the TLSF bitmap is 
> corrupted\n", file, line);
> +__warn((char *)file, line);
> +once = 0;
> +woops = 1;
> +}
> +else if ( once && 
> !xmem_pool_check_size(pool->matrix[fl][sl], fl, sl))
> +{
> +printk(XENLOG_ERR "xmem_pool:%s:%d the TLSF chunk matrix 
> is corrupted\n", file, line);
> +__warn((char *)file, line);
> +once = 0;
> +woops = 1;
> +}
> +if ( pool->matrix[fl][sl] )
> +matrix_empty = 0;
> +bitmap_empty = 0;
> +}
> +if ( once && bitmap_empty )
> +{
> +/* The bitmap is corrupted */
> +printk(XENLOG_ERR "xmem_pool:%s:%d the TLSF bitmap is 
> corrupted (non-empty FL with empty SL)\n", file, line);
> +__warn((char *)file, line);
> +once = 0;
> +woops = 1;
> +}
> +if ( once && matrix_empty )
> +{
> +/* The bitmap is corrupted */
> +printk(XENLOG_ERR "xmem_pool:%s:%d the TLSF bitmap is 
> corrupted (empty matrix)\n", file, line);
> +__warn((char *)file, line);
> +once = 0;
> +woops = 1;
> +}
> +}
> +}
> +
> +return woops;
> +}
> +
> +#define xmem_pool_check_locked(__pool) __xmem_pool_check_locked(__FILE__, 
> __LINE__, __pool)
> +static int __xmem_pool_check_locked(const char *file, int line, struct 
> xmem_pool *pool)
> +{
> +int err;
> +
> +spin_lock(&pool->lock);
> +err = __xmem_pool_check_unlocked(file, line, pool);
> +spin_unlock(&pool->lock);
> +return err;
> +}
> +
> +int __xmem_pool_check(const char *file, int line)
> +{
> +return __xmem_pool_check_locked(file, line, xenpool);
> +}
> +#else
> +#define xmem_pool_check_locked(__pool) do { if ( 0 && (__pool) ); } while (0)
> +#define xmem_pool_check_unlocked(__pool) do { if ( 0 && (__pool) ); } while 
> (0)
> +#endif
>  
>  /**
>   * Returns indexes (fl, sl) of the list used to serve request of size r
> @@ -381,6 +492,8 @@ void *xmem_pool_alloc(unsigned long size, struct 
> xmem_pool *pool)
>  int fl, sl;
>  unsigned long tmp_size;
>  
> +xmem_pool_check_locked(pool);
> +
>  if ( pool->init_region == NULL )
>  {
>  if ( (region = pool->get_mem(pool->init_size)) == NULL )
> @@ -442,11 +555,13 @@ void *xmem_pool_alloc(unsigned lon

Re: [Xen-devel] [v8][PATCH 16/17] xen/vtd: group assigned device with RMRR

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -572,10 +572,11 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>  {
>  struct acpi_dmar_reserved_memory *rmrr =
>  container_of(header, struct acpi_dmar_reserved_memory, header);
> -struct acpi_rmrr_unit *rmrru;
> +struct acpi_rmrr_unit *rmrru, *cur_rmrr;
>  void *dev_scope_start, *dev_scope_end;
>  u64 base_addr = rmrr->base_address, end_addr = rmrr->end_address;
>  int ret;
> +static unsigned int group_id = 0;

__initdata. Pointless initializer.

> @@ -682,7 +685,30 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>  "So please set pci_rdmforce to reserve these ranges"
>  " if you need such a device in hotplug case.\n");
>  
> +list_for_each_entry(cur_rmrr, &acpi_rmrr_units, list)
> +{
> +/*
> + * Any same or overlap range mean they should be
> + * at same group.
> + */
> +if ( ((base_addr >= cur_rmrr->base_address) &&
> + (end_addr <= cur_rmrr->end_address)) ||
> + ((base_addr <= cur_rmrr->base_address) &&
> + (end_addr >= cur_rmrr->end_address)) )

This is both more complicated than needed and wrong. You want
an overlap (partial or complete doesn't matter) check, i.e.
start1 <= end2 && start2 <= end1.

> +{
> +rmrru->gid = cur_rmrr->gid;
> +continue;

break

Also this doesn't seem to handle cases where you see in this order

[2,3]
[4,6]
[3,5]

But the more fundamental question is: Are overlaps of RMRRs
actually allowed, or would it not better to bail in that case and
leave the IOMMU disabled?

But the code further down looks so broken that I'll leave to you
to discuss this with your colleagues acting as VT-d maintainers.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xmalloc: add support for checking the pool integrity

2014-12-04 Thread Mihai Donțu
Implemented xmem_pool_check(), xmem_pool_check_locked() and
xmem_pool_check_unlocked() to verity the integrity of the TLSF matrix.

Signed-off-by: Mihai Donțu 
---
 xen/common/xmalloc_tlsf.c | 119 +-
 xen/include/xen/xmalloc.h |   7 +++
 2 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/xen/common/xmalloc_tlsf.c b/xen/common/xmalloc_tlsf.c
index a5769c9..009ba60 100644
--- a/xen/common/xmalloc_tlsf.c
+++ b/xen/common/xmalloc_tlsf.c
@@ -120,9 +120,120 @@ struct xmem_pool {
 char name[MAX_POOL_NAME_LEN];
 };
 
+static struct xmem_pool *xenpool;
+
+static inline void MAPPING_INSERT(unsigned long r, int *fl, int *sl);
+
 /*
  * Helping functions
  */
+#ifndef NDEBUG
+static int xmem_pool_check_size(const struct bhdr *b, int fl, int sl)
+{
+while ( b )
+{
+int __fl;
+int __sl;
+
+MAPPING_INSERT(b->size, &__fl, &__sl);
+if ( __fl != fl || __sl != sl )
+{
+printk(XENLOG_ERR "xmem_pool: for block %p size = %u, { fl = %d, 
sl = %d } should be { fl = %d, sl = %d }\n", b, b->size, fl, sl, __fl, __sl);
+return 0;
+}
+b = b->ptr.free_ptr.next;
+}
+return 1;
+}
+
+/*
+ * This function must be called from a context where pool->lock is
+ * already acquired
+ */
+#define xmem_pool_check_unlocked(__pool) __xmem_pool_check_unlocked(__FILE__, 
__LINE__, __pool)
+static int __xmem_pool_check_unlocked(const char *file, int line, const struct 
xmem_pool *pool)
+{
+int i;
+int woops = 0;
+static int once = 1;
+
+for ( i = 0; i < REAL_FLI; i++ )
+{
+int fl = ( pool->fl_bitmap & (1 << i) ) ? i : -1;
+
+if ( fl >= 0 )
+{
+int j;
+int bitmap_empty = 1;
+int matrix_empty = 1;
+
+for ( j = 0; j < MAX_SLI; j++ )
+{
+int sl = ( pool->sl_bitmap[fl] & (1 << j) ) ? j : -1;
+
+if ( sl < 0 )
+continue;
+
+if ( once && !pool->matrix[fl][sl] )
+{
+/* The bitmap is corrupted */
+printk(XENLOG_ERR "xmem_pool:%s:%d the TLSF bitmap is 
corrupted\n", file, line);
+__warn((char *)file, line);
+once = 0;
+woops = 1;
+}
+else if ( once && !xmem_pool_check_size(pool->matrix[fl][sl], 
fl, sl))
+{
+printk(XENLOG_ERR "xmem_pool:%s:%d the TLSF chunk matrix 
is corrupted\n", file, line);
+__warn((char *)file, line);
+once = 0;
+woops = 1;
+}
+if ( pool->matrix[fl][sl] )
+matrix_empty = 0;
+bitmap_empty = 0;
+}
+if ( once && bitmap_empty )
+{
+/* The bitmap is corrupted */
+printk(XENLOG_ERR "xmem_pool:%s:%d the TLSF bitmap is 
corrupted (non-empty FL with empty SL)\n", file, line);
+__warn((char *)file, line);
+once = 0;
+woops = 1;
+}
+if ( once && matrix_empty )
+{
+/* The bitmap is corrupted */
+printk(XENLOG_ERR "xmem_pool:%s:%d the TLSF bitmap is 
corrupted (empty matrix)\n", file, line);
+__warn((char *)file, line);
+once = 0;
+woops = 1;
+}
+}
+}
+
+return woops;
+}
+
+#define xmem_pool_check_locked(__pool) __xmem_pool_check_locked(__FILE__, 
__LINE__, __pool)
+static int __xmem_pool_check_locked(const char *file, int line, struct 
xmem_pool *pool)
+{
+int err;
+
+spin_lock(&pool->lock);
+err = __xmem_pool_check_unlocked(file, line, pool);
+spin_unlock(&pool->lock);
+return err;
+}
+
+int __xmem_pool_check(const char *file, int line)
+{
+return __xmem_pool_check_locked(file, line, xenpool);
+}
+#else
+#define xmem_pool_check_locked(__pool) do { if ( 0 && (__pool) ); } while (0)
+#define xmem_pool_check_unlocked(__pool) do { if ( 0 && (__pool) ); } while (0)
+#endif
 
 /**
  * Returns indexes (fl, sl) of the list used to serve request of size r
@@ -381,6 +492,8 @@ void *xmem_pool_alloc(unsigned long size, struct xmem_pool 
*pool)
 int fl, sl;
 unsigned long tmp_size;
 
+xmem_pool_check_locked(pool);
+
 if ( pool->init_region == NULL )
 {
 if ( (region = pool->get_mem(pool->init_size)) == NULL )
@@ -442,11 +555,13 @@ void *xmem_pool_alloc(unsigned long size, struct 
xmem_pool *pool)
 
 pool->used_size += (b->size & BLOCK_SIZE_MASK) + BHDR_OVERHEAD;
 
+xmem_pool_check_unlocked(pool);
 spin_unlock(&pool->lock);
 return (void *)b->ptr.buffer;
 
 /* Failed alloc */
  out_locked:
+xmem_pool_check_unlocked(pool);
 spin_unlock(&pool->lock);
 
  out:
@@ -464,6 +579,7 @@ void xmem_pool_free(void *p

Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Alex Williamson
On Thu, 2014-12-04 at 17:25 +0100, Sander Eikelenboom wrote:
> Thursday, December 4, 2014, 4:39:06 PM, you wrote:
> 
> > On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
> >> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
> >> 
> >> > On 04/12/14 14:09, Sander Eikelenboom wrote:
> >> >> 
> >> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> >> >> 
> >> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
> >> 
> >>  Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> >> 
> >> > On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> >> >>
> >> >> On Dec 4, 2014 6:30 AM, David Vrabel  
> >> >> wrote:
> >> >>>
> >> >>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
> >> 
> >>  Instead of doing all this complex dance, we depend on the 
> >>  toolstack 
> >>  doing the right thing. As such implement the 'do_flr' SysFS 
> >>  attribute 
> >>  which 'xl' uses when a device is detached or attached from/to a 
> >>  guest. 
> >>  It bypasses the need to worry about the PCI lock. 
> >> >>>
> >> >>> No.  Get pciback to add its own "reset" sysfs file (as I have 
> >> >>> repeatedly 
> >> >>> proposed). 
> >> >>>
> >> >>
> >> >> Which does not work as the kobj will complain (as there is already 
> >> >> an 'reset' associated with the PCI device).
> >> 
> >> > It is only needed if the core won't provide one.
> >> 
> >> > +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> >> > +{
> >> > +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> >> > +   struct device *dev = &pci->dev;
> >> > +   int ret;
> >> > +
> >> > +   /* Already have a per-function reset? */
> >> > +   if (pci_probe_reset_function(pci) == 0)
> >> > +   return 0;
> >> > +
> >> > +   ret = device_create_file(dev, &dev_attr_reset);
> >> > +   if (ret < 0)
> >> > +   return ret;
> >>  +   dev_data->>created_reset_file = true;
> >> > +   return 0;
> >> > +}
> >> 
> >>  Wouldn't the "core-reset-sysfs-file" be still wired to the end up 
> >>  calling 
> >>  "pci.c:__pci_dev_reset" ?
> >> 
> >>  The problem with that function is that from my testing it seems that 
> >>  the 
> >>  first option "pci_dev_specific_reset" always seems to return succes, 
> >>  so all the
> >>  other options are skipped (flr, pm, slot, bus). However the device it 
> >>  self is 
> >>  not properly reset enough (perhaps the pci_dev_specific_reset is good 
> >>  enough for 
> >>  none virtualization purposes and it's probably the least intrusive. 
> >>  For 
> >>  virtualization however it would be nice to be sure it resets 
> >>  properly, or have a 
> >>  way to force a specific reset routine.)
> >> >> 
> >> >>> Then you need work with the maintainer for those specific devices or
> >> >>> drivers to fix their specific reset function.
> >> >> 
> >> >>> I'm not adding stuff to pciback to workaround broken quirks.
> >> >> 
> >> >> OK that's a pretty clear message there, so if one wants to use pci and 
> >> >> vga
> >> >> passthrough one should better use KVM and vfio-pci.
> >> 
> >> > Have you (or anyone else) ever raised the problem with the broken reset
> >> > quirk for certain devices with the relevant maintainer?
> >> 
> >> >> vfio-pci has:
> >> >> - logic to do the try-slot-bus-reset logic
> >> 
> >> > Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> >> > as well.
> >> 
> >> Depends on what you call an "incorrect fix" .. it fixes a quirk .. 
> >> you can say that's incorrect, but then you would have to remove 50% of
> >> the kernel and Xen code as well.
> >> 
> >> (i do in general agree it's better to strive for a generic solution though,
> >> that's exactly why i brought up that that function doesn't seem to work 
> >> perfect
> >> for virtualization purposes) 
> >> 
> >> > It makes no sense for both pciback and vfio-pci to workaround problems
> >> > with pci_function_reset() in different ways -- it should be fixed in the
> >> > core PCI code so both can benefit and make use of the same code.
> >> 
> >> Well perhaps Bjorn knows why the order of resets and skipping the rest as
> >> implemented in "pci.c:__pci_dev_reset" was implemented in that way ?
> >> 
> >> Especially what is the expectation about pci_dev_specific_reset doing a 
> >> proper 
> >> reset for say a vga-card:
> >> - i know it doesn't work on a radeon card (doesn't blank screen, on next 
> >> guest 
> >>   boot reports it's already posted, powermanagement doesn't work).
> >> - while with a slot/bus reset, that all just works fine, screen blanks 
> >>   immediately and everything else also works.
> >> 
> >> Added Alex as well since he added this workaround for KVM/vfio-pci, 
> >> perhaps he knows why
> >> he introduced 

Re: [Xen-devel] [v8][PATCH 13/17] xen/mem_access: don't allow accessing reserved device memory

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> --- a/xen/common/mem_access.c
> +++ b/xen/common/mem_access.c
> @@ -55,6 +55,43 @@ void mem_access_resume(struct domain *d)
>  }
>  }
>  
> +/* We can't expose reserved device memory. */
> +static int mem_access_check_rdm(struct domain *d, uint64_aligned_t start,
> +uint32_t nr)
> +{
> +uint32_t i;
> +struct p2m_get_reserved_device_memory pgrdm;
> +int rc = 0;
> +
> +if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )

Why?

> +{
> +for ( i = 0; i < nr; i++ )
> +{
> +pgrdm.gfn = start + i;
> +pgrdm.domain = d;
> +rc = 
> iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> +  &pgrdm);
> +if ( rc < 0 )
> +{
> +printk(XENLOG_WARNING
> +   "Domain %d can't check reserved device memory.\n",

If I saw this text in a log file, it wouldn't mean anything to me.
Additionally this is only partly useful without also listing the
offending domain (which isn't d afaict) and the GFN.

> +   d->domain_id);
> +return rc;
> +}
> +
> +if ( rc == 1 )
> +{
> +printk(XENLOG_WARNING
> +   "Domain %d: we shouldn't mem_access reserved device 
> memory.\n",

This one's only marginally better than the one above.

> +   d->domain_id);
> +return rc;
> +}
> +}
> +}
> +
> +return rc;
> +}
> +
>  int mem_access_memop(unsigned long cmd,
>   XEN_GUEST_HANDLE_PARAM(xen_mem_access_op_t) arg)
>  {
> @@ -99,6 +136,10 @@ int mem_access_memop(unsigned long cmd,
>((mao.pfn + mao.nr - 1) > domain_get_maximum_gpfn(d))) )
>  break;
>  
> +rc =  mem_access_check_rdm(d, mao.pfn, mao.nr);
> +if ( rc == 1 )
> +break;

So you decided to return 1 from the hypercall - what is that
supposed to mean to an unaware caller?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 12/17] xen/x86/ept: handle reserved device memory in ept_handle_violation

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> We always reserve these ranges since we never allow any stuff to
> poke them.
> 
> But in theory some untrusted VM can maliciously access them. So we
> need to intercept this approach. But we just don't want to leak
> anything or introduce any side affect since other OSs may touch them
> by careless behavior, so its enough to have a lightweight way, and
> it shouldn't be same as those broken pages which cause domain crush.

This needs a better explanation: If the devices associated with the
reserved region being touched are assigned to the guest, it is
permitted to touch them. If it touches regions of devices not yet or
not anymore assigned to it, the behavior should match real
hardware: Writes ignored and reads return all ones. I.e. such
accesses should get handed to the DM in that latter case.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-12-04 Thread Andrew Cooper
On 04/12/14 16:11, Jan Beulich wrote:
 On 04.12.14 at 16:46,  wrote:
>> On 04/12/14 14:55, Jan Beulich wrote:
>> On 04.12.14 at 15:28,  wrote:
 On 04/12/14 13:49, Jan Beulich wrote:
 On 28.11.14 at 16:46,  wrote:
>> On 28/11/14 15:18, Jan Beulich wrote:
>> On 28.11.14 at 14:55,  wrote:
 The problem is with continuations which reuse the upper bits of the
 input register, not with this HVMOP_op_mask specifically; the
 HVMOP_op_mask simply adds to an existing problem.  This is something
 which needs considering urgently because, as you identify above, we 
 have
 already suffered an accidental ABI breakage with the mem-op widening.
>>> Since we can't retroactively fix the mem-op widening, I still don't see
>>> what's urgent here: As long as we don't change any of these masks,
>>> nothing bad is going to happen. Of course one thing to consider with
>>> this aspect in mind is whether to change the hvm-op or gnttab-op
>>> masks again _before_ 4.5 goes out, based on whether we feel they're
>>> wide enough for the (un)foreseeable future.
>> By urgent, I mean exactly this, while we have the ability to tweak the
>> masks.
> With no-one else voicing an opinion:
>
> For hvmop, the mask currently is 8 bits and we've got 22 ops defined.
>
> For gnttabop, the mask currently is 12 bits and we've got 12 ops defined.
>
> For the latter, we're fine even without further consideration. For the
> former, the two operations actively using the continuation encoding
> are tools-only ones. Since we're fine to alter the tools only interfaces,
> and since it was intended for the tools-only HVM-ops to be split off
> to a separate hypercall (e.g. hvmctl) anyway, the range restriction
> would then no longer be a problem. Plus, in the worst case we could
> always introduce yet another hypercall if we ran out of numbers.
 Are you suggesting that we make a new hvmctl now and remove the hvmop
 mask before 4.5?  If we ship 4.5 with the hvmop mask, we cannot
 subsequently remove it even if all continuable hypercalls move to a
 separate hypercall.
>>> Why? We certainly don't guarantee compatibility for undefined
>>> hypercalls to behave in a certain way.
>> A task is in the middle of a hypercall continuation.  The VM is migrated
>> to a newer Xen which has lost the op mask and gained a new hypercall
>> which would alias.
> Impossible: A continuation could be in progress only when we
> actually use the high bits (or else you have nowhere to encode
> it). Operations not using continuations consequently aren't
> susceptible to the mask disappearing.

Ah yes - if nothing guest usable is currently continuable, then this is ok.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data

2014-12-04 Thread Andrew Cooper
On 04/12/14 16:26, Boris Ostrovsky wrote:
> On 12/04/2014 06:55 AM, Dario Faggioli wrote:
>> On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:
>>> Make XEN_SYSCTL_topologyinfo more generic so that it can return both
>>> CPU and IO topology (support for returning the latter is added in the
>>> subsequent patch)
>>>
>> Is it always the case that we need the full information (i.e., both CPU
>> and IO topology), at all levels above Xen?
>>
>> I appreciate the performance implications (one hcall instead of two) in
>> cases where we do, but I still think, as Andrew suggested, that having a
>> new XEN_SYSCTL_iotopology (or something like that) and leaving
>> *_topologyinfo alone would make a better low-level interface.
>>
>> All IMHO, of course.
>
> I don't feel strongly either way so I can go that route (and it will
> make patch 4 completely unnecessary).
>
> (I am not sure though I understood Andrew's reasoning for splitting
> into two sysctls.

The two bits of information are different, and it is perfectly
reasonable to want the cpu information at one point, but the io
information at another.

As it stands, you appear to mandate that the caller wants the cpu
information, which is not true.

>From a separate point of view, while the sysctl abi version does allow
us to make changes like this, it makes a mess for valgrind and strace to
have radically different hypercalls with the same name, separated by
only the interface version. 

>
>>> To do so move (and rename) previously used cpu_to_core/cpu_to_socket/
>>> cpu_to_node into struct xen_sysctl_cputopo and move it, together with
>>> newly added struct xen_sysctl_iotopo, to xen_sysctl_topologyinfo.
>>>
>>> Add libxl_get_topology() to handle new interface and modify
>>> libxl_get_cpu_topology() to be a wrapper around it.
>>>
>> This is, I think, useful, and I'd go for it, even if we adding a new
>> hypercall at the Xen and xc level.
>>
>> Of course, it would work the other way round: the new libxl function
>> would wrap the existing libxl_get_cpu_topology() and a new libxl call
>> (something like libxl_get_io_topology() ?).
>>
>>> Adjust xenpm and python's low-level C libraries for new interface.
>>>
>> All in one patch? Wouldn't splitting it at least in two (hypervisor and
>> tools parts) be better? If it were me, I'd probably split even more...
>
> I could not split it because this patch changes sysctl interface (more
> specifically, xen_sysctl_topologyinfo_t/xc_topologyinfo_t) so anyone
> who uses this struct needed to be updated at the same time.
>
> Of course, if I were to leave current interface intact and add another
> sysctl for IO topology then this will not be necessary

This is the other reason why change simply for changes sake in the
interface is best avoided.  The unstable API goes very deep into the
toolstack.

>
>>
>>> This change requires bumping XEN_SYSCTL_INTERFACE_VERSION
>>>
>> Which would not be the case if we take the approach of adding a new,
>> iotopology specific, hcall, would it?
>
> I would think that any changes to a public interface, including adding
> a new function, require new version.
>
> (And if we get a new version, even if we split topology into CPU and
> IO sysctls, I'd still like to put cpu_to_[core|socket||node] into a
> single structure).

This would certainly decrease the faff both the toostack and hypervisor
have to go to when passing the parameters, and I think it would be a
good improvement.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 11/17] xen/x86/p2m: reject populating for reserved device memory mapping

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -556,6 +556,40 @@ guest_physmap_remove_page(struct domain *d, unsigned 
> long gfn,
>  gfn_unlock(p2m, gfn, page_order);
>  }
>  
> +/* Check if we are accessing rdm. */

If a comment doesn't do anything but re-state a function name,
it's imo superfluous.

> +int p2m_check_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> + u32 id, void *ctxt)
> +{
> +xen_pfn_t end = start + nr;
> +unsigned int i;
> +u32 sbdf;
> +struct p2m_get_reserved_device_memory *pgrdm = ctxt;
> +struct domain *d = pgrdm->domain;
> +
> +if ( d->arch.hvm_domain.pci_force )
> +{
> +if ( pgrdm->gfn >= start && pgrdm->gfn < end )
> +return 1;
> +}
> +else
> +{
> +for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
> +{
> +sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
> + d->arch.hvm_domain.pcidevs[i].bus,
> + d->arch.hvm_domain.pcidevs[i].devfn);
> +
> +if ( sbdf == id )
> +{
> +if ( pgrdm->gfn >= start && pgrdm->gfn < end )
> +return 1;
> +}

Please join together if()s like these.

> @@ -686,8 +721,28 @@ guest_physmap_add_entry(struct domain *d, unsigned long 
> gfn,
>  /* Now, actually do the two-way mapping */
>  if ( mfn_valid(_mfn(mfn)) ) 
>  {
> -rc = p2m_set_entry(p2m, gfn, _mfn(mfn), page_order, t,
> -   p2m->default_access);
> +pgrdm.gfn = gfn;
> +pgrdm.domain = d;
> +if ( !is_hardware_domain(d) && iommu_use_hap_pt(d) )
> +{
> +rc = 
> iommu_get_reserved_device_memory(p2m_check_reserved_device_memory,
> +  &pgrdm);
> +/* We always avoid populating reserved device memory. */
> +if ( rc == 1 )
> +{
> +rc = -EBUSY;
> +goto out;

Did I overlook something in the earlier tool stack patches? How does
the tool stack avoid populating these areas?

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -709,6 +709,15 @@ static inline unsigned int 
> p2m_get_iommu_flags(p2m_type_t p2mt)
>  return flags;
>  }
>  
> +struct p2m_get_reserved_device_memory {
> +unsigned long gfn;
> +struct domain *domain;
> +};
> +
> +/* Check if we are accessing rdm. */
> +extern int p2m_check_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +u32 id, void *ctxt);

Are subsequent patches going to make use of this outside of p2m.c?
If not, these declarations don't belong here. And even if the
function was going to be used elsewhere, the structure wouldn't
need defining here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap

2014-12-04 Thread Wei Liu
On Thu, Dec 04, 2014 at 11:26:58AM -0500, Don Slutz wrote:
[...]
> >those warnings less scary.
> 
> It was not so much that hvmloader is the one to change (but having it check
> for room first might be good), but more that a change to xen would be good
> (like changing the wording or maybe only output in debug=y builds, etc.).
> 
> Maybe a new XENMEMF to suppress the message?
> 

I don't think it should work like that. That message is perfectly
legitimate -- a domain is asking for more memory than it should and Xen
rightfully rejects that. Having a guest controlable way to suppress that
is a bad idea.

Wei.

> 
>-Don Slutz
> 
> 
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Removing the PVH assert in arch/x86/hvm/io.c:87

2014-12-04 Thread Roger Pau Monné
Hello,

I've just stumbled upon this assert while testing PVH on different
hardware. It was added in 7c4870 as a safe belt, but it turns out INS
and OUTS go through handle_mmio. So using this instructions from a PVH
guest basically kills Xen.

I've removed it and everything seems fine, so I'm considering sending a
patch for 4.5 in order to have it removed. I think the path that could
trigger the crash because of the missing vioapic stuff is already
guarded by the other chunk added in the same patch.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 10/17] hvmloader/mem_hole_alloc: skip any overlap with reserved device memory

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> In some cases like igd_opregion_pgbase, guest will use mem_hole_alloc
> to allocate some memory to use in runtime cycle, so we alsoe need to
> make sure all reserved device memory don't overlap such a region.

While ideally this would get switched to the model outlined for
the previous two patches too, it's at least reasonable to keep
this simple allocator simple for the time being.

> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -416,9 +416,29 @@ static uint32_t alloc_down = RESERVED_MEMORY_DYNAMIC_END;
>  
>  xen_pfn_t mem_hole_alloc(uint32_t nr_mfns)
>  {
> +unsigned int i, num = hvm_get_reserved_device_memory_map();
> +uint64_t rdm_start, rdm_end;
> +uint32_t alloc_start, alloc_end;
> +
>  alloc_down -= nr_mfns << PAGE_SHIFT;
> +alloc_start = alloc_down;
> +alloc_end = alloc_start + (nr_mfns << PAGE_SHIFT);
> +for ( i = 0; i < num; i++ )
> +{
> +rdm_start = (uint64_t)rdm_map[i].start_pfn << PAGE_SHIFT;
> +rdm_end = rdm_start + ((uint64_t)rdm_map[i].nr_pages << PAGE_SHIFT);
> +if ( check_rdm_hole_conflict((uint64_t)alloc_start,
> + (uint64_t)alloc_end,

Pointless casts.

> + rdm_start, rdm_end - rdm_start) )
> +{
> +alloc_end = rdm_start;
> +alloc_start = alloc_end - (nr_mfns << PAGE_SHIFT);
> +BUG_ON(alloc_up >= alloc_start);

This is redundant with the BUG_ON() below afaict. Or at least it
would be, if you would properly update allow_down (if you don't
you may end up returning the same PFN for two allocations).

> +}
> +}
> +
>  BUG_ON(alloc_up >= alloc_down);

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap

2014-12-04 Thread Don Slutz

On 12/03/14 09:50, Stefano Stabellini wrote:

On Wed, 3 Dec 2014, Don Slutz wrote:

On 12/03/14 07:20, Stefano Stabellini wrote:

On Wed, 3 Dec 2014, Wei Liu wrote:

On Tue, Dec 02, 2014 at 03:23:29PM -0500, Don Slutz wrote:
[...]

hw_error("xc_domain_getinfo failed");
}
-if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
-(nr_pfn * XC_PAGE_SIZE / 1024)) <
0) {
+max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
+free_pages = max_pages - info.nr_pages;
+real_free = free_pages;
+if (free_pages > VGA_HOLE_SIZE) {
+free_pages -= VGA_HOLE_SIZE;
+} else {
+free_pages = 0;
+}

I don't think we need to subtract VGA_HOLE_SIZE.

If you do not use some slack (like 32 here), xen does report:


(d5) [2014-11-29 17:07:21] Loaded SeaBIOS
(d5) [2014-11-29 17:07:21] Creating MP tables ...
(d5) [2014-11-29 17:07:21] Loading ACPI ...
(XEN) [2014-11-29 17:07:21] page_alloc.c:1568:d5 Over-allocation for
domain
5: 1057417 > 1057416
(XEN) [2014-11-29 17:07:21] memory.c:158:d5 Could not allocate order=0

This message is a bit red herring.

It's hvmloader trying to populate ram for firmware data. The actual
amount of extra pages needed depends on the firmware.

In any case it's safe to disallow hvmloader from doing so, it will just
relocate some pages from ram (hence shrinking *mem_end).

That looks like a better solution


I went with a "leave some slack" so that the error message above is not
output.

When a change to hvmloader is done so that the message does not appear during
normal usage, the extra pages in QEMU can be dropped.

Although those messages look like fatal errors, they are not. It is
normal way for hvmloader to operate: firstly it tries to allocate extra
memory until it gets an error, then it continues with normal memory,
see:

void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
{
 static int over_allocated;
 struct xen_add_to_physmap xatp;
 struct xen_memory_reservation xmr;

 for ( ; nr_mfns-- != 0; mfn++ )
 {
 /* Try to allocate a brand new page in the reserved area. */
 if ( !over_allocated )
 {
 xmr.domid = DOMID_SELF;
 xmr.mem_flags = 0;
 xmr.extent_order = 0;
 xmr.nr_extents = 1;
 set_xen_guest_handle(xmr.extent_start, &mfn);
 if ( hypercall_memory_op(XENMEM_populate_physmap, &xmr) == 1 )
 continue;
 over_allocated = 1;
 }

 /* Otherwise, relocate a page from the ordinary RAM map. */

I think there is really nothing to change there. Maybe we want to make
those warnings less scary.


It was not so much that hvmloader is the one to change (but having it check
for room first might be good), but more that a change to xen would be good
(like changing the wording or maybe only output in debug=y builds, etc.).

Maybe a new XENMEMF to suppress the message?


   -Don Slutz






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Sander Eikelenboom

Thursday, December 4, 2014, 4:39:06 PM, you wrote:

> On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
>> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
>> 
>> > On 04/12/14 14:09, Sander Eikelenboom wrote:
>> >> 
>> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
>> >> 
>> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>> 
>>  Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>> 
>> > On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>> >>
>> >> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>> >>>
>> >>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>> 
>>  Instead of doing all this complex dance, we depend on the toolstack 
>>  doing the right thing. As such implement the 'do_flr' SysFS 
>>  attribute 
>>  which 'xl' uses when a device is detached or attached from/to a 
>>  guest. 
>>  It bypasses the need to worry about the PCI lock. 
>> >>>
>> >>> No.  Get pciback to add its own "reset" sysfs file (as I have 
>> >>> repeatedly 
>> >>> proposed). 
>> >>>
>> >>
>> >> Which does not work as the kobj will complain (as there is already an 
>> >> 'reset' associated with the PCI device).
>> 
>> > It is only needed if the core won't provide one.
>> 
>> > +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>> > +{
>> > +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>> > +   struct device *dev = &pci->dev;
>> > +   int ret;
>> > +
>> > +   /* Already have a per-function reset? */
>> > +   if (pci_probe_reset_function(pci) == 0)
>> > +   return 0;
>> > +
>> > +   ret = device_create_file(dev, &dev_attr_reset);
>> > +   if (ret < 0)
>> > +   return ret;
>>  +   dev_data->>created_reset_file = true;
>> > +   return 0;
>> > +}
>> 
>>  Wouldn't the "core-reset-sysfs-file" be still wired to the end up 
>>  calling 
>>  "pci.c:__pci_dev_reset" ?
>> 
>>  The problem with that function is that from my testing it seems that 
>>  the 
>>  first option "pci_dev_specific_reset" always seems to return succes, so 
>>  all the
>>  other options are skipped (flr, pm, slot, bus). However the device it 
>>  self is 
>>  not properly reset enough (perhaps the pci_dev_specific_reset is good 
>>  enough for 
>>  none virtualization purposes and it's probably the least intrusive. For 
>>  virtualization however it would be nice to be sure it resets properly, 
>>  or have a 
>>  way to force a specific reset routine.)
>> >> 
>> >>> Then you need work with the maintainer for those specific devices or
>> >>> drivers to fix their specific reset function.
>> >> 
>> >>> I'm not adding stuff to pciback to workaround broken quirks.
>> >> 
>> >> OK that's a pretty clear message there, so if one wants to use pci and vga
>> >> passthrough one should better use KVM and vfio-pci.
>> 
>> > Have you (or anyone else) ever raised the problem with the broken reset
>> > quirk for certain devices with the relevant maintainer?
>> 
>> >> vfio-pci has:
>> >> - logic to do the try-slot-bus-reset logic
>> 
>> > Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
>> > as well.
>> 
>> Depends on what you call an "incorrect fix" .. it fixes a quirk .. 
>> you can say that's incorrect, but then you would have to remove 50% of
>> the kernel and Xen code as well.
>> 
>> (i do in general agree it's better to strive for a generic solution though,
>> that's exactly why i brought up that that function doesn't seem to work 
>> perfect
>> for virtualization purposes) 
>> 
>> > It makes no sense for both pciback and vfio-pci to workaround problems
>> > with pci_function_reset() in different ways -- it should be fixed in the
>> > core PCI code so both can benefit and make use of the same code.
>> 
>> Well perhaps Bjorn knows why the order of resets and skipping the rest as
>> implemented in "pci.c:__pci_dev_reset" was implemented in that way ?
>> 
>> Especially what is the expectation about pci_dev_specific_reset doing a 
>> proper 
>> reset for say a vga-card:
>> - i know it doesn't work on a radeon card (doesn't blank screen, on next 
>> guest 
>>   boot reports it's already posted, powermanagement doesn't work).
>> - while with a slot/bus reset, that all just works fine, screen blanks 
>>   immediately and everything else also works.
>> 
>> Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps 
>> he knows why
>> he introduced the workaround in vfio-pci instead of trying to fix it in core 
>> pci 
>> code ?

> I don't know what workaround you're talking about.  As devices are
> released from the user, vfio-pci attempts to reset them.  If
> pci_reset_function() returns success we mark the device clean, otherwise
> it gets marked dirty.  Each time a device is r

Re: [Xen-devel] [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data

2014-12-04 Thread Boris Ostrovsky

On 12/04/2014 06:55 AM, Dario Faggioli wrote:

On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:

Make XEN_SYSCTL_topologyinfo more generic so that it can return both
CPU and IO topology (support for returning the latter is added in the
subsequent patch)


Is it always the case that we need the full information (i.e., both CPU
and IO topology), at all levels above Xen?

I appreciate the performance implications (one hcall instead of two) in
cases where we do, but I still think, as Andrew suggested, that having a
new XEN_SYSCTL_iotopology (or something like that) and leaving
*_topologyinfo alone would make a better low-level interface.

All IMHO, of course.


I don't feel strongly either way so I can go that route (and it will 
make patch 4 completely unnecessary).


(I am not sure though I understood Andrew's reasoning for splitting into 
two sysctls.



To do so move (and rename) previously used cpu_to_core/cpu_to_socket/
cpu_to_node into struct xen_sysctl_cputopo and move it, together with
newly added struct xen_sysctl_iotopo, to xen_sysctl_topologyinfo.

Add libxl_get_topology() to handle new interface and modify
libxl_get_cpu_topology() to be a wrapper around it.


This is, I think, useful, and I'd go for it, even if we adding a new
hypercall at the Xen and xc level.

Of course, it would work the other way round: the new libxl function
would wrap the existing libxl_get_cpu_topology() and a new libxl call
(something like libxl_get_io_topology() ?).


Adjust xenpm and python's low-level C libraries for new interface.


All in one patch? Wouldn't splitting it at least in two (hypervisor and
tools parts) be better? If it were me, I'd probably split even more...


I could not split it because this patch changes sysctl interface (more 
specifically, xen_sysctl_topologyinfo_t/xc_topologyinfo_t) so anyone who 
uses this struct needed to be updated at the same time.


Of course, if I were to leave current interface intact and add another 
sysctl for IO topology then this will not be necessary





This change requires bumping XEN_SYSCTL_INTERFACE_VERSION


Which would not be the case if we take the approach of adding a new,
iotopology specific, hcall, would it?


I would think that any changes to a public interface, including adding a 
new function, require new version.


(And if we get a new version, even if we split topology into CPU and IO 
sysctls, I'd still like to put cpu_to_[core|socket||node] into a single 
structure).


Thanks.
-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 6/9] libxl: add libxl__domain_soft_reset_destroy_old()

2014-12-04 Thread Wei Liu
On Wed, Dec 03, 2014 at 06:16:18PM +0100, Vitaly Kuznetsov wrote:
> New libxl__domain_soft_reset_destroy_old() is an internal-only
> version of libxl_domain_destroy() which follows the same domain
> destroy path with the only difference: xc_domain_destroy() is
> being avoided so the domain is not actually being destroyed.
> 
> Add soft_reset flag to libxl__domain_destroy_state structure
> to support the change.
> 
> The original libxl_domain_destroy() function could be easily
> modified to support new flag but I'm trying to avoid that as
> it is part of public API.
> 
> Signed-off-by: Vitaly Kuznetsov 
> ---
>  tools/libxl/libxl.c  | 32 +++-
>  tools/libxl/libxl_internal.h |  4 
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index f84f7c2..c2bd730 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -1437,6 +1437,23 @@ int libxl_domain_destroy(libxl_ctx *ctx, uint32_t 
> domid,
>  return AO_INPROGRESS;
>  }
>  
> +int libxl__domain_soft_reset_destroy_old(libxl_ctx *ctx, uint32_t domid,
> + const libxl_asyncop_how *ao_how)
> +{

Internal function takes gc, not ctx.

> +AO_CREATE(ctx, domid, ao_how);

If you want to use libxl context, use CTX macro.

> +libxl__domain_destroy_state *dds;
> +
> +GCNEW(dds);
> +dds->ao = ao;
> +dds->domid = domid;
> +dds->callback = domain_destroy_cb;
> +dds->soft_reset = 1;
> +libxl__domain_destroy(egc, dds);
> +
> +return AO_INPROGRESS;
> +}
> +
> +
>  static void domain_destroy_cb(libxl__egc *egc, libxl__domain_destroy_state 
> *dds,
>int rc)
>  {
> @@ -1612,6 +1629,7 @@ static void devices_destroy_cb(libxl__egc *egc,
>  {
>  STATE_AO_GC(drs->ao);
>  libxl__destroy_domid_state *dis = CONTAINER_OF(drs, *dis, drs);
> +libxl__domain_destroy_state *dds = CONTAINER_OF(dis, *dds, domain);
>  libxl_ctx *ctx = CTX;
>  uint32_t domid = dis->domid;
>  char *dom_path;
> @@ -1650,11 +1668,15 @@ static void devices_destroy_cb(libxl__egc *egc,
>  }
>  libxl__userdata_destroyall(gc, domid);
>  
> -rc = xc_domain_destroy(ctx->xch, domid);
> -if (rc < 0) {
> -LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_destroy 
> failed for %d", domid);
> -rc = ERROR_FAIL;
> -goto out;
> +if (!dds->soft_reset)
> +{

Coding style.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 09/17] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> We need to check to reserve all reserved device memory maps in e820
> to avoid any potential guest memory conflict.
> 
> Currently, if we can't insert RDM entries directly, we may need to handle
> several ranges as follows:
> a. Fixed Ranges --> BUG()
>  lowmem_reserved_base-0xA: reserved by BIOS implementation,
>  BIOS region,
>  RESERVED_MEMBASE ~ 0x1,
> b. RAM or RAM:Hole -> Try to reserve

I continue to be unconvinced of the overall approach: The domain
builder continues to populate these regions when it shouldn't. Yet
once it doesn't, it would be most natural to simply communicate the
RAM regions to hvmloader, and hvmloader would use just that to
build the E820 table (and subsequently assign BARs).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 8/9] libxl: soft reset support

2014-12-04 Thread Wei Liu
(I've skipped the internal implementation since I don't know what's
 required to fulfil soft reset.)

On Wed, Dec 03, 2014 at 06:16:20PM +0100, Vitaly Kuznetsov wrote:
[...]
> + libxl__domain_create_state *dcs);
>  
>  /* Each time the dm needs to be saved, we must call suspend and then save */
>  _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 53611dc..eb833f0 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -2043,7 +2043,8 @@ static void reload_domain_config(uint32_t domid,
>  }
>  
>  /* Returns 1 if domain should be restarted,
> - * 2 if domain should be renamed then restarted, or 0
> + * 2 if domain should be renamed then restarted,
> + * 3 if domain performed soft reset, or 0
>   * Can update r_domid if domain is destroyed etc */
>  static int handle_domain_death(uint32_t *r_domid,
> libxl_event *event,
> @@ -2069,6 +2070,9 @@ static int handle_domain_death(uint32_t *r_domid,
>  case LIBXL_SHUTDOWN_REASON_WATCHDOG:
>  action = d_config->on_watchdog;
>  break;
> +case LIBXL_SHUTDOWN_REASON_SOFT_RESET:
> +LOG("Domain performed soft reset.");
> +return 3;

Would it be useful to provide "on_soft_reset" option in xl? Will the
admin be interested in performing some other action when domain does
soft reset? Say, for security reason admin want to prohibit domain from
soft resetting itself.

>  default:
>  LOG("Unknown shutdown reason code %d. Destroying domain.",
>  event->u.domain_shutdown.shutdown_reason);
> @@ -2285,6 +2289,7 @@ static void 
> evdisable_disk_ejects(libxl_evgen_disk_eject **diskws,
>  static uint32_t create_domain(struct domain_create *dom_info)
>  {
>  uint32_t domid = INVALID_DOMID;
> +uint32_t domid_old = INVALID_DOMID;
>  
>  libxl_domain_config d_config;
>  
> @@ -2510,7 +2515,18 @@ start:
>   * restore/migrate-receive it again.
>   */
>  restoring = 0;
> -}else{
> +} else if (domid_old != INVALID_DOMID) {
> +/* Do soft reset */
> +d_config.b_info.nodemap.size = 0;

What's the reason for doing this?

If you encounter problem with this it should probably be fixed in libxl.

Wei.

> +ret = libxl_domain_soft_reset(ctx, &d_config,
> +  &domid, domid_old,
> +  0, 0);
> +
> +if ( ret ) {
> +goto error_out;
> +}
> +domid_old = INVALID_DOMID;
> +} else {
>  ret = libxl_domain_create_new(ctx, &d_config, &domid,
>0, autoconnect_console_how);
>  }
> @@ -2574,6 +2590,8 @@ start:
>  event->u.domain_shutdown.shutdown_reason,
>  event->u.domain_shutdown.shutdown_reason);
>  switch (handle_domain_death(&domid, event, &d_config)) {
> +case 3:
> +domid_old = domid;
>  case 2:
>  if (!preserve_domain(&domid, event, &d_config)) {
>  /* If we fail then exit leaving the old domain in place. 
> */
> -- 
> 1.9.3
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-12-04 Thread Jan Beulich
>>> On 04.12.14 at 16:46,  wrote:
> On 04/12/14 14:55, Jan Beulich wrote:
> On 04.12.14 at 15:28,  wrote:
>>> On 04/12/14 13:49, Jan Beulich wrote:
>>> On 28.11.14 at 16:46,  wrote:
> On 28/11/14 15:18, Jan Beulich wrote:
> On 28.11.14 at 14:55,  wrote:
>>> The problem is with continuations which reuse the upper bits of the
>>> input register, not with this HVMOP_op_mask specifically; the
>>> HVMOP_op_mask simply adds to an existing problem.  This is something
>>> which needs considering urgently because, as you identify above, we have
>>> already suffered an accidental ABI breakage with the mem-op widening.
>> Since we can't retroactively fix the mem-op widening, I still don't see
>> what's urgent here: As long as we don't change any of these masks,
>> nothing bad is going to happen. Of course one thing to consider with
>> this aspect in mind is whether to change the hvm-op or gnttab-op
>> masks again _before_ 4.5 goes out, based on whether we feel they're
>> wide enough for the (un)foreseeable future.
> By urgent, I mean exactly this, while we have the ability to tweak the
> masks.
 With no-one else voicing an opinion:

 For hvmop, the mask currently is 8 bits and we've got 22 ops defined.

 For gnttabop, the mask currently is 12 bits and we've got 12 ops defined.

 For the latter, we're fine even without further consideration. For the
 former, the two operations actively using the continuation encoding
 are tools-only ones. Since we're fine to alter the tools only interfaces,
 and since it was intended for the tools-only HVM-ops to be split off
 to a separate hypercall (e.g. hvmctl) anyway, the range restriction
 would then no longer be a problem. Plus, in the worst case we could
 always introduce yet another hypercall if we ran out of numbers.
>>> Are you suggesting that we make a new hvmctl now and remove the hvmop
>>> mask before 4.5?  If we ship 4.5 with the hvmop mask, we cannot
>>> subsequently remove it even if all continuable hypercalls move to a
>>> separate hypercall.
>> Why? We certainly don't guarantee compatibility for undefined
>> hypercalls to behave in a certain way.
> 
> A task is in the middle of a hypercall continuation.  The VM is migrated
> to a newer Xen which has lost the op mask and gained a new hypercall
> which would alias.

Impossible: A continuation could be in progress only when we
actually use the high bits (or else you have nowhere to encode
it). Operations not using continuations consequently aren't
susceptible to the mask disappearing.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 2/2] add a new p2m type - p2m_mmio_write_dm

2014-12-04 Thread Tim Deegan
Hi, 

At 21:13 +0800 on 04 Dec (1417724006), Yu Zhang wrote:
> A new p2m type, p2m_mmio_write_dm, is added to trap and emulate
> the write operations on GPU's page tables. Handling of this new
> p2m type are similar with existing p2m_ram_ro in most condition
> checks, with only difference on final policy of emulation vs. drop.
> For p2m_ram_ro types, write operations will not trigger the device
> model, and will be discarded later in __hvm_copy(); while for the
> p2m_mmio_write_dm type pages, writes will go to the device model
> via ioreq-server.
> 
> Signed-off-by: Yu Zhang 
> Signed-off-by: Wei Ye 

Thanks for this -- only two comments:

> @@ -5978,7 +5982,8 @@ long do_hvm_op(unsigned long op, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>  goto param_fail4;
>  }
>  if ( !p2m_is_ram(t) &&
> - (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) )
> + (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) &&
> + t != p2m_mmio_write_dm )

I think that Jan already brough this up, and maybe I missed your
answer: this realaxation looks wrong to me. I would have thought that
transition between p2m_mmio_write_dm and p2m_ram_rw/p2m_ram_logdirty
would be the only ones you would want to allow.

> @@ -111,7 +112,8 @@ typedef unsigned int p2m_query_t;
>  #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty) \
>| p2m_to_mask(p2m_ram_ro) \
>| p2m_to_mask(p2m_grant_map_ro)   \
> -  | p2m_to_mask(p2m_ram_shared) )
> +  | p2m_to_mask(p2m_ram_shared)   \
> +  | p2m_to_mask(p2m_mmio_write_dm))

Nit: please align the '\' with the others above it. 

Cheers,

Tim. 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 08/17] hvmloader/mmio: reconcile guest mmio with reserved device memory

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> We need to make sure all mmio allocation don't overlap
> any rdm, reserved device memory. Here we just skip
> all reserved device memory range in mmio space.

I think someone else already suggested that this and patch 9 should
be swapped, and the BAR allocation be changed to use the E820
map as input. That may end up being a bigger change, but will yield
ultimately better (and namely better maintainable) code.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 04/17] update the existing hypercall to support XEN_DOMCTL_set_rdm

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> --- a/xen/common/compat/memory.c
> +++ b/xen/common/compat/memory.c
> @@ -22,27 +22,66 @@ struct get_reserved_device_memory {
>  unsigned int used_entries;
>  };
>  
> -static int get_reserved_device_memory(xen_pfn_t start,
> -  xen_ulong_t nr, void *ctxt)
> +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +  u32 id, void *ctxt)
>  {
>  struct get_reserved_device_memory *grdm = ctxt;
> +struct domain *d;
> +unsigned int i;
> +u32 sbdf;
> +struct compat_reserved_device_memory rdm = {
> +.start_pfn = start, .nr_pages = nr
> +};
>  
> -if ( grdm->used_entries < grdm->map.nr_entries )
> -{
> -struct compat_reserved_device_memory rdm = {
> -.start_pfn = start, .nr_pages = nr
> -};
> +if ( rdm.start_pfn != start || rdm.nr_pages != nr )
> +return -ERANGE;
>  
> -if ( rdm.start_pfn != start || rdm.nr_pages != nr )
> -return -ERANGE;
> +d = rcu_lock_domain_by_any_id(grdm->map.domid);
> +if ( d == NULL )
> +return -ESRCH;

So why are you doing this in the call back (potentially many times)
instead of just once in compat_memory_op(), storing the pointer in
the context structure?

>  
> -if ( __copy_to_compat_offset(grdm->map.buffer, grdm->used_entries,
> - &rdm, 1) )
> -return -EFAULT;
> +if ( d )
> +{
> +if ( d->arch.hvm_domain.pci_force )

You didn't verify that the domain is a HVM/PVH one.

> +{
> +if ( grdm->used_entries < grdm->map.nr_entries )
> +{
> +if ( __copy_to_compat_offset(grdm->map.buffer,
> + grdm->used_entries,
> + &rdm, 1) )
> +{
> +rcu_unlock_domain(d);
> +return -EFAULT;
> +}
> +}
> +++grdm->used_entries;
> +}
> +else
> +{
> +for ( i = 0; i < d->arch.hvm_domain.num_pcidevs; i++ )
> +{
> +sbdf = PCI_SBDF2(d->arch.hvm_domain.pcidevs[i].seg,
> + d->arch.hvm_domain.pcidevs[i].bus,
> + d->arch.hvm_domain.pcidevs[i].devfn);
> +if ( sbdf == id )
> +{
> +if ( grdm->used_entries < grdm->map.nr_entries )
> +{
> +if ( __copy_to_compat_offset(grdm->map.buffer,
> + grdm->used_entries,
> + &rdm, 1) )
> +{
> +rcu_unlock_domain(d);
> +return -EFAULT;
> +}
> +}
> +++grdm->used_entries;

break;

Also trying to fold code identical on the if and else branches would
seem pretty desirable.

> @@ -319,9 +358,13 @@ int compat_memory_op(unsigned int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) compat)
>  
>  if ( !rc && grdm.map.nr_entries < grdm.used_entries )
>  rc = -ENOBUFS;
> +
>  grdm.map.nr_entries = grdm.used_entries;
> -if ( __copy_to_guest(compat, &grdm.map, 1) )
> -rc = -EFAULT;
> +if ( grdm.map.nr_entries )
> +{
> +if ( __copy_to_guest(compat, &grdm.map, 1) )
> +rc = -EFAULT;
> +}

Why do you need this change?

> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -904,17 +904,33 @@ int platform_supports_x2apic(void)
>  
>  int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
>  {
> -struct acpi_rmrr_unit *rmrr;
> +struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
>  int rc = 0;
> +unsigned int i;
> +u16 bdf;
>  
> -list_for_each_entry(rmrr, &acpi_rmrr_units, list)
> +for_each_rmrr_device ( rmrr, bdf, i )
>  {
> -rc = func(PFN_DOWN(rmrr->base_address),
> -  PFN_UP(rmrr->end_address) - PFN_DOWN(rmrr->base_address),
> -  ctxt);
> -if ( rc )
> -break;
> +if ( rmrr != rmrr_cur )
> +{
> +rc = func(PFN_DOWN(rmrr->base_address),
> +  PFN_UP(rmrr->end_address) -
> +PFN_DOWN(rmrr->base_address),
> +  PCI_SBDF(rmrr->segment, bdf),
> +  ctxt);
> +
> +if ( unlikely(rc < 0) )
> +return rc;
> +
> +/* Just go next. */
> +if ( !rc )
> +rmrr_cur = rmrr;
> +
> +/* Now just return specific to user requirement. */
> +if ( rc > 0 )
> + 

Re: [Xen-devel] [PATCH v5 1/2] add a new p2m type class - P2M_DISCARD_WRITE_TYPES

2014-12-04 Thread Tim Deegan
At 21:13 +0800 on 04 Dec (1417724005), Yu Zhang wrote:
> From: Yu Zhang 
> 
> Currently, the P2M_RO_TYPES bears 2 meanings: one is
> "_PAGE_RW bit is clear in their PTEs", and another is
> to discard the write operations on these pages. This
> patch adds a p2m type class, P2M_DISCARD_WRITE_TYPES,
> to bear the second meaning, so we can use this type
> class instead of the P2M_RO_TYPES, to decide if a write
> operation is to be ignored.
> 
> Signed-off-by: Yu Zhang 

Reviewed-by: Tim Deegan 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 4/9] xen: introduce XEN_DOMCTL_devour

2014-12-04 Thread Julien Grall
On 04/12/14 15:12, Vitaly Kuznetsov wrote:
> Julien Grall  writes:
> 
>> Hi Vitaly,
>>
>> On 03/12/2014 17:16, Vitaly Kuznetsov wrote:
>>> New operation sets the 'recipient' domain which will recieve all
>>
>> s/recieve/receive/
>>
>>> memory pages from a particular domain and kills the original domain.
>>>
>>> Signed-off-by: Vitaly Kuznetsov 
>>> ---
>>> @@ -1764,13 +1765,32 @@ void free_domheap_pages(struct page_info *pg, 
>>> unsigned int order)
>>
>> [..]
>>
>>> +else
>>> +{
>>> +mfn = page_to_mfn(pg);
>>> +gmfn = mfn_to_gmfn(d, mfn);
>>> +
>>> +page_set_owner(pg, NULL);
>>> +if ( assign_pages(d->recipient, pg, order, 0) )
>>> +/* assign_pages reports the error by itself */
>>> +goto out;
>>> +
>>> +if ( guest_physmap_add_page(d->recipient, gmfn, mfn, order) )
>>
>> On ARM, mfn_to_gmfn will always return the mfn. This would result to
>> add a 1:1 mapping in the recipient domain.
>>
>> But ... only DOM0 has its memory mapped 1:1. So this code may blow up
>> the P2M of the recipient domain.
> 
> I know almost nothing about ARM so please bear with me. So, for a guest
> domain the mapping is not 1:1 (so guest sees different addresses) but
> mfn_to_gmfn() doesn't returs these addresses?

Yes. I guess it's because this macro is not really used (see why below).
Ian, Stefano: any idea why mfn_to_gfmn is not correctly implemented on ARM?

> I was under an impression
> it is not x86-specific and I can see its usage in e.g. getdomaininfo(),
> memory_exchange(),..

I can find two places in the common code:
- getdomaininfo: The return is obviously buggy. Though, it doesn't seem
to be used in the toolstack side for ARM
- memory_exchange: AFAIK this is not supported right now.

> How can one figure out the mapping then?

AFAIK, there is no way on ARM to get a GMFN from an MFN in Xen.

Maybe we should implement it correctly mfn_to_gmfn on ARM? Ian, Stefano,
any though?

> Anyway, what I want to do here is: when this page is freed I want to
> reassign it to our newly-created guest at the exact same address it was
> mapped in the original domain. 

> BTW, what's the current state of affairs with kexec and ARM guest?

AFAIK, nobody has worked on it for Xen ARM. I don't even know the status
for Kexec on ARM.

> I
> suppose we should have similar problems: vcpu_info, event channels,
> .. 

ARM guest is very similar to PVH guest. All the problems you will fixed
now means less work when we will add support for ARM ;).

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0

2014-12-04 Thread David Vrabel
On 04/12/14 15:36, Anthony Wright wrote:
>> On 01/12/14 14:22, David Vrabel wrote:
>> This VIF protocol is weird. The first slot contains a txreq with a
>> size
>> for the total length of the packet, subsequent slots have sizes for
>> that
>> fragment only.
>>
>> netback then has to calculate how long the first slot is, by
>> subtracting
>> all the size from the following slots.
>>
>> So something has gone wrong but it's not obvious what it is. Any
>> chance
>> you can dump the ring state when it happens?
> 
> We think we've worked out how to dump the ring state, please see below.

We need the full contents of the ring which isn't currently available
via debugfs and I haven't had time to put together a debug patch to make
it available.

David



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v15 11/21] x86/VPMU: Interface for setting PMU mode and flags

2014-12-04 Thread Boris Ostrovsky

On 12/04/2014 03:51 AM, Jan Beulich wrote:

On 03.12.14 at 21:13,  wrote:

On 11/27/2014 03:57 AM, Jan Beulich wrote:

On 26.11.14 at 15:32,  wrote:

On 11/25/2014 08:49 AM, Jan Beulich wrote:

On 17.11.14 at 00:07,  wrote:

@@ -244,19 +256,19 @@ void vpmu_initialise(struct vcpu *v)
switch ( vendor )
{
case X86_VENDOR_AMD:
-if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-opt_vpmu_enabled = 0;
+if ( svm_vpmu_initialise(v) != 0 )
+vpmu_mode = XENPMU_MODE_OFF;
break;

case X86_VENDOR_INTEL:

-if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-opt_vpmu_enabled = 0;
+if ( vmx_vpmu_initialise(v) != 0 )
+vpmu_mode = XENPMU_MODE_OFF;
break;

So this turns off the vPMU globally upon failure of initializing
some random vCPU. Why is that? I see this was the case even
before your entire series, but shouldn't this be fixed _before_
enhancing the whole thing to support PV/PVH?

Yes, that's probably too strong. Do you want to fix this as an early
patch (before PV(H)) support gets in? I'd rather do it in the patch that
moves things into initcalls.

Yes, I think this should be fixed in a prereq patch, thus allowing it
to be easily backported if so desired.

I started to make this change and then I realized that the next patch
(12/21) is actually already taking care of this problem: most of the
*_vpmu_initialise() will be executed as initcalls during host boot and
if any of those fail then we do want to disable VPMU globally (those
failures would not be VCPU-specific).

Funny that you say that - it was actually while reviewing that next
patch when I made that observation: svm_vpmu_initialise() get a
-ENOMEM return _added_ there for example, which is contrary to
what I asked for above.



Oh, *that* initialization code (I was looking at vpmu_init()). Yes, VPMU 
should not be turned off there.


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 07/17] hvmloader/util: get reserved device memory maps

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> We need to use reserved device memory maps with multiple times, so
> provide just one common function should be friend.

I'm not going to repeat earlier comments; the way this is done right
now it's neither a proper runtime function nor a proper init time one.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-12-04 Thread Tim Deegan
At 14:28 + on 04 Dec (1417699730), Andrew Cooper wrote:
> On 04/12/14 13:49, Jan Beulich wrote:
>  On 28.11.14 at 16:46,  wrote:
> >> On 28/11/14 15:18, Jan Beulich wrote:
> >> On 28.11.14 at 14:55,  wrote:
>  The problem is with continuations which reuse the upper bits of the
>  input register, not with this HVMOP_op_mask specifically; the
>  HVMOP_op_mask simply adds to an existing problem.  This is something
>  which needs considering urgently because, as you identify above, we have
>  already suffered an accidental ABI breakage with the mem-op widening.
> >>> Since we can't retroactively fix the mem-op widening, I still don't see
> >>> what's urgent here: As long as we don't change any of these masks,
> >>> nothing bad is going to happen. Of course one thing to consider with
> >>> this aspect in mind is whether to change the hvm-op or gnttab-op
> >>> masks again _before_ 4.5 goes out, based on whether we feel they're
> >>> wide enough for the (un)foreseeable future.
> >> By urgent, I mean exactly this, while we have the ability to tweak the
> >> masks.
> > With no-one else voicing an opinion:
> >
> > For hvmop, the mask currently is 8 bits and we've got 22 ops defined.
> >
> > For gnttabop, the mask currently is 12 bits and we've got 12 ops defined.
> >
> > For the latter, we're fine even without further consideration. For the
> > former, the two operations actively using the continuation encoding
> > are tools-only ones. Since we're fine to alter the tools only interfaces,
> > and since it was intended for the tools-only HVM-ops to be split off
> > to a separate hypercall (e.g. hvmctl) anyway, the range restriction
> > would then no longer be a problem. Plus, in the worst case we could
> > always introduce yet another hypercall if we ran out of numbers.
> 
> Are you suggesting that we make a new hvmctl now and remove the hvmop
> mask before 4.5?  If we ship 4.5 with the hvmop mask, we cannot
> subsequently remove it even if all continuable hypercalls move to a
> separate hypercall.

I think we can if the only hypercalls that use continuations are
tools-only (and so not liable to work across migration anyway).

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5] Fixes for PCI backend for 3.19.

2014-12-04 Thread David Vrabel
On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
> 
> These patches fix some issues with PCI back and also add proper
> bus/slot reset.

Applied 1-8 to devel/for-linus-3.19.  I did not apply #9.

Thanks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-12-04 Thread Andrew Cooper
On 04/12/14 14:55, Jan Beulich wrote:
 On 04.12.14 at 15:28,  wrote:
>> On 04/12/14 13:49, Jan Beulich wrote:
>> On 28.11.14 at 16:46,  wrote:
 On 28/11/14 15:18, Jan Beulich wrote:
 On 28.11.14 at 14:55,  wrote:
>> The problem is with continuations which reuse the upper bits of the
>> input register, not with this HVMOP_op_mask specifically; the
>> HVMOP_op_mask simply adds to an existing problem.  This is something
>> which needs considering urgently because, as you identify above, we have
>> already suffered an accidental ABI breakage with the mem-op widening.
> Since we can't retroactively fix the mem-op widening, I still don't see
> what's urgent here: As long as we don't change any of these masks,
> nothing bad is going to happen. Of course one thing to consider with
> this aspect in mind is whether to change the hvm-op or gnttab-op
> masks again _before_ 4.5 goes out, based on whether we feel they're
> wide enough for the (un)foreseeable future.
 By urgent, I mean exactly this, while we have the ability to tweak the
 masks.
>>> With no-one else voicing an opinion:
>>>
>>> For hvmop, the mask currently is 8 bits and we've got 22 ops defined.
>>>
>>> For gnttabop, the mask currently is 12 bits and we've got 12 ops defined.
>>>
>>> For the latter, we're fine even without further consideration. For the
>>> former, the two operations actively using the continuation encoding
>>> are tools-only ones. Since we're fine to alter the tools only interfaces,
>>> and since it was intended for the tools-only HVM-ops to be split off
>>> to a separate hypercall (e.g. hvmctl) anyway, the range restriction
>>> would then no longer be a problem. Plus, in the worst case we could
>>> always introduce yet another hypercall if we ran out of numbers.
>> Are you suggesting that we make a new hvmctl now and remove the hvmop
>> mask before 4.5?  If we ship 4.5 with the hvmop mask, we cannot
>> subsequently remove it even if all continuable hypercalls move to a
>> separate hypercall.
> Why? We certainly don't guarantee compatibility for undefined
> hypercalls to behave in a certain way.

A task is in the middle of a hypercall continuation.  The VM is migrated
to a newer Xen which has lost the op mask and gained a new hypercall
which would alias.

At this point, the task has transparently swapped hypercall subops,
through no fault of its own.

As I have said before, once an op mask has found its way into a released
version of Xen, it is irrevocably part of the ABI, and cannot change in
the future.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V4 00/10] xen: Switch to virtual mapped linear p2m list

2014-12-04 Thread David Vrabel
On 28/11/14 10:53, Juergen Gross wrote:
> Paravirtualized kernels running on Xen use a three level tree for
> translation of guest specific physical addresses to machine global
> addresses. This p2m tree is used for construction of page table
> entries, so the p2m tree walk is performance critical.
> 
> By using a linear virtual mapped p2m list accesses to p2m elements
> can be sped up while even simplifying code. To achieve this goal
> some p2m related initializations have to be performed later in the
> boot process, as the final p2m list can be set up only after basic
> memory management functions are available.

Applied to devel/for-linus-3.19.

Thanks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xen/arm: uart interrupts handling

2014-12-04 Thread Tim Deegan
At 13:51 + on 04 Dec (1417697518), Julien Grall wrote:
> On 04/12/14 03:50, Vijay Kilari wrote:
> > Hi Tim,
> 
> Hi Vijay,
> 
> > I see that on uart interrupt, ICR is written to clear the all
> > interrupts except TX, RX and RX timeout. With this, cpu always finds
> > TX/RX is active and never
> > comes out of the loop.
> 
> FWIW, the PL011 serial code has been copied from the Linux drivers.
> 
> Linux interrupt handler also clear all interrupts except TX, RX, RX
> timeout. So do you see the issue on Linux?
> 
> > 
> > With the below changes, TX, RX & RTI are cleared before handling this
> > interrupts.
> > 
> > Is my observation is correct?. If so I wonder how it is working on
> > platforms that
> > are using pl011. Without this for my cpu just keeps looping here.
> > 
> >   index fba0a55..d21bce3 100644
> > --- a/xen/drivers/char/pl011.c
> > +++ b/xen/drivers/char/pl011.c
> > @@ -63,7 +63,7 @@ static void pl011_interrupt(int irq, void *data,
> > struct cpu_user_regs *regs)
> >  {
> >  do
> >  {
> > -pl011_write(uart, ICR, status & ~(TXI|RTI|RXI));
> > +pl011_write(uart, ICR, status & (TXI|RTI|RXI));
> 
> 
> This changes looks wrong to me. We want to clear the bit in status we
> don't handle. Otherwise the interrupt will be fired in loop.

Yes - even if we do want to explicitly clear the TXI|RTI|RXI bits, we
must clear the others too!

> If I'm not mistaken, TXI/RTI/RXI will be cleared when data is read or
> write into the fifo. So we should not clear automatically.

I've been looking at that and I think it's OK for the RX path -- we
ought to keep calling serial_rx_interrupt() until we've drained the
FIFO, at which point both RTI and RXI will be cleared.  Certainly we
shouldn't unconditionally clear RXI/RTI without somehow making sure
that we're not leaving bytes in the FIFO.

The TX path looks more suspect -- if the uart raises TXI and we have
nothing buffered to send, then TXI won't get cleared.
Still, I wonder why you're seeing this and other people haven't.

And again, we ought not to just clear the interrupt:
serial_tx_interrupt might not send any bytes (and indeed by my reading
it won't do anything unless the FIFO is actually empty), in which case
we need to keep something around to tell us to try again.

Actually, looking at the current behaviour, I wonder whether we ought
to push the interrupt trigger back from firing at half-empty.  Like so:

diff --git a/xen/drivers/char/pl011.c b/xen/drivers/char/pl011.c
index dd19ce8..2e97234 100644
--- a/xen/drivers/char/pl011.c
+++ b/xen/drivers/char/pl011.c
@@ -109,6 +109,8 @@ static void __init pl011_init_preirq(struct
serial_port *port)
 panic("pl011: No Baud rate configured\n");
 uart->baud = (uart->clock_hz << 2) / divisor;
 }
+/* Trigger RX interrupt at 1/2 full, TX interrupt at 7/8 empty */
+pl011_write(uart, IFLS, (2<<3 | 0));
 /* This write must follow FBRD and IBRD writes. */
 pl011_write(uart, LCR_H, (uart->data_bits - 5) << 5
 | FEN


Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Alex Williamson
On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
> 
> > On 04/12/14 14:09, Sander Eikelenboom wrote:
> >> 
> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> >> 
> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
> 
>  Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> 
> > On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> >>
> >> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
> >>>
> >>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
> 
>  Instead of doing all this complex dance, we depend on the toolstack 
>  doing the right thing. As such implement the 'do_flr' SysFS 
>  attribute 
>  which 'xl' uses when a device is detached or attached from/to a 
>  guest. 
>  It bypasses the need to worry about the PCI lock. 
> >>>
> >>> No.  Get pciback to add its own "reset" sysfs file (as I have 
> >>> repeatedly 
> >>> proposed). 
> >>>
> >>
> >> Which does not work as the kobj will complain (as there is already an 
> >> 'reset' associated with the PCI device).
> 
> > It is only needed if the core won't provide one.
> 
> > +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> > +{
> > +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> > +   struct device *dev = &pci->dev;
> > +   int ret;
> > +
> > +   /* Already have a per-function reset? */
> > +   if (pci_probe_reset_function(pci) == 0)
> > +   return 0;
> > +
> > +   ret = device_create_file(dev, &dev_attr_reset);
> > +   if (ret < 0)
> > +   return ret;
>  +   dev_data->>created_reset_file = true;
> > +   return 0;
> > +}
> 
>  Wouldn't the "core-reset-sysfs-file" be still wired to the end up 
>  calling 
>  "pci.c:__pci_dev_reset" ?
> 
>  The problem with that function is that from my testing it seems that the 
>  first option "pci_dev_specific_reset" always seems to return succes, so 
>  all the
>  other options are skipped (flr, pm, slot, bus). However the device it 
>  self is 
>  not properly reset enough (perhaps the pci_dev_specific_reset is good 
>  enough for 
>  none virtualization purposes and it's probably the least intrusive. For 
>  virtualization however it would be nice to be sure it resets properly, 
>  or have a 
>  way to force a specific reset routine.)
> >> 
> >>> Then you need work with the maintainer for those specific devices or
> >>> drivers to fix their specific reset function.
> >> 
> >>> I'm not adding stuff to pciback to workaround broken quirks.
> >> 
> >> OK that's a pretty clear message there, so if one wants to use pci and vga
> >> passthrough one should better use KVM and vfio-pci.
> 
> > Have you (or anyone else) ever raised the problem with the broken reset
> > quirk for certain devices with the relevant maintainer?
> 
> >> vfio-pci has:
> >> - logic to do the try-slot-bus-reset logic
> 
> > Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> > as well.
> 
> Depends on what you call an "incorrect fix" .. it fixes a quirk .. 
> you can say that's incorrect, but then you would have to remove 50% of
> the kernel and Xen code as well.
> 
> (i do in general agree it's better to strive for a generic solution though,
> that's exactly why i brought up that that function doesn't seem to work 
> perfect
> for virtualization purposes) 
> 
> > It makes no sense for both pciback and vfio-pci to workaround problems
> > with pci_function_reset() in different ways -- it should be fixed in the
> > core PCI code so both can benefit and make use of the same code.
> 
> Well perhaps Bjorn knows why the order of resets and skipping the rest as
> implemented in "pci.c:__pci_dev_reset" was implemented in that way ?
> 
> Especially what is the expectation about pci_dev_specific_reset doing a 
> proper 
> reset for say a vga-card:
> - i know it doesn't work on a radeon card (doesn't blank screen, on next 
> guest 
>   boot reports it's already posted, powermanagement doesn't work).
> - while with a slot/bus reset, that all just works fine, screen blanks 
>   immediately and everything else also works.
> 
> Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps 
> he knows why
> he introduced the workaround in vfio-pci instead of trying to fix it in core 
> pci 
> code ?

I don't know what workaround you're talking about.  As devices are
released from the user, vfio-pci attempts to reset them.  If
pci_reset_function() returns success we mark the device clean, otherwise
it gets marked dirty.  Each time a device is released, if there are
dirty devices we test whether we can try a bus/slot reset to clean them.
In the case of assigning a GPU this typically means that the GPU or
audio function come th

Re: [Xen-devel] PV DomU running linux 3.17.3 causing xen-netback fatal error in Dom0

2014-12-04 Thread Anthony Wright
> On 01/12/14 14:22, David Vrabel wrote:
> This VIF protocol is weird. The first slot contains a txreq with a
> size
> for the total length of the packet, subsequent slots have sizes for
> that
> fragment only.
> 
> netback then has to calculate how long the first slot is, by
> subtracting
> all the size from the following slots.
> 
> So something has gone wrong but it's not obvious what it is. Any
> chance
> you can dump the ring state when it happens?

We think we've worked out how to dump the ring state, please see below.

dmesg output

[76571.687014] vif vif-6-0 vif6.0: txreq.offset: a5e, size: 4002, end: 6656
[76571.687035] vif vif-6-0 vif6.0: fatal error; disabling device
[76571.700304] br-primary-1: port 2(vif6.0) entered disabled state

/sys/kernel/debug/xen-netback/vif6.0/io_ring_q0
===
Queue 0
TX: nr_ents 256
req prod 10164 (39) cons 10127 (2) event 10126 (1)
rsp prod 10125 (base) pvt 10125 (0) event 10145 (20)
pending prod 9589 pending cons 9333 nr_pending_reqs 0
dealloc prod 8501 dealloc cons 8501 dealloc_queue 0

RX: nr_ents 256
req prod 1321 (41) cons 1280 (0) event 1 (-1279)
rsp prod 1280 (base) pvt 1280 (0) event 1281 (1)

NAPI state: 1 NAPI weight: 64 TX queue len 0
Credit timer_pending: 0, credit: 18446744073709551615, usec: 0
remaining: 18446744073678062682, expires: 0, now: 4314107964


/sys/kernel/debug/xen-netback/vif6.0/io_ring_q1
===
Queue 1
TX: nr_ents 256
req prod 10106 (0) cons 10106 (0) event 10107 (1)
rsp prod 10106 (base) pvt 10106 (0) event 10107 (1)
pending prod 9573 pending cons 9317 nr_pending_reqs 0
dealloc prod 8503 dealloc cons 8503 dealloc_queue 0

RX: nr_ents 256
req prod 594 (39) cons 555 (0) event 1 (-554)
rsp prod 555 (base) pvt 555 (0) event 556 (1)

NAPI state: 1 NAPI weight: 64 TX queue len 0
Credit timer_pending: 0, credit: 18446744073709551615, usec: 0
remaining: 18446744073678038030, expires: 0, now: 4314118667


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 02/17] introduce XEN_DOMCTL_set_rdm

2014-12-04 Thread Jan Beulich
>>> On 01.12.14 at 10:24,  wrote:
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Please don't - we use bool_t in the hypervisor, not bool. The header
only exists for source code shared with the tools.

> @@ -1553,6 +1554,44 @@ int iommu_do_pci_domctl(
>  }
>  break;
>  
> +case XEN_DOMCTL_set_rdm:
> +{
> +struct xen_domctl_set_rdm *xdsr = &domctl->u.set_rdm;
> +struct xen_guest_pcidev_info *pcidevs = NULL;
> +struct domain *d = rcu_lock_domain_by_any_id(domctl->domain);

"d" gets passed into this function - no need to shadow the variable
and (wrongly) re-obtain the pointer.

> +
> +if ( d == NULL )
> +return -ESRCH;
> +
> +d->arch.hvm_domain.pci_force =
> +xdsr->flags & PCI_DEV_RDM_CHECK ? true : false;
> +d->arch.hvm_domain.num_pcidevs = xdsr->num_pcidevs;

You shouldn't set the count before setting the pointer.

> +d->arch.hvm_domain.pcidevs = NULL;
> +
> +if ( xdsr->num_pcidevs )
> +{
> +pcidevs = xmalloc_array(xen_guest_pcidev_info_t,
> +xdsr->num_pcidevs);

New domctl-s must not represent security risks: xdsr->num_pcidevs
can be (almost) arbitrarily large - do you really want to allow such
huge allocations? A reasonable upper bound could for example be
the total number of PCI devices the hypervisor knows about.

> +if ( pcidevs == NULL )
> +{
> +rcu_unlock_domain(d);
> +return -ENOMEM;
> +}
> +
> +if ( copy_from_guest(pcidevs, xdsr->pcidevs,
> + xdsr->num_pcidevs*sizeof(*pcidevs)) )
> +{
> +xfree(pcidevs);
> +rcu_unlock_domain(d);
> +return -EFAULT;
> +}
> +}
> +
> +d->arch.hvm_domain.pcidevs = pcidevs;

If the operation gets issued more than once for a given domain,
you're leaking the old pointer here. Overall should think a bit
more about this multiple use case (or outright disallow it).

> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -674,6 +674,14 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
>  "  RMRR region: base_addr %"PRIx64
>  " end_address %"PRIx64"\n",
>  rmrru->base_address, rmrru->end_address);
> +/*
> + * TODO: we may provide a precise paramter just to reserve
> + * RMRR range specific to one device.
> + */
> +dprintk(XENLOG_WARNING VTDPREFIX,
> +"So please set pci_rdmforce to reserve these ranges"
> +" if you need such a device in hotplug case.\n");

It makes no sense to use dprintk() here. I also don't see how this
message relates to whatever may have been logged immediately
before, so the wording ("So please set ...") is questionable. Nor is the
reference to "hotplug case" meaningful here - in this context, only
physical (host) device hotplug can be meant without further
qualification. In the end I think trying to log something here is just
wrong - simply drop the message and make sure whatever you want
to say can be found easily by looking elsewhere.

> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -90,6 +90,10 @@ struct hvm_domain {
>  /* Cached CF8 for guest PCI config cycles */
>  uint32_tpci_cf8;
>  
> +bool_t  pci_force;
> +uint32_tnum_pcidevs;
> +struct xen_guest_pcidev_info  *pcidevs;

Without a comment all these field names are pretty questionable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 4/9] xen: introduce XEN_DOMCTL_devour

2014-12-04 Thread Vitaly Kuznetsov
Julien Grall  writes:

> Hi Vitaly,
>
> On 03/12/2014 17:16, Vitaly Kuznetsov wrote:
>> New operation sets the 'recipient' domain which will recieve all
>
> s/recieve/receive/
>
>> memory pages from a particular domain and kills the original domain.
>>
>> Signed-off-by: Vitaly Kuznetsov 
>> ---
>> @@ -1764,13 +1765,32 @@ void free_domheap_pages(struct page_info *pg, 
>> unsigned int order)
>
> [..]
>
>> +else
>> +{
>> +mfn = page_to_mfn(pg);
>> +gmfn = mfn_to_gmfn(d, mfn);
>> +
>> +page_set_owner(pg, NULL);
>> +if ( assign_pages(d->recipient, pg, order, 0) )
>> +/* assign_pages reports the error by itself */
>> +goto out;
>> +
>> +if ( guest_physmap_add_page(d->recipient, gmfn, mfn, order) )
>
> On ARM, mfn_to_gmfn will always return the mfn. This would result to
> add a 1:1 mapping in the recipient domain.
>
> But ... only DOM0 has its memory mapped 1:1. So this code may blow up
> the P2M of the recipient domain.

I know almost nothing about ARM so please bear with me. So, for a guest
domain the mapping is not 1:1 (so guest sees different addresses) but
mfn_to_gmfn() doesn't returs these addresses? I was under an impression
it is not x86-specific and I can see its usage in e.g. getdomaininfo(),
memory_exchange(),.. How can one figure out the mapping then?
Anyway, what I want to do here is: when this page is freed I want to
reassign it to our newly-created guest at the exact same address it was
mapped in the original domain. 

>
> I'm not an x86 expert, but this may also happen when the recipient
> domain is using translated page mode (i.e HVM/PVHM).

PVHVM is the main target here (as kexec is unsupported for PV) and it
kinda works. mfn_to_gmfn() returns gmfn != mfn.

BTW, what's the current state of affairs with kexec and ARM guest? I
suppose we should have similar problems: vcpu_info, event channels,
.. 

>
> Regards,

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/9] toolstack-based approach to pvhvm guest kexec

2014-12-04 Thread Wei Liu
On Thu, Dec 04, 2014 at 03:46:59PM +0100, Vitaly Kuznetsov wrote:
> Wei Liu  writes:
> 
> > On Wed, Dec 03, 2014 at 06:16:12PM +0100, Vitaly Kuznetsov wrote:
> >> Changes from RFCv3:
> >> This is the first non-RFC series as no major concerns were expressed. I'm 
> >> trying
> >> to address Jan's comments. Changes are:
> >> - Move from XEN_DOMCTL_set_recipient to XEN_DOMCTL_devour (I don't really 
> >> like
> >>   the name but nothing more appropriate came to my mind) which incorporates
> >>   former XEN_DOMCTL_set_recipient and XEN_DOMCTL_destroydomain to prevent
> >>   original domain from changing its allocations during transfer procedure.
> >> - Check in free_domheap_pages() that assign_pages() succeeded.
> >> - Change printk() in free_domheap_pages().
> >> - DOMDYING_locked state was introduced to support XEN_DOMCTL_devour.
> >> - xc_domain_soft_reset() got simplified a bit. Now we just wait for the 
> >> original
> >>   domain to die or loose all its pages.
> >> - rebased on top of current master branch.
> >> 
> >> Changes from RFC/WIPv2:
> >> 
> >> Here is a slightly different approach to memory reassignment. Instead of
> >> introducing new (and very doubtful) XENMEM_transfer operation introduce
> >> simple XEN_DOMCTL_set_recipient operation and do everything in 
> >> free_domheap_pages()
> >> handler utilizing normal domain destroy path. This is better because:
> >> - The approach is general-enough
> >> - All memory pages are usually being freed when the domain is destroyed
> >> - No special grants handling required
> >> - Better supportability
> >> 
> >> With regards to PV:
> >> Though XEN_DOMCTL_set_recipient works for both PV and HVM this patchset 
> >> does not
> >> bring PV kexec/kdump support. xc_domain_soft_reset() is limited to work 
> >> with HVM
> >> domains only. The main reason for that is: it is (in theory) possible to 
> >> save p2m
> >> and rebuild them with the new domain but that would only allow us to 
> >> resume execution
> >> from where we stopped. If we want to execute new kernel we need to build 
> >> the same
> >> kernel/initrd/bootstrap_pagetables/... structure we build to boot PV 
> >> domain initially.
> >> That however would destroy the original domain's memory thus making kdump 
> >> impossible.
> >> To make everything work additional support from kexec userspace/linux 
> >> kernel is
> >> required and I'm not sure it makes sense to implement all this stuff in 
> >> the light of
> >> PVH.
> >> 
> >
> > What would happen if you soft reset a PV guest? At the very least soft
> > reset should be explicitly forbidden in PV case.
> 
> Well, nothing particulary bad from hypervisor point of view
> happens. But when PV guest dies page tables are being destroyed (and we
> lose the knowledge anyway). It would be possible for the toolstack to
> start from the beggining - build bootstrap tables, put kernel/initrd and
> start everything. I however don't see any value in doing so: our memory
> gets destroyed and it is going to be the same as plain reboot.
> 

OK. As long as hypervisor is safe I'm OK with it. I was thinking
from a security PoV, i.e. there is no guest triggerable crash of
hypervisor, guest cannot escape by providing arbitrary new kernel, etc.

> Why do we need to forbid the call for PV? (xc_domain_soft_reset()
> already forbids PV).
> 
> >
> >> Original description:
> >> 
> >> When a PVHVM linux guest performs kexec there are lots of things which
> >> require taking care of:
> >> - shared info, vcpu_info
> >> - grants
> >> - event channels
> >> - ...
> >
> > Is there a complete list?
> >
> 
> Konrad tried to assemble it here:
> http://lists.xen.org/archives/html/xen-devel/2014-06/msg03889.html
> 
> 
> >> Instead of taking care of all these things we can rebuild the domain
> >> performing kexec from scratch doing so-called soft-reboot.
> >> 
> >> The idea was suggested by David Vrabel, Jan Beulich, and Konrad Rzeszutek 
> >> Wilk.
> >> 
> >
> > As this approach requires toolstack do complex interaction with
> > hypervisor and preserve / throw away a bunch of states. I think the
> > whole procedure should be documented.
> 
> Sure.. Where would you expect such doc to appear?
> 

I think libxl_internal.h is the right place. Just group the
documentation with other internal functions you introduced.

Re all the links, thanks, I will have a look.

Wei.

> >
> > It would also be helpful if you link to previous discussions in your
> > cover letter.
> 
> Sure, will do next time. For now:
> on resetting VCPU_info (and that's where 'rebuild everything with the
> toolstack solution' was suggested):
> http://lists.xen.org/archives/html/xen-devel/2014-08/msg01869.html
> Previous versions:
> http://lists.xen.org/archives/html/xen-devel/2014-08/msg01630.html
> http://lists.xen.org/archives/html/xen-devel/2014-08/msg00603.html
> 
> EVTCHNOP_reset (got merged):
> http://lists.xen.org/archives/html/xen-devel/2014-07/msg03979.html
> Previous:
> http://lists.xen.org/archives/html/xen-devel/20

Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-12-04 Thread Jan Beulich
>>> On 04.12.14 at 15:28,  wrote:
> On 04/12/14 13:49, Jan Beulich wrote:
> On 28.11.14 at 16:46,  wrote:
>>> On 28/11/14 15:18, Jan Beulich wrote:
>>> On 28.11.14 at 14:55,  wrote:
> The problem is with continuations which reuse the upper bits of the
> input register, not with this HVMOP_op_mask specifically; the
> HVMOP_op_mask simply adds to an existing problem.  This is something
> which needs considering urgently because, as you identify above, we have
> already suffered an accidental ABI breakage with the mem-op widening.
 Since we can't retroactively fix the mem-op widening, I still don't see
 what's urgent here: As long as we don't change any of these masks,
 nothing bad is going to happen. Of course one thing to consider with
 this aspect in mind is whether to change the hvm-op or gnttab-op
 masks again _before_ 4.5 goes out, based on whether we feel they're
 wide enough for the (un)foreseeable future.
>>> By urgent, I mean exactly this, while we have the ability to tweak the
>>> masks.
>> With no-one else voicing an opinion:
>>
>> For hvmop, the mask currently is 8 bits and we've got 22 ops defined.
>>
>> For gnttabop, the mask currently is 12 bits and we've got 12 ops defined.
>>
>> For the latter, we're fine even without further consideration. For the
>> former, the two operations actively using the continuation encoding
>> are tools-only ones. Since we're fine to alter the tools only interfaces,
>> and since it was intended for the tools-only HVM-ops to be split off
>> to a separate hypercall (e.g. hvmctl) anyway, the range restriction
>> would then no longer be a problem. Plus, in the worst case we could
>> always introduce yet another hypercall if we ran out of numbers.
> 
> Are you suggesting that we make a new hvmctl now and remove the hvmop
> mask before 4.5?  If we ship 4.5 with the hvmop mask, we cannot
> subsequently remove it even if all continuable hypercalls move to a
> separate hypercall.

Why? We certainly don't guarantee compatibility for undefined
hypercalls to behave in a certain way.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Sander Eikelenboom

Thursday, December 4, 2014, 3:31:11 PM, you wrote:

> On 04/12/14 14:09, Sander Eikelenboom wrote:
>> 
>> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
>> 
>>> On 04/12/14 13:10, Sander Eikelenboom wrote:

 Thursday, December 4, 2014, 1:24:47 PM, you wrote:

> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>
>> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>>>
>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 

 Instead of doing all this complex dance, we depend on the toolstack 
 doing the right thing. As such implement the 'do_flr' SysFS attribute 
 which 'xl' uses when a device is detached or attached from/to a guest. 
 It bypasses the need to worry about the PCI lock. 
>>>
>>> No.  Get pciback to add its own "reset" sysfs file (as I have 
>>> repeatedly 
>>> proposed). 
>>>
>>
>> Which does not work as the kobj will complain (as there is already an 
>> 'reset' associated with the PCI device).

> It is only needed if the core won't provide one.

> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> +{
> +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> +   struct device *dev = &pci->dev;
> +   int ret;
> +
> +   /* Already have a per-function reset? */
> +   if (pci_probe_reset_function(pci) == 0)
> +   return 0;
> +
> +   ret = device_create_file(dev, &dev_attr_reset);
> +   if (ret < 0)
> +   return ret;
 +   dev_data->>created_reset_file = true;
> +   return 0;
> +}

 Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
 "pci.c:__pci_dev_reset" ?

 The problem with that function is that from my testing it seems that the 
 first option "pci_dev_specific_reset" always seems to return succes, so 
 all the
 other options are skipped (flr, pm, slot, bus). However the device it self 
 is 
 not properly reset enough (perhaps the pci_dev_specific_reset is good 
 enough for 
 none virtualization purposes and it's probably the least intrusive. For 
 virtualization however it would be nice to be sure it resets properly, or 
 have a 
 way to force a specific reset routine.)
>> 
>>> Then you need work with the maintainer for those specific devices or
>>> drivers to fix their specific reset function.
>> 
>>> I'm not adding stuff to pciback to workaround broken quirks.
>> 
>> OK that's a pretty clear message there, so if one wants to use pci and vga
>> passthrough one should better use KVM and vfio-pci.

> Have you (or anyone else) ever raised the problem with the broken reset
> quirk for certain devices with the relevant maintainer?

>> vfio-pci has:
>> - logic to do the try-slot-bus-reset logic

> Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> as well.

Depends on what you call an "incorrect fix" .. it fixes a quirk .. 
you can say that's incorrect, but then you would have to remove 50% of
the kernel and Xen code as well.

(i do in general agree it's better to strive for a generic solution though,
that's exactly why i brought up that that function doesn't seem to work perfect
for virtualization purposes) 

> It makes no sense for both pciback and vfio-pci to workaround problems
> with pci_function_reset() in different ways -- it should be fixed in the
> core PCI code so both can benefit and make use of the same code.

Well perhaps Bjorn knows why the order of resets and skipping the rest as
implemented in "pci.c:__pci_dev_reset" was implemented in that way ?

Especially what is the expectation about pci_dev_specific_reset doing a proper 
reset for say a vga-card:
- i know it doesn't work on a radeon card (doesn't blank screen, on next guest 
  boot reports it's already posted, powermanagement doesn't work).
- while with a slot/bus reset, that all just works fine, screen blanks 
  immediately and everything else also works.

Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he 
knows why
he introduced the workaround in vfio-pci instead of trying to fix it in core 
pci 
code ?

--
Sander


> David




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 4/9] xen: introduce XEN_DOMCTL_devour

2014-12-04 Thread Vitaly Kuznetsov
Julien Grall  writes:

> On 03/12/2014 17:16, Vitaly Kuznetsov wrote:
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index a42d0b8..552e4a3 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -366,6 +366,8 @@ struct domain
>>   bool_t   is_privileged;
>>   /* Which guest this guest has privileges on */
>>   struct domain   *target;
>> +/* Which guest receives freed memory pages */
>
> It took me a while to understand that the recipient domain is a newly
> created domain, right? It might be worth to add a word here (and maybe
> in assign_pages).

Sure, will add. In case you think renaming 'recipient' to something else
makes sense I'm all in.

>
> With that in mind, the code in assign_pages makes more sense.
>
> Regards,

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/9] toolstack-based approach to pvhvm guest kexec

2014-12-04 Thread Vitaly Kuznetsov
Wei Liu  writes:

> On Wed, Dec 03, 2014 at 06:16:12PM +0100, Vitaly Kuznetsov wrote:
>> Changes from RFCv3:
>> This is the first non-RFC series as no major concerns were expressed. I'm 
>> trying
>> to address Jan's comments. Changes are:
>> - Move from XEN_DOMCTL_set_recipient to XEN_DOMCTL_devour (I don't really 
>> like
>>   the name but nothing more appropriate came to my mind) which incorporates
>>   former XEN_DOMCTL_set_recipient and XEN_DOMCTL_destroydomain to prevent
>>   original domain from changing its allocations during transfer procedure.
>> - Check in free_domheap_pages() that assign_pages() succeeded.
>> - Change printk() in free_domheap_pages().
>> - DOMDYING_locked state was introduced to support XEN_DOMCTL_devour.
>> - xc_domain_soft_reset() got simplified a bit. Now we just wait for the 
>> original
>>   domain to die or loose all its pages.
>> - rebased on top of current master branch.
>> 
>> Changes from RFC/WIPv2:
>> 
>> Here is a slightly different approach to memory reassignment. Instead of
>> introducing new (and very doubtful) XENMEM_transfer operation introduce
>> simple XEN_DOMCTL_set_recipient operation and do everything in 
>> free_domheap_pages()
>> handler utilizing normal domain destroy path. This is better because:
>> - The approach is general-enough
>> - All memory pages are usually being freed when the domain is destroyed
>> - No special grants handling required
>> - Better supportability
>> 
>> With regards to PV:
>> Though XEN_DOMCTL_set_recipient works for both PV and HVM this patchset does 
>> not
>> bring PV kexec/kdump support. xc_domain_soft_reset() is limited to work with 
>> HVM
>> domains only. The main reason for that is: it is (in theory) possible to 
>> save p2m
>> and rebuild them with the new domain but that would only allow us to resume 
>> execution
>> from where we stopped. If we want to execute new kernel we need to build the 
>> same
>> kernel/initrd/bootstrap_pagetables/... structure we build to boot PV domain 
>> initially.
>> That however would destroy the original domain's memory thus making kdump 
>> impossible.
>> To make everything work additional support from kexec userspace/linux kernel 
>> is
>> required and I'm not sure it makes sense to implement all this stuff in the 
>> light of
>> PVH.
>> 
>
> What would happen if you soft reset a PV guest? At the very least soft
> reset should be explicitly forbidden in PV case.

Well, nothing particulary bad from hypervisor point of view
happens. But when PV guest dies page tables are being destroyed (and we
lose the knowledge anyway). It would be possible for the toolstack to
start from the beggining - build bootstrap tables, put kernel/initrd and
start everything. I however don't see any value in doing so: our memory
gets destroyed and it is going to be the same as plain reboot.

Why do we need to forbid the call for PV? (xc_domain_soft_reset()
already forbids PV).

>
>> Original description:
>> 
>> When a PVHVM linux guest performs kexec there are lots of things which
>> require taking care of:
>> - shared info, vcpu_info
>> - grants
>> - event channels
>> - ...
>
> Is there a complete list?
>

Konrad tried to assemble it here:
http://lists.xen.org/archives/html/xen-devel/2014-06/msg03889.html


>> Instead of taking care of all these things we can rebuild the domain
>> performing kexec from scratch doing so-called soft-reboot.
>> 
>> The idea was suggested by David Vrabel, Jan Beulich, and Konrad Rzeszutek 
>> Wilk.
>> 
>
> As this approach requires toolstack do complex interaction with
> hypervisor and preserve / throw away a bunch of states. I think the
> whole procedure should be documented.

Sure.. Where would you expect such doc to appear?

>
> It would also be helpful if you link to previous discussions in your
> cover letter.

Sure, will do next time. For now:
on resetting VCPU_info (and that's where 'rebuild everything with the
toolstack solution' was suggested):
http://lists.xen.org/archives/html/xen-devel/2014-08/msg01869.html
Previous versions:
http://lists.xen.org/archives/html/xen-devel/2014-08/msg01630.html
http://lists.xen.org/archives/html/xen-devel/2014-08/msg00603.html

EVTCHNOP_reset (got merged):
http://lists.xen.org/archives/html/xen-devel/2014-07/msg03979.html
Previous:
http://lists.xen.org/archives/html/xen-devel/2014-07/msg03925.html
http://lists.xen.org/archives/html/xen-devel/2014-07/msg03322.html
http://lists.xen.org/archives/html/xen-devel/2014-07/msg02500.html

This patch series:
http://lists.xen.org/archives/html/xen-devel/2014-10/msg00764.html
http://lists.xen.org/archives/html/xen-devel/2014-09/msg03623.html
http://lists.xen.org/archives/html/xen-devel/2014-08/msg02309.html

>
> Wei.

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: Cleaning up the Mini-OS namespace

2014-12-04 Thread Andrew Cooper
On 04/12/14 14:27, Martin Lucina wrote:
> Hi,
>
> In rumprun-xen [1] we use Mini-OS as a "base firmware" layer in our stack.
> Currently we are using a slightly bastardized fork of the xen.git Mini-OS.
>
> We would like to avoid this turning into a permanent fork. Following
> previous discussion [2] on and openmirage-devel I would like to coordinate
> upstreaming our changes if possible.
>
> The biggest change which needs to be done is cleaning up the Mini-OS
> namespace. This is necessary as we link Mini-OS with the application, rump
> kernel and NetBSD libc to get a full application stack.
>
> The changes as implemented in a semi-automated fashion for the Mini-OS used
> by rumprun-xen can be viewed in the (since merged) pull request at [3]:
>
> - All Mini-OS functions called by rumprun-xen are renamed to minios_* or
>   _minios_* for strictly internal functions, except those in the
>   blkfront_*, netfront_*, pcifront_* and xenbus_* driver namespaces.
> - In the case of drivers, eg. init|shutdown_blkfront are renamed to
>   blkfront_*.
> - All global variables are either manually made local, or placed under
>   the _minios_* namespace, with the exception of HYPERVISOR_shared_info,
>   and those variables under driver namespaces kept above.
> - All callers are updated to use the new names. Where it makes sense,
>   macros such as alloc_page are also renamed into the minios_ namespace.
>
> Questions:
>
> - Is there a general interest in upstreaming this work?
> - Upstream Mini-OS provides things we (rumpkernel.org) don't need (stub
>   "libc", ...). If we are to move to upstream then we will need to do a
>   more general cleanup of Mini-OS to make these pluggable. Again, is there
>   upstream interest in this work?
> - As there is no formal API for Mini-OS. My changes only address the
>   "public" functionality used by rumprun-xen. Other users mileage will
>   vary; who else should I coordinate with?
> - I have not namespaced macros such as local_irq_save(). Should this be
>   done? 
>   
>   Also, the driver namespaces have been preserved (since I was lazy), these
>   should probably be renamed under the minios namespace; it's plausible
>   some day someone will try to link an application with a function called
>   blkfront_init().
>
> All comments and review welcome.
>
> Martin

I think this is a very good idea, and I am completely in favour of it.

There are already-identified issues such as MiniOS leaking things like
ARRAY_SIZE() into linked namespaces, which I havn't yet had enough tuits
to fix.

I think splitting things like the stub libc away from the "MiniOS Xen
Framework" is also a good idea.  Ideally, the result of a "MiniOS Build"
would be a small set of .a's which can then be linked against some
normal C to make a minios guest.  (How feasible this is in reality
remains to be seen.)

>From a not-public-API point of view, all you have to worry about is that
the existing minios stuff in xen.git, including the stubdom stuff,
continues to work.  We have never made any guarantees to anyone using
minios out-of-tree.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Poor network performance between DomU with multiqueue support

2014-12-04 Thread Zhangleiqiang (Trump)
Thanks for your detailed explanation, Wei. 

I am wondering if netfront/netback can be optimized to reach the 10Gbps 
throughout between DomUs running on different hosts connected with 10GE 
network. Currently, it seems like the RX is the bottleneck, which also consist 
with the testing result in xenwiki: 
http://wiki.xen.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing

I am wondering what factors prevent RX to reach the higher throughout? You have 
mentioned that one reason is that guest RX data path still uses grant_copy 
while guest TX uses grant_map to do zero-copy transmit. Do you know any other 
factors or ongoing work to optimize the RX data path?

--
zhangleiqiang (Trump)

Best Regards


> -Original Message-
> From: Wei Liu [mailto:wei.l...@citrix.com]
> Sent: Thursday, December 04, 2014 9:06 PM
> To: Zhangleiqiang (Trump)
> Cc: Wei Liu; xen-devel@lists.xen.org; zhangleiqiang; Luohao (brian); Xiaoding
> (B); Yuzhou (C); Zhuangyuxin
> Subject: Re: [Xen-devel] Poor network performance between DomU with
> multiqueue support
> 
> On Thu, Dec 04, 2014 at 12:09:33PM +, Zhangleiqiang (Trump) wrote:
> [...]
> > > > However, I find another issue. Even using 6 queues and making sure
> > > > that all of these 6 netback processes running with high cpu usage
> > > > (indeed, any of it running with 87% cpu usage), the whole VM
> > > > receive throughout is not very higher than results when using 4
> > > > queues. The results are from 4.5Gbps to 5.04 Gbps using TCP with
> > > > 512 bytes length and 4.3Gbps to 5.78Gbps using TCP with 1460 bytes
> length.
> > > >
> > >
> > > I would like to ask if you're still using 4U4G (4 CPU 4 G?)
> > > configuration? If so, please make sure there are at least the same number
> of vcpus as queues.
> >
> 
> > Sorry for misleading you, 4U4G means 4 CPU and 4 G memory, :). I also
> > found that the max_queue of netback is determinated by min(online_cpu,
> > module_param) yesterday, so when using 6 queues in the previous
> > testing, I used VM with 6 CPU and 6 G Memory.
> 
> >
> > > > According to the testing result from WIKI:
> > > > http://wiki.xen.org/wiki/Xen-netback_and_xen-netfront_multi-queue_
> > > > perf ormance_testing, The VM receive throughput is also more lower
> > > > than VM transmit.
> > > >
> > >
> > > I think that's expected, because guest RX data path still uses
> > > grant_copy while guest TX uses grant_map to do zero-copy transmit.
> >
> > As I understand, the RX process is as follows:
> > 1. Phy NIC receive packet
> > 2. XEN Hypervisor trigger interrupt to Dom0 3. Dom0' s NIC driver do
> > the "RX" operation, and the packet is stored into SKB which is also
> > owned/shared with netback 4. NetBack notify netfront through event
> > channel that a packet is receiving 5. Netfront grant a buffer for
> > receiving and notify netback the GR (if using grant-resue mechanism,
> > netfront just notify the GR to netback) through IO Ring 6. NetBack do
> > the grant_copy to copy packet from its SKB to the buffer referenced by
> > GR, and notify netfront through event channel 7. Netfront copy the
> > data from buffer to user-level app's SKB
> >
> > Am I right?
> 
> Step 4 is not correct, netback won't notify netfront at that point.
> 
> Step 5 is not correct, all grant refs are pre-allocated and granted before 
> that.
> 
> Other steps look correct.
> 
> > Why not using zero-copy transmit in guest RX data pash too ?
> >
> 
> A rogue / buggy guest might hold the mapping for arbitrary long period of 
> time.
> 
> >
> > > > I am wondering why the VM receive throughout cannot be up to
> > > > 8-10Gbps as VM transmit under multi-queue?  I also tried to send
> > > > packets directly from Local Dom0 to DomU, the DomU receive
> > > > throughput can reach about 8-12Gbps, so I am also wondering why
> > > > transmitting packets from Dom0 to Remote DomU can only reach about
> 4-5Gbps throughout?
> > >
> > > If data is from Dom0 to DomU then SKB is probably not fragmented by
> > > network stack.  You can use tcpdump to check that.
> >
> > In our testing , the MTU is set to 1600. However, even testing with
> > packets whose length are 1024 (small than 1600), the throughout
> > between Dom0 to Local DomU is more higher than that between Dom0 to
> > Remote DomU. So maybe the fragment is not the reason for it.
> >
> 
> Don't have much idea about this, sorry.
> 
> Wei.
> 
> >
> > > Wei.
> > >
> > > >
> > > > > Wei.
> > > > >
> > > > > > --
> > > > > > zhangleiqiang (Trump)
> > > > > >
> > > > > > Best Regards
> > > > > >
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Wei Liu [mailto:wei.l...@citrix.com]
> > > > > > > Sent: Tuesday, December 02, 2014 8:12 PM
> > > > > > > To: Zhangleiqiang (Trump)
> > > > > > > Cc: Wei Liu; zhangleiqiang; xen-devel@lists.xen.org; Luohao
> > > > > > > (brian); Xiaoding (B); Yuzhou (C); Zhuangyuxin
> > > > > > > Subject: Re: [Xen-devel] Poor network performance between
> > > > > > > DomU 

Re: [Xen-devel] Poor network performance between DomU with multiqueue support

2014-12-04 Thread Zhangleiqiang (Trump)
> -Original Message-
> From: Zoltan Kiss [mailto:zoltan.k...@linaro.org]
> Sent: Thursday, December 04, 2014 9:35 PM
> To: Zhangleiqiang (Trump); Wei Liu; xen-devel@lists.xen.org
> Cc: Xiaoding (B); Zhuangyuxin; zhangleiqiang; Luohao (brian); Yuzhou (C)
> Subject: Re: [Xen-devel] Poor network performance between DomU with
> multiqueue support
> 
> 
> 
> On 04/12/14 12:09, Zhangleiqiang (Trump) wrote:
> >> I think that's expected, because guest RX data path still uses
> >> grant_copy while
> >> >guest TX uses grant_map to do zero-copy transmit.
> > As I understand, the RX process is as follows:
> > 1. Phy NIC receive packet
> > 2. XEN Hypervisor trigger interrupt to Dom0 3. Dom0' s NIC driver do
> > the "RX" operation, and the packet is stored into SKB which is also
> > owned/shared with netback
> Not that easy. There is something between the NIC driver and netback which
> directs the packets, e.g. the old bridge driver, ovs, or the IP stack of the 
> kernel.
> > 4. NetBack notify netfront through event channel that a packet is
> > receiving 5. Netfront grant a buffer for receiving and notify netback
> > the GR (if using grant-resue mechanism, netfront just notify the GR to
> > netback) through IO Ring
> It looks a bit confusing in the code, but netfront put "requests" on the ring
> buffer, which contains the grant ref of the guest page where the backend can
> copy. When the packet comes, netback consumes these requests and send
> back a response telling the guest the grant copy of the packet finished, it 
> can
> start handling the data. (sending a response means it's placing a response in
> the ring and trigger the event channel) And ideally netback should always have
> requests in the ring, so it doesn't have to wait for the guest to fill it up.

> > 6. NetBack do the grant_copy to copy packet from its SKB to the buffer
> > referenced by GR, and notify netfront through event channel 7.
> > Netfront copy the data from buffer to user-level app's SKB
> Or wherever that SKB should go, yes. Like with any received packet on a real
> network interface.
> >
> > Am I right? Why not using zero-copy transmit in guest RX data pash too ?
> Because that means you are mapping that memory to the guest, and you won't
> have any guarantee when the guest will release them. And netback can't just
> unmap them forcibly after a timeout, because finding a correct timeout value
> would be quite impossible.
> A malicious/buggy/overloaded guest can hold on to Dom0 memory indefinitely,
> but it even becomes worse if the memory came from another
> guest: you can't shutdown that guest for example, until all its memory is
> returned to him.

Thanks for your detailed explanation about RX data path, I have get it, :)

About the issue that poor performance between DomU to DomU, but high throughout 
between Dom0 to remote Dom0/DomU mentioned in my previous mail, do you have any 
idea about it? 

I am wondering if netfront/netback can be optimized to reach the 10Gbps 
throughout between DomUs running on different hosts connected with 10GE 
network. Currently, it seems like the TX is not the bottleneck, because we can 
reach the aggregate throughout of 9Gbps when sending packets from one DomU to 
other 3 DomUs running on different host. So I think the bottleneck maybe the 
RX, are you agreed with me?

I am wondering what is the main reason that prevent RX to reach the higher 
throughout? Compared to KVM+virtio+vhost, which can reach high throughout, the 
RX has extra grantcopy operation, and the grantcopy operation may be one reason 
for it. Do you have any idea about it too?

> 
> Regards,
> 
> Zoli

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread David Vrabel
On 04/12/14 14:09, Sander Eikelenboom wrote:
> 
> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> 
>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>>
>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>>
 On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>
> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>>
>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>
>>> Instead of doing all this complex dance, we depend on the toolstack 
>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>> It bypasses the need to worry about the PCI lock. 
>>
>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>> proposed). 
>>
>
> Which does not work as the kobj will complain (as there is already an 
> 'reset' associated with the PCI device).
>>>
 It is only needed if the core won't provide one.
>>>
 +static int pcistub_try_create_reset_file(struct pci_dev *pci)
 +{
 +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
 +   struct device *dev = &pci->dev;
 +   int ret;
 +
 +   /* Already have a per-function reset? */
 +   if (pci_probe_reset_function(pci) == 0)
 +   return 0;
 +
 +   ret = device_create_file(dev, &dev_attr_reset);
 +   if (ret < 0)
 +   return ret;
>>> +   dev_data->>created_reset_file = true;
 +   return 0;
 +}
>>>
>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>>> "pci.c:__pci_dev_reset" ?
>>>
>>> The problem with that function is that from my testing it seems that the 
>>> first option "pci_dev_specific_reset" always seems to return succes, so all 
>>> the
>>> other options are skipped (flr, pm, slot, bus). However the device it self 
>>> is 
>>> not properly reset enough (perhaps the pci_dev_specific_reset is good 
>>> enough for 
>>> none virtualization purposes and it's probably the least intrusive. For 
>>> virtualization however it would be nice to be sure it resets properly, or 
>>> have a 
>>> way to force a specific reset routine.)
> 
>> Then you need work with the maintainer for those specific devices or
>> drivers to fix their specific reset function.
> 
>> I'm not adding stuff to pciback to workaround broken quirks.
> 
> OK that's a pretty clear message there, so if one wants to use pci and vga
> passthrough one should better use KVM and vfio-pci.

Have you (or anyone else) ever raised the problem with the broken reset
quirk for certain devices with the relevant maintainer?

> vfio-pci has:
> - logic to do the try-slot-bus-reset logic

Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
as well.

It makes no sense for both pciback and vfio-pci to workaround problems
with pci_function_reset() in different ways -- it should be fixed in the
core PCI code so both can benefit and make use of the same code.

David


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/9] toolstack-based approach to pvhvm guest kexec

2014-12-04 Thread Vitaly Kuznetsov
Olaf Hering  writes:

> On Wed, Dec 03, Vitaly Kuznetsov wrote:
>
>> Original description:
>> 
>> When a PVHVM linux guest performs kexec there are lots of things which
>> require taking care of:
>> - shared info, vcpu_info
>> - grants
>> - event channels
>> - ...
>> Instead of taking care of all these things we can rebuild the domain
>> performing kexec from scratch doing so-called soft-reboot.
>> 
>> The idea was suggested by David Vrabel, Jan Beulich, and Konrad Rzeszutek 
>> Wilk.
>> 
>> P.S. The patch series can be tested with PVHVM Linux guest with the following
>> modifications:
>
> Its not clear to me how thew old kernel starts the new kernel.
> How and where is that done?

It is done by linux kernel itself, I bring nothing new into the
picture. It all works like this:
1) Original guest does HYPERVISOR_sched_op(SCHEDOP_shutdown, r = { .reason =
SHUTDOWN_soft_reset})
2) All this rebuild machinery happens including copying HVM context
3) New guest resumes from where old did the hypercall
4) Kernel does kexec and new kernel is being booted.

>
> Olaf

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-12-04 Thread Andrew Cooper
On 04/12/14 13:49, Jan Beulich wrote:
 On 28.11.14 at 16:46,  wrote:
>> On 28/11/14 15:18, Jan Beulich wrote:
>> On 28.11.14 at 14:55,  wrote:
 The problem is with continuations which reuse the upper bits of the
 input register, not with this HVMOP_op_mask specifically; the
 HVMOP_op_mask simply adds to an existing problem.  This is something
 which needs considering urgently because, as you identify above, we have
 already suffered an accidental ABI breakage with the mem-op widening.
>>> Since we can't retroactively fix the mem-op widening, I still don't see
>>> what's urgent here: As long as we don't change any of these masks,
>>> nothing bad is going to happen. Of course one thing to consider with
>>> this aspect in mind is whether to change the hvm-op or gnttab-op
>>> masks again _before_ 4.5 goes out, based on whether we feel they're
>>> wide enough for the (un)foreseeable future.
>> By urgent, I mean exactly this, while we have the ability to tweak the
>> masks.
> With no-one else voicing an opinion:
>
> For hvmop, the mask currently is 8 bits and we've got 22 ops defined.
>
> For gnttabop, the mask currently is 12 bits and we've got 12 ops defined.
>
> For the latter, we're fine even without further consideration. For the
> former, the two operations actively using the continuation encoding
> are tools-only ones. Since we're fine to alter the tools only interfaces,
> and since it was intended for the tools-only HVM-ops to be split off
> to a separate hypercall (e.g. hvmctl) anyway, the range restriction
> would then no longer be a problem. Plus, in the worst case we could
> always introduce yet another hypercall if we ran out of numbers.

Are you suggesting that we make a new hvmctl now and remove the hvmop
mask before 4.5?  If we ship 4.5 with the hvmop mask, we cannot
subsequently remove it even if all continuable hypercalls move to a
separate hypercall.

>
> Consequently what I'd like to propose (and I guess I'll craft a patch
> as soon as I finished this mail) is that we add comments to these
> masks (also the memop one) to make clear that they mustn't change.
> Additionally for forward compatibility I'd also like to add checks for
> the upper bits to be zero for any of the sub-ops that don't actually
> use them to encode continuation information. Konrad, would you
> consider doing so acceptable for 4.5?

I will have to re-work around this new check for XenServer
compatibility, but I suppose that is fine.  I am certainly in favour of
leaving an ABI comment with the op masks.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] RFC: Cleaning up the Mini-OS namespace

2014-12-04 Thread Martin Lucina
Hi,

In rumprun-xen [1] we use Mini-OS as a "base firmware" layer in our stack.
Currently we are using a slightly bastardized fork of the xen.git Mini-OS.

We would like to avoid this turning into a permanent fork. Following
previous discussion [2] on and openmirage-devel I would like to coordinate
upstreaming our changes if possible.

The biggest change which needs to be done is cleaning up the Mini-OS
namespace. This is necessary as we link Mini-OS with the application, rump
kernel and NetBSD libc to get a full application stack.

The changes as implemented in a semi-automated fashion for the Mini-OS used
by rumprun-xen can be viewed in the (since merged) pull request at [3]:

- All Mini-OS functions called by rumprun-xen are renamed to minios_* or
  _minios_* for strictly internal functions, except those in the
  blkfront_*, netfront_*, pcifront_* and xenbus_* driver namespaces.
- In the case of drivers, eg. init|shutdown_blkfront are renamed to
  blkfront_*.
- All global variables are either manually made local, or placed under
  the _minios_* namespace, with the exception of HYPERVISOR_shared_info,
  and those variables under driver namespaces kept above.
- All callers are updated to use the new names. Where it makes sense,
  macros such as alloc_page are also renamed into the minios_ namespace.

Questions:

- Is there a general interest in upstreaming this work?
- Upstream Mini-OS provides things we (rumpkernel.org) don't need (stub
  "libc", ...). If we are to move to upstream then we will need to do a
  more general cleanup of Mini-OS to make these pluggable. Again, is there
  upstream interest in this work?
- As there is no formal API for Mini-OS. My changes only address the
  "public" functionality used by rumprun-xen. Other users mileage will
  vary; who else should I coordinate with?
- I have not namespaced macros such as local_irq_save(). Should this be
  done? 
  
  Also, the driver namespaces have been preserved (since I was lazy), these
  should probably be renamed under the minios namespace; it's plausible
  some day someone will try to link an application with a function called
  blkfront_init().

All comments and review welcome.

Martin

[1] http://repo.rumpkernel.org/rumprun-xen
[2] http://thread.gmane.org/gmane.comp.rumpkernel.user/514
[3] https://github.com/rumpkernel/rumprun-xen/pull/10

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] x86/stack: Avoid peeking into unmapped guard pages when dumping Xens stack

2014-12-04 Thread Andrew Cooper
On 04/12/14 13:32, Jan Beulich wrote:
 On 04.12.14 at 13:58,  wrote:
>> Konrad: I am requesting a release ack for this change.  It aids the clarity 
>> of
>> certain crash information, and prevents cascade pagefaults in certain
>> circumstances, which would prevent execution of the crash kernel or a system
>> reboot.
> With
>
> #ifndef NDEBUG
> #define MEMORY_GUARD
> #endif
>
> I don't think this qualifies as a necessary bug fix at this point of 4.5.

Without frame pointers, you get a call trace of almost complete junk,
spanning 6 pages.  This is admittedly less bad, but still a bug.

>
>> +unsigned long get_stack_trace_bottom(unsigned long sp)
>> +{
>> +switch ( get_stack_page(sp) )
>> +{
>> +case 0 ... 2:
>> +return ROUNDUP(sp, PAGE_SIZE) -
>> +offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
> The only concern I have here is that this may wrongly truncate a
> dump/trace when one of the IST stacks overflowed. But I think we
> can live with that - an overflow of the first one would already have
> similar behavior today.

I was already planning to do some improvements for 4.6, based on work
developed for the xen-crashdump-analyser which has proved very useful
when investigating XenServer crashes.  This includes the ability to spot
valid exception frames, and follow them back to the primary stack
through a set of nested IST faults.

Along the bugfix line, there are some other issues which would be nice
to fix.  Putting a guard pages in stack pages 0 and 7 to completely
prevent a stack overflow on cpu0 from sliding into Xens BSS.  This would
leave 3 unmapped guard pages, 2 pages of primary stack and 3 IST
stacks.  The syscall trampoline can safely live at the bottom of the #DF
stack.

For AMD hardware, the lack of TR switch on vmexit causes any interaction
with MEMORY_GUARD to be a triple fault as we currently must disable IST
for security reasons.  I am hoping that the overhead of switching the
task register will prove to be negligible, and that we can run without
playing any IST games in the slightest.  If not, then the switch can
reasonably be gated on MEMORY_GUARD, in which case we still
conditionally keep the IST games, but don't take the TR switch overhead
in non-debug builds.

>
>> +
>> +#ifndef MEMORY_GUARD
>> +case 3 ... 5:
>> +#endif
>> +case 6 ... 7:
>> +return ROUNDUP(sp, STACK_SIZE) -
>> +sizeof(struct cpu_info) - sizeof(unsigned long);
>> +
>> +#ifdef MEMORY_GUARD
>> +case 3 ... 5:
>> +#endif
>> +default:
> What is the #ifdef good for when this is "default:" anyway? With
> this dropped (also from the other function)
> Reviewed-by: Jan Beulich 

Ah yes - a side effect of the way the patch was developed.  They can be
dropped.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Sander Eikelenboom
Hello Sander,

Thursday, December 4, 2014, 3:09:09 PM, you wrote:


> Thursday, December 4, 2014, 2:43:06 PM, you wrote:

>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>> 
>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>> 
 On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>
> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>>
>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>
>>> Instead of doing all this complex dance, we depend on the toolstack 
>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>> It bypasses the need to worry about the PCI lock. 
>>
>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>> proposed). 
>>
>
> Which does not work as the kobj will complain (as there is already an 
> 'reset' associated with the PCI device).
>>> 
 It is only needed if the core won't provide one.
>>> 
 +static int pcistub_try_create_reset_file(struct pci_dev *pci)
 +{
 +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
 +   struct device *dev = &pci->dev;
 +   int ret;
 +
 +   /* Already have a per-function reset? */
 +   if (pci_probe_reset_function(pci) == 0)
 +   return 0;
 +
 +   ret = device_create_file(dev, &dev_attr_reset);
 +   if (ret < 0)
 +   return ret;
>>> +   dev_data->>created_reset_file = true;
 +   return 0;
 +}
>>> 
>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>>> "pci.c:__pci_dev_reset" ?
>>> 
>>> The problem with that function is that from my testing it seems that the 
>>> first option "pci_dev_specific_reset" always seems to return succes, so all 
>>> the
>>> other options are skipped (flr, pm, slot, bus). However the device it self 
>>> is 
>>> not properly reset enough (perhaps the pci_dev_specific_reset is good 
>>> enough for 
>>> none virtualization purposes and it's probably the least intrusive. For 
>>> virtualization however it would be nice to be sure it resets properly, or 
>>> have a 
>>> way to force a specific reset routine.)

>> Then you need work with the maintainer for those specific devices or
>> drivers to fix their specific reset function.

>> I'm not adding stuff to pciback to workaround broken quirks.

> OK that's a pretty clear message there, so if one wants to use pci and vga
> passthrough one should better use KVM and vfio-pci. 

> vfio-pci has:
> - logic to do the try-slot-bus-reset logic
> - it has quirks specific to vga passthrough
Hrmm have to correct my self because the vga-pt quirks are part of the vfio-pci 
part in qemu.

The try-slot-bus-reset logic is part of the kernel vfio-pci driver though and 
they faced the same locking issue it 
seems:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61cf16d8bd38c3dc52033ea75d5b1f8368514a17
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=890ed578df82f5b7b5a874f9f2fa4f117305df5f

> implemented in from the looks of it a quite clean driver.
> (the main issue with it so far was you could only seize devices based on 
> vendor and device id, which can be a problem when you have multiple devices.
> However that was resolved recently if i am correct.)

> And neither of those will be supported by xen-pciback if i get your message 
> right ?

> --
> Sander

>> David



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Sander Eikelenboom

Thursday, December 4, 2014, 2:43:06 PM, you wrote:

> On 04/12/14 13:10, Sander Eikelenboom wrote:
>> 
>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>> 
>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:

 On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>
> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>
>> Instead of doing all this complex dance, we depend on the toolstack 
>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>> which 'xl' uses when a device is detached or attached from/to a guest. 
>> It bypasses the need to worry about the PCI lock. 
>
> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
> proposed). 
>

 Which does not work as the kobj will complain (as there is already an 
 'reset' associated with the PCI device).
>> 
>>> It is only needed if the core won't provide one.
>> 
>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>> +{
>>> +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>> +   struct device *dev = &pci->dev;
>>> +   int ret;
>>> +
>>> +   /* Already have a per-function reset? */
>>> +   if (pci_probe_reset_function(pci) == 0)
>>> +   return 0;
>>> +
>>> +   ret = device_create_file(dev, &dev_attr_reset);
>>> +   if (ret < 0)
>>> +   return ret;
>> +   dev_data->>created_reset_file = true;
>>> +   return 0;
>>> +}
>> 
>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>> "pci.c:__pci_dev_reset" ?
>> 
>> The problem with that function is that from my testing it seems that the 
>> first option "pci_dev_specific_reset" always seems to return succes, so all 
>> the
>> other options are skipped (flr, pm, slot, bus). However the device it self 
>> is 
>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough 
>> for 
>> none virtualization purposes and it's probably the least intrusive. For 
>> virtualization however it would be nice to be sure it resets properly, or 
>> have a 
>> way to force a specific reset routine.)

> Then you need work with the maintainer for those specific devices or
> drivers to fix their specific reset function.

> I'm not adding stuff to pciback to workaround broken quirks.

OK that's a pretty clear message there, so if one wants to use pci and vga
passthrough one should better use KVM and vfio-pci. 

vfio-pci has:
- logic to do the try-slot-bus-reset logic
- it has quirks specific to vga passthrough
implemented in from the looks of it a quite clean driver.
(the main issue with it so far was you could only seize devices based on 
vendor and device id, which can be a problem when you have multiple devices.
However that was resolved recently if i am correct.)

And neither of those will be supported by xen-pciback if i get your message 
right ?

--
Sander

> David



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] Mini-OS: netfront: Fix rx ring starvation in network_rx

2014-12-04 Thread Martin Lucina
Hi,

following is a patch against vanilla Mini-OS in upstream xen.git for a
problem we have found in the netfront driver. When subjected to load
network receive would freeze due to the rx ring running out of free
request slots.

With this patch a httpd running on rumprun-xen [1] survives all the load I
can throw at it.

Note that I have only compile-tested this patch against the upstream
Mini-OS. It also does not address the -DHAVE_LIBC codepath as I don't know
how to test that or if anyone is using that code?

Further, I would like to clean this function up; the code is an impossible
to follow mess. What I'd like to arrive at is at [2], however this also
needs(?) to address the -DHAVE_LIBC codepath.

Martin

[1] http://repo.rumpkernel.org/rumprun-xen and
https://github.com/mato/rump-mathopd
[2] https://gist.github.com/mato/512eb70f39565b030f73



In network_rx() we must push the same amount of requests back onto the
ring in the second loop that we consumed in the first loop. Otherwise
the ring will eventually starve itself of free request slots and no
packets will be delivered.

Signed-off-by: Martin Lucina 
---
 extras/mini-os/netfront.c |   17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/extras/mini-os/netfront.c b/extras/mini-os/netfront.c
index 44c3995..33afc9c 100644
--- a/extras/mini-os/netfront.c
+++ b/extras/mini-os/netfront.c
@@ -99,14 +99,14 @@ void network_rx(struct netfront_dev *dev)
 struct netif_rx_response *rx;
 int nr_consumed, some, more, i, notify;
 
-
+nr_consumed = 0;
 moretodo:
 rp = dev->rx.sring->rsp_prod;
 rmb(); /* Ensure we see queued responses up to 'rp'. */
 cons = dev->rx.rsp_cons;
 
-for (nr_consumed = 0, some = 0;
- (cons != rp) && !some;
+for (some = 0;
+ (cons != rp);
  nr_consumed++, cons++)
 {
 struct net_buffer* buf;
@@ -115,17 +115,8 @@ moretodo:
 
 rx = RING_GET_RESPONSE(&dev->rx, cons);
 
-if (rx->flags & NETRXF_extra_info)
-{
-printk("+ we have extras!\n");
-continue;
-}
-
-
-if (rx->status == NETIF_RSP_NULL) continue;
-
 id = rx->id;
-BUG_ON(id >= NET_TX_RING_SIZE);
+BUG_ON(id >= NET_RX_RING_SIZE);
 
 buf = &dev->rx_buffers[id];
 page = (unsigned char*)buf->page;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xen/arm: uart interrupts handling

2014-12-04 Thread Julien Grall
On 04/12/14 03:50, Vijay Kilari wrote:
> Hi Tim,

Hi Vijay,

> I see that on uart interrupt, ICR is written to clear the all
> interrupts except TX, RX and RX timeout. With this, cpu always finds
> TX/RX is active and never
> comes out of the loop.

FWIW, the PL011 serial code has been copied from the Linux drivers.

Linux interrupt handler also clear all interrupts except TX, RX, RX
timeout. So do you see the issue on Linux?

> 
> With the below changes, TX, RX & RTI are cleared before handling this
> interrupts.
> 
> Is my observation is correct?. If so I wonder how it is working on
> platforms that
> are using pl011. Without this for my cpu just keeps looping here.
> 
>   index fba0a55..d21bce3 100644
> --- a/xen/drivers/char/pl011.c
> +++ b/xen/drivers/char/pl011.c
> @@ -63,7 +63,7 @@ static void pl011_interrupt(int irq, void *data,
> struct cpu_user_regs *regs)
>  {
>  do
>  {
> -pl011_write(uart, ICR, status & ~(TXI|RTI|RXI));
> +pl011_write(uart, ICR, status & (TXI|RTI|RXI));


This changes looks wrong to me. We want to clear the bit in status we
don't handle. Otherwise the interrupt will be fired in loop.

If I'm not mistaken, TXI/RTI/RXI will be cleared when data is read or
write into the fifo. So we should not clear automatically.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-4.5 HVMOP ABI issues

2014-12-04 Thread Jan Beulich
>>> On 28.11.14 at 16:46,  wrote:
> On 28/11/14 15:18, Jan Beulich wrote:
> On 28.11.14 at 14:55,  wrote:
>>> The problem is with continuations which reuse the upper bits of the
>>> input register, not with this HVMOP_op_mask specifically; the
>>> HVMOP_op_mask simply adds to an existing problem.  This is something
>>> which needs considering urgently because, as you identify above, we have
>>> already suffered an accidental ABI breakage with the mem-op widening.
>> Since we can't retroactively fix the mem-op widening, I still don't see
>> what's urgent here: As long as we don't change any of these masks,
>> nothing bad is going to happen. Of course one thing to consider with
>> this aspect in mind is whether to change the hvm-op or gnttab-op
>> masks again _before_ 4.5 goes out, based on whether we feel they're
>> wide enough for the (un)foreseeable future.
> 
> By urgent, I mean exactly this, while we have the ability to tweak the
> masks.

With no-one else voicing an opinion:

For hvmop, the mask currently is 8 bits and we've got 22 ops defined.

For gnttabop, the mask currently is 12 bits and we've got 12 ops defined.

For the latter, we're fine even without further consideration. For the
former, the two operations actively using the continuation encoding
are tools-only ones. Since we're fine to alter the tools only interfaces,
and since it was intended for the tools-only HVM-ops to be split off
to a separate hypercall (e.g. hvmctl) anyway, the range restriction
would then no longer be a problem. Plus, in the worst case we could
always introduce yet another hypercall if we ran out of numbers.

Consequently what I'd like to propose (and I guess I'll craft a patch
as soon as I finished this mail) is that we add comments to these
masks (also the memop one) to make clear that they mustn't change.
Additionally for forward compatibility I'd also like to add checks for
the upper bits to be zero for any of the sub-ops that don't actually
use them to encode continuation information. Konrad, would you
consider doing so acceptable for 4.5?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/2] xen/pci: Use APIC for MSIs when APIC virtualization is supported

2014-12-04 Thread David Vrabel
On 04/12/14 10:56, David Vrabel wrote:
> On 02/12/14 20:19, Boris Ostrovsky wrote:
>> Changes in v4:
>> * Added comment describing what we check for in pci_xen_init()
> 
> I applied v3 ages ago to the devel/for-linus-3.19 branch and I'm not
> going to mess about replacing it for a comment change.

I had to mess with it anyway to fix a build problem.  I've applied this
version.

Thanks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 2/2] add a new p2m type - p2m_mmio_write_dm

2014-12-04 Thread Yu Zhang
From: Yu Zhang 

A new p2m type, p2m_mmio_write_dm, is added to trap and emulate
the write operations on GPU's page tables. Handling of this new
p2m type are similar with existing p2m_ram_ro in most condition
checks, with only difference on final policy of emulation vs. drop.
For p2m_ram_ro types, write operations will not trigger the device
model, and will be discarded later in __hvm_copy(); while for the
p2m_mmio_write_dm type pages, writes will go to the device model
via ioreq-server.

Signed-off-by: Yu Zhang 
Signed-off-by: Wei Ye 
---
 xen/arch/x86/hvm/hvm.c  | 11 ---
 xen/arch/x86/mm/p2m-ept.c   |  1 +
 xen/arch/x86/mm/p2m-pt.c|  1 +
 xen/include/asm-x86/p2m.h   |  4 +++-
 xen/include/public/hvm/hvm_op.h |  1 +
 5 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 967f822..b4bdfab 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2837,7 +2837,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
  * to the mmio handler.
  */
 if ( (p2mt == p2m_mmio_dm) || 
- (npfec.write_access && (p2m_is_discard_write(p2mt))) )
+ (npfec.write_access &&
+  (p2m_is_discard_write(p2mt) || (p2mt == p2m_mmio_write_dm))) )
 {
 put_gfn(p2m->domain, gfn);
 
@@ -5904,6 +5905,8 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 get_gfn_query_unlocked(d, a.pfn, &t);
 if ( p2m_is_mmio(t) )
 a.mem_type =  HVMMEM_mmio_dm;
+else if ( t == p2m_mmio_write_dm )
+a.mem_type = HVMMEM_mmio_write_dm;
 else if ( p2m_is_readonly(t) )
 a.mem_type =  HVMMEM_ram_ro;
 else if ( p2m_is_ram(t) )
@@ -5931,7 +5934,8 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 static const p2m_type_t memtype[] = {
 [HVMMEM_ram_rw]  = p2m_ram_rw,
 [HVMMEM_ram_ro]  = p2m_ram_ro,
-[HVMMEM_mmio_dm] = p2m_mmio_dm
+[HVMMEM_mmio_dm] = p2m_mmio_dm,
+[HVMMEM_mmio_write_dm] = p2m_mmio_write_dm
 };
 
 if ( copy_from_guest(&a, arg, 1) )
@@ -5978,7 +5982,8 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 goto param_fail4;
 }
 if ( !p2m_is_ram(t) &&
- (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) )
+ (!p2m_is_hole(t) || a.hvmmem_type != HVMMEM_mmio_dm) &&
+ t != p2m_mmio_write_dm )
 {
 put_gfn(d, pfn);
 goto param_fail4;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 15c6e83..e21a92d 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -136,6 +136,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, 
p2m_type_t type, p2m_acces
 entry->x = 0;
 break;
 case p2m_grant_map_ro:
+case p2m_mmio_write_dm:
 entry->r = 1;
 entry->w = entry->x = 0;
 break;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index e48b63a..26fb18d 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -94,6 +94,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t 
mfn)
 default:
 return flags | _PAGE_NX_BIT;
 case p2m_grant_map_ro:
+case p2m_mmio_write_dm:
 return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
 case p2m_ram_ro:
 case p2m_ram_logdirty:
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 42de75d..866fb0d 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -72,6 +72,7 @@ typedef enum {
 p2m_ram_shared = 12,  /* Shared or sharable memory */
 p2m_ram_broken = 13,  /* Broken page, access cause domain crash */
 p2m_map_foreign  = 14,/* ram pages from foreign domain */
+p2m_mmio_write_dm = 15,   /* Read-only; writes go to the device model 
*/
 } p2m_type_t;
 
 /* Modifiers to the query */
@@ -111,7 +112,8 @@ typedef unsigned int p2m_query_t;
 #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty) \
   | p2m_to_mask(p2m_ram_ro) \
   | p2m_to_mask(p2m_grant_map_ro)   \
-  | p2m_to_mask(p2m_ram_shared) )
+  | p2m_to_mask(p2m_ram_shared)   \
+  | p2m_to_mask(p2m_mmio_write_dm))
 
 /* Write-discard types, which should discard the write operations */
 #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro) \
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index eeb0a60..a4e5345 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -81,6 +81,7 @@ typedef enum {
 HVMMEM_ram_rw, /* Normal read/write guest RAM */
 HVMMEM_ram_ro, /* Read-only; writes are 

[Xen-devel] [PATCH v5 1/2] add a new p2m type class - P2M_DISCARD_WRITE_TYPES

2014-12-04 Thread Yu Zhang
From: Yu Zhang 

Currently, the P2M_RO_TYPES bears 2 meanings: one is
"_PAGE_RW bit is clear in their PTEs", and another is
to discard the write operations on these pages. This
patch adds a p2m type class, P2M_DISCARD_WRITE_TYPES,
to bear the second meaning, so we can use this type
class instead of the P2M_RO_TYPES, to decide if a write
operation is to be ignored.

Signed-off-by: Yu Zhang 
---
 xen/arch/x86/hvm/hvm.c | 16 +++-
 xen/arch/x86/mm/shadow/multi.c |  2 +-
 xen/include/asm-x86/p2m.h  |  5 +
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 51ffc90..967f822 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2837,7 +2837,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
  * to the mmio handler.
  */
 if ( (p2mt == p2m_mmio_dm) || 
- (npfec.write_access && (p2mt == p2m_ram_ro)) )
+ (npfec.write_access && (p2m_is_discard_write(p2mt))) )
 {
 put_gfn(p2m->domain, gfn);
 
@@ -2882,16 +2882,6 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 goto out_put_gfn;
 }
 
-/* Shouldn't happen: Maybe the guest was writing to a r/o grant mapping? */
-if ( npfec.write_access && (p2mt == p2m_grant_map_ro) )
-{
-gdprintk(XENLOG_WARNING,
- "trying to write to read-only grant mapping\n");
-hvm_inject_hw_exception(TRAP_gp_fault, 0);
-rc = 1;
-goto out_put_gfn;
-}
-
 /* If we fell through, the vcpu will retry now that access restrictions 
have
  * been removed. It may fault again if the p2m entry type still requires 
so.
  * Otherwise, this is an error condition. */
@@ -3941,7 +3931,7 @@ static enum hvm_copy_result __hvm_copy(
 
 if ( flags & HVMCOPY_to_guest )
 {
-if ( p2mt == p2m_ram_ro )
+if ( p2m_is_discard_write(p2mt) )
 {
 static unsigned long lastpage;
 if ( xchg(&lastpage, gfn) != gfn )
@@ -4035,7 +4025,7 @@ static enum hvm_copy_result __hvm_clear(paddr_t addr, int 
size)
 
 p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
 
-if ( p2mt == p2m_ram_ro )
+if ( p2m_is_discard_write(p2mt) )
 {
 static unsigned long lastpage;
 if ( xchg(&lastpage, gfn) != gfn )
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 225290e..94cf06d 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4575,7 +4575,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v,
 {
 return _mfn(BAD_GFN_TO_MFN);
 }
-if ( p2m_is_readonly(p2mt) )
+if ( p2m_is_discard_write(p2mt) )
 {
 put_page(page);
 return _mfn(READONLY_GFN);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 5f7fe71..42de75d 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -113,6 +113,10 @@ typedef unsigned int p2m_query_t;
   | p2m_to_mask(p2m_grant_map_ro)   \
   | p2m_to_mask(p2m_ram_shared) )
 
+/* Write-discard types, which should discard the write operations */
+#define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro) \
+  | p2m_to_mask(p2m_grant_map_ro))
+
 /* Types that can be subject to bulk transitions. */
 #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
   | p2m_to_mask(p2m_ram_logdirty) )
@@ -145,6 +149,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_hole(_t) (p2m_to_mask(_t) & P2M_HOLE_TYPES)
 #define p2m_is_mmio(_t) (p2m_to_mask(_t) & P2M_MMIO_TYPES)
 #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
+#define p2m_is_discard_write(_t) (p2m_to_mask(_t) & P2M_DISCARD_WRITE_TYPES)
 #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES)
 #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)
 #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 0/2] add new p2m type class and new p2m type

2014-12-04 Thread Yu Zhang
XenGT (Intel Graphics Virtualization technology, please refer to
https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-
xengt) driver runs inside Dom0 as a virtual graphics device model,
and needs to trap and emulate the guest's write operations to some
specific memory pages, like memory pages used by guest graphics
driver as PPGTT(per-process graphics translation table). We added
a new p2m type, p2m_mmio_write_dm, to trap and emulate the write
operations on these graphic page tables. 

Handling of this new p2m type are similar with existing p2m_ram_ro
in most condition checks, with only difference on final policy of
emulation vs. drop. For p2m_ram_ro types, write operations will not
trigger the device model, and will be discarded later in __hvm_copy();
while for the p2m_mmio_write_dm type pages, writes will go to the
device model via ioreq-server.

Previously, the conclusion in our v3 patch review is to provide a
more generalized HVMOP_map_io_range_to_ioreq_server hypercall, by
seperating rangesets inside a ioreq server to read-protected/write-
protected/both-prtected. Yet, after offline discussion with Paul,
we believe a more simplified solution may suffice. We can keep the 
existing HVMOP_map_io_range_to_ioreq_server hypercall, and let the 
user decide whether or not a p2m type change is necessary, because
in most cases the emulator will already use the p2m_mmio_dm type.

Changes from v4:
 - A new p2m type class, P2M_DISCARD_WRITE_TYPES, is added;
 - A new predicate, p2m_is_discard_write, is used in __hvm_copy()/
   __hvm_clear()/emulate_gva_to_mfn()/hvm_hap_nested_page_fault(),
   to discard the write operations;
 - The new p2m type, p2m_mmio_write_dm, is added to P2M_RO_TYPES;
 - Coding style changes;

Changes from v3:
 - Use the existing HVMOP_map_io_range_to_ioreq_server hypercall
   to add write protected range;
 - Modify the HVMOP_set_mem_type hypercall to support the new p2m
   type for this range.

Changes from v2:
 - Remove excute attribute of the new p2m type p2m_mmio_write_dm;
 - Use existing rangeset for keeping the write protection page range
   instead of introducing hash table;
 - Some code style fix.

Changes from v1:
 - Changes the new p2m type name from p2m_ram_wp to p2m_mmio_write_dm.
   This means that we treat the pages as a special mmio range instead
   of ram;
 - Move macros to c file since only this file is using them.
 - Address various comments from Jan.

Yu Zhang (2):
  Add a new p2m type class - P2M_DISCARD_WRITE_TYPES
  add a new p2m type - p2m_mmio_write_dm

 xen/arch/x86/hvm/hvm.c  | 25 ++---
 xen/arch/x86/mm/p2m-ept.c   |  1 +
 xen/arch/x86/mm/p2m-pt.c|  1 +
 xen/arch/x86/mm/shadow/multi.c  |  2 +-
 xen/include/asm-x86/p2m.h   |  9 -
 xen/include/public/hvm/hvm_op.h |  1 +
 6 files changed, 22 insertions(+), 17 deletions(-)

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread David Vrabel
On 04/12/14 13:10, Sander Eikelenboom wrote:
> 
> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> 
>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>
>>> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:

 On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>
> Instead of doing all this complex dance, we depend on the toolstack 
> doing the right thing. As such implement the 'do_flr' SysFS attribute 
> which 'xl' uses when a device is detached or attached from/to a guest. 
> It bypasses the need to worry about the PCI lock. 

 No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
 proposed). 

>>>
>>> Which does not work as the kobj will complain (as there is already an 
>>> 'reset' associated with the PCI device).
> 
>> It is only needed if the core won't provide one.
> 
>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>> +{
>> +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>> +   struct device *dev = &pci->dev;
>> +   int ret;
>> +
>> +   /* Already have a per-function reset? */
>> +   if (pci_probe_reset_function(pci) == 0)
>> +   return 0;
>> +
>> +   ret = device_create_file(dev, &dev_attr_reset);
>> +   if (ret < 0)
>> +   return ret;
> +   dev_data->>created_reset_file = true;
>> +   return 0;
>> +}
> 
> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
> "pci.c:__pci_dev_reset" ?
> 
> The problem with that function is that from my testing it seems that the 
> first option "pci_dev_specific_reset" always seems to return succes, so all 
> the
> other options are skipped (flr, pm, slot, bus). However the device it self is 
> not properly reset enough (perhaps the pci_dev_specific_reset is good enough 
> for 
> none virtualization purposes and it's probably the least intrusive. For 
> virtualization however it would be nice to be sure it resets properly, or 
> have a 
> way to force a specific reset routine.)

Then you need work with the maintainer for those specific devices or
drivers to fix their specific reset function.

I'm not adding stuff to pciback to workaround broken quirks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Poor network performance between DomU with multiqueue support

2014-12-04 Thread Zoltan Kiss



On 04/12/14 12:09, Zhangleiqiang (Trump) wrote:

I think that's expected, because guest RX data path still uses grant_copy while
>guest TX uses grant_map to do zero-copy transmit.

As I understand, the RX process is as follows:
1. Phy NIC receive packet
2. XEN Hypervisor trigger interrupt to Dom0
3. Dom0' s NIC driver do the "RX" operation, and the packet is stored into SKB 
which is also owned/shared with netback
Not that easy. There is something between the NIC driver and netback 
which directs the packets, e.g. the old bridge driver, ovs, or the IP 
stack of the kernel.

4. NetBack notify netfront through event channel that a packet is receiving
5. Netfront grant a buffer for receiving and notify netback the GR (if using 
grant-resue mechanism, netfront just notify the GR to netback) through IO Ring
It looks a bit confusing in the code, but netfront put "requests" on the 
ring buffer, which contains the grant ref of the guest page where the 
backend can copy. When the packet comes, netback consumes these requests 
and send back a response telling the guest the grant copy of the packet 
finished, it can start handling the data. (sending a response means it's 
placing a response in the ring and trigger the event channel)
And ideally netback should always have requests in the ring, so it 
doesn't have to wait for the guest to fill it up.



6. NetBack do the grant_copy to copy packet from its SKB to the buffer 
referenced by GR, and notify netfront through event channel
7. Netfront copy the data from buffer to user-level app's SKB
Or wherever that SKB should go, yes. Like with any received packet on a 
real network interface.


Am I right? Why not using zero-copy transmit in guest RX data pash too ?
Because that means you are mapping that memory to the guest, and you 
won't have any guarantee when the guest will release them. And netback 
can't just unmap them forcibly after a timeout, because finding a 
correct timeout value would be quite impossible.
A malicious/buggy/overloaded guest can hold on to Dom0 memory 
indefinitely, but it even becomes worse if the memory came from another 
guest: you can't shutdown that guest for example, until all its memory 
is returned to him.


Regards,

Zoli

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] x86/stack: Avoid peeking into unmapped guard pages when dumping Xens stack

2014-12-04 Thread Jan Beulich
>>> On 04.12.14 at 13:58,  wrote:
> Konrad: I am requesting a release ack for this change.  It aids the clarity of
> certain crash information, and prevents cascade pagefaults in certain
> circumstances, which would prevent execution of the crash kernel or a system
> reboot.

With

#ifndef NDEBUG
#define MEMORY_GUARD
#endif

I don't think this qualifies as a necessary bug fix at this point of 4.5.

> +unsigned long get_stack_trace_bottom(unsigned long sp)
> +{
> +switch ( get_stack_page(sp) )
> +{
> +case 0 ... 2:
> +return ROUNDUP(sp, PAGE_SIZE) -
> +offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);

The only concern I have here is that this may wrongly truncate a
dump/trace when one of the IST stacks overflowed. But I think we
can live with that - an overflow of the first one would already have
similar behavior today.

> +
> +#ifndef MEMORY_GUARD
> +case 3 ... 5:
> +#endif
> +case 6 ... 7:
> +return ROUNDUP(sp, STACK_SIZE) -
> +sizeof(struct cpu_info) - sizeof(unsigned long);
> +
> +#ifdef MEMORY_GUARD
> +case 3 ... 5:
> +#endif
> +default:

What is the #ifdef good for when this is "default:" anyway? With
this dropped (also from the other function)
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/4] sysctl/libxl: Provide information about IO topology

2014-12-04 Thread Ian Campbell
On Thu, 2014-12-04 at 13:22 +0100, Dario Faggioli wrote:
> On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:
> 
> > --- a/tools/libxl/libxl_internal.h
> > +++ b/tools/libxl/libxl_internal.h
> > @@ -1168,6 +1168,11 @@ _hidden int libxl__try_phy_backend(mode_t st_mode);
> >  
> >  _hidden char *libxl__devid_to_localdev(libxl__gc *gc, int devid);
> >  
> > +_hidden int libxl_pci_numdevs(libxl_ctx *ctx);
> > +_hidden int libxl_pci_topology_init(libxl_ctx *ctx,
> > +xen_sysctl_iotopo_t *iotopo,
> > +int numdev);
> > +
> IIRC, internal/hidden function should have libxl__* as prefix.

And, generally, take a gc instead of a ctx.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] systemd: use pkg-config to determine systemd library availability

2014-12-04 Thread Ian Campbell
On Thu, 2014-12-04 at 07:13 -0500, Konrad Rzeszutek Wilk wrote:
> On Dec 4, 2014 6:55 AM, Ian Campbell  wrote:
> >
> > On Wed, 2014-12-03 at 10:51 +, Ian Campbell wrote: 
> > > On Wed, 2014-12-03 at 11:49 +0100, Olaf Hering wrote: 
> > > > On Wed, Dec 03, Ian Campbell wrote: 
> > > > 
> > > > > Ah I didn't know about the sd_listen_fds thing, so I think that what 
> > > > > we 
> > > > > need then is to use pkg-config first to determine if systemd-daemon 
> > > > > is 
> > > > > present at all, and then check for specific symbols we require using 
> > > > > the 
> > > > > pkg-config supplied CFLAGS and LDFLAGS rather than assuming 
> > > > > -lsystemd-daemon. 
> > > > 
> > > > Correction: sd_listen_fds is available since at least v1. 
> > > >  git describe --contains abbbea81a8fa70badb7a05e518d6b07c360fc09d 
> > > >  v1~390 
> > > 
> > > In that case I don't think we realistically need to check for it? 
> >
> > The implication here being that Wei's original patch is correct. Konrad, 
> > do you think this is OK for the release? 
> >
> > > 
> 
> Yes. Release-Axked-By: Konrad Rzeszutek Wilk 

Applied, thanks.
> > > Ian. 
> > > 
> > > 
> > > 
> > > ___ 
> > > Xen-devel mailing list 
> > > Xen-devel@lists.xen.org 
> > > http://lists.xen.org/xen-devel 
> >
> >
> >
> > ___ 
> > Xen-devel mailing list 
> > Xen-devel@lists.xen.org 
> > http://lists.xen.org/xen-devel 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxc: check in xc_get_tot_pages() that the proper domain is reported

2014-12-04 Thread Ian Campbell
On Tue, 2014-12-02 at 13:43 -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 02, 2014 at 04:04:44PM +, Wei Liu wrote:
> > On Tue, Dec 02, 2014 at 04:18:08PM +0100, Vitaly Kuznetsov wrote:
> > > XEN_DOMCTL_getdomaininfo, which is being used by xc_domain_getinfo(), has
> > > strange interface: it reports first domain which has domid >= requested 
> > > domid
> > > so all callers are supposed to check that the proper domain(s) was queried
> > > by checking domid. xc_get_tot_pages() doesn't do that. In case the 
> > > requested
> > > domain was destroyed it will report first domain with domid > requested 
> > > domid
> > > which is apparently misleading as there is no way xc_get_tot_pages() 
> > > callers
> > > can figure out that they got tot_pages for some other domain.
> > > 
> > > Signed-off-by: Vitaly Kuznetsov 
> > 
> > Acked-by: Wei Liu 
> 
> Lets add another Ack to this party..
> 
> Release-Acked-by: Konrad Rzeszutek Wilk 

Applied, thanks.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] pygrub: Fix regression from c/s d1b93ea, attempt 2

2014-12-04 Thread Ian Campbell
On Tue, 2014-12-02 at 10:09 -0500, Konrad Rzeszutek Wilk wrote:
> > At this point in a freeze I'm much happier with:
> > 
> >  tools/pygrub/src/pygrub |4 +++-
> >  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> The same here. This is now the season of handing out band-aids.
> 
> As such Release-Acked-by: Konrad Rzeszutek Wilk 

Applied.

> > 
> > than
> >  tools/pygrub/src/ExtLinuxConf.py |6 +++---
> >  tools/pygrub/src/GrubConf.py |7 ++-
> >  tools/pygrub/src/LiloConf.py |6 +++---
> >  3 files changed, 8 insertions(+), 11 deletions(-)
> > 
> > Plus Boris' patch is far easier to reason about in isolation in a
> > dynamically/duck typed language.
> > 
> > Ian.
> > 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >