date:20160322

Re: [Xen-devel] confirm

2016-03-22 Thread muralidhar dixit

On Wed, Mar 23, 2016, 10:25 Mina Naghshnejad 
wrote:

>
> ___You Only Live Once
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7 2/2] VT-d: Fix vt-d Device-TLB flush timeout issue

2016-03-22 Thread Xu, Quan

On March 23, 2016 1:37pm,  wrote:
> > From: Xu, Quan
> > Sent: Wednesday, March 23, 2016 11:30 AM
> >
> > >
> > > Yes, still inconsistent. As I said, you put invalidation sync within
> > > dev_invalidate_iotlb, while for all other IOMMU invalidations the
> > > sync is put after. Below would be consistent then:
> > >
> > > if ( flush_dev_iotlb )
> > > ret = dev_invalidate_iotlb(iommu, did, addr, size_order,
> type);
> > > rc = dev_invalidate_iotlb_sync(...);
> > > if ( !ret )
> > > ret = rc;
> > >
> >   Kevin,
> >   now I doubt that I should put invalidation sync within
> > dev_invalidate_iotlb, which was also your suggestion.
> > As the dev_invalidate_iotlb() is invalidation for all of domain's ATS
> > devices. If in this consistent way, we couldn't Find which ATS device
> > flush timed out, then we need to hide all of domain's ATS devices.
> > Do you recall it?
> >   Also I think it is reluctant to put invalidate_sync within
> > queue_invalidate_iotlb() for consistent issue.
> > Quan
> 
> Yes I recall this story.
> 
> What about doing this? Let's wrap a _sync version for all flush interfaces, 
> like
> below:
> 
> static int queue_invalidate_context_sync(...)
> {
>   queue_invalidate_context(...);
>   return invalidate_sync(...);
> }
> 
> Then invoke _sync version at all callers, e.g.:
> static int flush_context_qi(...)
> {
>   ...
>   if ( qi_ctrl->qinval_maddr != 0 )
>   ret = queue_invalidate_context_sync(...);
> }
> 
> similarly we'll have dev_invalidate_iotlb_sync for device IOTLB flush.
> 
> It simplifies caller logic and make code more readable. :-)
> 

Agreed, I would follow it and send out v8 ASAP, then I could focus on prereq 
patch set. :)


Quan



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7 2/2] VT-d: Fix vt-d Device-TLB flush timeout issue

2016-03-22 Thread Tian, Kevin

> From: Xu, Quan
> Sent: Wednesday, March 23, 2016 11:30 AM
> 
> >
> > Yes, still inconsistent. As I said, you put invalidation sync within
> > dev_invalidate_iotlb, while for all other IOMMU invalidations the sync is 
> > put
> > after. Below would be consistent then:
> >
> > if ( flush_dev_iotlb )
> > ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
> > rc = dev_invalidate_iotlb_sync(...);
> > if ( !ret )
> > ret = rc;
> >
>   Kevin,
>   now I doubt that I should put invalidation sync within 
> dev_invalidate_iotlb, which was also
> your suggestion.
> As the dev_invalidate_iotlb() is invalidation for all of domain's ATS 
> devices. If in this
> consistent way, we couldn't
> Find which ATS device flush timed out, then we need to hide all of domain's 
> ATS devices.
> Do you recall it?
>   Also I think it is reluctant to put invalidate_sync within 
> queue_invalidate_iotlb() for
> consistent issue.
> Quan

Yes I recall this story.

What about doing this? Let's wrap a _sync version for all flush interfaces, 
like below:

static int queue_invalidate_context_sync(...)
{
queue_invalidate_context(...);
return invalidate_sync(...);
}

Then invoke _sync version at all callers, e.g.:
static int flush_context_qi(...)
{
...
if ( qi_ctrl->qinval_maddr != 0 )
ret = queue_invalidate_context_sync(...);
}

similarly we'll have dev_invalidate_iotlb_sync for device IOTLB flush.

It simplifies caller logic and make code more readable. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] Fwd: Last minute submission for OUTREACHY!

2016-03-22 Thread Mina Naghshnejad

-- Forwarded message --
From: Mina Naghshnejad 
Date: Tue, Mar 22, 2016 at 2:36 PM
Subject: Re: Last minute submission for OUTREACHY!
To: Brenton Leanhardt 



Great!
For "stages" class variable, "question", "action" types are already defined
elsewhere or should I define them? and also from what I understand Workflow
class only holds information about "question", "action",.. and is not
launching them,

-Mina


On Tue, Mar 22, 2016 at 12:18 PM, Brenton Leanhardt 
wrote:

> Hi Mina,
>
> There's still time to apply but you'll need to act fast.  Here's a task!
>
> Add a workflow module and Workflow class to the ftl_installer module
> that will eventually represent the Workflow format proposed here:
> https://github.com/ftl-toolbox/ftl_installer/pull/4
>
> The Workflow class should:
>
> 1) Take a string argument that represents the location of the workflow
> yaml file.
> 2) Store the contents of the workflow file as a native python datastructure
> 3) Have a method for returning a yaml representation of the internal
> workflow representation.
>
> Don't get too hung up on #2.  We realize it's not very well defined.
> The idea is that this module will represent the API of the workflow
> class and you might find that certain accessor methods are needed to
> make the class easier to work with.  As you get in to the details I'm
> sure you'll have questions and we can iron that out when the time
> comes.  I suspect we'll update the Workflow format based on questions
> you ask.
>
> --Brenton
>
>
> On Tue, Mar 22, 2016 at 1:49 PM, Mina Naghshnejad
>  wrote:
> > Hello!
> > I am applying for Outreachy and will have to submit my patch in the last
> > minute,
> > I am interested in
> >
> > Project 3: Learn Ansible internals and help us create FTL.
> >
> > I am a PhD student researching on Cloud infrastructure and want to do
> some
> > open source projects at this point to get more in depth insight of
> > real-world clouds.
> > I have experience working with Python and always curious about learning
> new
> > things,
> >
> > regards,
> > Mina
> >
> > ___
> > dev mailing list
> > d...@lists.openshift.redhat.com
> > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
> >
>
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] confirm

2016-03-22 Thread Mina Naghshnejad


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [V4] x86/xsaves: calculate the xstate_comp_offsets base on xstate_bv

2016-03-22 Thread Shuai Ruan

On Wed, Mar 23, 2016 at 09:28:03AM +0800, Shuai Ruan wrote:
> On Tue, Mar 22, 2016 at 04:53:02AM -0600, Jan Beulich wrote:
> > >>> On 16.03.16 at 13:12,  wrote:
> > 
> > Don't you need to use xcomp_bv here? That's what "Extended
> > Region of an XSAVE Area" in SDM Vol 1 suggests to me.
> > 
> "OPERATION OF XRSTORS" and "OPERATION OF XSAVES" in SDM Vol1.
> 
> For xsaves:
> "Execution of XSAVES performs the init optimization to reduce the amount
> of data written to memory."
> 
> For xrstors:
> "XRSTORS updates state component i based on the value of bit i in the
> XSTATE_BV field of the XSAVE header".
> 
Ignore this. You are right.
I misunderstand some XSAVES/XRSTORS behaviors in SDM . In the previous versions,
I use xcom_bv to caculate comp_offset (but changing to xstate_bv in this
version:( ).
Thanks.
> > >  /* Set XSTATE_BV and XCOMP_BV.  */
> > >  xsave->xsave_hdr.xstate_bv = xstate_bv;
> > >  xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | 
> > > XSTATE_COMPACTION_ENABLED;
> > > +setup_xstate_comp(xstate_comp_offsets, xstate_bv);
> > 
> > Same here then I think.
> > 
> > Jan
> > 
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7 2/2] VT-d: Fix vt-d Device-TLB flush timeout issue

2016-03-22 Thread Xu, Quan

On March 21, 2016 11:27am, Tian, Kevin  wrote:
> > From: Xu, Quan
> > Sent: Friday, March 18, 2016 8:22 PM

> > > >  static void queue_invalidate_iec(struct iommu *iommu, u8 granu,
> > > > u8 im, u16 iidx)  {
> > > >  unsigned long flags;
> > > > @@ -342,8 +393,6 @@ static int flush_iotlb_qi(
> > > >
> > > >  if ( qi_ctrl->qinval_maddr != 0 )
> > > >  {
> > > > -int rc;
> > > > -
> > > >  /* use queued invalidation */
> > > >  if (cap_write_drain(iommu->cap))
> > > >  dw = 1;
> > > > @@ -353,11 +402,17 @@ static int flush_iotlb_qi(
> > > >  queue_invalidate_iotlb(iommu,
> > > > type >>
> > > DMA_TLB_FLUSH_GRANU_OFFSET, dr,
> > > > dw, did, size_order, 0, addr);
> > > > +
> > > > +/*
> > > > + * Before Device-TLB invalidation we need to synchronize
> > > > + * invalidation completions with hardware.
> > > > + */
> > > > +ret = invalidate_sync(iommu);
> > > > +if ( ret )
> > > > + return ret;
> > > > +
> > > >  if ( flush_dev_iotlb )
> > > >  ret = dev_invalidate_iotlb(iommu, did, addr, size_order,
> type);
> > > > -rc = invalidate_sync(iommu);
> > > > -if ( !ret )
> > > > -ret = rc;
> > >
> > > Current change looks not consistent. For IOMMU iotlb flush, we have
> > > invalidate_sync out of invalidate operation, however below...
> > >
> >
> > Now, does it still look not consistent?
> >
> 
> Yes, still inconsistent. As I said, you put invalidation sync within
> dev_invalidate_iotlb, while for all other IOMMU invalidations the sync is put
> after. Below would be consistent then:
> 
> if ( flush_dev_iotlb )
> ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
> rc = dev_invalidate_iotlb_sync(...);
> if ( !ret )
> ret = rc;
> 
  Kevin,
  now I doubt that I should put invalidation sync within dev_invalidate_iotlb, 
which was also your suggestion.
As the dev_invalidate_iotlb() is invalidation for all of domain's ATS devices. 
If in this consistent way, we couldn't
Find which ATS device flush timed out, then we need to hide all of domain's ATS 
devices.
Do you recall it?
  Also I think it is reluctant to put invalidate_sync within 
queue_invalidate_iotlb() for consistent issue.
Quan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-3.4 test] 86898: regressions - FAIL

2016-03-22 Thread osstest service owner

flight 86898 linux-3.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/86898/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-rumpuserxen   6 xen-build fail REGR. vs. 64451
 build-i386-rumpuserxen6 xen-build fail REGR. vs. 64451

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 20 leak-check/check fail blocked in 64451
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 64451
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 64451

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass

version targeted for testing:
 linux3389604d77540abf738b486d650c1745b2d663ca
baseline version:
 linux3edd6224c2a677bb59efe0b083a51fc2b3b5c64d

Last test of basis64451  2015-11-16 12:12:15 Z  127 days
Testing same since86823  2016-03-21 14:17:05 Z1 days2 attempts


People who touched revisions under test:
  Al Viro 
  Albert Huitsing 
  Aleksei Mamlin 
  Alex Deucher 
  Alexei Potashnik 
  Aman Deep 
  Andrew Morton 
  Andrey Vagin 
  Andy Lutomirski 
  Anssi Hannula 
  Arkadiusz MiÅkiewicz 
  Arne Fitzenreiter 
  Bart Van Assche 
  Ben Hutchings 
  Ben Zhang 
  Bernhard Bender 
  bingtian...@taobao.com 
  Borislav Petkov 
  Brian Campbell 
  Chris Mason 
  Chris Metcalf 
  Claudio Cappelli 
  Clemens Ladisch 
  Cong Wang 
  Dan Carpenter 
  Daniel Borkmann 
  Daniel Vetter 
  Darren Lavender 
  David Ahern 
  David Daney 
  David Howells 
  David S. Miller 
  Dennis Yang 
  Dirk Behme 
  Dmitry Torokhov 
  Dmitry Vyukov 
  Dominic SacrÃ© 
  Don Zickus 
  Doug Ledford 
  Edward Hyunkoo Jee 
  Eric Dumazet 
  Eric Dumazet 
  Eric Northup 
  Felipe Balbi 
  Felipe Balbi 
  Felix Fietkau 
  Filipe Manana 
  Gioh Kim 
  Greg Kroah-Hartman 
  Hannes Frederic Sowa 
  Heiko Carstens 
  Herbert Xu 
  Herton R. Krzesinski 
  Huang Rui 
  Ingo Molnar 
  James Bottomley 
  Jan Beulich 
  Jason Gunthorpe 
  Jason Wang 
  Jiri Pirko 
  Joe Perches 
  Joe Stringer 
  Joe Thornber 
  Johan Hovold 
  Johannes Berg 
  John Soni Jose 
  John Youn 
  John Youn 
  Joseph Qi 
  Juergen Gross 
  Julian Anastasov 
  Kai MÃ¤kisara

Re: [Xen-devel] [PATCH v7 2/2] VT-d: Fix vt-d Device-TLB flush timeout issue

2016-03-22 Thread Xu, Quan

(( __ sorry, I was out of office on Mon./Tues. __))

On March 21, 2016 11:27am, Tian, Kevin  wrote:
> > From: Xu, Quan
> > Sent: Friday, March 18, 2016 8:22 PM
> > > >  static void queue_invalidate_iec(struct iommu *iommu, u8 granu,
> > > > u8 im, u16 iidx)  {
> > > >  unsigned long flags;
> > > > @@ -342,8 +393,6 @@ static int flush_iotlb_qi(
> > > >
> > > >  if ( qi_ctrl->qinval_maddr != 0 )
> > > >  {
> > > > -int rc;
> > > > -
> > > >  /* use queued invalidation */
> > > >  if (cap_write_drain(iommu->cap))
> > > >  dw = 1;
> > > > @@ -353,11 +402,17 @@ static int flush_iotlb_qi(
> > > >  queue_invalidate_iotlb(iommu,
> > > > type >>
> > > DMA_TLB_FLUSH_GRANU_OFFSET, dr,
> > > > dw, did, size_order, 0, addr);
> > > > +
> > > > +/*
> > > > + * Before Device-TLB invalidation we need to synchronize
> > > > + * invalidation completions with hardware.
> > > > + */
> > > > +ret = invalidate_sync(iommu);
> > > > +if ( ret )
> > > > + return ret;
> > > > +
> > > >  if ( flush_dev_iotlb )
> > > >  ret = dev_invalidate_iotlb(iommu, did, addr, size_order,
> type);
> > > > -rc = invalidate_sync(iommu);
> > > > -if ( !ret )
> > > > -ret = rc;
> > >
> > > Current change looks not consistent. For IOMMU iotlb flush, we have
> > > invalidate_sync out of invalidate operation, however below...
> > >
> >
> > Now, does it still look not consistent?
> >
> 
> Yes, still inconsistent. As I said, you put invalidation sync within
> dev_invalidate_iotlb, while for all other IOMMU invalidations the sync is put
> after. Below would be consistent then:
> 
> if ( flush_dev_iotlb )
> ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
> rc = dev_invalidate_iotlb_sync(...);
> if ( !ret )
> ret = rc;
> 

Make sense, I will fix it in next v8.
Now this patch set looks clear, I'd better summarize and send out v8. Of 
course, I would continue to
explain and fix the prereq patch set.


Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [V5] x86/xsaves: fix overwriting between non-lazy/lazy xsaves

2016-03-22 Thread Shuai Ruan

On Tue, Mar 22, 2016 at 08:34:33AM -0600, Jan Beulich wrote:
> >>> On 18.03.16 at 04:01,  wrote:
> >   * Copy legacy XSAVE area, to avoid complications with CPUID
> >   * leaves 0 and 1 in the loop below.
> >   */
> >  memcpy(xsave, src, FXSAVE_SIZE);
> >  
> > -/* Set XSTATE_BV and XCOMP_BV.  */
> > +/* Set XSTATE_BV.  */
> >  xsave->xsave_hdr.xstate_bv = xstate_bv;
> > -xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | 
> > XSTATE_COMPACTION_ENABLED;
> >  setup_xstate_comp(xstate_comp_offsets, xstate_bv);
> 
> I see you set xcomp_bv (and hence the compaction bit) in xrstor()
> now, but afaict that doesn't allow you to completely drop initializing
> the field here, as the code there looks at the compaction bit.
> 
> > +else
> > +XRSTOR("0x48,","0x0f,0xae,0x2f"); /* xrstor */
> 
> At this point, what guarantees that xcomp_bv is zero, no matter
> where the state to be loaded originates from? I think at least in
> arch_set_info_guest(), hvm_load_cpu_ctxt(), and
> hvm_vcpu_reset_state() you went too far deleting code, and you
> really need to keep the storing of zero there. Did you draw, just
> for yourself, mentally or on a sheet of paper, a diagram illustrating
> the various state transitions?
> 
For above two comments.

The patch is base on [v4]x86/xsaves: calculate the xstate_comp_offsets base
on xstate_bv and I answer your question on why caculate xstate_comp_offset
based on xstate_bv in previous thread. If that is right, drop
initializing xcomp_bv is ok. Now xcomp_bv can guarantee to be zero for 
arch_set_info_guest() and hvm_load_cpu_ctxt(). If the drop is wrong
(due to my misunderstand of the SDM), I will change the if () here.

For hvm_vcpu_reset_state(), we should depend on whether xsaves is used 
to decide whether to init xcomp_bv or not. And currently we use
xcr0_accum to indicate the use of xsaves, when hvm_vcpu_reset_state()
is called , can vcpu->xcr0_accum indicate using of xsaves ?
I think in hvm_vcpu_reset_state(), we should leave xcomp_bv zero.

> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [V4] x86/xsaves: calculate the xstate_comp_offsets base on xstate_bv

2016-03-22 Thread Shuai Ruan

On Tue, Mar 22, 2016 at 04:53:02AM -0600, Jan Beulich wrote:
> >>> On 16.03.16 at 13:12,  wrote:
> 
> Don't you need to use xcomp_bv here? That's what "Extended
> Region of an XSAVE Area" in SDM Vol 1 suggests to me.
> 
"OPERATION OF XRSTORS" and "OPERATION OF XSAVES" in SDM Vol1.

For xsaves:
"Execution of XSAVES performs the init optimization to reduce the amount
of data written to memory."

For xrstors:
"XRSTORS updates state component i based on the value of bit i in the
XSTATE_BV field of the XSAVE header".

> >  /* Set XSTATE_BV and XCOMP_BV.  */
> >  xsave->xsave_hdr.xstate_bv = xstate_bv;
> >  xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | 
> > XSTATE_COMPACTION_ENABLED;
> > +setup_xstate_comp(xstate_comp_offsets, xstate_bv);
> 
> Same here then I think.
> 
> Jan
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v11]xen: sched: convert RTDS from time to event driven model

2016-03-22 Thread Dario Faggioli

On Fri, 2016-03-18 at 01:45 -0600, Jan Beulich wrote:
> > 
> > > > On 18.03.16 at 05:09,  wrote:
> > Great job! However, we still have 1 mile in the 100-mile journey.
> > :-D
> > 
> > I applied the patch on staging and tried some test cases. One of
> > them
> > is as follows:
> > 
> > I tried to create a cpupool and then migrate a VM to the new
> > cpupool;
>
BTW, Meng:

(XEN)[] schedule_cpu_switch+0x250/0x28a
(XEN)[] cpupool.c#cpupool_assign_cpu_locked+0x31/0x11f

I think you mean "and then move a CPU from a cpupool to another". Or
perhaps what you said is what your script does, and you weren't sure at
what stage it explodes.

Well, let me tell you: it's when you move a CPU between pools that have
schedulers that remaps the scheduler locks (such as Credit2-->RTDS and
vice versa).

> > However, the system triggers the bug as below. I guess this is some
> > kind of bug that are known to us,  and Dario had some uncommitted
> > patch to fix it, IIRC?
> In the context of this patch the most relevant question is: Is this
> an issue with the patch, or one that existed already before? 
>
Exactly!

And the answer is:
 - it's pre-existing
 - it's an even bigger issue than that ASSERT triggering (i.e., there
   are potential races even when things works)
 - I'm taking care of it.

> After
> all that's what we're in need to know whether the change can go
> in. 
>
Yep.

> And skimming over the patch, it doesn't seem to alter code
> related to where you see things blow up.
> 
Indeed it does not.

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] arm: Fix asynchronous aborts (SError exceptions) due to bogus PTEs

2016-03-22 Thread Shanker Donthineni



On 03/22/2016 05:21 PM, Julien Grall wrote:
> (CC some ARM folks)
>
> On 21/03/2016 23:18, Shanker Donthineni wrote:
>> Hi Julien,
>
> Hello Shanker,
>
> Sorry for the late answer.
>
>> Do you have any other comments to be addressed?
>
> I have a question regarding the implication for what you wrote in the commit.
>
> As far as I understand, any speculative table walk might cause an imprecise 
> asynchronous abort. So if a guest is using page tables that contain garbage, 
> it would be possible to receive an SError. Am I right?
>

Yes, you are right (applies to EL1 TTBR0/TTBR1, EL2 TTBR0/TTBR1 and EL3 TTBR0 
tables).

>>
>> On 03/16/2016 02:08 PM, Shanker Donthineni wrote:
>>> From: Vikram Sethi 
>>>
>>> ARMv8 architecture allows performing prefetch data/instructions
>>> from memory locations marked as normal memory. Prefetch does not
>>> mean that the data/instruction has to be used/executed in code
>>> flow. All PTEs that appear to be valid to MMU must contain valid
>>> physical address with proper attributes otherwise MMU table walk
>>> might cause imprecise asynchronous aborts.
>>>
>>> The way current XEN code is preparing page tables for frametable
>>> and xenheap memory can create bogus PTEs. This patch fixes the
>>> issue by clearing page table memory before populating EL2 L0/L1
>>> PTEs. Without this patch XEN crashes on Qualcomm Technologies
>>> server chips due to asynchronous aborts.
>>>
>>> The speculative/prefetch feature explanation is scattered everywhere
>>> in ARM specification but below two sections have useful information.
>>>
>>> E2.8 Memory types and attributes
>>> G4.12.6 External abort on a translation table walk
>
> As said on an earlier version of this patch, please mention the version of 
> the spec when you quote it.
>

Sure, should I post V3 patch mentioning ARM spec version?
>>>
>>> Signed-off-by: Vikram Sethi 
>>> Signed-off-by: Shanker Donthineni 
>>> ---
>>> Changes since v1:
>>>  Replace memset() with clear_page()
>>>  Edit commit description
>>>
>>>   xen/arch/arm/mm.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>>> index 81f9e2e..3fda8f3 100644
>>> --- a/xen/arch/arm/mm.c
>>> +++ b/xen/arch/arm/mm.c
>>> @@ -730,6 +730,8 @@ void __init setup_xenheap_mappings(unsigned long 
>>> base_mfn,
>>>   else
>>>   {
>>>   unsigned long first_mfn = alloc_boot_pages(1, 1);
>>> +
>>> +clear_page(mfn_to_virt(first_mfn));
>>>   pte = mfn_to_xen_entry(first_mfn, WRITEALLOC);
>>>   pte.pt.table = 1;
>>>   write_pte(p, pte);
>>> @@ -773,6 +775,7 @@ void __init setup_frametable_mappings(paddr_t ps, 
>>> paddr_t pe)
>>>   second = mfn_to_virt(second_base);
>>>   for ( i = 0; i < nr_second; i++ )
>>>   {
>>> +clear_page(mfn_to_virt(second_base + i));
>>>   pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
>>>   pte.pt.table = 1;
>>>   
>>> write_pte(_first[first_table_offset(FRAMETABLE_VIRT_START)+i], pte);
>>
>
> Regards,
>

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] arm: Fix asynchronous aborts (SError exceptions) due to bogus PTEs

2016-03-22 Thread Julien Grall


(CC some ARM folks)

On 21/03/2016 23:18, Shanker Donthineni wrote:

Hi Julien,


Hello Shanker,

Sorry for the late answer.


Do you have any other comments to be addressed?


I have a question regarding the implication for what you wrote in the 
commit.


As far as I understand, any speculative table walk might cause an 
imprecise asynchronous abort. So if a guest is using page tables that 
contain garbage, it would be possible to receive an SError. Am I right?




On 03/16/2016 02:08 PM, Shanker Donthineni wrote:

From: Vikram Sethi 

ARMv8 architecture allows performing prefetch data/instructions
from memory locations marked as normal memory. Prefetch does not
mean that the data/instruction has to be used/executed in code
flow. All PTEs that appear to be valid to MMU must contain valid
physical address with proper attributes otherwise MMU table walk
might cause imprecise asynchronous aborts.

The way current XEN code is preparing page tables for frametable
and xenheap memory can create bogus PTEs. This patch fixes the
issue by clearing page table memory before populating EL2 L0/L1
PTEs. Without this patch XEN crashes on Qualcomm Technologies
server chips due to asynchronous aborts.

The speculative/prefetch feature explanation is scattered everywhere
in ARM specification but below two sections have useful information.

E2.8 Memory types and attributes
G4.12.6 External abort on a translation table walk


As said on an earlier version of this patch, please mention the version 
of the spec when you quote it.




Signed-off-by: Vikram Sethi 
Signed-off-by: Shanker Donthineni 
---
Changes since v1:
 Replace memset() with clear_page()
 Edit commit description

  xen/arch/arm/mm.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 81f9e2e..3fda8f3 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -730,6 +730,8 @@ void __init setup_xenheap_mappings(unsigned long base_mfn,
  else
  {
  unsigned long first_mfn = alloc_boot_pages(1, 1);
+
+clear_page(mfn_to_virt(first_mfn));
  pte = mfn_to_xen_entry(first_mfn, WRITEALLOC);
  pte.pt.table = 1;
  write_pte(p, pte);
@@ -773,6 +775,7 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t 
pe)
  second = mfn_to_virt(second_base);
  for ( i = 0; i < nr_second; i++ )
  {
+clear_page(mfn_to_virt(second_base + i));
  pte = mfn_to_xen_entry(second_base + i, WRITEALLOC);
  pte.pt.table = 1;
  write_pte(_first[first_table_offset(FRAMETABLE_VIRT_START)+i], 
pte);




Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 6/6] x86/pat: Document PAT initializations

2016-03-22 Thread Toshi Kani

On Tue, 2016-03-22 at 18:02 +0100, Borislav Petkov wrote:
> On Wed, Mar 16, 2016 at 06:46:59PM -0600, Toshi Kani wrote:
> > Update PAT documentation to describe how PAT is initialized under
> > various configurations.
> > 
 :
> >  
> > +PAT Initialization
> > +--
> > +
> > +The following table describes how PAT is initialized under various
> > +configurations. PAT must be set to enable to initialize PAT MSR in
> > order
> 
> Err "PAT MSR must be updated by Linux in order to support WC and WT" ...
> or so?

Right. Will do.

> > +to support WC and WT attributes. Otherwise, PAT keeps PAT MSR value
> > set
> > +by BIOS.
> 
> "Otherwise, the PAT MSR has the value programmed in it by the firmware."

Will do.

> > Note, Xen enables WC attribute in BIOS setup for guests.
> > +
> > + MTRR PAT   Call Sequence   PAT State  PAT MSR
> > + =
> > + EE MTRR -> pat_init()  Enable OS
> 
> s/Enable/Enabled/

Will do.

> MTRR->pat_init() - either use function names for both or do pseudo like
> so:
> 
> MTRR init -> PAT init

OK, I will change all to pseudo.  MTRR has multiple caller functions, and
we do not have enough space to write them all.

> > + ED MTRR -> pat_init()  Disable-
> 
> s/Disable/Disabled/. Ditto for the rest.

Will do.

> > + DE MTRR -> pat_disable()   DisableBIOS
> > + DD MTRR -> pat_disable()   Disable-
> > + -np/E  nopat() -> pat_disable()DisableBIOS
> > + -np/D  nopat() -> pat_disable()Disable-
> > + E!P/E  MTRR -> pat_init()  DisableBIOS
> > + D!P/E  MTRR -> pat_disable()   DisableBIOS
> > + !M   !P/E  MTRR stub -> pat_disable()  DisableBIOS
> > +
> > + Legend
> > + 
> > + EFeature enabled in CPU
> > + D   Feature disabled/unsupported in CPU
> > + np  "nopat" boot option specified
> > + !P  CONFIG_X86_PAT option unset
> > + !M  CONFIG_MTRR option unset
> > + Enable   PAT state set to enable
> > + Disable  PAT state set to disable
> > + OS   PAT initializes PAT MSR with OS setup
> > + BIOS PAT keeps PAT MSR with BIOS setup
> > +

Thanks,
-Toshi

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-3.4 baseline-only test] 44273: regressions - FAIL

2016-03-22 Thread Platform Team regression test user

This run is configured for baseline tests only.

flight 44273 linux-3.4 real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/44273/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-rumpuserxen   6 xen-build fail REGR. vs. 38303
 build-i386-rumpuserxen6 xen-build fail REGR. vs. 38303
 test-amd64-amd64-xl  19 guest-start/debian.repeat fail REGR. vs. 38303

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemut-winxpsp3  9 windows-install  fail like 38303

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass
 test-amd64-amd64-qemuu-nested-intel 16 debian-hvm-install/l1/l2 fail never pass

version targeted for testing:
 linux3389604d77540abf738b486d650c1745b2d663ca
baseline version:
 linux3edd6224c2a677bb59efe0b083a51fc2b3b5c64d

Last test of basis38303  2015-11-17 23:02:20 Z  125 days
Testing same since44273  2016-03-22 11:52:37 Z0 days1 attempts


People who touched revisions under test:
  Al Viro 
  Albert Huitsing 
  Aleksei Mamlin 
  Alex Deucher 
  Alexei Potashnik 
  Aman Deep 
  Andrew Morton 
  Andrey Vagin 
  Andy Lutomirski 
  Anssi Hannula 
  Arkadiusz MiÅkiewicz 
  Arne Fitzenreiter 
  Bart Van Assche 
  Ben Hutchings 
  Ben Zhang 
  Bernhard Bender 
  bingtian...@taobao.com 
  Borislav Petkov 
  Brian Campbell 
  Chris Mason 
  Chris Metcalf 
  Claudio Cappelli 
  Clemens Ladisch 
  Cong Wang 
  Dan Carpenter 
  Daniel Borkmann 
  Daniel Vetter 
  Darren Lavender 
  David Ahern 
  David Daney 
  David Howells 
  David S. Miller 
  Dennis Yang 
  Dirk Behme 
  Dmitry Torokhov 
  Dmitry Vyukov 
  Dominic SacrÃ© 
  Don Zickus 
  Doug Ledford 
  Edward Hyunkoo Jee 
  Eric Dumazet 
  Eric Dumazet 
  Eric Northup 
  Felipe Balbi 
  Felipe Balbi 
  Felix Fietkau 
  Filipe Manana 
  Gioh Kim 
  Greg Kroah-Hartman 
  Hannes Frederic Sowa 
  Heiko Carstens 
  Herbert Xu 
  Herton R. Krzesinski 
  Huang Rui 
  Ingo Molnar 
  James Bottomley 
  Jan Beulich 
  Jason Gunthorpe 
  Jason Wang 
  Jiri Pirko 
  Joe Perches 
  Joe Stringer 
  Joe Thornber 
  Johan Hovold

Re: [Xen-devel] [PATCH v2 4/6] x86/mtrr: Fix PAT init handling when MTRR MSR is disabled

2016-03-22 Thread Toshi Kani

On Tue, 2016-03-22 at 18:01 +0100, Borislav Petkov wrote:
> Subject: [PATCH v2 4/6] x86/mtrr: Fix PAT init handling when MTRR MSR is
> disabled
> 
> s/ MSR//

Will do.

> On Wed, Mar 16, 2016 at 06:46:57PM -0600, Toshi Kani wrote:
> > get_mtrr_state() calls pat_init() on BSP even if MTRR is disabled
> > by its MSR.
> 
> s/by its MSR//

Will do.

> > This causes pat_init() to be called on BSP only since APs do not call
> 
> This doesn't cause that - get_mtrr_state() is called only on the BSP by
> mtrr_bp_init().

Right, I will change it to "results" or something.

> > pat_init() when MTRR is disabled. This inconsistency between BSP and
> > APs leads undefined behavior.
> 
>   leads to

Will do.

> > Move BSP's PAT init code from get_mtrr_state() to mtrr_bp_pat_init().
> > Change mtrr_bp_init() to call mtrr_bp_pat_init() if MTRR is enabled.
> 
> No need for those.

OK.

> > This keeps BSP's calling condition to pat_init() consistent with AP's,
> > mtrr_ap_init() and mtrr_aps_init().
> 
> This one is fine.

:-)

Thanks,
-Toshi

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/6] x86/mtrr: Fix Xorg crashes in Qemu sessions

2016-03-22 Thread Toshi Kani

On Tue, 2016-03-22 at 18:00 +0100, Borislav Petkov wrote:
> On Wed, Mar 16, 2016 at 06:46:56PM -0600, Toshi Kani wrote:
> > A Xorg failure on qemu32 was reported as a regression caused
> > by 'commit 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is
> > disabled")'. [1]  This patch fixes the regression.
> 
> I hope so.
> 
> > Negative effects of this regression were two failures in Xorg
> > on qemu32 env, which were triggered by the fact that its virtual
> 
> "... with QEMU CPU model "qemu32" (-cpu qemu32) ... "

Will add this description.

> > CPU does not support MTRR. [2]
> > 
> >  #1. copy_process() failed in the check in reserve_pfn_range()
> > 
> > copy_process
> >  copy_mm
> >   dup_mm
> >    dup_mmap
> > copy_page_range
> >  track_pfn_copy
> >   reserve_pfn_range
> 
> Here's where you describe why it failed the check and which check.

Will do.

> >  #2. error path in copy_process() then hit WARN_ON_ONCE in
> >  untrack_pfn().
> > 
> >  x86/PAT: Xorg:509 map pfn expected mapping type uncached-
> >  minus for [mem 0xfd00-0xfdff], got write-combining
> >   Call Trace:
> >  dump_stack+0x58/0x79
> >  warn_slowpath_common+0x8b/0xc0
> >  ? untrack_pfn+0x9f/0xb0
> >  ? untrack_pfn+0x9f/0xb0
> >  warn_slowpath_null+0x22/0x30
> >  untrack_pfn+0x9f/0xb0
> >  ? __kunmap_atomic+0x54/0x110
> >  unmap_single_vma+0x56f/0x580
> >  ? pagevec_move_tail_fn+0xa0/0xa0
> >  unmap_vmas+0x43/0x60
> >  exit_mmap+0x5f/0xf0
> >  mmput+0x2d/0xa0
> >  copy_process.part.47+0x1229/0x1430
> >  _do_fork+0xb4/0x3b0
> >  SyS_clone+0x2c/0x30
> >  do_syscall_32_irqs_on+0x54/0xb0
> >  entry_INT80_32+0x2a/0x2a
> 
> You can delete the offsets after the "+" - they're useless.

Will do.

> > These negative effects are caused by two separate bugs, but they
> > can be dealt in lower priority.
> 
> ??

Will change to "they can be addressed in separate patches."

> > Fixing the pat_init() issue below
> > avoids Xorg to hit these cases.
> > 
> > When the CPU does not support MTRR, MTRR does not call pat_init(),
> > which leaves PAT enabled without initializing PAT.  This pat_init()
> > issue is a long-standing issue, but manifested as issue #1 (and then
> > hit issue #2) with the commit
> 
> commit 9cd25aac1f44 ?

Yes. I had to remove this number since checkpatch complained that I needed
to quote the whole patch tile again.  I will ignore this checkpatch error
and add this commit number here.

> > because the memtype now tracks cache
> > attribute with 'page_cache_mode'.  A WC map request is tracked as WC
> > in memtype, but sets a PTE as UC (pgprot) per __cachemode2pte_tbl[].
> > This caused the error in reserve_pfn_range() when it was called from
> > track_pfn_copy(), which obtained pgprot from a PTE.  It converts
> > pgprot to page_cache_mode, which does not necessarily result in
> > the original page_cache_mode since __cachemode2pte_tbl[] redirects
> > multiple types to UC.  This is a separate issue in reserve_pfn_range().
> 
> Good.
> 
> > This pat_init() issue existed before the commit, but we used pgprot
> > in memtype.  Hence, we did not have issue #1 before.  But WC request
> > resulted in WT in effect because WC pgrot is actually WT when PAT
> > is not initialized.  This is not how it was designed to work.  When
> > PAT is set to disable properly, WC is converted to UC.  The use of
> > WT can result in a system crash if the target range does not support
> > WT.  Fortunately, nobody ran into such issue before.
> > 
> > To fix this pat_init() issue, PAT code has been enhanced to provide
> > pat_disable() interface, which disables the OS to initialize PAT MSR,
> 
>   ... prevents the OS from initializing the
> PAT MSR.

Will do.

> > and sets PAT table to the BIOS handoff state.
> 
> > This patch changes
> > MTRR code to call pat_disable() when MTRR is disabled as PAT cannot
> > be initialized in this case.  This sets PAT to disable properly, and
> > makes PAT code to bypass the memtype check.  This avoids issue #1
> > (which can be dealt in lower priority).
> 
> You don't need all that text from "This patch ..." on - we can see that
> in the diff. The commit message needs to contain "why" not "what".

OK.

> >  
> >  /*
> > @@ -83,9 +84,12 @@ static inline int mtrr_trim_uncached_memory(unsigned
> > long end_pfn)
> >  static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
> >  {
> >  }
> > +static inline void mtrr_bp_init(void)
> > +{
> > +   pat_disable("Skip PAT initialization");
> 
> Make that more user-friendly:
> 
>      "MTRRs disabled, skipping PAT initialization too."

Agreed. Will do.

> > +}
> >  
> >  #define mtrr_ap_init() do {} while (0)
> > -#define mtrr_bp_init() do {} while (0)
> >  #define set_mtrr_aps_delayed_init() do {} while (0)
> >  #define mtrr_aps_init() do {} while (0)
> >  #define mtrr_bp_restore() do {} while (0)
> > diff --git

Re: [Xen-devel] [PATCH v2] docs: update FLASK cmd line instructions

2016-03-22 Thread Konrad Rzeszutek Wilk

On Fri, Mar 18, 2016 at 11:46:03AM -0500, Doug Goldstein wrote:
> The command line instructions for FLASK include a note on how to compile
> Xen with FLASK but the note was out of date after the change to Kconfig.
> 
> Signed-off-by: Doug Goldstein 

Reviewed-by: Konrad Rzeszutek Wilk 

> ---
> CC: Ian Jackson 
> CC: Jan Beulich 
> CC: Keir Fraser 
> CC: Tim Deegan 
> CC: Konrad Rzeszutek Wilk 
> CC: Daniel De Graaf 
> 
> change since v1:
> - add menuconfig and config entries as suggested by Konrad
> - caught another place mentioning XSM_ENABLE
> ---
>  docs/misc/xen-command-line.markdown | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown 
> b/docs/misc/xen-command-line.markdown
> index ca77e3b..e4e4437 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -665,8 +665,10 @@ to use the default.
>  > Default: `permissive`
>  
>  Specify how the FLASK security server should be configured.  This option is 
> only
> -available if the hypervisor was compiled with XSM support (which can be 
> enabled
> -by setting XSM\_ENABLE = y in .config).
> +available if the hypervisor was compiled with FLASK support.  This can be
> +enabled by running either:
> +- make -C xen config and enabling XSM and FLASK.
> +- make -C xen menuconfig and enabling 'FLux Advanced Security Kernel 
> support' and 'Xen Security Modules support'
>  
>  * `permissive`: This is intended for development and is not suitable for use
>with untrusted guests.  If a policy is provided by the bootloader, it will 
> be
> @@ -805,7 +807,7 @@ Paging (HAP).
>  Enable late hardware domain creation using the specified domain ID.  This is
>  intended to be used when domain 0 is a stub domain which builds a 
> disaggregated
>  system including a hardware domain with the specified domain ID.  This 
> option is
> -supported only when compiled with XSM\_ENABLE=y on x86.
> +supported only when compiled with XSM on x86.
>  
>  ### hest\_disable
>  > ` = `
> -- 
> 2.7.3
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xenconsole: update help message

2016-03-22 Thread Konrad Rzeszutek Wilk

On Tue, Mar 22, 2016 at 07:36:58PM +, Wei Liu wrote:
> On Tue, Mar 22, 2016 at 12:00:24PM -0500, Doug Goldstein wrote:
> > The help message did not include information about the --type parameter.
> > 
> > Signed-off-by: Doug Goldstein 
> > ---
> > CC: Ian Jackson 
> > CC: Stefano Stabellini 
> > CC: Wei Liu 
> 
> Acked-by: Wei Liu 

Applied.
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] pygrub: remove isconfig option

2016-03-22 Thread Boris Ostrovsky


On 03/22/2016 04:46 PM, Firas Azar wrote:

Andrew:

Currently if we use pygrub --isconfig , there is a good chance 
that this command could fail with an "out of memory" error. This can 
happen especially if the image file in question is at least a few 
gigabytes in size and the dom0_mem setting is relatively small. So 
even if any software used this option it is likely it would fail. This 
problem is due to the parser using "readlines" function in 
GrubConf.py, which attempts to load the entire file into memory.


FWIW, the same problem is observed on unstable. Running this option 
results in very high disk IO rate and eventually OOM killer wakes up and 
whacks pygrub.


So chances are this has been broken for a while since I believe Firas is 
working with 4.4. I haven't tried anything other than unstable though.


-boris




The advantage of "-l -n" option is that it exercises the "real" code 
path, which boots the PV guests.


Regards

--Firas

On 03/22/2016 04:22 PM, Andrew Cooper wrote:

On 22/03/16 20:06, Firas Azar wrote:
The pygrub command option "isconfig" is broken and obsolete. This 
patch removes it since the alternate options "-l -n" provide the 
same functionality.

Broken how?

Are you sure that no existing software is using that option?

~Andrew



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/6] x86/mm/pat: Add pat_disable() interface

2016-03-22 Thread Toshi Kani

On Tue, 2016-03-22 at 17:59 +0100, Borislav Petkov wrote:
> On Wed, Mar 16, 2016 at 06:46:55PM -0600, Toshi Kani wrote:
> > In preparation to fix a regression caused by 'commit 9cd25aac1f44
> > ("x86/mm/pat: Emulate PAT when it is disabled")', PAT needs to
> > provide an interface that disables the OS to initialize PAT MSR.
> 
>   prevents the OS from initializing the PAT MSR.

Right. Will do.

> > 
> > PAT MSR initialization must be done on all CPUs with the specific
> 
> s/with/using/

Ditto.

> > sequence of operations defined in Intel SDM.  This requires MTRR
>      ^
>     the
> 
> s/MTRR/MTRRs/

Ditto.

> > to be enabled since pat_init() is called as part of MTRR init
> > from mtrr_rendezvous_handler().
> > 
> > Change pat_disable() as the interface to disable the OS to initialize
> > PAT MSR, and set PAT table with pat_keep_handoff_state().  This
> > interface can be called when PAT initialization may not be performed.
> 
> This paragraph reads funky and I can't really parse what it is trying to
> say.

Sorry... Here is a retry:

Make pat_disable() as the interface that prevents the OS from initializing
the PAT MSR.  MTRR will call this interface when it cannot provide the SDM-
defined sequence to initialize PAT.

> > This also assures that pat_disable() called from pat_bsp_init()
> > to set PAT table properly when CPU does not support PAT.
> > 
 :
> >  
> > -static inline void pat_disable(const char *reason)
> > +/**
> > + * pat_disable() - Disable the OS to initialize PAT MSR
> 
>   
> 
> Err, what? The function name can't be more clear.

Will change to "Prevent the OS from initializing the PAT MSR".

I wanted to clarify that "disable" does not mean to disable PAT MSR.

> > + *
> > + * This function disables the OS to initialize PAT MSR, and calls
> 
>   "prevents the OS from initializing the PAT MSR..."

Will do.

> > + * pat_keep_handoff_state() to set PAT table to the handoff state.
> 
> We can see what is calls. You're explaining *what* the code does instead
> of *why* again.

Right...

> > + */
> > +void pat_disable(const char *reason)
> >  {
> 
> Why aren't you checking __pat_enabled here?
> 
>   if (!__pat_enabled)
>   return;

pat_keep_handoff_state() is a no-op after the initial call, but I agree
that having this check is better.  Will do.

> You can save yourself the other guards in that function, especially that
> pr_err() below.

The pr_err() below is for a difference case -- PAT is enabled, but a call
is made to disable it after pat_init() is called.  We cannot allow this
case.

> > +   if (boot_cpu_done) {
> > +   pr_err("x86/PAT: PAT cannot be disabled after
> > initialization "
> > +      "(attempting: %s)\n", reason);
> 
> Please integrate checkpatch.pl into your patch creation workflow as it
> sometimes has valid complaints:
> 
> WARNING: quoted string split across lines
> #79: FILE: arch/x86/mm/pat.c:55:
> +   pr_err("x86/PAT: PAT cannot be disabled after
> initialization "
> +  "(attempting: %s)\n", reason);

I've run checkpatch.pl and thought it was OK to have this warning (instead
of a >80 warning) since the error message part was not split.  The
"attempting" part is for debugging and its string is passed from the
caller. 

> More to the point: why do we need that pr_err() call? What is that
> supposed to tell the user?
> 
> I think it is more for the programmer to catch wrong use of
> pat_disable() and then it should be WARN_ONCE() or so...

Yes, this case is for the programmer to catch wrong use.  I will change it
to use WARN_ONCE() and remove the "(attempting: %s)\n" part of the message.

> > +   return;
> > +   }
> > +
> >     __pat_enabled = 0;
> >     pr_info("x86/PAT: %s\n", reason);
> > +
> > +   pat_keep_handoff_state();
> >  }
> >  
> >  static int __init nopat(char *str)
> > @@ -202,7 +217,7 @@ static void pat_bsp_init(u64 pat)
> >  {
> >     u64 tmp_pat;
> >  
> > -   if (!cpu_has_pat) {
> > +   if (!boot_cpu_has(X86_FEATURE_PAT)) {
> >     pat_disable("PAT not supported by CPU.");
> >     return;
> >     }
> > @@ -220,7 +235,7 @@ static void pat_bsp_init(u64 pat)
> >  
> >  static void pat_ap_init(u64 pat)
> >  {
> > -   if (!cpu_has_pat) {
> > +   if (!boot_cpu_has(X86_FEATURE_PAT)) {
> >     /*
> >      * If this happens we are on a secondary CPU, but
> > switched to
> >      * PAT on the boot CPU. We have no way to undo PAT.
> 
> Those last two hunks are unrelated changes and should be a separate
> patch.

Will do.

Thanks,
-Toshi

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] pygrub: remove isconfig option

2016-03-22 Thread Firas Azar


Andrew:

Currently if we use pygrub --isconfig , there is a good chance 
that this command could fail with an "out of memory" error. This can 
happen especially if the image file in question is at least a few 
gigabytes in size and the dom0_mem setting is relatively small. So even 
if any software used this option it is likely it would fail. This 
problem is due to the parser using "readlines" function in GrubConf.py, 
which attempts to load the entire file into memory.


The advantage of "-l -n" option is that it exercises the "real" code 
path, which boots the PV guests.


Regards

--Firas

On 03/22/2016 04:22 PM, Andrew Cooper wrote:

On 22/03/16 20:06, Firas Azar wrote:

The pygrub command option "isconfig" is broken and obsolete. This patch removes it since 
the alternate options "-l -n" provide the same functionality.

Broken how?

Are you sure that no existing software is using that option?

~Andrew



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/5] x86/time: implement tsc as clocksource

2016-03-22 Thread Joao Martins



On 03/22/2016 04:02 PM, Jan Beulich wrote:
 On 22.03.16 at 16:51,  wrote:
> 
>>
>> On 03/22/2016 12:46 PM, Jan Beulich wrote:
>> On 22.03.16 at 13:41,  wrote:
>>>

 On 03/18/2016 08:21 PM, Andrew Cooper wrote:
> On 17/03/16 16:12, Joao Martins wrote:
>> Introduce support for using TSC as platform time which is the highest
>> resolution time and most performant to get (~20 nsecs).  Though there
>> are also several problems associated with its usage, and there isn't a
>> complete (and architecturally defined) guarantee that all machines
>> will provide reliable and monotonic TSC across all CPUs, on different
>> sockets and different P/C states.  I believe Intel to be the only that
>> can guarantee that. For this reason it's set with less priority when
>> compared to HPET unless adminstrator changes "clocksource" boot option
>> to "tsc". Upon initializing it, we also check for time warps and
>> appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
>> NONSTOP_TSC. And in case none of these conditions are met, we fail in
>> order to fallback to the next available clocksource.
>>
>> It is also worth noting that with clocksource=tsc there isn't any
>> need to synchronise with another clocksource, and I could verify that
>> great portion the time skew was eliminated and seeing much less time
>> warps happening. With HPET I observed ~500 warps in the period
>> of 1h of around 27 us, and with TSC down to 50 warps in the same
>> period having each warp < 100 ns. The warps still exist though but
>> are only related to cross CPU calibration (being how much it takes to
>> rendezvous with master), in which a later patch in this series aims to
>> solve.
>>
>> Signed-off-by: Joao Martins 
>> ---
>> Cc: Keir Fraser 
>> Cc: Jan Beulich 
>> Cc: Andrew Cooper 
>
> Some style corrections, but no functional problems.
>
> Reviewed-by Andrew Cooper 
>
 I found out one issue in the tsc clocksource initialization path with 
 respect to
 the reliability check. This check is running when initializing platform 
 timer,
 but not all CPUS are up at that point (init_xen_time()) which means that 
 the
 check will always succeed. So for clocksource=tsc I need to defer 
 initialization
 to a later point (on verify_tsc_reliability()) and if successful switch to 
 that
 clocksource. Unless you disagree, v2 would have this and just requires one
 additional preparatory change prior to this patch.
>>>
>>> Hmm, that's suspicious when thinking about CPU hotplug: What
>>> do you intend to do when a CPU comes online later, failing the
>>> check?
>> Good point, but I am not sure whether that would happen. The initcall
>> verify_tsc_reliability seems to be called only for the boot processor. Or are
>> you saying that it's case that initcalls are called too when hotplugging cpus
>> later on? If that's the case then my suggestion wouldn't work as you point 
>> out -
>> or rather without having runtime switching support as you point out below.
> 
> Looks like I didn't express myself clearly enough: "failing the check"
> wasn't meant to imply the failure would actually occur, but rather
> that failure would occur if the check was run on that CPU. IOW the
> case of a CPU getting hotplugged which doesn't satisfy the needs
> for selecting TSC as the clock source.
Ah, I see. I believe this wouldn't be an issue with the current rendezvous
mechanism (std_rendezvous), as the delta is computed between local_tsc_stamp
taken in that (hotplugged) CPU in the calibration and the rdtsc() in the guest
same CPU, even though having CPU0 TSC (master) as system_time.

However it can be a problem with PVCLOCK_TSC_STABLE_BIT as the hotplugged CPU
could have its TSC drifted, and since setting this flag relies on
synchronization of TSCs we would need to clear the flag enterily.

>>> So far we don't do runtime switching of the clock source,
>>> and I'm not sure that's a good idea to do when there are running
>>> guests.
>> Totally agree, but I would be proposing to be at initialization phase where
>> there wouldn't be guests running. We would start with HPET, then only switch 
>> to
>> TSC if that check has passed (and would happen once).
>>
>> Simpler alternative would be to call init_xen_time after all CPUs are 
>> brought up
>> (and would also keep this patch as is), but I am not sure about the
>> repercussions of that.
> 
> I don't see how either would help with the hotplug case.
This was in response to what I thought you meant with your earlier question
(which I misunderstood). But my question is still valid I believe. The reason
for choosing between my suggested approaches is that

Re: [Xen-devel] [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.

2016-03-22 Thread Konrad Rzeszutek Wilk

On Tue, Mar 22, 2016 at 07:28:57PM +, Andrew Cooper wrote:
> On 22/03/16 18:57, Konrad Rzeszutek Wilk wrote:
> > --- a/xen/include/xsm/dummy.h
> > +++ b/xen/include/xsm/dummy.h
> > @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version 
> > (XSM_DEFAULT_ARG uint32_t op)
> >  return xsm_default_action(XSM_PRIV, current->domain, NULL);
> >  }
> >  }
> > +
> > +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
> > +{
> > +XSM_ASSERT_ACTION(XSM_OTHER);
> > +switch ( op )
> > +{
> > +case XEN_VERSION_version:
> > +case XEN_VERSION_extraversion:
> > +case XEN_VERSION_capabilities:
> > +case XEN_VERSION_platform_parameters:
> > +case XEN_VERSION_get_features:
> > +case XEN_VERSION_pagesize:
> > +case XEN_VERSION_guest_handle:
> > +/* These MUST always be accessible to any guest by default. */
> > +return xsm_default_action(XSM_HOOK, current->domain, NULL);
> > +default:
> > +return xsm_default_action(XSM_PRIV, current->domain, NULL);
>  Considering that we seem to have settled on some exceptions here
>  for the change adding XSM check to the legacy version op, do you
>  really think going with no exception at all here is the right approach?
>  Because if we do, that'll prevent guests getting fully converted over
>  to the new interface. Of course, we could also make this conversion
>  specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
>  this new interface.
> >>> No no. I think convesion is the right long-term goal. 
> >>>
> >>> However the nice thing about this hypercall is that it can return -EPERM.
> >>>
> >>> Making it always return an value for XEN_VERSION_version,
> >>> XEN_VERSION_platform_parameters, XEN_VERSION_get_features means that
> >>> there are some exceptions to this "can return -EPERM" as they will
> >>> be guaranteed an postive return value. They can ignore the -EPERM
> >>> case.
> >>>
> >>> And means that guest can still take shortcuts.
> >>>
> >>> I agree with you that guests need these hypercalls but at the same
> >>> time I am not sure what can be done so they don't fall flat on their
> >>> faces if they are presented with -EPERM. The Linux xenbus_init can be
> >>> modified to deal with this returning -EPERM where it makes some 
> >>> assumptions.
> >>>
> >>> But in a likelyhood it is the bad assumption!
> >> I'm afraid I can't conclude what you mean to say with all of the
> >> above.
> > That I am waffling.
> >
> > Andrew, what is your opinion?
> 
> Nothing good can come from failing a XEN_VERSION_version hypercall. 
> There are a number easy ways for a guest to infer such information.
> 
> XEN_VERSION_platform_parameters is only useful for 32bit PV guests, and
> the toolstack.  Given that it is returning a fixed number in the ABI,
> nothing good can come of failing this either.
> 
> get_features can effectively be failed for permission reasons by
> returning 0.  As such, explicitly failing with -EPERM is similarly
> pointless.

Allright. So here it is:

From 0f82c5f4cd1537ab8555f2b94bb3fe738df69733 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Tue, 22 Mar 2016 12:03:21 -0400
Subject: [PATCH] HYPERCALL_version_op. New hypercall mirroring XENVER_ but
 sane.

This hypercall mirrors the XENVER_ in that it has similar functionality.
However it is designed differently:
 - No compat layer. The data structures are the same size on 32
   as on 64-bit.
 - The hypercall accepts three arguments - the command, pointer to
   an buffer, and the length of the buffer.
 - Each sub-ops can be "probed" for size by returning the size of
   buffer that will be needed - if the buffer is NULL.
 - Subops can complete even if the buffer is too small - truncated
   data will be filled and hypercall will return -ENOBUFS.
 - VERSION_commandline, VERSION_changeset are privileged.
 - There is no XENVER_compile_info equivalent.
 - The hypercall can return -EPERM and toolstack/OSes are expected
   to deal with. However there are three subops: XEN_VERSION_version,
   XEN_VERSION_platform_parameters and XEN_VERSION_get_features
   that will always return an value as guests cannot survive without them.

While we combine some of the common code between XENVER_ and VERSION_
take the liberty of moving pae_extended_cr3 in x86 area.

Suggested-by: Andrew Cooper 
Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Daniel De Graaf  [XSM bits]

---
Cc: Daniel De Graaf 
Cc: Ian Jackson 
Cc: Stefano Stabellini 
Cc: Wei Liu 
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc:

Re: [Xen-devel] [PATCH v3 21/28] xen+tools: Export maximum host and guest cpu featuresets via SYSCTL

2016-03-22 Thread Andrew Cooper

On 22/03/16 08:43, Jan Beulich wrote:
>
> Going through the HAP-only features, btw, revealed another
> possible dependency missing from patch 10: I would think that
> INVPCID depends on PCID.

And, as it turns out PCID on LM.

Attempting to set CR4.PCIDE while IA32_EFER.LMA = 0 is specified to
incur a #GP fault.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] pygrub: remove isconfig option

2016-03-22 Thread Andrew Cooper

On 22/03/16 20:06, Firas Azar wrote:
> The pygrub command option "isconfig" is broken and obsolete. This patch 
> removes it since the alternate options "-l -n" provide the same functionality.

Broken how?

Are you sure that no existing software is using that option?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables

2016-03-22 Thread Konrad Rzeszutek Wilk

. snip..
> > +struct virtual_region kernel_text = {
> 
> static
> 
> > +.list = LIST_HEAD_INIT(kernel_text.list),
> > +.start = (unsigned long)_stext,
> > +.end = (unsigned long)_etext,
> > +#ifdef CONFIG_X86
> > +.ex = (struct exception_table_entry *)__start___ex_table,
> > +.ex_end = (struct exception_table_entry *)__stop___ex_table,
> > +#endif
> 
> Is this together with ...
> 
> > +/*
> > + * Becomes irrelevant when __init sections are cleared.
> > + */
> > +struct virtual_region kernel_inittext  = {
> > +.list = LIST_HEAD_INIT(kernel_inittext.list),
> > +.skip = ignore_if_active,
> > +.start = (unsigned long)_sinittext,
> > +.end = (unsigned long)_einittext,
> > +#ifdef CONFIG_X86
> > +/* Even if they are __init their exception entry still gets stuck 
> > here. */
> > +.ex = (struct exception_table_entry *)__start___ex_table,
> > +.ex_end = (struct exception_table_entry *)__stop___ex_table,
> > +#endif
> 
> ... this really a good idea? I.e. are there not going to be any
> odd side effects because of that redundancy?

None. If the EIP falls within _stext and _etext then its 'ex' table
will be scanned. If _inittext and _einittext then this one. Both of
them end up scanning the same exact exception table (which is kind
silly) - but there are no side-effect.

It would be good to only have __init related exceptions on the __inittext
(and also ditch the __init exception tables once the boot is completed)
but I am not exactly sure how to automatically make the macros resolve
what sections it should go in. That is a further TODO though.
> 
> Also note that the comment preceding this object is a single line one.
> 
> > +/*
> > + * No locking. Additions are done either at startup (when there is only
> > + * one CPU) or when all CPUs are running without IRQs.
> > + *
> > + * Deletions are big tricky. We MUST make sure all but one CPU
> > + * are running cpu_relax().
> > + *
> > + */
> > +LIST_HEAD(virtual_region_list);
> 
> I wonder whether this wouldn't better be static, with the iterator
> that the various parties need getting put here as an out-of-line
> function (instead of getting open coded in a couple of places).

There are three users of this list:
 * search_exception_table - which ends up calling search_one_table
if within start->end and if region->ex is defined.
 * do_invalid_trap (x86 and ARM) - where both scan start->end.
 * symbols_lookup - where we scan start->end.

The last two could be unified in a:

struct virtual_region search_virtual_region(unsigned long addr);

And the first one can use this and then check if region->ex is set as well.

[trying it out]

.. snip..
> > @@ -108,13 +127,17 @@ const char *symbols_lookup(unsigned long addr,
> >  {
> >  unsigned long i, low, high, mid;
> >  unsigned long symbol_end = 0;
> > +symbols_lookup_t symbol_lookup = NULL;
> 
> Pointless initializer.

I believe we need it. That is the contents on the stack can be garbage
and the __is_active_kernel_text won't update symbol_lookup unless
it finds a match. Ah, and also the compiler is unhappy:

symbols.c: In function ‘symbols_lookup’:
symbols.c:136:8: error: ‘symbol_lookup’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
 if (symbol_lookup)
^
cc1: all warnings being treated as errors

> 
> >  namebuf[KSYM_NAME_LEN] = 0;
> >  namebuf[0] = 0;
> >  
> > -if (!is_active_kernel_text(addr))
> > +if (!__is_active_kernel_text(addr, _lookup))
> >  return NULL;
> >  
> > +if (symbol_lookup)
> > +return (symbol_lookup)(addr, symbolsize, offset, namebuf);
> 
> Note that there are few coding style issues here (missing blanks,
> superfluous parentheses).

That file uses a different StyleGuide. The Linux one. Which reminds me
that __is_active_kernel_text needs to adhere to different StyleGuide.

> 
> > --- /dev/null
> > +++ b/xen/include/xen/bug_ex_symbols.h
> > @@ -0,0 +1,74 @@
> > +/*
> > + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> > + *
> > + */
> > +
> > +#ifndef __BUG_EX_SYMBOL_LIST__
> > +#define __BUG_EX_SYMBOL_LIST__
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#ifdef CONFIG_X86
> > +#include 
> > +#endif
> 
> Why?

Otherwise the compilation will fail on ARM as they do not have exceptions
(and no asm/uaccess.h file)
> 
> > +#include 
> > +
> > +struct virtual_region
> > +{
> > +struct list_head list;
> > +
> > +#define CHECKING_SYMBOL (1<<1)
> > +#define CHECKING_BUG_FRAME  (1<<2)
> > +#define CHECKING_EXCEPTION  (1<<3)
> > +/*
> > + * Whether to skip this region for particular searches. The flag
> > + * can be CHECKING_[SYMBOL|BUG_FRAMES|EXCEPTION].
> > + *
> > + * If the function returns 1 this region will be skipped.
> > + */
> > +bool_t (*skip)(unsigned int flag, unsigned long priv);
> > +
> > +unsigned long start;/* Virtual address start. */
> > +unsigned long end;

[Xen-devel] [PATCH] pygrub: remove isconfig option

2016-03-22 Thread Firas Azar

The pygrub command option "isconfig" is broken and obsolete. This patch removes 
it since the alternate options "-l -n" provide the same functionality.

Signed-off-by: Firas Azar 
---
 tools/pygrub/src/pygrub | 21 +
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
index 40f9584..82d4f01 100755
--- a/tools/pygrub/src/pygrub
+++ b/tools/pygrub/src/pygrub
@@ -776,10 +776,11 @@ if __name__ == "__main__":
 
 try:
 opts, args = getopt.gnu_getopt(sys.argv[1:], 'qilnh::',
-   ["quiet", "interactive", "list-entries", 
"not-really", "help",
-"output=", "output-format=", 
"output-directory=", "offset=",
-"entry=", "kernel=", 
-"ramdisk=", "args=", "isconfig", "debug"])
+   ["quiet", "interactive", "list-entries",
+"not-really", "help", "output=",
+"output-format=", "output-directory=",
+"offset=", "entry=", "kernel=",
+"ramdisk=", "args=", "debug"])
 except getopt.GetoptError:
 usage()
 sys.exit(1)
@@ -793,7 +794,6 @@ if __name__ == "__main__":
 entry = None
 interactive = True
 list_entries = False
-isconfig = False
 part_offs = None
 debug = False
 not_really = False
@@ -838,8 +838,6 @@ if __name__ == "__main__":
 entry = a
 # specifying the entry to boot implies non-interactive
 interactive = False
-elif o in ("--isconfig",):
-isconfig = True
 elif o in ("--debug",):
 debug = True
 elif o in ("--output-format",):
@@ -871,15 +869,6 @@ if __name__ == "__main__":
 else:
 fd = os.open(output, os.O_WRONLY)
 
-# debug
-if isconfig:
-chosencfg = run_grub(file, entry, fs, incfg["args"])
-print "  kernel: %s" % chosencfg["kernel"]
-if chosencfg["ramdisk"]:
-print "  initrd: %s" % chosencfg["ramdisk"]
-print "  args: %s" % chosencfg["args"]
-sys.exit(0)
-
 # if boot filesystem is set then pass to fsimage.open
 bootfsargs = '"%s"' % incfg["args"]
 bootfsgroup = re.findall('zfs-bootfs=(.*?)[\s\,\"]', bootfsargs)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] tools: Restrict configuration of qemu processes

2016-03-22 Thread Jim Fehlig

On 03/15/2016 06:28 AM, Wei Liu wrote:
> On Mon, Mar 14, 2016 at 07:14:12PM -0600, Jim Fehlig wrote:
>> Opps, forgot to cc the tools maintainers, sorry. I can resend if needed.
>>
>> Regards,
>> Jim
>>
>> On 03/14/2016 07:08 PM, Jim Fehlig wrote:
>>> Commit 6ef823fd added '-nodefaults' to the qemu args created by
>>> libxl, which is a good step in restricting qemu's default
>>> configuration. This change takes another step by adding
>>> -no-user-config, which ignores any user-provided config files in
>>> sysconfdir. Together, -nodefaults and -no-user-config allow Xen
>>> to avoid unkown and uncontrolled qemu configuration.
>>>
>>> Both options are also added to the qemu invocation in the
>>> xen-qemu-dom0-disk-backend systemd service file.
>>>
>>> Signed-off-by: Jim Fehlig 
> Acked-by: Wei Liu 

If there are no remaining issues, can this patch be applied?  Thanks!

Regards,
Jim

>
>>> ---
>>>  tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in | 1 +
>>>  tools/libxl/libxl_dm.c| 6 
>>> ++
>>>  2 files changed, 7 insertions(+)
>>>
>>> diff --git 
>>> a/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in 
>>> b/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in
>>> index acf61a8..f56775b 100644
>>> --- a/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in
>>> +++ b/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in
>>> @@ -14,6 +14,7 @@ ExecStartPre=/bin/mkdir -p @XEN_RUN_DIR@
>>>  ExecStart=@qemu_xen_systemd@ -xen-domid 0 \
>>> -xen-attach -name dom0 -nographic -M xenpv -daemonize \
>>> -monitor /dev/null -serial /dev/null -parallel /dev/null \
>>> +   -nodefaults -no-user-config \
>>> -pidfile @XEN_RUN_DIR@/qemu-dom0.pid
>>>  
>>>  [Install]
>>> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
>>> index 4aca38e..cfda24c 100644
>>> --- a/tools/libxl/libxl_dm.c
>>> +++ b/tools/libxl/libxl_dm.c
>>> @@ -828,6 +828,12 @@ static int 
>>> libxl__build_device_model_args_new(libxl__gc *gc,
>>>   */
>>>  flexarray_append(dm_args, "-nodefaults");
>>>  
>>> +/*
>>> + * Do not use any of the user-provided config files in sysconfdir,
>>> + * avoiding unkown and uncontrolled configuration.
>>> + */
>>> +flexarray_append(dm_args, "-no-user-config");
>>> +
>>>  if (b_info->type == LIBXL_DOMAIN_TYPE_PV) {
>>>  flexarray_append(dm_args, "-xen-attach");
>>>  }
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xenconsole: update help message

2016-03-22 Thread Wei Liu

On Tue, Mar 22, 2016 at 12:00:24PM -0500, Doug Goldstein wrote:
> The help message did not include information about the --type parameter.
> 
> Signed-off-by: Doug Goldstein 
> ---
> CC: Ian Jackson 
> CC: Stefano Stabellini 
> CC: Wei Liu 

Acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 17/22] arm/gic: Add a new callback to deny Dom0 access to GIC regions

2016-03-22 Thread Julien Grall


Hi Shannon,

On 17/03/16 09:41, Shannon Zhao wrote:

From: Shannon Zhao 

Add a new member in gic_hw_operations which is used to deny Dom0 access
to GIC regions.

Signed-off-by: Shannon Zhao 
---
v6: use SZ_64K for GICv3 distributor
---
  xen/arch/arm/gic-v2.c | 31 +++
  xen/arch/arm/gic-v3.c | 45 +
  xen/arch/arm/gic.c|  5 +
  xen/include/asm-arm/gic.h |  3 +++
  4 files changed, 84 insertions(+)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 02db5f2..186f944 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -22,6 +22,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -714,6 +715,31 @@ static u32 gicv2_make_hwdom_madt(const struct domain *d, 
u32 offset)
  return table_len;
  }

+static int gicv2_iomem_deny_access(const struct domain *d)
+{
+int rc;
+unsigned long gfn, nr;
+
+gfn = dbase >> PAGE_SHIFT;
+rc = iomem_deny_access(d, gfn, gfn + 1);
+if ( rc )
+return rc;
+
+gfn = hbase >> PAGE_SHIFT;
+rc = iomem_deny_access(d, gfn, gfn + 1);
+if ( rc )
+return rc;
+
+gfn = cbase >> PAGE_SHIFT;
+nr = DIV_ROUND_UP(csize, PAGE_SIZE);
+rc = iomem_deny_access(d, gfn, gfn + nr);
+if ( rc )
+return rc;
+
+gfn = vbase >> PAGE_SHIFT;
+return iomem_deny_access(d, gfn, gfn + nr);
+}
+
  static int __init
  gic_acpi_parse_madt_cpu(struct acpi_subtable_header *header,
  const unsigned long end)
@@ -809,6 +835,10 @@ static u32 gicv2_make_hwdom_madt(const struct domain *d, 
u32 offset)
  {
  return 0;
  }
+static int gicv2_iomem_deny_access(const struct domain *d)
+{
+return 0;
+}


I don't see any benefits to have iomem_deny_access only implemented when 
CONFIG_ACPI is built.


Because in this case, you will also deny the iomem when Xen is booting 
using device tree.



  #endif

  static int __init gicv2_init(void)
@@ -902,6 +932,7 @@ const static struct gic_hw_operations gicv2_ops = {
  .read_apr= gicv2_read_apr,
  .make_hwdom_dt_node  = gicv2_make_hwdom_dt_node,
  .make_hwdom_madt = gicv2_make_hwdom_madt,
+.iomem_deny_access   = gicv2_iomem_deny_access,
  };

  /* Set up the GIC */
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index d9fce4b..7f9634d 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -27,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1278,6 +1279,45 @@ static u32 gicv3_make_hwdom_madt(const struct domain *d, 
u32 offset)
  return table_len;
  }

+static int gicv3_iomem_deny_access(const struct domain *d)
+{
+int rc, i;
+unsigned long gfn, nr;
+
+gfn = dbase >> PAGE_SHIFT;
+nr = DIV_ROUND_UP(SZ_64K, PAGE_SIZE);
+rc = iomem_deny_access(d, gfn, gfn + nr);
+if ( rc )
+return rc;
+
+for ( i = 0; i < gicv3.rdist_count; i++ )
+{
+gfn = gicv3.rdist_regions[i].base >> PAGE_SHIFT;
+nr = DIV_ROUND_UP(gicv3.rdist_regions[i].size, PAGE_SIZE);
+rc = iomem_deny_access(d, gfn, gfn + nr);
+if ( rc )
+return rc;
+}
+
+if ( cbase != INVALID_PADDR )
+{
+gfn = cbase >> PAGE_SHIFT;
+nr = DIV_ROUND_UP(csize, PAGE_SIZE);
+rc = iomem_deny_access(d, gfn, gfn + nr);
+if ( rc )
+return rc;
+}
+
+if ( vbase != INVALID_PADDR )
+{
+gfn = vbase >> PAGE_SHIFT;
+nr = DIV_ROUND_UP(csize, PAGE_SIZE);
+return iomem_deny_access(d, gfn, gfn + nr);
+}
+
+return 0;
+}
+
  static int __init
  gic_acpi_parse_madt_cpu(struct acpi_subtable_header *header,
  const unsigned long end)
@@ -1426,6 +1466,10 @@ static u32 gicv3_make_hwdom_madt(const struct domain *d, 
u32 offset)
  {
  return 0;
  }
+static int gicv3_iomem_deny_access(const struct domain *d)
+{
+return 0;
+}


Ditto


  #endif

  /* Set up the GIC */
@@ -1521,6 +1565,7 @@ static const struct gic_hw_operations gicv3_ops = {
  .secondary_init  = gicv3_secondary_cpu_init,
  .make_hwdom_dt_node  = gicv3_make_hwdom_dt_node,
  .make_hwdom_madt = gicv3_make_hwdom_madt,
+.iomem_deny_access   = gicv3_iomem_deny_access,
  };

  static int __init gicv3_dt_preinit(struct dt_device_node *node, const void 
*data)
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 6d32432..65022ee 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -744,6 +744,11 @@ u32 gic_make_hwdom_madt(const struct domain *d, u32 offset)
  return gic_hw_ops->make_hwdom_madt(d, offset);
  }

+int gic_iomem_deny_access(const struct domain *d)
+{
+return gic_hw_ops->iomem_deny_access(d);
+}
+
  /*
   * Local variables:
   * mode: C
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index 4cf003d..932fc02 100644
---

Re: [Xen-devel] [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.

2016-03-22 Thread Andrew Cooper

On 22/03/16 18:57, Konrad Rzeszutek Wilk wrote:
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version 
> (XSM_DEFAULT_ARG uint32_t op)
>  return xsm_default_action(XSM_PRIV, current->domain, NULL);
>  }
>  }
> +
> +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
> +{
> +XSM_ASSERT_ACTION(XSM_OTHER);
> +switch ( op )
> +{
> +case XEN_VERSION_version:
> +case XEN_VERSION_extraversion:
> +case XEN_VERSION_capabilities:
> +case XEN_VERSION_platform_parameters:
> +case XEN_VERSION_get_features:
> +case XEN_VERSION_pagesize:
> +case XEN_VERSION_guest_handle:
> +/* These MUST always be accessible to any guest by default. */
> +return xsm_default_action(XSM_HOOK, current->domain, NULL);
> +default:
> +return xsm_default_action(XSM_PRIV, current->domain, NULL);
 Considering that we seem to have settled on some exceptions here
 for the change adding XSM check to the legacy version op, do you
 really think going with no exception at all here is the right approach?
 Because if we do, that'll prevent guests getting fully converted over
 to the new interface. Of course, we could also make this conversion
 specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
 this new interface.
>>> No no. I think convesion is the right long-term goal. 
>>>
>>> However the nice thing about this hypercall is that it can return -EPERM.
>>>
>>> Making it always return an value for XEN_VERSION_version,
>>> XEN_VERSION_platform_parameters, XEN_VERSION_get_features means that
>>> there are some exceptions to this "can return -EPERM" as they will
>>> be guaranteed an postive return value. They can ignore the -EPERM
>>> case.
>>>
>>> And means that guest can still take shortcuts.
>>>
>>> I agree with you that guests need these hypercalls but at the same
>>> time I am not sure what can be done so they don't fall flat on their
>>> faces if they are presented with -EPERM. The Linux xenbus_init can be
>>> modified to deal with this returning -EPERM where it makes some assumptions.
>>>
>>> But in a likelyhood it is the bad assumption!
>> I'm afraid I can't conclude what you mean to say with all of the
>> above.
> That I am waffling.
>
> Andrew, what is your opinion?

Nothing good can come from failing a XEN_VERSION_version hypercall. 
There are a number easy ways for a guest to infer such information.

XEN_VERSION_platform_parameters is only useful for 32bit PV guests, and
the toolstack.  Given that it is returning a fixed number in the ABI,
nothing good can come of failing this either.

get_features can effectively be failed for permission reasons by
returning 0.  As such, explicitly failing with -EPERM is similarly
pointless.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] x86/mm/pat: Change PAT to support non-default PAT MSR

2016-03-22 Thread Thomas Gleixner

On Tue, 22 Mar 2016, Borislav Petkov wrote:
> > +static void pat_keep_handoff_state(void)
> 
> Static function, no need for "pat_" prefix. Also, no need for the
> kernel-doc comment.

I have to disagree. kernel-doc comments are not limited to global
functions.

I realy prefer kernel-doc style for static functions over randomly formatted
ones for consistency reasons.

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 16/22] arm/acpi: Configure SPI interrupt type and route to Dom0 dynamically

2016-03-22 Thread Julien Grall


Hi Shannon,

On 17/03/16 09:41, Shannon Zhao wrote:

From: Shannon Zhao 

Interrupt information is described in DSDT and is not available at the
time of booting. Check if the interrupt is permitted to access and set
the interrupt type, route it to guest dynamically only for SPI
and Dom0.

Signed-off-by: Parth Dixit 
Signed-off-by: Shannon Zhao 
---
v6: coding style
---
  xen/arch/arm/vgic.c | 32 
  1 file changed, 32 insertions(+)

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index ee35683..39d858c 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -25,6 +25,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 

  #include 

@@ -334,6 +336,19 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
  }
  }

+#define VGIC_ICFG_MASK(intr) (1 << ((2 * ((intr) % 16)) + 1))
+
+static inline unsigned int get_the_irq_type(struct vcpu *v, int n, int index)
+{
+struct vgic_irq_rank *vr = vgic_get_rank(v, n);
+uint32_t tr = vr->icfg[index >> 4];
+
+if ( tr & VGIC_ICFG_MASK(index) )
+return IRQ_TYPE_EDGE_BOTH;
+else
+return IRQ_TYPE_LEVEL_MASK;
+}
+
  void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
  {
  const unsigned long mask = r;
@@ -342,9 +357,26 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
  unsigned long flags;
  int i = 0;
  struct vcpu *v_target;
+struct domain *d = v->domain;
+int ret;

  while ( (i = find_next_bit(, 32, i)) < 32 ) {
  irq = i + (32 * n);
+/* Set the irq type and route it to guest only for SPI and Dom0 */
+if( irq_access_permitted(d, irq) && is_hardware_domain(d) &&
+( irq >= 32 ) && ( !acpi_disabled ) )
+{
+ret = irq_set_spi_type(irq, get_the_irq_type(v, n, i));
+if ( ret )
+printk(XENLOG_WARNING "The irq type is not correct\n");


XENLOG_WARNING (and XENLOG_ERR) are not rate-limited. Therefore a domain 
(even if it's dom0) could flood the console.


Please use XENLOG_G_* instead. However in this case, "v" is the current 
vCPU so you can use gprintk(XENLOG_WARNING, ...);



+
+vgic_reserve_virq(d, irq);
+
+ret = route_irq_to_guest(d, irq, irq, NULL);
+if ( ret )
+printk(XENLOG_ERR "Unable to route IRQ %u to domain %u\n",
+   irq, d->domain_id);


I'm not against this solution for short term. But in long term we would 
benefit to split the IRQ configuration from the routing.


The routing will be done before DOM0 is booting. The IRQ configuration 
will just end up to write in the ICFGR register.


This will also help for PCI-passthrough as the guest will have to 
configure the SPIs (we can't expect DOM0 doing it for it). But the 
routing will be done ahead.



+}
  v_target = __vgic_get_target_vcpu(v, irq);
  p = irq_to_pending(v_target, irq);
  set_bit(GIC_IRQ_GUEST_ENABLED, >status);



Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.

2016-03-22 Thread Konrad Rzeszutek Wilk

> >> > --- a/xen/include/xsm/dummy.h
> >> > +++ b/xen/include/xsm/dummy.h
> >> > @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version 
> >> > (XSM_DEFAULT_ARG uint32_t op)
> >> >  return xsm_default_action(XSM_PRIV, current->domain, NULL);
> >> >  }
> >> >  }
> >> > +
> >> > +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
> >> > +{
> >> > +XSM_ASSERT_ACTION(XSM_OTHER);
> >> > +switch ( op )
> >> > +{
> >> > +case XEN_VERSION_version:
> >> > +case XEN_VERSION_extraversion:
> >> > +case XEN_VERSION_capabilities:
> >> > +case XEN_VERSION_platform_parameters:
> >> > +case XEN_VERSION_get_features:
> >> > +case XEN_VERSION_pagesize:
> >> > +case XEN_VERSION_guest_handle:
> >> > +/* These MUST always be accessible to any guest by default. */
> >> > +return xsm_default_action(XSM_HOOK, current->domain, NULL);
> >> > +default:
> >> > +return xsm_default_action(XSM_PRIV, current->domain, NULL);
> >> 
> >> Considering that we seem to have settled on some exceptions here
> >> for the change adding XSM check to the legacy version op, do you
> >> really think going with no exception at all here is the right approach?
> > 
> >> Because if we do, that'll prevent guests getting fully converted over
> >> to the new interface. Of course, we could also make this conversion
> >> specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
> >> this new interface.
> > 
> > No no. I think convesion is the right long-term goal. 
> > 
> > However the nice thing about this hypercall is that it can return -EPERM.
> > 
> > Making it always return an value for XEN_VERSION_version,
> > XEN_VERSION_platform_parameters, XEN_VERSION_get_features means that
> > there are some exceptions to this "can return -EPERM" as they will
> > be guaranteed an postive return value. They can ignore the -EPERM
> > case.
> > 
> > And means that guest can still take shortcuts.
> > 
> > I agree with you that guests need these hypercalls but at the same
> > time I am not sure what can be done so they don't fall flat on their
> > faces if they are presented with -EPERM. The Linux xenbus_init can be
> > modified to deal with this returning -EPERM where it makes some assumptions.
> > 
> > But in a likelyhood it is the bad assumption!
> 
> I'm afraid I can't conclude what you mean to say with all of the
> above.

That I am waffling.

Andrew, what is your opinion?
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] identify a Xen PV domU to fix devmem_is_allowed

2016-03-22 Thread Boris Ostrovsky


On 03/21/2016 05:29 PM, Boris Ostrovsky wrote:

On 03/15/2016 12:57 PM, Olaf Hering wrote:

On Tue, Mar 01, Boris Ostrovsky wrote:


on domU:

[root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# od -N 1 -j 4096 
/dev/mem

od: /dev/mem: read error: Bad address
001
[root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]#

with

(XEN) mm.c:1767:d14v0 Bad L1 flags 10

How should we proceed with this bug?



I can't see any way to avoid calling xen_pv_domain() so what you 
suggested should work. The only problem is that this will now cause 
reserved areas to also return 0:


# head /proc/iomem
-0fff : reserved <
1000-0009 : System RAM
000a-000f : reserved   <=
  000f-000f : System ROM
0010-7fff : System RAM
  0100-0172a065 : Kernel code
  0172a066-01d32b3f : Kernel data
  01ec5000-02026fff : Kernel bss
fee0-fee00fff : Local APIC

which I don't think they really should.

How about this:

if (pagenr < 256 && !xen_pv_domain())
return 1;
if (iomem_is_exclusive(pagenr << PAGE_SHIFT))
return 0;
if (!page_is_ram(pagenr))
return 1;


Also, while looking into this I noticed that pat_x_mtrr_type() will 
make us switch from _PAGE_CACHE_MODE_WB to _PAGE_CACHE_MODE_UC_MINUS 
when trying to mmap and this is what causes the hypervisor error 
message and the splat in Linux. We make this switch despite the fact 
that MTRR is disabled and therefore mtrr_type_lookup() returns 
MTRR_TYPE_INVALID.


Should we return req_type is MTRR is disabled?


Olaf,


Can you apply PAT changes from 
http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg02127.html and 
see if it helps?


It should at least get rid of the splat (patch 3 is the one addresses 
no-MTRR problem that I mentioned above). We should still fix 
devmem_is_allowed() though.


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] xsm: only define XSM_MAGIC in xsm.h

2016-03-22 Thread Doug Goldstein

On 3/16/16 2:18 PM, Doug Goldstein wrote:
> Rather than have XSM_MAGIC set in the global xen/config.h and set in
> xsm.h if it's unset, just set it once in xsm.h since its only used in
> files that already include xsm.h
> 
> Signed-off-by: Doug Goldstein 
> ---
> CC: Daniel De Graaf 
> ---
>  xen/include/xen/config.h | 1 -
>  xen/include/xsm/xsm.h| 7 +--
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/include/xen/config.h b/xen/include/xen/config.h
> index 96f5539..3f8c53d 100644
> --- a/xen/include/xen/config.h
> +++ b/xen/include/xen/config.h
> @@ -79,7 +79,6 @@
>  #define STR(...) __STR(__VA_ARGS__)
>  
>  #ifdef CONFIG_FLASK
> -#define XSM_MAGIC 0xf97cff8c
>  /* Maintain statistics on the access vector cache */
>  #define FLASK_AVC_STATS 1
>  #endif
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index 3afed70..37a102a 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -23,8 +23,11 @@ DEFINE_XEN_GUEST_HANDLE(xsm_op_t);
>  
>  /* policy magic number (defined by XSM_MAGIC) */
>  typedef u32 xsm_magic_t;
> -#ifndef XSM_MAGIC
> -#define XSM_MAGIC 0x
> +
> +#ifdef CONFIG_FLASK
> +#define XSM_MAGIC 0xf97cff8c
> +#else
> +#define XSM_MAGIC 0x0
>  #endif
>  
>  /* These annotations are used by callers and in dummy.h to document the
> 

ping?

-- 
Doug Goldstein



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] x86/mm/pat: Change PAT to support non-default PAT MSR

2016-03-22 Thread Toshi Kani

On Tue, 2016-03-22 at 17:57 +0100, Borislav Petkov wrote:
> $Subject is misleading - there's no non-default PAT MSR - the setting is
> non-default.

Right.  Will change to "Add support of non-default PAT MSR setting at
handoff".

> On Wed, Mar 16, 2016 at 06:44:57PM -0600, Toshi Kani wrote:
> > In preparation to fix a regression caused by 'commit 9cd25aac1f44
> > ("x86/mm/pat: Emulate PAT when it is disabled")', PAT needs to
> > support a case that PAT MSR is initialized with a non-default
> > value.
> > 
> > When pat_init() is called in PAT disable state, it initializes
> 
>     is called and PAT is disabled

Will do.

> > PAT table with the BIOS default value. Xen, however, sets PAT MSR
> > with a non-default value to enable WC. This causes inconsistency
> > between PAT table and PAT MSR when PAT is set to disable on Xen.
> > 
> > Change pat_init() to handle the PAT disable cases properly.  Add
> > pat_keep_handoff_state() to handle two cases when PAT is set to
> > disable.
> >  1. CPU supports PAT: Set PAT table to be consistent with PAT MSR.
> >  2. CPU does not support PAT: Set PAT table to be consistent with
> > PWT and PCD bits in a PTE.
> > 
 :
> > +/**
> > + * pat_keep_handoff_state - Set PAT table to the handoff state
> > + *
> > + * This function keeps PAT in the BIOS handoff state. When CPU
> > supports
> > + * PAT, it sets PAT table to be consistent with PAT MSR. When CPU does
> > not
> > + * support PAT, it emulates PAT by setting PAT table consistent with
> > PWT
> > + * and PCD bits in a PTE.
> > + *
> > + * The PAT table is global to all CPUs, which is initialized once at
> > + * boot-time. Any subsequent calls to this function have no effect.
> > + */
> > +static void pat_keep_handoff_state(void)
> 
> Static function, no need for "pat_" prefix. Also, no need for the
> kernel-doc comment.
> 
> Also, no need for all that handoff nomenclature etc, just call it
> setup_pat(). Because it does exactly that - it sets up the PAT bits
> unconditionally, regardless of enabled or not.

I'd like to make it clear that this function does not set PAT MSR, unlike
what pat_init() does.  When CPU supports PAT, it keeps PAT MSR in whatever
the setting at handoff, and initializes PAT table to match with this
setting.

I am open to a better name, but I am afraid that setup_pat() can be
confusing as if it sets PAT MSR.

> >  {
> > -   u64 pat;
> > -   struct cpuinfo_x86 *c = _cpu_data;
> > +   u64 pat = 0;
> > +   static int set_handoff_done;
> 
> s/set_handoff_done/pat_setup_done/

I will match it with a func name once we decided.

Thanks,
-Toshi

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 15/22] arm/acpi: Permit access all Xen unused SPIs for Dom0

2016-03-22 Thread Julien Grall


Hi Shannon,

On 17/03/16 09:41, Shannon Zhao wrote:

From: Shannon Zhao 

Permit access all Xen unused SPIs for Dom0 except the interrupts that
Xen uses.


You say exactly the same things with all "Xen unused SPIs for Dom0" and 
"except the interrupts that Xen uses".


I would instead say:

"Allow DOM0 to use all SPIs but the ones used by Xen."


Then when Dom0 configures the interrupt, it could set the
interrupt type and route it to Dom0.

Signed-off-by: Shannon Zhao 
Reviewed-by: Stefano Stabellini 
---
  xen/arch/arm/domain_build.c | 31 +++
  1 file changed, 31 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 6726e45..1e5ee0e 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1359,6 +1359,33 @@ static int prepare_dtb(struct domain *d, struct 
kernel_info *kinfo)
  #ifdef CONFIG_ACPI
  #define ACPI_DOM0_FDT_MIN_SIZE 4096

+static int acpi_permit_spi_access(struct domain *d)
+{
+int i, res;
+struct irq_desc *desc;
+
+/* Here just permit Dom0 to access the SPIs which Xen doesn't use. Then 
when


Coding style:

/*
 * FOo
 * Bar
 */


+ * Dom0 configures the interrupt, set the interrupt type and route it to
+ * Dom0.
+ */
+for( i = NR_LOCAL_IRQS; i < vgic_num_irqs(d); i++ )
+{
+desc = irq_to_desc(i);
+if( desc->action != NULL)


Well some of the SPIs used by Xen may not be registered yet. For 
instance the SMMU driver doesn't register any SPIs until it's necessary 
(i.e a device is assigned to a domain).



+continue;
+
+res = irq_permit_access(d, i);
+if ( res )
+{
+printk(XENLOG_ERR "Unable to permit to dom%u access to IRQ %u\n",
+   d->domain_id, i);
+return res;
+}
+}
+
+return 0;
+}
+
  static int make_chosen_node(const struct kernel_info *kinfo,
  struct membank tbl_add[])
  {
@@ -1849,6 +1876,10 @@ static int prepare_acpi(struct domain *d, struct 
kernel_info *kinfo)
  if ( rc != 0 )
  return rc;

+rc = acpi_permit_spi_access(d);
+if ( rc != 0 )
+return rc;
+
  return 0;
  }
  #else



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [GIT PULL] xen: features and fixes for 4.6-rc0

2016-03-22 Thread David Vrabel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Linus,

Please git pull the following tag:

 git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git for-linus-4.6-rc0-tag

xen: features and fixes for 4.6-rc0

- - Make earlyprintk=xen work for HVM guests.
- - Remove module support for things never built as modules.

Thanks.

David

 arch/arm/include/asm/xen/hypercall.h |  2 +
 arch/x86/xen/xen-head.S  | 19 +---
 drivers/tty/hvc/hvc_xen.c| 75 
 drivers/xen/balloon.c|  4 --
 drivers/xen/events/events_2l.c   |  1 -
 drivers/xen/events/events_base.c |  2 +-
 drivers/xen/events/events_fifo.c |  1 -
 drivers/xen/features.c   |  2 +-
 drivers/xen/grant-table.c|  1 -
 drivers/xen/platform-pci.c   | 22 --
 drivers/xen/sys-hypervisor.c | 59 -
 drivers/xen/xen-balloon.c| 14 ++
 drivers/xen/xen-pciback/conf_space.c |  2 +-
 drivers/xen/xen-pciback/pciback_ops.c|  2 +-
 drivers/xen/xen-pciback/xenbus.c |  2 +-
 drivers/xen/xen-selfballoon.c|  1 -
 drivers/xen/xenbus/xenbus_dev_backend.c  | 13 +-
 drivers/xen/xenbus/xenbus_dev_frontend.c | 13 +-
 drivers/xen/xenbus/xenbus_xs.c   |  1 -
 drivers/xen/xenfs/xensyms.c  |  1 -
 20 files changed, 101 insertions(+), 136 deletions(-)

Boris Ostrovsky (2):
  xen/x86: Zero out .bss for PV guests
  xen/x86: Drop mode-selecting ifdefs in startup_xen()

Paul Gortmaker (5):
  xen: audit usages of module.h ; remove unnecessary instances
  drivers/xen: make [xen-]ballon explicitly non-modular
  drivers/xen: make xenbus_dev_[front/back]end explicitly non-modular
  drivers/xen: make sys-hypervisor.c explicitly non-modular
  drivers/xen: make platform-pci.c explicitly non-modular

Stefano Stabellini (3):
  hvc_xen: add earlycon support
  hvc_xen: fix xenboot for DomUs
  hvc_xen: make early_printk work with HVM guests
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQEcBAEBCAAGBQJW8YjsAAoJEFxbo/MsZsTRE6YIAJbwtILQ0RLFqy9tm5ykUnT9
CVsGqZyfSdijQR5U+hoxlHWeDWsMN1VAWD2jRBppEFUM//v4RoxsnWDRdF7owOkg
nE5/cbcdTLcWjcWXLYMwDALegXW36SdObtjeXmi/KfU8l9yHJEYar3RJqVvBauj/
RDuCDIra5V6vvQGahfwb4QsxE8I0zdqJbTN4+v4amkcjioBtahRkbv6uO3Xuv4P4
k6kCo/gSWe2VjsFwlNv/6P5BIUEALHfAc8SCyHYvFgxJ8QF3WQ2lZeIi0yXw9+n7
hGEC3szFfE3hQ+wxCYMlZm7eQHvP227uE1/Mb+iP8i7AUJwmnhyO6ZWwREcrIJM=
=RzEK
-END PGP SIGNATURE-

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall

2016-03-22 Thread Daniel De Graaf


On 03/22/2016 12:10 PM, Konrad Rzeszutek Wilk wrote:

On Mon, Mar 21, 2016 at 05:22:09AM -0600, Jan Beulich wrote:

On 18.03.16 at 18:26,  wrote:

On Fri, Mar 18, 2016 at 05:55:55AM -0600, Jan Beulich wrote:

On 15.03.16 at 18:56,  wrote:

@@ -223,12 +224,15 @@ void __init do_initcalls(void)
  /*
   * Simple hypercalls.
   */
-
  DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)


Please retain the blank line, as it relates to more than just this
one function.


Done! (stray change).


Considering this I'm not puzzled by ...


  case XENVER_guest_handle:
-if ( copy_to_guest(arg, current->domain->handle,
-   ARRAY_SIZE(current->domain->handle)) )
+{
+xen_domain_handle_t hdl;
+
+if ( deny )
+memset(, 0, ARRAY_SIZE(hdl));
+
+BUILD_BUG_ON(ARRAY_SIZE(current->domain->handle) != ARRAY_SIZE(hdl));
+
+if ( copy_to_guest(arg, deny ? hdl : current->domain->handle,
+   ARRAY_SIZE(hdl) ) )
  return -EFAULT;
  return 0;
-
+}
  case XENVER_commandline:


... this.


Wow. That is some sharp eyes!



--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -727,3 +727,27 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct 
domain *d, unsigned int
  }

  #endif /* CONFIG_X86 */
+
+#include 
+static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
+{
+XSM_ASSERT_ACTION(XSM_OTHER);
+switch ( op )
+{
+case XENVER_version:
+case XENVER_platform_parameters:
+case XENVER_get_features:
+/* The sub-ops ignores the permission check and returns data. */


ignore ... and return ...

With those minor things addressed I think the patch can have my ack.


Thank you!

Now I just need Daniel's Ack again.


From 1ccf59abdd2cd9228f0159dce77fe404d98c7300 Mon Sep 17 00:00:00 2001

From: Konrad Rzeszutek Wilk 
Date: Fri, 11 Mar 2016 21:40:43 -0500
Subject: [PATCH] xsm/xen_version: Add XSM for most of xen_version hypercall

Most of XENVER_* have now an XSM check for their sub-ops.

The subop for XENVER_commandline is now a priviliged operation.
To not break guests we still return an string - but it is
just '\0'.

The XENVER_[version|platform_parameters|get_features] - will
always return an value to the guest.

The rest: XENVER_[extraversion|capabilities|page_size|
guest_handle|changeset| compile_info] behave as before -
allowed by default for all guests if using the XSM default
policy or with the dummy one. And if the system admin
wants to curtail access to some of them - they can do
that now with a non-default XSM policy.

Also we add a local variable block.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Jan Beulich 


Replied to the wrong email before; this one is actually:

Acked-by: Daniel De Graaf 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/3] Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server

2016-03-22 Thread Paul Durrant

> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: 22 March 2016 17:27
> To: Yu Zhang; xen-devel@lists.xen.org
> Cc: Keir (Xen.org); jbeul...@suse.com; Andrew Cooper; George Dunlap;
> jun.nakaj...@intel.com; Kevin Tian; Tim (Xen.org); Paul Durrant;
> zhiyuan...@intel.com
> Subject: Re: [PATCH 3/3] Add HVMOP to map guest ram with
> p2m_ioreq_server to an ioreq server
> 
> On 16/03/16 12:22, Yu Zhang wrote:
> > A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
> > let one ioreq server claim its responsibility for the handling
> > of guest pages with p2m type p2m_ioreq_server. Users of this
> > HVMOP can specify whether the p2m_ioreq_server is supposed to
> > handle write accesses or read ones in a flag. By now, we only
> > support one ioreq server for this p2m type, so once an ioreq
> > server has claimed its ownership, following calls of the
> > HVMOP_map_mem_type_to_ioreq_server will fail.
> >
> > Note: this HVMOP does not change the p2m type of any guest ram
> > page, until the HVMOP_set_mem_type is triggered. So normally
> > the steps would be the backend driver first claims its ownership
> > of guest ram pages with p2m_ioreq_server type. At then sets the
> > memory type to p2m_ioreq_server for specified guest ram pages.
> 
> Yu, thanks for this work.  I think it's heading in the right direction.
> 
> A couple of comments:
> 
> There's not much documentation in the code about how this is expected to
> be used.
> 
> For instance, having separate flags seems to imply that you can
> independently select either read intercept, write intercept, or both;
> but [ept_]p2m_type_to_flags() seems to assume that READ_ACCESS implies
> WRITE_ACCESS.  Do you plan to implement them separately in the future?
> If not, would it be better to make the interface an enum instead?
> 
> At very least it should be documented that READ_ACCESS implies
> WRITE_ACCESS.

That's not true. If WRITE_ACCESS has not been requested then writes are handled 
directly in Xen rather than being forwarded to the ioreq server. If h/w were to 
allow pages to be marked write-only then we wouldn't need to do that.

> 
> Calling the function with no flags appears to mean, "Detach the ioreq
> server from this type" -- this should also be documented.
> 

That was documented in the design, but it's true that it does need to pulled 
into comments.

> Furthermore, you appear to deal with p2m_ioreq_server types when there
> is no ioreq server by making the "flags=0" case RW in
> [ept_]p2m_type_to_flags.  This should be spelled out somewhere (probably
> in the p2m structure definition).
> 
> Finally, you call p2m_memory_type_changed() when changing the ioreq
> server; but this appears to be only implemented for ept; meaning that if
> you're either running HAP on AMD or running in shadow mode, if the
> caller isn't careful (i.e., doesn't remove all p2m_ioreq_server types
> before removing itself as a server), they may end up with bogus
> permissions left over in the p2m table.
> 
> Maybe you should check for the existence of p2m->memory_type_changed
> and
> return -ENOTSUPP or something if it's not present?
> 
> More comments inline...
> 
> > Signed-off-by: Paul Durrant 
> > Signed-off-by: Yu Zhang 
> 
> > @@ -168,13 +226,65 @@ static int hvmemul_do_io(
> >  break;
> >  case X86EMUL_UNHANDLEABLE:
> >  {
> > -struct hvm_ioreq_server *s =
> > -hvm_select_ioreq_server(curr->domain, );
> > +struct hvm_ioreq_server *s;
> > +p2m_type_t p2mt;
> > +
> > +if ( is_mmio )
> > +{
> > +unsigned long gmfn = paddr_to_pfn(addr);
> > +
> > +(void) get_gfn_query_unlocked(currd, gmfn, );
> > +
> > +switch ( p2mt )
> > +{
> > +case p2m_ioreq_server:
> > +{
> > +unsigned long flags;
> > +
> > +p2m_get_ioreq_server(currd, p2mt, , );
> > +
> > +if ( !s )
> > +break;
> > +
> > +if ( (dir == IOREQ_READ &&
> > +  !(flags & P2M_IOREQ_HANDLE_READ_ACCESS)) ||
> > + (dir == IOREQ_WRITE &&
> > +  !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS)) )
> > +s = NULL;
> > +
> > +break;
> > +}
> > +case p2m_ram_rw:
> > +s = NULL;
> > +break;
> > +
> > +default:
> > +s = hvm_select_ioreq_server(currd, );
> > +break;
> > +}
> > +}
> > +else
> > +{
> > +p2mt = p2m_invalid;
> > +
> > +s = hvm_select_ioreq_server(currd, );
> > +}
> >
> >  /* If there is no suitable backing DM, just ignore accesses */
> >  if ( !s )
> >  {
> >

Re: [Xen-devel] [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.

2016-03-22 Thread Daniel De Graaf


On 03/15/2016 01:56 PM, Konrad Rzeszutek Wilk wrote:

This hypercall mirrors the XENVER_ in that it has similar functionality.
However it is designed differently:
  - No compat layer. The data structures are the same size on 32
as on 64-bit.
  - The hypercall accepts three arguments - the command, pointer to
an buffer, and the length of the buffer.
  - Each sub-ops can be "probed" for size by returning the size of
buffer that will be needed - if the buffer is NULL.
  - Subops can complete even if the buffer is too slow - truncated
data will be filled and hypercall will return -ENOBUFS.
  - VERSION_OP_commandline, VERSION_OP_changeset are privileged.
  - There are no XENVER_compile_info equivalent.
  - The hypercall can return -EPERM and toolstack/OSes are expected
to deal with it.

Suggested-by: Andrew Cooper 
Signed-off-by: Konrad Rzeszutek Wilk 


Acked-by: Daniel De Graaf 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall

2016-03-22 Thread Daniel De Graaf


On 03/15/2016 01:56 PM, Konrad Rzeszutek Wilk wrote:

All of XENVER_* have now an XSM check for their sub-ops.

The subop for XENVER_commandline is now a priviliged operation.
To not break guests we still return an string - but it is
just '\0'.

The rest: XENVER_[version|extraversion|capabilities|
parameters|get_features|page_size|guest_handle|changeset|
compile_info] behave as before - allowed by default for all
guests if using the XSM default policy or with the dummy one.

The admin can choose to change the sub-ops to be denied
as they see fit.

Also we add a local variable block.

Signed-off-by: Konrad Rzeszutek Wilk 


Acked-by: Daniel De Graaf 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Paul Durrant

> -Original Message-
> From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
> Sent: 22 March 2016 16:15
> To: Paul Durrant
> Cc: David Vrabel; Konrad Rzeszutek Wilk; Bob Liu; jgr...@suse.com; xen-
> de...@lists.xen.org; annie...@oracle.com; Roger Pau Monne
> Subject: RE: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node
> 
> Paul Durrant writes ("RE: [Xen-devel] [RFC PATCH] blkif.h: document
> scsi/0x12/0x83 node"):
> > It's getting hard to parse the thread at this point but, as I've
> > mentioned in a previous response in the thread, Windows basically
> > assumes disks are SCSI and it's up to the controller driver to make
> > it look that way.
> 
> Right.  That makes perfect sense.
> 
> > To that end the XENVBD Windows PV driver synthesizes VPD pages 80
> > and 83 but also have the ability pull base64 encoded VPD data from
> > xenstore. The synthesis is required to make Windows work properly
> 
> Right.
> 
> > but the reason for overriding it with data from xenstore is not
> > apparent.
> 
> Thanks, this is the part that we are struggling with and which it is
> proposed to now document.
> 
> > In XenServer the storage backend code does populate the
> > VPD information (with UUID information for the storage volume
> > created by XAPI) but I don't believe that this is necessary
> > behaviour in the normal case.
> 
> So the VPD here is a UUID from Xapi ?  Not anything do with the
> underlying storage ?
> 
> I think that is a plausible thing for Xapi to do.  Although I don't
> fully understand why it would be necessary, it doesn't seem
> implausible that there are guest OS's which look for and do something
> useful with such data (because they could use it, for example, to
> reliably identify a storage volume other than by reading its
> contents).
> 

I think that in the general case the VPD data is created by XAPI (or some 
element of the storage code) from the UUID. My belief is, though, there were 
cases where XAPI instead got the VPD data from the underlying storage.

> > My *assumption* is that, at some point
> > in the past, XenServer had OEM specific storage backends and the
> > requirement to run OEM s/w in guest which relied on VPD supplied by
> > the storage, and as is commonly the case xenstore was the easiest
> > way to get this information from the backend and into the guest.
> 
> Right, but the VPD information would be Xapi-generated and generic,
> rather than containing OEM information from the disk ?
> 

Not in the case of these (mythical) OEM backends.

I don't know whether it meets Oracle's use-case but I'm happy to add code in 
XENVBD to synthesize page 80 from a straightforward serial number in xenstore.

  Paul

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen components.

2016-03-22 Thread Jason Long

Can you create a Diagram?



On Monday, March 21, 2016 7:26 AM, Jason Long  wrote:
It is not a good diagram. Looking at my OS diagram.
For example, Xen have Dom0, DomU and




On Monday, March 21, 2016 6:34 AM, Andrew Cooper  
wrote:
On 21/03/16 13:28, Jason Long wrote:

Don't top post.

> Thank you but I need a diagram like below. It is an diagram about OS 
> components :
>
> http://www.c-jump.com/CIS24/Slides/Booting/images/os_components.png
>
> Can you show me similar diagram for Xen?

http://lmgtfy.com/?q=diagram+of+xen+components


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/3] Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server

2016-03-22 Thread George Dunlap

On 16/03/16 12:22, Yu Zhang wrote:
> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
> let one ioreq server claim its responsibility for the handling
> of guest pages with p2m type p2m_ioreq_server. Users of this
> HVMOP can specify whether the p2m_ioreq_server is supposed to
> handle write accesses or read ones in a flag. By now, we only
> support one ioreq server for this p2m type, so once an ioreq
> server has claimed its ownership, following calls of the
> HVMOP_map_mem_type_to_ioreq_server will fail.
> 
> Note: this HVMOP does not change the p2m type of any guest ram
> page, until the HVMOP_set_mem_type is triggered. So normally
> the steps would be the backend driver first claims its ownership
> of guest ram pages with p2m_ioreq_server type. At then sets the
> memory type to p2m_ioreq_server for specified guest ram pages.

Yu, thanks for this work.  I think it's heading in the right direction.

A couple of comments:

There's not much documentation in the code about how this is expected to
be used.

For instance, having separate flags seems to imply that you can
independently select either read intercept, write intercept, or both;
but [ept_]p2m_type_to_flags() seems to assume that READ_ACCESS implies
WRITE_ACCESS.  Do you plan to implement them separately in the future?
If not, would it be better to make the interface an enum instead?

At very least it should be documented that READ_ACCESS implies WRITE_ACCESS.

Calling the function with no flags appears to mean, "Detach the ioreq
server from this type" -- this should also be documented.

Furthermore, you appear to deal with p2m_ioreq_server types when there
is no ioreq server by making the "flags=0" case RW in
[ept_]p2m_type_to_flags.  This should be spelled out somewhere (probably
in the p2m structure definition).

Finally, you call p2m_memory_type_changed() when changing the ioreq
server; but this appears to be only implemented for ept; meaning that if
you're either running HAP on AMD or running in shadow mode, if the
caller isn't careful (i.e., doesn't remove all p2m_ioreq_server types
before removing itself as a server), they may end up with bogus
permissions left over in the p2m table.

Maybe you should check for the existence of p2m->memory_type_changed and
return -ENOTSUPP or something if it's not present?

More comments inline...

> Signed-off-by: Paul Durrant 
> Signed-off-by: Yu Zhang 

> @@ -168,13 +226,65 @@ static int hvmemul_do_io(
>  break;
>  case X86EMUL_UNHANDLEABLE:
>  {
> -struct hvm_ioreq_server *s =
> -hvm_select_ioreq_server(curr->domain, );
> +struct hvm_ioreq_server *s;
> +p2m_type_t p2mt;
> +
> +if ( is_mmio )
> +{
> +unsigned long gmfn = paddr_to_pfn(addr);
> +
> +(void) get_gfn_query_unlocked(currd, gmfn, );
> +
> +switch ( p2mt )
> +{
> +case p2m_ioreq_server:
> +{
> +unsigned long flags;
> +
> +p2m_get_ioreq_server(currd, p2mt, , );
> +
> +if ( !s )
> +break;
> +
> +if ( (dir == IOREQ_READ &&
> +  !(flags & P2M_IOREQ_HANDLE_READ_ACCESS)) ||
> + (dir == IOREQ_WRITE &&
> +  !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS)) )
> +s = NULL;
> +
> +break;
> +}
> +case p2m_ram_rw:
> +s = NULL;
> +break;
> +
> +default:
> +s = hvm_select_ioreq_server(currd, );
> +break;
> +}
> +}
> +else
> +{
> +p2mt = p2m_invalid;
> +
> +s = hvm_select_ioreq_server(currd, );
> +}
>  
>  /* If there is no suitable backing DM, just ignore accesses */
>  if ( !s )
>  {
> -rc = hvm_process_io_intercept(_handler, );
> +switch ( p2mt )
> +{
> +case p2m_ioreq_server:
> +case p2m_ram_rw:
> +rc = hvm_process_io_intercept(_handler, );
> +break;
> +
> +default:
> +rc = hvm_process_io_intercept(_handler, );
> +break;
> +}

Is it actually possible to get here with "is_mmio" true and p2mt ==
p2m_ram_rw?

If so, it would appear that this changes the handling in that case.
Before this patch, on p2m_ram_rw, you'd call hvm_select_ioreq_server(),
which would either give you a server (perhaps the default one), or you'd
be called with null_handler.

After this patch, on p2m_ram_rw, you'll always be called with mem_handler.

If this is an intentional change, you need to explan why in the
changelog (and probably also a comment here).

> @@ -1411,6 +1413,55 @@ static int

Re: [Xen-devel] [PATCH v4 16/34] xsplice: Implement payload loading

2016-03-22 Thread Konrad Rzeszutek Wilk

.. snip..
> +static void* xsplice_map_rx(const mfn_t *mfn, unsigned int pages)
> +{
> +unsigned long cur;
> +unsigned long start, end;
> +
> +start = (unsigned long)avail_virt_start;
> +end = start + pages * PAGE_SIZE;
> +
> +ASSERT(find_space_fnc);
> +
> +if ( (find_space_fnc)(pages, , ) )
> +return NULL;
> +
> +if ( end >= avail_virt_end )
> +return NULL;
> +
> +for ( cur = start; pages--; ++mfn, cur += PAGE_SIZE )
> +{
> +/*
> + * We would like to to RX, but we need to copy data in it first.
> + * See arch_xsplice_secure for how we lockdown.
> + */
> +if ( map_pages_to_xen(start, mfn_x(*mfn), 1, PAGE_HYPERVISOR_RWX) )

s/start/cur/
That sneaked in as I can see earlier versions having the right iterator.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxc/x86: XSAVE related adjustments

2016-03-22 Thread Andrew Cooper

On 22/03/16 14:46, Jan Beulich wrote:
 On 22.03.16 at 14:48,  wrote:
>>> @@ -300,9 +304,9 @@ static void xc_cpuid_config_xsave(xc_int
>>>  {
>>>  case 0: 
>>>  /* EAX: low 32bits of xfeature_enabled_mask */
>>> -regs[0] = info->xfeature_mask & 0x;
>>> +regs[0] &= info->xfeature_mask;
>>>  /* EDX: high 32bits of xfeature_enabled_mask */
>>> -regs[3] = (info->xfeature_mask >> 32) & 0x;
>>> +regs[3] &= info->xfeature_mask >> 32;
>>>  /* ECX: max size required by all HW features */
>>>  {
>>>  unsigned int _input[2] = {0xd, 0x0}, _regs[4];
>> This is an improvement on the code currently present, but is still
>> superseded by the final patch of my cpuid series.
> Is it? I did check your tree before sending, and you do only
> mechanical adjustments. In particular you don't switch to
> &= and you don't drop the pointless and-ing with 0x.

Using &= is specifically wrong and buggy.  My patch replaces
info->xfeature_mask with guest_xfeature_mask, which itself is calculated
from the guest feature availability.

The value in regs[] is dom0's view of the cpuid leaves, and are
inappropriate to be combined to make the guests view.  Observe that I
have specifically been replacing masks with assignments.

Consider the (admittedly contrived scenario of) dom0 being denied access
to xsave, while domU is intended to have access.  A less contrived
scenario is a 32bit dom0 trying to construct a 64bit PV guest.  It only
worked previously because dom0 used native cpuid which bypassed Xen
hiding the LM bit.

>
>>> @@ -325,16 +329,20 @@ static void xc_cpuid_config_xsave(xc_int
>> Between these two hunks, there is a loop bound which is also wrong.
> But seeing that your patches fix it I didn't bother stealing the fix
> from your patches.
>
>>>  if ( !info->hvm )
>>>  regs[0] &= ~XSAVES;
>>>  regs[2] &= info->xfeature_mask;
>>> -regs[3] = 0;
>>> +regs[3] &= info->xfeature_mask >> 32;
>>>  break;
>>> -case 2 ... 63: /* sub-leaves */
>>> +case 2 ... 62: /* per-component sub-leaves */
>>>  if ( !(info->xfeature_mask & (1ULL << input[1])) )
>> Now I think about it, this check is incomplete.  xfeature_mask doesn't
>> contain xss values.
> For now the XSS bitmask is blank. Looking at everything together I
> do think though that once it becomes non-zero, info->xfeature_mask
> will need to become the OR of both masks.
>
>>>  {
>>>  regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>>  break;
>>>  }
>>>  /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
>>> -regs[2] = regs[3] = 0;
>>> +regs[2] &= XSTATE_XSS | XSTATE_ALIGN64;
>>> +regs[3] = 0;
>>> +break;
>>> +default:
>>> +regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>>  break;
>> If you wish, I can fold this patch into the final patch of my cpuid series.
> I'd be fine with that, albeit (as said in the submission) the changes
> are independent of one another despite them causing conflicts.

It would be clearer than having two different patches both fixing part
of the code.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 15/28] x86/cpu: Sysctl and common infrastructure for levelling context switching

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 16:57,  wrote:
> On 21/03/16 16:23, Jan Beulich wrote:
> On 15.03.16 at 16:35,  wrote:
>>> --- a/xen/arch/x86/cpu/common.c
>>> +++ b/xen/arch/x86/cpu/common.c
>>> @@ -36,6 +36,12 @@ integer_param("cpuid_mask_ext_ecx", 
>>> opt_cpuid_mask_ext_ecx);
>>>  unsigned int opt_cpuid_mask_ext_edx = ~0u;
>>>  integer_param("cpuid_mask_ext_edx", opt_cpuid_mask_ext_edx);
>>>  
>>> +unsigned int __initdata expected_levelling_cap;
>> Especially for an __initdata item to be non-static, its use(s) should
>> be visible in a patch adding such a variable.
> 
> From the commit message:
>> This interface will currently report no capabilities.  This change is
>> scaffolding for future patches, which will introduce detection and switching
>> logic, after which the interface will report hardware capabilities 
> correctly.
> 
> This is specifically to reduce the complexity of the following patches. 
> How much more clearly do you want this stating?

The problem isn't that this wasn't stated, but that (as said) namely
for __initdata objects, which put restrictions on where use of them
is valid, introducing them in one patch and adding uses only later is
difficult to review. Quite a bit more difficult than if such variables
got added only once needed (which "bloats" the affected patch by
just a handful of lines at most).

But anyway, I've got through it, and I'm not insisting on this needing
to change. I merely wanted to point out that while there are cases
where it's reasonable to add unused objects or functions to make
later stuff easier to review, I don't think this is such a case (at least
not as far as the variables here are concerned).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 6/6] x86/pat: Document PAT initializations

2016-03-22 Thread Borislav Petkov

On Wed, Mar 16, 2016 at 06:46:59PM -0600, Toshi Kani wrote:
> Update PAT documentation to describe how PAT is initialized under
> various configurations.
> 
> Signed-off-by: Toshi Kani 
> Cc: Borislav Petkov 
> Cc: Luis R. Rodriguez 
> Cc: Juergen Gross 
> Cc: Ingo Molnar 
> Cc: H. Peter Anvin 
> Cc: Thomas Gleixner 
> ---
>  Documentation/x86/pat.txt |   32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
> index 54944c7..f619e1d 100644
> --- a/Documentation/x86/pat.txt
> +++ b/Documentation/x86/pat.txt
> @@ -196,3 +196,35 @@ Another, more verbose way of getting PAT related debug 
> messages is with
>  "debugpat" boot parameter. With this parameter, various debug messages are
>  printed to dmesg log.
>  
> +PAT Initialization
> +--
> +
> +The following table describes how PAT is initialized under various
> +configurations. PAT must be set to enable to initialize PAT MSR in order

Err "PAT MSR must be updated by Linux in order to support WC and WT" ... or so?

> +to support WC and WT attributes. Otherwise, PAT keeps PAT MSR value set
> +by BIOS.

"Otherwise, the PAT MSR has the value programmed in it by the firmware."

> Note, Xen enables WC attribute in BIOS setup for guests.
> +
> + MTRR PAT   Call Sequence   PAT State  PAT MSR
> + =
> + EE MTRR -> pat_init()  Enable OS

s/Enable/Enabled/

MTRR->pat_init() - either use function names for both or do pseudo like
so:

MTRR init -> PAT init

> + ED MTRR -> pat_init()  Disable-

s/Disable/Disabled/. Ditto for the rest.

> + DE MTRR -> pat_disable()   DisableBIOS
> + DD MTRR -> pat_disable()   Disable-
> + -np/E  nopat() -> pat_disable()DisableBIOS
> + -np/D  nopat() -> pat_disable()Disable-
> + E!P/E  MTRR -> pat_init()  DisableBIOS
> + D!P/E  MTRR -> pat_disable()   DisableBIOS
> + !M   !P/E  MTRR stub -> pat_disable()  DisableBIOS
> +
> + Legend
> + 
> + EFeature enabled in CPU
> + D Feature disabled/unsupported in CPU
> + np"nopat" boot option specified
> + !PCONFIG_X86_PAT option unset
> + !MCONFIG_MTRR option unset
> + Enable   PAT state set to enable
> + Disable  PAT state set to disable
> + OS   PAT initializes PAT MSR with OS setup
> + BIOS PAT keeps PAT MSR with BIOS setup
> +

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 5/6] x86/xen, pat: Remove PAT table init code from Xen

2016-03-22 Thread Borislav Petkov

On Wed, Mar 16, 2016 at 06:46:58PM -0600, Toshi Kani wrote:
> Xen supports PAT without MTRR for its guests.  In order to
> enable WC attribute, it was necessary for xen_start_kernel()
> to call pat_init_cache_modes() to update PAT table before
> starting guest kernel.
> 
> Now that the kernel initializes PAT table to the BIOS handoff
> state when MTRR is disabled, this Xen-specific PAT init code
> is no longer necessary.  Delete it from xen_start_kernel().
> 
> Also change pat_init_cache_modes() to a static function since
> PAT table should not be tweaked by other modules.
> 
> Signed-off-by: Toshi Kani 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Borislav Petkov 
> Cc: Luis R. Rodriguez 
> Cc: Juergen Gross 
> Cc: Ingo Molnar 
> Cc: H. Peter Anvin 
> Cc: Thomas Gleixner 
> ---
>  arch/x86/include/asm/pat.h |1 -
>  arch/x86/mm/pat.c  |2 +-
>  arch/x86/xen/enlighten.c   |9 -
>  3 files changed, 1 insertion(+), 11 deletions(-)

Jürgen, ack?

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] x86/mtrr: Fix PAT init handling when MTRR MSR is disabled

2016-03-22 Thread Borislav Petkov

Subject: [PATCH v2 4/6] x86/mtrr: Fix PAT init handling when MTRR MSR is 
disabled

s/ MSR//

On Wed, Mar 16, 2016 at 06:46:57PM -0600, Toshi Kani wrote:
> get_mtrr_state() calls pat_init() on BSP even if MTRR is disabled
> by its MSR.

s/by its MSR//

> This causes pat_init() to be called on BSP only since APs do not call

This doesn't cause that - get_mtrr_state() is called only on the BSP by
mtrr_bp_init().

> pat_init() when MTRR is disabled. This inconsistency between BSP and
> APs leads undefined behavior.

  leads to


> Move BSP's PAT init code from get_mtrr_state() to mtrr_bp_pat_init().
> Change mtrr_bp_init() to call mtrr_bp_pat_init() if MTRR is enabled.

No need for those.

> This keeps BSP's calling condition to pat_init() consistent with AP's,
> mtrr_ap_init() and mtrr_aps_init().

This one is fine.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xenconsole: update help message

2016-03-22 Thread Doug Goldstein

The help message did not include information about the --type parameter.

Signed-off-by: Doug Goldstein 
---
CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Wei Liu 
---
 tools/console/client/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/console/client/main.c b/tools/console/client/main.c
index d006fdc..f660e10 100644
--- a/tools/console/client/main.c
+++ b/tools/console/client/main.c
@@ -76,6 +76,7 @@ static void usage(const char *program) {
   "\n"
   "  -h, --help   display this help and exit\n"
   "  -n, --num N  use console number N\n"
+  "  -t, --type TYPE  console type. must be 'pv' or 'serial'\n"
   , program);
 }
 
-- 
2.7.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/6] x86/mtrr: Fix Xorg crashes in Qemu sessions

2016-03-22 Thread Borislav Petkov

On Wed, Mar 16, 2016 at 06:46:56PM -0600, Toshi Kani wrote:
> A Xorg failure on qemu32 was reported as a regression caused
> by 'commit 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is
> disabled")'. [1]  This patch fixes the regression.

I hope so.

> Negative effects of this regression were two failures in Xorg
> on qemu32 env, which were triggered by the fact that its virtual

"... with QEMU CPU model "qemu32" (-cpu qemu32) ... "

> CPU does not support MTRR. [2]
> 
>  #1. copy_process() failed in the check in reserve_pfn_range()
> 
> copy_process
>  copy_mm
>   dup_mm
>dup_mmap
> copy_page_range
>  track_pfn_copy
>   reserve_pfn_range

Here's where you describe why it failed the check and which check.

> 
>  #2. error path in copy_process() then hit WARN_ON_ONCE in
>  untrack_pfn().
> 
>  x86/PAT: Xorg:509 map pfn expected mapping type uncached-
>  minus for [mem 0xfd00-0xfdff], got write-combining
>   Call Trace:
>  dump_stack+0x58/0x79
>  warn_slowpath_common+0x8b/0xc0
>  ? untrack_pfn+0x9f/0xb0
>  ? untrack_pfn+0x9f/0xb0
>  warn_slowpath_null+0x22/0x30
>  untrack_pfn+0x9f/0xb0
>  ? __kunmap_atomic+0x54/0x110
>  unmap_single_vma+0x56f/0x580
>  ? pagevec_move_tail_fn+0xa0/0xa0
>  unmap_vmas+0x43/0x60
>  exit_mmap+0x5f/0xf0
>  mmput+0x2d/0xa0
>  copy_process.part.47+0x1229/0x1430
>  _do_fork+0xb4/0x3b0
>  SyS_clone+0x2c/0x30
>  do_syscall_32_irqs_on+0x54/0xb0
>  entry_INT80_32+0x2a/0x2a

You can delete the offsets after the "+" - they're useless.

> These negative effects are caused by two separate bugs, but they
> can be dealt in lower priority.

??

> Fixing the pat_init() issue below
> avoids Xorg to hit these cases.
> 
> When the CPU does not support MTRR, MTRR does not call pat_init(),
> which leaves PAT enabled without initializing PAT.  This pat_init()
> issue is a long-standing issue, but manifested as issue #1 (and then
> hit issue #2) with the commit

commit 9cd25aac1f44 ?

> because the memtype now tracks cache
> attribute with 'page_cache_mode'.  A WC map request is tracked as WC
> in memtype, but sets a PTE as UC (pgprot) per __cachemode2pte_tbl[].
> This caused the error in reserve_pfn_range() when it was called from
> track_pfn_copy(), which obtained pgprot from a PTE.  It converts
> pgprot to page_cache_mode, which does not necessarily result in
> the original page_cache_mode since __cachemode2pte_tbl[] redirects
> multiple types to UC.  This is a separate issue in reserve_pfn_range().

Good.

> This pat_init() issue existed before the commit, but we used pgprot
> in memtype.  Hence, we did not have issue #1 before.  But WC request
> resulted in WT in effect because WC pgrot is actually WT when PAT
> is not initialized.  This is not how it was designed to work.  When
> PAT is set to disable properly, WC is converted to UC.  The use of
> WT can result in a system crash if the target range does not support
> WT.  Fortunately, nobody ran into such issue before.
> 
> To fix this pat_init() issue, PAT code has been enhanced to provide
> pat_disable() interface, which disables the OS to initialize PAT MSR,

... prevents the OS from initializing the PAT 
MSR.

> and sets PAT table to the BIOS handoff state.

> This patch changes
> MTRR code to call pat_disable() when MTRR is disabled as PAT cannot
> be initialized in this case.  This sets PAT to disable properly, and
> makes PAT code to bypass the memtype check.  This avoids issue #1
> (which can be dealt in lower priority).

You don't need all that text from "This patch ..." on - we can see that
in the diff. The commit message needs to contain "why" not "what".

> 
> [1]: https://lkml.org/lkml/2016/3/3/828
> [2]: https://lkml.org/lkml/2016/3/4/775

I believe I already mentioned that links should be supplied like this:

Link: http://lkml.kernel.org/r/

> Signed-off-by: Toshi Kani 
> Cc: Borislav Petkov 
> Cc: Luis R. Rodriguez 
> Cc: Juergen Gross 
> Cc: Ingo Molnar 
> Cc: H. Peter Anvin 
> Cc: Thomas Gleixner 
> ---
>  arch/x86/include/asm/mtrr.h |6 +-
>  arch/x86/kernel/cpu/mtrr/main.c |   10 +-
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> index b94f6f6..634c593 100644
> --- a/arch/x86/include/asm/mtrr.h
> +++ b/arch/x86/include/asm/mtrr.h
> @@ -24,6 +24,7 @@
>  #define _ASM_X86_MTRR_H
>  
>  #include 
> +#include 
>  
>  
>  /*
> @@ -83,9 +84,12 @@ static inline int mtrr_trim_uncached_memory(unsigned long 
> end_pfn)
>  static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
>  {
>  }
> +static inline void mtrr_bp_init(void)
> +{
> + pat_disable("Skip PAT initialization");

Make that more user-friendly:

   "MTRRs

Re: [Xen-devel] [PATCH v2 2/6] x86/mm/pat: Add pat_disable() interface

2016-03-22 Thread Borislav Petkov

On Wed, Mar 16, 2016 at 06:46:55PM -0600, Toshi Kani wrote:
> In preparation to fix a regression caused by 'commit 9cd25aac1f44
> ("x86/mm/pat: Emulate PAT when it is disabled")', PAT needs to
> provide an interface that disables the OS to initialize PAT MSR.

prevents the OS from initializing the PAT MSR.

> 
> PAT MSR initialization must be done on all CPUs with the specific

s/with/using/

> sequence of operations defined in Intel SDM.  This requires MTRR
   ^
  the

s/MTRR/MTRRs/

> to be enabled since pat_init() is called as part of MTRR init
> from mtrr_rendezvous_handler().
> 
> Change pat_disable() as the interface to disable the OS to initialize
> PAT MSR, and set PAT table with pat_keep_handoff_state().  This
> interface can be called when PAT initialization may not be performed.

This paragraph reads funky and I can't really parse what it is trying to
say.

> This also assures that pat_disable() called from pat_bsp_init()
> to set PAT table properly when CPU does not support PAT.
> 
> Signed-off-by: Toshi Kani 
> Cc: Borislav Petkov 
> Cc: Luis R. Rodriguez 
> Cc: Juergen Gross 
> Cc: Robert Elliott 
> Cc: Ingo Molnar 
> Cc: H. Peter Anvin 
> Cc: Thomas Gleixner 
> ---
>  arch/x86/include/asm/pat.h |1 +
>  arch/x86/mm/pat.c  |   21 ++---
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
> index ca6c228..016142b 100644
> --- a/arch/x86/include/asm/pat.h
> +++ b/arch/x86/include/asm/pat.h
> @@ -5,6 +5,7 @@
>  #include 
>  
>  bool pat_enabled(void);
> +void pat_disable(const char *reason);
>  extern void pat_init(void);
>  void pat_init_cache_modes(u64);
>  
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index e0a34b0..48d1619 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -40,11 +40,26 @@
>  static bool boot_cpu_done;
>  
>  static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
> +static void pat_keep_handoff_state(void);
>  
> -static inline void pat_disable(const char *reason)
> +/**
> + * pat_disable() - Disable the OS to initialize PAT MSR



Err, what? The function name can't be more clear.

> + *
> + * This function disables the OS to initialize PAT MSR, and calls

"prevents the OS from initializing the PAT MSR..."

> + * pat_keep_handoff_state() to set PAT table to the handoff state.

We can see what is calls. You're explaining *what* the code does instead
of *why* again.

> + */
> +void pat_disable(const char *reason)
>  {

Why aren't you checking __pat_enabled here?

if (!__pat_enabled)
return;

You can save yourself the other guards in that function, especially that
pr_err() below.

> + if (boot_cpu_done) {
> + pr_err("x86/PAT: PAT cannot be disabled after initialization "
> +"(attempting: %s)\n", reason);

Please integrate checkpatch.pl into your patch creation workflow as it
sometimes has valid complaints:

WARNING: quoted string split across lines
#79: FILE: arch/x86/mm/pat.c:55:
+   pr_err("x86/PAT: PAT cannot be disabled after initialization "
+  "(attempting: %s)\n", reason);

More to the point: why do we need that pr_err() call? What is that
supposed to tell the user?

I think it is more for the programmer to catch wrong use of
pat_disable() and then it should be WARN_ONCE() or so...

> + return;
> + }
> +
>   __pat_enabled = 0;
>   pr_info("x86/PAT: %s\n", reason);
> +
> + pat_keep_handoff_state();
>  }
>  
>  static int __init nopat(char *str)
> @@ -202,7 +217,7 @@ static void pat_bsp_init(u64 pat)
>  {
>   u64 tmp_pat;
>  
> - if (!cpu_has_pat) {
> + if (!boot_cpu_has(X86_FEATURE_PAT)) {
>   pat_disable("PAT not supported by CPU.");
>   return;
>   }
> @@ -220,7 +235,7 @@ static void pat_bsp_init(u64 pat)
>  
>  static void pat_ap_init(u64 pat)
>  {
> - if (!cpu_has_pat) {
> + if (!boot_cpu_has(X86_FEATURE_PAT)) {
>   /*
>* If this happens we are on a secondary CPU, but switched to
>* PAT on the boot CPU. We have no way to undo PAT.

Those last two hunks are unrelated changes and should be a separate
patch.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] help .. xl :command not found

2016-03-22 Thread Marwa Hamza

i builded xen from the source
git clone git://xenbits.xen.org/xen.git
then i used linux as a dom0
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
with ubuntu as file system

it worked successfully .. i get a shell but when i write sudo xl list --->
xl command not found
i'm wondering why .. is there something i have missed ..
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 20/28] x86/domctl: Update PV domain cpumasks when setting cpuid policy

2016-03-22 Thread Andrew Cooper

On 21/03/16 17:06, Jan Beulich wrote:
 On 15.03.16 at 16:35,  wrote:
>> @@ -87,6 +88,129 @@ static void update_domain_cpuid_info(struct domain *d,
>>  d->arch.x86_model = (ctl->eax >> 4) & 0xf;
>>  if ( d->arch.x86 >= 0x6 )
>>  d->arch.x86_model |= (ctl->eax >> 12) & 0xf0;
>> +
>> +if ( is_pv_domain(d) && ((levelling_caps & LCAP_1cd) == LCAP_1cd) )
>> +{
>> +uint64_t mask = cpuidmask_defaults._1cd;
>> +uint32_t ecx = ctl->ecx & pv_featureset[FEATURESET_1c];
>> +uint32_t edx = ctl->edx & pv_featureset[FEATURESET_1d];
>> +
>> +/*
>> + * Must expose hosts HTT value so a guest using native CPU can
> DYM "native CPUID" here?

Yes.

>
>> + * correctly interpret other leaves which cannot be masked.
>> + */
>> +edx &= ~cpufeat_mask(X86_FEATURE_HTT);
>> +if ( cpu_has_htt )
>> +edx |= cpufeat_mask(X86_FEATURE_HTT);
>> +
>> +switch ( boot_cpu_data.x86_vendor )
>> +{
>> +case X86_VENDOR_INTEL:
>> +mask &= ((uint64_t)edx << 32) | ecx;
>> +break;
>> +
>> +case X86_VENDOR_AMD:
>> +mask &= ((uint64_t)ecx << 32) | edx;
>> +
>> +/* Fast-forward bits - Must be set. */
>> +if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
> Missing blanks inside parens. Also I think the comment should be
> expanded to include the fact that the need to do this here but
> not in the Intel case was empirically(?) determined

It was empirically determined that AMD did have to behave like this.

>  - just in case
> something isn't quite right with this on some future hardware,
> and readers (possibly including ourselves) wonder.

Intel and AMD masking MSRs are fundamentally different, and function
differently.

Intel MSRs are documented strictly an & mask, and experimentally, are
applied before OSXSAVE/APIC are fast-forwarded from hardware.

AMD MSRs are documented strictly as an override, with a caution to avoid
setting bits which aren't actually supported in the hardware, and
experimentally need the fast-forward bits set if fast-forwarding is to
occur.

I suppose I could split this up and move this logic into
cpu/{amd,intel}.c as appropriate, and call a vendor specific function to
appropriately translate a guest cpuid value into a mask value.  However,
a lot of common logic would end up needing to be duplicated.

>
>> +case X86_VENDOR_AMD:
>> +/*
>> + * Must expose hosts CMP_LEGACY value so a guest using 
>> native
>> + * CPU can correctly interpret other leaves which cannot be
> CPUID again?

Yes

>
>> + * masked.
>> + */
>> +ecx &= ~cpufeat_mask(X86_FEATURE_CMP_LEGACY);
>> +if ( cpu_has_cmp_legacy )
>> +ecx |= cpufeat_mask(X86_FEATURE_CMP_LEGACY);
>> +
>> +mask &= ((uint64_t)ecx << 32) | edx;
>> +
>> +/* Fast-forward bits - Must be set. */
>> +ecx = 0;
>> +edx = cpufeat_mask(X86_FEATURE_APIC);
>> +
>> +mask |= ((uint64_t)ecx << 32) | edx;
> This certainly looks strange (as in more complicated than it needs to
> be) - why not just mask |= cpufeat_mask(X86_FEATURE_APIC)?

Copy and paste from the leaf 1 case.  I specifically wanted to avoid
directly manipulating the mask value, as it is already confusing enough
which way ecx and edx are.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 20/28] x86/domctl: Update PV domain cpumasks when setting cpuid policy

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 17:37,  wrote:
> On 21/03/16 17:06, Jan Beulich wrote:
> On 15.03.16 at 16:35,  wrote:
>>> +switch ( boot_cpu_data.x86_vendor )
>>> +{
>>> +case X86_VENDOR_INTEL:
>>> +mask &= ((uint64_t)edx << 32) | ecx;
>>> +break;
>>> +
>>> +case X86_VENDOR_AMD:
>>> +mask &= ((uint64_t)ecx << 32) | edx;
>>> +
>>> +/* Fast-forward bits - Must be set. */
>>> +if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
>> Missing blanks inside parens. Also I think the comment should be
>> expanded to include the fact that the need to do this here but
>> not in the Intel case was empirically(?) determined
> 
> It was empirically determined that AMD did have to behave like this.
> 
>>  - just in case
>> something isn't quite right with this on some future hardware,
>> and readers (possibly including ourselves) wonder.
> 
> Intel and AMD masking MSRs are fundamentally different, and function
> differently.
> 
> Intel MSRs are documented strictly an & mask, and experimentally, are
> applied before OSXSAVE/APIC are fast-forwarded from hardware.
> 
> AMD MSRs are documented strictly as an override, with a caution to avoid
> setting bits which aren't actually supported in the hardware, and
> experimentally need the fast-forward bits set if fast-forwarding is to
> occur.
> 
> I suppose I could split this up and move this logic into
> cpu/{amd,intel}.c as appropriate, and call a vendor specific function to
> appropriately translate a guest cpuid value into a mask value.  However,
> a lot of common logic would end up needing to be duplicated.

I'm certainly against code duplication. And just to re-iterate - all
I have been asking for was to write some of what you've said in
a source comment.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Konrad Rzeszutek Wilk

On Tue, Mar 22, 2016 at 04:14:40PM +, Ian Jackson wrote:
> Paul Durrant writes ("RE: [Xen-devel] [RFC PATCH] blkif.h: document 
> scsi/0x12/0x83 node"):
> > It's getting hard to parse the thread at this point but, as I've
> > mentioned in a previous response in the thread, Windows basically
> > assumes disks are SCSI and it's up to the controller driver to make
> > it look that way.
> 
> Right.  That makes perfect sense.
> 
> > To that end the XENVBD Windows PV driver synthesizes VPD pages 80
> > and 83 but also have the ability pull base64 encoded VPD data from
> > xenstore. The synthesis is required to make Windows work properly
> 
> Right.
> 
> > but the reason for overriding it with data from xenstore is not
> > apparent.
> 
> Thanks, this is the part that we are struggling with and which it is
> proposed to now document.
> 
> > In XenServer the storage backend code does populate the
> > VPD information (with UUID information for the storage volume
> > created by XAPI) but I don't believe that this is necessary
> > behaviour in the normal case.
> 
> So the VPD here is a UUID from Xapi ?  Not anything do with the
> underlying storage ?

For this to be 'correct' the SCSI INQ binary result data (which is
what the base64 would encode) would need to follow the SCSI op codes.

There are bits in it that MUST be defined to make any sense.
The serial number can be an UUID or such.
> 
> I think that is a plausible thing for Xapi to do.  Although I don't
> fully understand why it would be necessary, it doesn't seem
> implausible that there are guest OS's which look for and do something
> useful with such data (because they could use it, for example, to
> reliably identify a storage volume other than by reading its
> contents).
> 
> > My *assumption* is that, at some point
> > in the past, XenServer had OEM specific storage backends and the
> > requirement to run OEM s/w in guest which relied on VPD supplied by
> > the storage, and as is commonly the case xenstore was the easiest
> > way to get this information from the backend and into the guest.
> 
> Right, but the VPD information would be Xapi-generated and generic,
> rather than containing OEM information from the disk ?
> 
> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Konrad Rzeszutek Wilk

On Tue, Mar 22, 2016 at 04:11:35PM +, Ian Jackson wrote:
> Konrad Rzeszutek Wilk writes ("Re: [Xen-devel] [RFC PATCH] blkif.h: document 
> scsi/0x12/0x83 node"):
> > > I don't think this is a sterile academic conversation which would (if
> > > satisfactorily answered) have no real implications.  Rather, if we
> > > understood the use case properly, there would probably be implications
> > > for whether how this functionality ought to be exposed via the libxl
> > > API; whether blkfront should consume this information by default; and
> > > so on.
> > 
> > Lets take one step back.
> > 
> > What is the problem with documenting an existing use-case in blkif.h?
> > 
> > Regardless of whether it is viewed "legacy" by some folks and by others
> > drivers it is still in use?
> 
> Firstly, if you want to document it then the documentation needs to
> explain what the semantics are supposed to be.  At the moment I'm not
> sure I see a correct and useful set of semantics.  I had a go at
> writing something but several of the things I wrote in it seem to have
> been contradicted by comments from Bob Liu and others.
> 
> It is difficult to write a suitable document.  Ideally the document
> would give an example use case.  But it appears that the only use case
> is super secret!

:-)

> 
> Secondly, apparently this is not regarded as deprecated.  Bob Liu
> says he plans to add support for it to blkfront.

One step at a time. First lets document this XenStore key in blkif.h
so it is know what it does and the semantics of it.

> 
> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 16:27,  wrote:
> On Tue, Mar 22, 2016 at 03:12:02PM +, Ian Jackson wrote:
>> David Vrabel writes ("Re: [Xen-devel] [RFC PATCH] blkif.h: document 
> scsi/0x12/0x83 node"):
>> > On 22/03/16 14:10, Konrad Rzeszutek Wilk wrote:
>> > > Just think of it as a black box.
>> > 
>> > This isn't sufficient.
>> > 
>> > You are presenting a solution but have not properly described the
>> > problem so no one can evaluate whether the solution is appropriate.
>> 
>> I'm sorry to be awkward, but I'm afraid I do agree with David Vrabel's
>> messages in this thread.
>> 
>> I don't think this is a sterile academic conversation which would (if
>> satisfactorily answered) have no real implications.  Rather, if we
>> understood the use case properly, there would probably be implications
>> for whether how this functionality ought to be exposed via the libxl
>> API; whether blkfront should consume this information by default; and
>> so on.
> 
> Lets take one step back.
> 
> What is the problem with documenting an existing use-case in blkif.h?

Namely taking into account Paul's almost simultaneous reply I'm
in doubt whether this is a use case or an abuse of some readily
available mechanism.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Ian Jackson

Paul Durrant writes ("RE: [Xen-devel] [RFC PATCH] blkif.h: document 
scsi/0x12/0x83 node"):
> It's getting hard to parse the thread at this point but, as I've
> mentioned in a previous response in the thread, Windows basically
> assumes disks are SCSI and it's up to the controller driver to make
> it look that way.

Right.  That makes perfect sense.

> To that end the XENVBD Windows PV driver synthesizes VPD pages 80
> and 83 but also have the ability pull base64 encoded VPD data from
> xenstore. The synthesis is required to make Windows work properly

Right.

> but the reason for overriding it with data from xenstore is not
> apparent.

Thanks, this is the part that we are struggling with and which it is
proposed to now document.

> In XenServer the storage backend code does populate the
> VPD information (with UUID information for the storage volume
> created by XAPI) but I don't believe that this is necessary
> behaviour in the normal case.

So the VPD here is a UUID from Xapi ?  Not anything do with the
underlying storage ?

I think that is a plausible thing for Xapi to do.  Although I don't
fully understand why it would be necessary, it doesn't seem
implausible that there are guest OS's which look for and do something
useful with such data (because they could use it, for example, to
reliably identify a storage volume other than by reading its
contents).

> My *assumption* is that, at some point
> in the past, XenServer had OEM specific storage backends and the
> requirement to run OEM s/w in guest which relied on VPD supplied by
> the storage, and as is commonly the case xenstore was the easiest
> way to get this information from the backend and into the guest.

Right, but the VPD information would be Xapi-generated and generic,
rather than containing OEM information from the disk ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 13/22] arm/acpi: Map the new created EFI and ACPI tables to Dom0

2016-03-22 Thread Julien Grall


Hi Shannon,

On 22/03/16 13:18, Shannon Zhao wrote:

On 2016年03月22日 08:42, Julien Grall wrote:

Hi Shannon,

On 17/03/2016 09:40, Shannon Zhao wrote:

From: Shannon Zhao 

Map the UEFI and ACPI tables which we created to non-RAM space in Dom0.

Signed-off-by: Shannon Zhao 
Reviewed-by: Stefano Stabellini 
---
   xen/arch/arm/domain_build.c | 15 +++
   1 file changed, 15 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 008fc76..e036887 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1691,6 +1691,21 @@ static int prepare_acpi(struct domain *d,
struct kernel_info *kinfo)
   acpi_create_efi_mmap_table(d->arch.efi_acpi_gpa,
d->arch.efi_acpi_len,
  d->arch.efi_acpi_table, >mem,
tbl_add);

+/* Map the EFI and ACPI tables to Dom0 */
+rc = map_regions_rw(d,
+paddr_to_pfn(d->arch.efi_acpi_gpa),
+PFN_UP(d->arch.efi_acpi_len),
+
paddr_to_pfn(virt_to_maddr(d->arch.efi_acpi_table)));


The ACPI/EFI tables could potentially have data in the cache but are not
written into the memory (because Xen is mapping the RAM with caching
enabled). However, DOM0 may decide to map it with cache disabled.
Therefore it would be possible for the domain to see wrong data.

So I think you need to clean the cache for this region.

Oh, that would be good. Is there any existing function I can use?


You could reuse p2m_cache_flush. However this function will only flush 
cache for p2m_ram_* entries.


I think the best way would be to extend the CACHEFLUSH operations and 
maybe p2m_cache_flush (?).


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Ian Jackson

Konrad Rzeszutek Wilk writes ("Re: [Xen-devel] [RFC PATCH] blkif.h: document 
scsi/0x12/0x83 node"):
> > I don't think this is a sterile academic conversation which would (if
> > satisfactorily answered) have no real implications.  Rather, if we
> > understood the use case properly, there would probably be implications
> > for whether how this functionality ought to be exposed via the libxl
> > API; whether blkfront should consume this information by default; and
> > so on.
> 
> Lets take one step back.
> 
> What is the problem with documenting an existing use-case in blkif.h?
> 
> Regardless of whether it is viewed "legacy" by some folks and by others
> drivers it is still in use?

Firstly, if you want to document it then the documentation needs to
explain what the semantics are supposed to be.  At the moment I'm not
sure I see a correct and useful set of semantics.  I had a go at
writing something but several of the things I wrote in it seem to have
been contradicted by comments from Bob Liu and others.

It is difficult to write a suitable document.  Ideally the document
would give an example use case.  But it appears that the only use case
is super secret!

Secondly, apparently this is not regarded as deprecated.  Bob Liu
says he plans to add support for it to blkfront.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall

2016-03-22 Thread Konrad Rzeszutek Wilk

On Mon, Mar 21, 2016 at 05:22:09AM -0600, Jan Beulich wrote:
> >>> On 18.03.16 at 18:26,  wrote:
> > On Fri, Mar 18, 2016 at 05:55:55AM -0600, Jan Beulich wrote:
> >> >>> On 15.03.16 at 18:56,  wrote:
> >> > @@ -223,12 +224,15 @@ void __init do_initcalls(void)
> >> >  /*
> >> >   * Simple hypercalls.
> >> >   */
> >> > -
> >> >  DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >> 
> >> Please retain the blank line, as it relates to more than just this
> >> one function.
> > 
> > Done! (stray change).
> 
> Considering this I'm not puzzled by ...
> 
> >  case XENVER_guest_handle:
> > -if ( copy_to_guest(arg, current->domain->handle,
> > -   ARRAY_SIZE(current->domain->handle)) )
> > +{
> > +xen_domain_handle_t hdl;
> > +
> > +if ( deny )
> > +memset(, 0, ARRAY_SIZE(hdl));
> > +
> > +BUILD_BUG_ON(ARRAY_SIZE(current->domain->handle) != 
> > ARRAY_SIZE(hdl));
> > +
> > +if ( copy_to_guest(arg, deny ? hdl : current->domain->handle,
> > +   ARRAY_SIZE(hdl) ) )
> >  return -EFAULT;
> >  return 0;
> > -
> > +}
> >  case XENVER_commandline:
> 
> ... this.

Wow. That is some sharp eyes!
> 
> > --- a/xen/include/xsm/dummy.h
> > +++ b/xen/include/xsm/dummy.h
> > @@ -727,3 +727,27 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG 
> > struct domain *d, unsigned int
> >  }
> >  
> >  #endif /* CONFIG_X86 */
> > +
> > +#include 
> > +static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
> > +{
> > +XSM_ASSERT_ACTION(XSM_OTHER);
> > +switch ( op )
> > +{
> > +case XENVER_version:
> > +case XENVER_platform_parameters:
> > +case XENVER_get_features:
> > +/* The sub-ops ignores the permission check and returns data. */
> 
> ignore ... and return ...
> 
> With those minor things addressed I think the patch can have my ack.

Thank you!

Now I just need Daniel's Ack again.

From 1ccf59abdd2cd9228f0159dce77fe404d98c7300 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Fri, 11 Mar 2016 21:40:43 -0500
Subject: [PATCH] xsm/xen_version: Add XSM for most of xen_version hypercall

Most of XENVER_* have now an XSM check for their sub-ops.

The subop for XENVER_commandline is now a priviliged operation.
To not break guests we still return an string - but it is
just '\0'.

The XENVER_[version|platform_parameters|get_features] - will
always return an value to the guest.

The rest: XENVER_[extraversion|capabilities|page_size|
guest_handle|changeset| compile_info] behave as before -
allowed by default for all guests if using the XSM default
policy or with the dummy one. And if the system admin
wants to curtail access to some of them - they can do
that now with a non-default XSM policy.

Also we add a local variable block.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Jan Beulich 

---
Cc: Daniel De Graaf 
Cc: Ian Jackson 
Cc: Stefano Stabellini 
Cc: Wei Liu 

v2: Do XSM check for all the XENVER_ ops.
 - Add empty data conditions.
 - Return  for priv subops.
 - Move extraversion from priv to normal. Drop the XSM check
for the non-priv subops.
v3:
 - Add +1 for strlen(xen_deny()) to include NULL. Move changeset,
compile_info to non-priv subops.
 - Remove the \0 on xen_deny()
 - Add new XSM domain for xenver hypercall. Add all subops to it.
 - Remove the extra line, Add Ack from Daniel
v4:
 - Rename the XSM from xen_version_op to xsm_xen_version.
   Prefix the types with 'xen' to distinguish it from another
   hypercall performing similar operation. Removed Ack from Daniel
   as it was so large. Add local variable block.
v5:
 - Make XENVER_platform_parameters,get_features,version be excluded
   from the XSM check per Jans' review. Add BUILD_BUG_CHECK and fix
   odd line removals. Remove stray changes and fix spelling.
---
 tools/flask/policy/policy/modules/xen/xen.te | 14 ++
 xen/common/kernel.c  | 42 +---
 xen/common/version.c | 15 ++
 xen/include/xen/version.h|  1 +
 xen/include/xsm/dummy.h  | 24 
 xen/include/xsm/xsm.h|  6 
 xen/xsm/dummy.c  |  1 +
 xen/xsm/flask/hooks.c| 39 ++
 xen/xsm/flask/policy/access_vectors  | 25 +
 xen/xsm/flask/policy/security_classes|  1 +
 10 files changed, 158 insertions(+), 10 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te 
b/tools/flask/policy/policy/modules/xen/xen.te
index d35ae22..18f49b5 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++

Re: [Xen-devel] [PATCH v3 09/28] xen/x86: Calculate maximum host and guest featuresets

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 16:01,  wrote:
> On 22/03/16 14:52, Jan Beulich wrote:
> On 22.03.16 at 15:37,  wrote:
>>> On 22/03/16 12:39, Jan Beulich wrote:
>>> On 22.03.16 at 12:23,  wrote:
> On 18/03/16 17:09, Jan Beulich wrote:
> On 15.03.16 at 16:35,  wrote:
>>> +static void __init calculate_hvm_featureset(void)
>>> +{
>>> +unsigned int i;
>>> +const uint32_t *hvm_featuremask;
>>> +
>>> +if ( !hvm_enabled )
>>> +return;
>>> +
>>> +hvm_featuremask = hvm_funcs.hap_supported ?
>>> +hvm_hap_featuremask : hvm_shadow_featuremask;
>> I know I asked about this before, and it still puzzles me. Could you
>> add some explanation of this to the commit message or a comment?
> I am not sure what more I can say about it.
>
> The toolstack needs the be able to see the difference between a guest
> started in shadow mode on hap hardware, to be able to correctly
> calculate whether it can migrate to hap-incapable hardware. 
 Difference to what? A HAP guest? How would that difference be
 invisible if you surfaced two feature sets? Even more - with just
 one feature set exposed, how would the tool stack see that very
 difference?
>>> At the moment, a toolstack creates a domain, and has no clue what the
>>> domain can actually see in cpuid.  In particular, it can't retrieve the
>>> "lost bits" which Xen dynamically disables.
>> One more reason to expose the shadow and HAP feature sets
>> separately, it would seem to me. Then the tool stack can have
>> a clue.
> 
> When "get_cpuid" works properly, it will no longer be needed, which is
> specifically why I am not putting it in the API.

This is a sysctl iirc, so not set in stone if added.

Jan

> The toolstack will be able to construct an arbitrary domain, call
> get_cpuid to see exactly what the guest will see, and use that as the
> basis of migrateability decisions.
> 
> It is just awkward that this information is needed at the moment when a
> suitable interface from Xen isn't available.
> 
> ~Andrew




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 16:52,  wrote:
> On Mon, Mar 21, 2016 at 06:45:28AM -0600, Jan Beulich wrote:
>> >>> On 18.03.16 at 20:22,  wrote:
>> > @@ -380,6 +388,133 @@ DO(xen_version)(int cmd, 
>> > XEN_GUEST_HANDLE_PARAM(void) arg)
>> >  return -ENOSYS;
>> >  }
>> >  
>> > +static const char *capabilities_info(unsigned int *len)
>> > +{
>> > +static xen_capabilities_info_t __read_mostly cached_cap;
>> > +static unsigned int __read_mostly cached_cap_len;
>> > +static bool_t cached;
>> > +
>> > +if ( unlikely(!cached) )
>> > +{
>> > +arch_get_xen_caps(_cap);
>> > +cached_cap_len = strlen(cached_cap) + 1;
>> > +cached = 1;
>> > +}
>> 
>> I'm sorry for noticing this only now, but without any locking this is
>> unsafe: x86's arch_get_xen_caps() using safe_strcat() to fill the
>> buffer, simultaneous invocations would possibly produce garbled
>> output to all (i.e. also subsequently started) guests. Either use a
>> real lock here, or make the guard a tristate one, which gets
>> transitioned e.g. from 0 to -1 by the first one coming here (doing
>> the initialization), with everyone else waiting for it to become +1
>> (to which the initializing party sets it once it is done).
> 
> That would indeed be bad.
> 
> What if an _init_ code called this to 'pre-cache' it?

That's one of the options you have.

>> > --- a/xen/include/xsm/dummy.h
>> > +++ b/xen/include/xsm/dummy.h
>> > @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version 
>> > (XSM_DEFAULT_ARG uint32_t op)
>> >  return xsm_default_action(XSM_PRIV, current->domain, NULL);
>> >  }
>> >  }
>> > +
>> > +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
>> > +{
>> > +XSM_ASSERT_ACTION(XSM_OTHER);
>> > +switch ( op )
>> > +{
>> > +case XEN_VERSION_version:
>> > +case XEN_VERSION_extraversion:
>> > +case XEN_VERSION_capabilities:
>> > +case XEN_VERSION_platform_parameters:
>> > +case XEN_VERSION_get_features:
>> > +case XEN_VERSION_pagesize:
>> > +case XEN_VERSION_guest_handle:
>> > +/* These MUST always be accessible to any guest by default. */
>> > +return xsm_default_action(XSM_HOOK, current->domain, NULL);
>> > +default:
>> > +return xsm_default_action(XSM_PRIV, current->domain, NULL);
>> 
>> Considering that we seem to have settled on some exceptions here
>> for the change adding XSM check to the legacy version op, do you
>> really think going with no exception at all here is the right approach?
> 
>> Because if we do, that'll prevent guests getting fully converted over
>> to the new interface. Of course, we could also make this conversion
>> specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
>> this new interface.
> 
> No no. I think convesion is the right long-term goal. 
> 
> However the nice thing about this hypercall is that it can return -EPERM.
> 
> Making it always return an value for XEN_VERSION_version,
> XEN_VERSION_platform_parameters, XEN_VERSION_get_features means that
> there are some exceptions to this "can return -EPERM" as they will
> be guaranteed an postive return value. They can ignore the -EPERM
> case.
> 
> And means that guest can still take shortcuts.
> 
> I agree with you that guests need these hypercalls but at the same
> time I am not sure what can be done so they don't fall flat on their
> faces if they are presented with -EPERM. The Linux xenbus_init can be
> modified to deal with this returning -EPERM where it makes some assumptions.
> 
> But in a likelyhood it is the bad assumption!

I'm afraid I can't conclude what you mean to say with all of the
above.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [seabios baseline-only test] 44274: tolerable FAIL

2016-03-22 Thread Platform Team regression test user

This run is configured for baseline tests only.

flight 44274 seabios real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/44274/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 44231
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 44231
 test-amd64-amd64-qemuu-nested-intel 16 debian-hvm-install/l1/l2 fail like 44231

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass

version targeted for testing:
 seabios  fc0878926bfe644f07fba27228432cada68ec3ba
baseline version:
 seabios  dce99e01b6bfc51175bdf32612fd4f2738e5c3c8

Last test of basis44231  2016-03-08 05:22:26 Z   14 days
Testing same since44274  2016-03-22 11:51:33 Z0 days1 attempts


People who touched revisions under test:
  Kevin O'Connor 
  Matt DeVillier 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-qemuu-nested-intel  fail
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass
 test-amd64-amd64-xl-qemuu-winxpsp3   pass
 test-amd64-i386-xl-qemuu-winxpsp3pass



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


commit fc0878926bfe644f07fba27228432cada68ec3ba
Author: Matt DeVillier 
Date:   Wed Mar 16 20:37:35 2016 -0500

sdcard: skip detection of PCI sdhci controllers if etc/sdcard used

Some BayTrail ChromeOS devices have the eMMC controller hidden (thus
requiring the use of etc/sdcard), while others do not, making it
problematic to have a single payload which serves all devices
properly.   Therefore, if the CBFS contains etc/sdcard entries, skip
detection of any visible PCI sdhci controllers in order to avoid
duplicate entries in the boot menu.

Signed-off-by: Matt DeVillier 
Signed-off-by: Kevin O'Connor 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 15/28] x86/cpu: Sysctl and common infrastructure for levelling context switching

2016-03-22 Thread Andrew Cooper

On 21/03/16 16:23, Jan Beulich wrote:
 On 15.03.16 at 16:35,  wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -36,6 +36,12 @@ integer_param("cpuid_mask_ext_ecx", 
>> opt_cpuid_mask_ext_ecx);
>>  unsigned int opt_cpuid_mask_ext_edx = ~0u;
>>  integer_param("cpuid_mask_ext_edx", opt_cpuid_mask_ext_edx);
>>  
>> +unsigned int __initdata expected_levelling_cap;
> Especially for an __initdata item to be non-static, its use(s) should
> be visible in a patch adding such a variable.

From the commit message:
> This interface will currently report no capabilities.  This change is
> scaffolding for future patches, which will introduce detection and switching
> logic, after which the interface will report hardware capabilities correctly.

This is specifically to reduce the complexity of the following patches. 
How much more clearly do you want this stating?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 12/22] arm/acpi: Prepare EFI memory descriptor for Dom0

2016-03-22 Thread Julien Grall


Hi Shannon,

On 22/03/16 13:16, Shannon Zhao wrote:

On 2016年03月22日 00:51, Julien Grall wrote:

+memory_map[offset].Attribute = EFI_MEMORY_WB;
+}
+
+for( i = 0; i < acpi_mem.nr_banks; i++, offset++ )
+{
+memory_map[offset].Type = EfiACPIReclaimMemory;
+memory_map[offset].PhysicalStart = acpi_mem.bank[i].start;
+memory_map[offset].NumberOfPages =
PFN_UP(acpi_mem.bank[i].size);


Ditto

You are also assuming that acpi_mem.bank[i].size will always be aligned
to 4KB. If so, we may expose unwanted data to the guest.

Based on how the field is set, I would add a BUG_ON to ensure this
condition.

UEFI spec says
"EFI memory descriptors of type EfiACPIReclaimMemory and
EfiACPIMemoryNVS must be aligned on a 4 KiB boundary and must be a
multiple of 4 KiB in size."

So I think the size is aligned to 4kb, right?


Right. I was suggested to add a BUG_ON to document the constraint and 
ensure nobody will play with acpi_mem outside EFI.


A such check would also be nice for mem->bank[i].size;

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/5] x86/time: implement tsc as clocksource

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 16:51,  wrote:

> 
> On 03/22/2016 12:46 PM, Jan Beulich wrote:
> On 22.03.16 at 13:41,  wrote:
>> 
>>>
>>> On 03/18/2016 08:21 PM, Andrew Cooper wrote:
 On 17/03/16 16:12, Joao Martins wrote:
> Introduce support for using TSC as platform time which is the highest
> resolution time and most performant to get (~20 nsecs).  Though there
> are also several problems associated with its usage, and there isn't a
> complete (and architecturally defined) guarantee that all machines
> will provide reliable and monotonic TSC across all CPUs, on different
> sockets and different P/C states.  I believe Intel to be the only that
> can guarantee that. For this reason it's set with less priority when
> compared to HPET unless adminstrator changes "clocksource" boot option
> to "tsc". Upon initializing it, we also check for time warps and
> appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
> NONSTOP_TSC. And in case none of these conditions are met, we fail in
> order to fallback to the next available clocksource.
>
> It is also worth noting that with clocksource=tsc there isn't any
> need to synchronise with another clocksource, and I could verify that
> great portion the time skew was eliminated and seeing much less time
> warps happening. With HPET I observed ~500 warps in the period
> of 1h of around 27 us, and with TSC down to 50 warps in the same
> period having each warp < 100 ns. The warps still exist though but
> are only related to cross CPU calibration (being how much it takes to
> rendezvous with master), in which a later patch in this series aims to
> solve.
>
> Signed-off-by: Joao Martins 
> ---
> Cc: Keir Fraser 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 

 Some style corrections, but no functional problems.

 Reviewed-by Andrew Cooper 

>>> I found out one issue in the tsc clocksource initialization path with 
>>> respect to
>>> the reliability check. This check is running when initializing platform 
>>> timer,
>>> but not all CPUS are up at that point (init_xen_time()) which means that the
>>> check will always succeed. So for clocksource=tsc I need to defer 
>>> initialization
>>> to a later point (on verify_tsc_reliability()) and if successful switch to 
>>> that
>>> clocksource. Unless you disagree, v2 would have this and just requires one
>>> additional preparatory change prior to this patch.
>> 
>> Hmm, that's suspicious when thinking about CPU hotplug: What
>> do you intend to do when a CPU comes online later, failing the
>> check?
> Good point, but I am not sure whether that would happen. The initcall
> verify_tsc_reliability seems to be called only for the boot processor. Or are
> you saying that it's case that initcalls are called too when hotplugging cpus
> later on? If that's the case then my suggestion wouldn't work as you point 
> out -
> or rather without having runtime switching support as you point out below.

Looks like I didn't express myself clearly enough: "failing the check"
wasn't meant to imply the failure would actually occur, but rather
that failure would occur if the check was run on that CPU. IOW the
case of a CPU getting hotplugged which doesn't satisfy the needs
for selecting TSC as the clock source.

>> So far we don't do runtime switching of the clock source,
>> and I'm not sure that's a good idea to do when there are running
>> guests.
> Totally agree, but I would be proposing to be at initialization phase where
> there wouldn't be guests running. We would start with HPET, then only switch 
> to
> TSC if that check has passed (and would happen once).
> 
> Simpler alternative would be to call init_xen_time after all CPUs are 
> brought up
> (and would also keep this patch as is), but I am not sure about the
> repercussions of that.

I don't see how either would help with the hotplug case.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 09/22] arm/p2m: Add helper functions to map memory regions

2016-03-22 Thread Julien Grall


Hi Shannon,

On 22/03/16 13:05, Shannon Zhao wrote:

On 2016年03月21日 23:52, Julien Grall wrote:

Title: to map/unmap

On 17/03/2016 09:40, Shannon Zhao wrote:

diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 433952a..17be6ad 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -144,6 +144,16 @@ int p2m_cache_flush(struct domain *d, xen_pfn_t
start_mfn, xen_pfn_t end_mfn);
   /* Setup p2m RAM mapping for domain d from start-end. */
   int p2m_populate_ram(struct domain *d, paddr_t start, paddr_t end);

+int map_regions_rw(struct domain *d,
+unsigned long start_gfn,
+unsigned long nr_mfns,
+unsigned long mfn);


 From the commit message, this function will map the region read-write
and cacheable.

But it's not clear from the name. Please either rename the function or
document it in the code.

So map_regions_rw_cache?


It sounds good.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 16:39,  wrote:
> On Mon, Mar 21, 2016 at 06:49:03AM -0600, Jan Beulich wrote:
>> >>> On 18.03.16 at 20:59,  wrote:
>> > I know I copied and pasted it and I must have done something uncanny.
>> > 
>> > Anyhow this is what the change looks like now (I've retained the Reviewed
>> > and Ack as I think this change is mostly cosmetical in nature?)
>> 
>> I think that's okay.
>> 
>> > v5: Add Acks, make BUILD_BUG_ON checks look correct. Position the
>> > BUGFRAME_NR properly.
>> 
>> Almost, that is.
>> 
>> > --- a/xen/include/asm-x86/bug.h
>> > +++ b/xen/include/asm-x86/bug.h
>> > @@ -10,6 +10,7 @@
>> >  #define BUGFRAME_bug2
>> >  #define BUGFRAME_assert 3
>> >  
>> > +#define BUGFRAME_NR 4
>> >  #ifndef __ASSEMBLY__
>> 
>> The insertion wants to go _before_ the blank line. (And in the
>> ARM case you then may consider removing the preceding blank
>> line too; in any event the ARM and x86 ones should look similar
>> in the end.)
>> 
> 
> Here it is. Last call :-)

Thanks, looks fine now.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.

2016-03-22 Thread Konrad Rzeszutek Wilk

On Mon, Mar 21, 2016 at 06:45:28AM -0600, Jan Beulich wrote:
> >>> On 18.03.16 at 20:22,  wrote:
> >> > + * return the number of bytes requested for the operation. Or an
> >> > + * negative value if an error is encountered.
> >> > + */
> >> > +
> >> > +typedef uint64_t xen_version_op_val_t;
> >> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
> >> > +
> >> > +typedef void xen_version_op_buf_t;
> >> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);
> >> 
> >> Are these actually useful for anything? And for the various strings,
> > 
> > The xen_version_op_val_t is definitly used by the toolstack.
> > 
> >> wouldn't a "char" handle be more natural?
> > 
> > Heh. It was char[] before but Andrew liked it as void.
> 
> But that was because you used it for non string types too,
> wasn't it?

Yes. For the build-id which is a binary blob. And as you noticed  - also
the domain handle which can be anything.

> 
> > @@ -380,6 +388,133 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
> > arg)
> >  return -ENOSYS;
> >  }
> >  
> > +static const char *capabilities_info(unsigned int *len)
> > +{
> > +static xen_capabilities_info_t __read_mostly cached_cap;
> > +static unsigned int __read_mostly cached_cap_len;
> > +static bool_t cached;
> > +
> > +if ( unlikely(!cached) )
> > +{
> > +arch_get_xen_caps(_cap);
> > +cached_cap_len = strlen(cached_cap) + 1;
> > +cached = 1;
> > +}
> 
> I'm sorry for noticing this only now, but without any locking this is
> unsafe: x86's arch_get_xen_caps() using safe_strcat() to fill the
> buffer, simultaneous invocations would possibly produce garbled
> output to all (i.e. also subsequently started) guests. Either use a
> real lock here, or make the guard a tristate one, which gets
> transitioned e.g. from 0 to -1 by the first one coming here (doing
> the initialization), with everyone else waiting for it to become +1
> (to which the initializing party sets it once it is done).

That would indeed be bad.

What if an _init_ code called this to 'pre-cache' it?

> 
> > +DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
> > +   unsigned int len)
> > +{
> > +union {
> > +xen_version_op_val_t val;
> > +xen_feature_info_t fi;
> > +} u = {};
> > +unsigned int sz = 0;
> > +const void *ptr = NULL;
> > +int rc = xsm_version_op(XSM_OTHER, cmd);
> > +
> > +/* We can safely return -EPERM! */
> > +if ( rc )
> > +return rc;
> > +
> > +/*
> > + * The HYPERVISOR_xen_version differs in that some return the value,
> > + * and some copy it on back on argument. We follow the same rule for 
> > all
> > + * sub-ops: return 0 on success, positive value of bytes returned, and
> > + * always copy the result in arg. Yeey sanity!
> > + */
> > +switch ( cmd )
> > +{
> > +case XEN_VERSION_version:
> > +sz = sizeof(xen_version_op_val_t);
> > +u.val = (xen_major_version() << 16) | xen_minor_version();
> > +break;
> > +
> > +case XEN_VERSION_extraversion:
> > +sz = strlen(xen_extra_version()) + 1;
> > +ptr = xen_extra_version();
> > +break;
> > +
> > +case XEN_VERSION_capabilities:
> > +ptr = capabilities_info();
> > +break;
> > +
> > +case XEN_VERSION_changeset:
> > +sz = strlen(xen_changeset()) + 1;
> > +ptr = xen_changeset();
> > +break;
> > +
> > +case XEN_VERSION_platform_parameters:
> > +sz = sizeof(xen_version_op_val_t);
> > +u.val = HYPERVISOR_VIRT_START;
> > +break;
> > +
> > +case XEN_VERSION_get_features:
> > +if ( copy_from_guest(, arg, 1) )
> 
> Afaict this is incompatible with the null handle check further down (i.e.
> you also need to check for a null handle here).

Oh my. Indeed.

> 
> > --- a/xen/include/public/arch-arm.h
> > +++ b/xen/include/public/arch-arm.h
> > @@ -128,6 +128,9 @@
> >   ** VCPUOP_register_vcpu_info
> >   ** VCPUOP_register_runstate_memory_area
> >   *
> > + *  HYPERVISOR_version_op
> > + *   All generic sub-operations
> > + *
> >   *
> >   * Other notes on the ARM ABI:
> 
> I don't think the extra almost blank line is warranted here.
> 
> > --- a/xen/include/public/version.h
> > +++ b/xen/include/public/version.h
> > @@ -30,7 +30,15 @@
> >  
> >  #include "xen.h"
> >  
> > -/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
> > +/*
> > + * There are two hypercalls mentioned in here. The XENVER_ are for
> > + * HYPERCALL_xen_version (17), while VERSION_ are for the
> > + * HYPERCALL_version_op (41).
> > + *
> > + * The subops are very similar except that the later hypercall has a
> > + * sane interface.
> > + */
> > +
> >  
> >  /* arg == NULL; returns major:minor (16:16). */
> 
> Nor is the extra blank one here.
> 
> > @@ -87,6 +95,66 @@ typedef struct xen_feature_info xen_feature_info_t;
> >  #define

Re: [Xen-devel] [PATCH 2/5] x86/time: implement tsc as clocksource

2016-03-22 Thread Joao Martins



On 03/22/2016 12:46 PM, Jan Beulich wrote:
 On 22.03.16 at 13:41,  wrote:
> 
>>
>> On 03/18/2016 08:21 PM, Andrew Cooper wrote:
>>> On 17/03/16 16:12, Joao Martins wrote:
 Introduce support for using TSC as platform time which is the highest
 resolution time and most performant to get (~20 nsecs).  Though there
 are also several problems associated with its usage, and there isn't a
 complete (and architecturally defined) guarantee that all machines
 will provide reliable and monotonic TSC across all CPUs, on different
 sockets and different P/C states.  I believe Intel to be the only that
 can guarantee that. For this reason it's set with less priority when
 compared to HPET unless adminstrator changes "clocksource" boot option
 to "tsc". Upon initializing it, we also check for time warps and
 appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
 NONSTOP_TSC. And in case none of these conditions are met, we fail in
 order to fallback to the next available clocksource.

 It is also worth noting that with clocksource=tsc there isn't any
 need to synchronise with another clocksource, and I could verify that
 great portion the time skew was eliminated and seeing much less time
 warps happening. With HPET I observed ~500 warps in the period
 of 1h of around 27 us, and with TSC down to 50 warps in the same
 period having each warp < 100 ns. The warps still exist though but
 are only related to cross CPU calibration (being how much it takes to
 rendezvous with master), in which a later patch in this series aims to
 solve.

 Signed-off-by: Joao Martins 
 ---
 Cc: Keir Fraser 
 Cc: Jan Beulich 
 Cc: Andrew Cooper 
>>>
>>> Some style corrections, but no functional problems.
>>>
>>> Reviewed-by Andrew Cooper 
>>>
>> I found out one issue in the tsc clocksource initialization path with 
>> respect to
>> the reliability check. This check is running when initializing platform 
>> timer,
>> but not all CPUS are up at that point (init_xen_time()) which means that the
>> check will always succeed. So for clocksource=tsc I need to defer 
>> initialization
>> to a later point (on verify_tsc_reliability()) and if successful switch to 
>> that
>> clocksource. Unless you disagree, v2 would have this and just requires one
>> additional preparatory change prior to this patch.
> 
> Hmm, that's suspicious when thinking about CPU hotplug: What
> do you intend to do when a CPU comes online later, failing the
> check?
Good point, but I am not sure whether that would happen. The initcall
verify_tsc_reliability seems to be called only for the boot processor. Or are
you saying that it's case that initcalls are called too when hotplugging cpus
later on? If that's the case then my suggestion wouldn't work as you point out -
or rather without having runtime switching support as you point out below.

> So far we don't do runtime switching of the clock source,
> and I'm not sure that's a good idea to do when there are running
> guests.
Totally agree, but I would be proposing to be at initialization phase where
there wouldn't be guests running. We would start with HPET, then only switch to
TSC if that check has passed (and would happen once).

Simpler alternative would be to call init_xen_time after all CPUs are brought up
(and would also keep this patch as is), but I am not sure about the
repercussions of that.

Joao

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Interested to participate in Outreachy Program

2016-03-22 Thread Doug Goldstein

On 3/21/16 7:12 AM, sabiya kazi wrote:
> Hi Doug,
> Can you please help on questions/problems where I am stuck?
> Deadline for applying for outreachy is coming closer.
> Regards,
> Sabiya
> 
> Hi Doug,
>  I am done with building of xen source.Now, I have started looking
> at source files and identifying changes required for given task.
> 
>As you suggested, I went through virsh command source to get idea how
> escape sequence char option is implemented. Based on that,I came up with
> following findings:
> *To complete this task, Steps will be*
> 
> 
>1. Do help related change in libxl/xl_cmdtable.c

yes

>2. Add handling of one more option for command 'xl console' in file
>xl_cmdimpl.c in method  main_console.  From this method, call is delegated
>to libxl_console_exec() in libxl.c

yes or libxl_primary_console_exec() so if the -e arg is specified you'll
need it to go into libxl_console_exec() as well.

>3. libxl_console_exec delegates to execl(), Do some changes in this
>method.

yes.

> 
> 
> *Question/Doubts:*
> 
>-  Where I need to use this escape char?

it will be passed to the xenconsole program from step 3 above.

>-  libxl_console_exec() is called by many methods Will it affect other
>flows as well?

You will have to add a parameter and default it to NULL and that will
cause the underlying call to execl() to not specify that option.

>- Could not find execl method implementation where it actually prints
>data

Not sure I understand.

>- Changes for handling escape char are not cleared

tools/console/client/main.c you'll find ESCAPE_CHARACTER hard coded.
You'll have to add an option here that execl() above in step 3 will
specify to pass the argument and utilize it.


> 
> Can you please help me on this?

Hopefully my comments above made sense. I'll be in the office the rest
of the week so I should be able to help if you run into any other
roadblocks.

> Regards,
> -Sabiya
> 
> 
> 
> On Sat, Mar 19, 2016 at 11:31 PM, sabiya kazi  wrote:
> 
>> Hi Doug,
>> I have started building xen source.
>> Can you help me on issue of creating guest domain?
>>
>>
>> Regards,
>> -Sabiya
>>
>> On Fri, Mar 18, 2016 at 1:18 AM, sabiya kazi  wrote:
>>
>>> Hi Doug,
>>>
>>> I have proceeded on xen installation. I have configured grub to use xen,
>>> I could see xl command and tried it's few options.
>>>
>>> However, I am getting failure while creating a Debian PV Guest using
>>>
>>> xen-create-image --hostname=tutorial-pv-guest \
>>>   --memory=512mb \
>>>   --vcpus=2 \
>>>   --lvm=vg0 \
>>>   --dhcp \
>>>   --pygrub \
>>>   --dist=wheezy
>>>
>>> I have attached logs for reference.
>>>  Before this step, I could not setup Linux Bridge Network for Guest
>>> networking.
>>> I am using my router for internet connection.Can this will be problem?
>>> Can you please have a look and let me know.
>>>Now, One more question to make changes for "Xl command" I need to
>>> check it's source, I have checked out source from xen repository.
>>> Can you share path of xl and virsh command source, I can start looking at
>>> source..
>>>
>>> Rega
>>>
>>
> 


-- 
Doug Goldstein



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Paul Durrant

> -Original Message-
> From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
> Sent: 22 March 2016 15:09
> To: Paul Durrant
> Cc: David Vrabel; Konrad Rzeszutek Wilk; Bob Liu; jgr...@suse.com; xen-
> de...@lists.xen.org; annie...@oracle.com; Roger Pau Monne
> Subject: RE: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node
> 
> Paul Durrant writes ("RE: [Xen-devel] [RFC PATCH] blkif.h: document
> scsi/0x12/0x83 node"):
> > AFAIK XenServer still very much makes use of it.
> 
> Can you answer, for XenServer's use case, some of the questions that
> David and I have asked ?
> 

It's getting hard to parse the thread at this point but, as I've mentioned in a 
previous response in the thread, Windows basically assumes disks are SCSI and 
it's up to the controller driver to make it look that way.
To that end the XENVBD Windows PV driver synthesizes VPD pages 80 and 83 but 
also have the ability pull base64 encoded VPD data from xenstore. The synthesis 
is required to make Windows work properly but the reason for overriding it with 
data from xenstore is not apparent. In XenServer the storage backend code does 
populate the VPD information (with UUID information for the storage volume 
created by XAPI) but I don't believe that this is necessary behaviour in the 
normal case. My *assumption* is that, at some point in the past, XenServer had 
OEM specific storage backends and the requirement to run OEM s/w in guest which 
relied on VPD supplied by the storage, and as is commonly the case xenstore was 
the easiest way to get this information from the backend and into the guest.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.

2016-03-22 Thread Konrad Rzeszutek Wilk

On Mon, Mar 21, 2016 at 06:49:03AM -0600, Jan Beulich wrote:
> >>> On 18.03.16 at 20:59,  wrote:
> > I know I copied and pasted it and I must have done something uncanny.
> > 
> > Anyhow this is what the change looks like now (I've retained the Reviewed
> > and Ack as I think this change is mostly cosmetical in nature?)
> 
> I think that's okay.
> 
> > v5: Add Acks, make BUILD_BUG_ON checks look correct. Position the
> > BUGFRAME_NR properly.
> 
> Almost, that is.
> 
> > --- a/xen/include/asm-x86/bug.h
> > +++ b/xen/include/asm-x86/bug.h
> > @@ -10,6 +10,7 @@
> >  #define BUGFRAME_bug2
> >  #define BUGFRAME_assert 3
> >  
> > +#define BUGFRAME_NR 4
> >  #ifndef __ASSEMBLY__
> 
> The insertion wants to go _before_ the blank line. (And in the
> ARM case you then may consider removing the preceding blank
> line too; in any event the ARM and x86 ones should look similar
> in the end.)
> 

Here it is. Last call :-)

From f97548200461b9eb4d8187eb9e1f021c74160759 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Thu, 10 Mar 2016 16:45:31 -0500
Subject: [PATCH] x86/arm: Add BUGFRAME_NR define and BUILD checks.

So that we have a nice mechansim to figure out the upper
bounds of bug.frames and also catch compiler errors in case
one tries to use a higher frame number.

Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Andrew Cooper 
Acked-by: Julien Grall 

---
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc: Andrew Cooper 

v3: First time included.
v4: Add BUG_FRAME check also in the assembler version of the macro.
v5: Add Acks, make BUILD_BUG_ON checks look correct. Position the
BUGFRAME_NR properly. Reposition the BUGFRAME_NR again.
---
---
 xen/include/asm-arm/bug.h | 3 +++
 xen/include/asm-x86/bug.h | 8 
 2 files changed, 11 insertions(+)

diff --git a/xen/include/asm-arm/bug.h b/xen/include/asm-arm/bug.h
index ab9e811..68353e1 100644
--- a/xen/include/asm-arm/bug.h
+++ b/xen/include/asm-arm/bug.h
@@ -32,6 +32,8 @@ struct bug_frame {
 #define BUGFRAME_bug1
 #define BUGFRAME_assert 2
 
+#define BUGFRAME_NR 3
+
 /* Many versions of GCC doesn't support the asm %c parameter which would
  * be preferable to this unpleasantness. We use mergeable string
  * sections to avoid multiple copies of the string appearing in the
@@ -39,6 +41,7 @@ struct bug_frame {
  */
 #define BUG_FRAME(type, line, file, has_msg, msg) do {  \
 BUILD_BUG_ON((line) >> 16); \
+BUILD_BUG_ON((type) >= BUGFRAME_NR);\
 asm ("1:"BUG_INSTR"\n"  \
  ".pushsection .rodata.str, \"aMS\", %progbits, 1\n"\
  "2:\t.asciz " __stringify(file) "\n"   \
diff --git a/xen/include/asm-x86/bug.h b/xen/include/asm-x86/bug.h
index e868e85..c5d2d4c 100644
--- a/xen/include/asm-x86/bug.h
+++ b/xen/include/asm-x86/bug.h
@@ -10,6 +10,8 @@
 #define BUGFRAME_bug2
 #define BUGFRAME_assert 3
 
+#define BUGFRAME_NR 4
+
 #ifndef __ASSEMBLY__
 
 struct bug_frame {
@@ -51,6 +53,7 @@ struct bug_frame {
 
 #define BUG_FRAME(type, line, ptr, second_frame, msg) do {   \
 BUILD_BUG_ON((line) >> (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH)); \
+BUILD_BUG_ON((type) >= BUGFRAME_NR); \
 asm volatile ( _ASM_BUGFRAME_TEXT(second_frame)  \
:: _ASM_BUGFRAME_INFO(type, line, ptr, msg) );\
 } while (0)
@@ -83,6 +86,11 @@ extern const struct bug_frame __start_bug_frames[],
  * in .rodata
  */
 .macro BUG_FRAME type, line, file_str, second_frame, msg
+
+.if \type >= BUGFRAME_NR
+.error "Invalid BUGFRAME index"
+.endif
+
 .L\@ud: ud2a
 
 .pushsection .rodata.str1, "aMS", @progbits, 1
-- 
2.5.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-mingo-tip-master test] 86881: regressions - FAIL

2016-03-22 Thread osstest service owner

flight 86881 linux-mingo-tip-master real [real]
http://logs.test-lab.xenproject.org/osstest/logs/86881/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i386-rumpuserxen6 xen-build fail REGR. vs. 60684
 test-amd64-amd64-libvirt 15 guest-saverestore.2   fail REGR. vs. 60684
 build-amd64-rumpuserxen   6 xen-build fail REGR. vs. 60684
 test-amd64-amd64-xl-xsm  15 guest-localmigratefail REGR. vs. 60684
 test-amd64-amd64-xl  14 guest-saverestore fail REGR. vs. 60684
 test-amd64-amd64-libvirt-xsm 15 guest-saverestore.2   fail REGR. vs. 60684
 test-amd64-amd64-xl-multivcpu 17 guest-localmigrate/x10   fail REGR. vs. 60684
 test-amd64-amd64-xl-credit2  17 guest-localmigrate/x10fail REGR. vs. 60684
 test-amd64-amd64-pair  22 guest-migrate/dst_host/src_host fail REGR. vs. 60684

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 15 guest-localmigratefail REGR. vs. 60684
 test-amd64-i386-libvirt-xsm  15 guest-saverestore.2  fail blocked in 60684
 test-amd64-i386-xl-xsm   15 guest-localmigrate   fail blocked in 60684
 test-amd64-i386-xl   15 guest-localmigrate   fail blocked in 60684
 test-amd64-i386-libvirt  15 guest-saverestore.2  fail blocked in 60684
 test-amd64-i386-libvirt-pair 22 guest-migrate/dst_host/src_host fail blocked 
in 60684
 test-amd64-i386-pair  22 guest-migrate/dst_host/src_host fail blocked in 60684
 test-amd64-amd64-libvirt-pair 22 guest-migrate/dst_host/src_host fail blocked 
in 60684
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop   fail blocked in 60684
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 60684
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 60684
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 60684

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-pvh-intel 14 guest-saverestorefail  never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-intel 13 xen-boot/l1 fail never pass
 test-amd64-amd64-qemuu-nested-amd 13 xen-boot/l1   fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass

version targeted for testing:
 linux1085298b22bb09f6497d7a003f0377f6d12ca583
baseline version:
 linux69f75ebe3b1d1e636c4ce0a0ee248edacc69cbe0

Last test of basis60684  2015-08-13 04:21:46 Z  222 days
Failing since 60712  2015-08-15 18:33:48 Z  219 days  165 attempts
Testing same since86881  2016-03-22 04:22:06 Z0 days1 attempts

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  fail
 test-amd64-i386-xl   fail
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm pass
 test-amd64-amd64-libvirt-xsm fail
 test-amd64-i386-libvirt-xsm

Re: [Xen-devel] [PATCH v2 0/3] libxl: add support for qemu base pvusb backend

2016-03-22 Thread Juergen Gross

On 22/03/16 08:29, Juergen Gross wrote:
> This patch series is meant to be applied on top of Chunyan's series
> to support pvusb in libxl.
> 
> It is adding support for an alternative pvusb backend "qusb" via qemu.
> 
> Changes in V2:
> - patch 1: Return false if libxl__get_domid() fails as requested by
>   George Dunlap
> - Swapped patches 2 and 3 as former patch 2 has been questioned to make
>   sense for 4.7. This will remove an obstacle for former patch 3 to go in.
> 
> Juergen Gross (3):
>   libxl: make libxl__need_xenpv_qemu() operate on domain config
>   libxl: add new pvusb backend "qusb" provided by qemu
>   libxl: add domain config parameter to force start of qemu

I had a try with detecting a missing device model when adding a device
to an already running domain. Instead of silently failing it would now
print an appropriate error message. This could be changed later to
start the device model in that situation.

Would you like me to send another version of the complete series (the
v2 patches are not changed), or the new patches only?

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 13/28] xen/x86: Improvements to in-hypervisor cpuid sanity checks

2016-03-22 Thread Andrew Cooper

On 21/03/16 16:11, Jan Beulich wrote:
>
>> +/* OSXSAVE cleared by pv_featureset.  Fast-forward CR4 back in. */
>> +if ( (is_pv_domain(currd) && guest_kernel_mode(curr, regs) &&
>> +  (this_cpu(cr4) & X86_CR4_OSXSAVE)) ||
>> + (curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) )
>> +c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
>> +
>> +c |= cpufeat_mask(X86_FEATURE_HYPERVISOR);
>>  break;
> Is this correct for PVH (which calls here out of vmx.c)?

Probably not. Not that PVH functioned correctly before either.

>  At least the ->arch.pv_vcpu use unlikely is.

I will guard the entire clause, leaving a note about PVH.  This will
take the behaviour back to how it was before.

>
>> +/* OSPKE cleared by pv_featureset.  Fast-forward CR4 back in. */
>> +if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_PKE )
>> +c |= cpufeat_mask(X86_FEATURE_OSPKE);
> That's kind of pointless for PV

Not at all.  Fixing Xen's (ab)use of the PTE bits blocking PKE is easy,
and there is already a thread on xen-devel about doing so.

> , and similarly to the above one likely
> wrong for PVH.

I will guard it similarly.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Konrad Rzeszutek Wilk

On Tue, Mar 22, 2016 at 03:12:02PM +, Ian Jackson wrote:
> David Vrabel writes ("Re: [Xen-devel] [RFC PATCH] blkif.h: document 
> scsi/0x12/0x83 node"):
> > On 22/03/16 14:10, Konrad Rzeszutek Wilk wrote:
> > > Just think of it as a black box.
> > 
> > This isn't sufficient.
> > 
> > You are presenting a solution but have not properly described the
> > problem so no one can evaluate whether the solution is appropriate.
> 
> I'm sorry to be awkward, but I'm afraid I do agree with David Vrabel's
> messages in this thread.
> 
> I don't think this is a sterile academic conversation which would (if
> satisfactorily answered) have no real implications.  Rather, if we
> understood the use case properly, there would probably be implications
> for whether how this functionality ought to be exposed via the libxl
> API; whether blkfront should consume this information by default; and
> so on.

Lets take one step back.

What is the problem with documenting an existing use-case in blkif.h?

Regardless of whether it is viewed "legacy" by some folks and by others
drivers it is still in use?

> 
> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/domctl: don't waste domain CPUID slot for all zero data

2016-03-22 Thread Andrew Cooper

On 22/03/16 12:56, Jan Beulich wrote:

slot => slots in the subject.

> domain_cpuid() returns all zeroes anyway when not finding a match, so
> there's no need to explicitly store such a set of values.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Ian Jackson

David Vrabel writes ("Re: [Xen-devel] [RFC PATCH] blkif.h: document 
scsi/0x12/0x83 node"):
> On 22/03/16 14:10, Konrad Rzeszutek Wilk wrote:
> > Just think of it as a black box.
> 
> This isn't sufficient.
> 
> You are presenting a solution but have not properly described the
> problem so no one can evaluate whether the solution is appropriate.

I'm sorry to be awkward, but I'm afraid I do agree with David Vrabel's
messages in this thread.

I don't think this is a sterile academic conversation which would (if
satisfactorily answered) have no real implications.  Rather, if we
understood the use case properly, there would probably be implications
for whether how this functionality ought to be exposed via the libxl
API; whether blkfront should consume this information by default; and
so on.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Ian Jackson

Paul Durrant writes ("RE: [Xen-devel] [RFC PATCH] blkif.h: document 
scsi/0x12/0x83 node"):
> AFAIK XenServer still very much makes use of it.

Can you answer, for XenServer's use case, some of the questions that
David and I have asked ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 09/28] xen/x86: Calculate maximum host and guest featuresets

2016-03-22 Thread Andrew Cooper

On 22/03/16 14:52, Jan Beulich wrote:
 On 22.03.16 at 15:37,  wrote:
>> On 22/03/16 12:39, Jan Beulich wrote:
>> On 22.03.16 at 12:23,  wrote:
 On 18/03/16 17:09, Jan Beulich wrote:
 On 15.03.16 at 16:35,  wrote:
>> +static void __init calculate_hvm_featureset(void)
>> +{
>> +unsigned int i;
>> +const uint32_t *hvm_featuremask;
>> +
>> +if ( !hvm_enabled )
>> +return;
>> +
>> +hvm_featuremask = hvm_funcs.hap_supported ?
>> +hvm_hap_featuremask : hvm_shadow_featuremask;
> I know I asked about this before, and it still puzzles me. Could you
> add some explanation of this to the commit message or a comment?
 I am not sure what more I can say about it.

 The toolstack needs the be able to see the difference between a guest
 started in shadow mode on hap hardware, to be able to correctly
 calculate whether it can migrate to hap-incapable hardware. 
>>> Difference to what? A HAP guest? How would that difference be
>>> invisible if you surfaced two feature sets? Even more - with just
>>> one feature set exposed, how would the tool stack see that very
>>> difference?
>> At the moment, a toolstack creates a domain, and has no clue what the
>> domain can actually see in cpuid.  In particular, it can't retrieve the
>> "lost bits" which Xen dynamically disables.
> One more reason to expose the shadow and HAP feature sets
> separately, it would seem to me. Then the tool stack can have
> a clue.

When "get_cpuid" works properly, it will no longer be needed, which is
specifically why I am not putting it in the API.

The toolstack will be able to construct an arbitrary domain, call
get_cpuid to see exactly what the guest will see, and use that as the
basis of migrateability decisions.

It is just awkward that this information is needed at the moment when a
suitable interface from Xen isn't available.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 09/28] xen/x86: Calculate maximum host and guest featuresets

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 15:37,  wrote:
> On 22/03/16 12:39, Jan Beulich wrote:
> On 22.03.16 at 12:23,  wrote:
>>> On 18/03/16 17:09, Jan Beulich wrote:
>>> On 15.03.16 at 16:35,  wrote:
> +static void __init calculate_hvm_featureset(void)
> +{
> +unsigned int i;
> +const uint32_t *hvm_featuremask;
> +
> +if ( !hvm_enabled )
> +return;
> +
> +hvm_featuremask = hvm_funcs.hap_supported ?
> +hvm_hap_featuremask : hvm_shadow_featuremask;
 I know I asked about this before, and it still puzzles me. Could you
 add some explanation of this to the commit message or a comment?
>>> I am not sure what more I can say about it.
>>>
>>> The toolstack needs the be able to see the difference between a guest
>>> started in shadow mode on hap hardware, to be able to correctly
>>> calculate whether it can migrate to hap-incapable hardware. 
>> Difference to what? A HAP guest? How would that difference be
>> invisible if you surfaced two feature sets? Even more - with just
>> one feature set exposed, how would the tool stack see that very
>> difference?
> 
> At the moment, a toolstack creates a domain, and has no clue what the
> domain can actually see in cpuid.  In particular, it can't retrieve the
> "lost bits" which Xen dynamically disables.

One more reason to expose the shadow and HAP feature sets
separately, it would seem to me. Then the tool stack can have
a clue.

Jan

> As a result, creating a shadow domain on a hap-capable host results in
> misinformation about which features the guest was advertised at boot. 
> This in turn leads to an erronious decision that the domain can't be
> migrated to a hap-incapable host.
> 
>>
>>> There is no "get_cpuid_policy" hypercall, so a toolstack cannot query
>>> what a guest can actually see, after Xen's dynamic checks have taken place.
>>>
>>> Implementing get_cpuid_policy depends on Xen having a full model of
>>> cpuid state, which is too much work to be rolled together into this series.
>>>
>>> Like all of the dynamic checks later, it is only an intermediate step,
>>> and I do have plans to remove them longterm when Xen has a better model
>>> of cpuid.
>> I understand this is subject to further changes down the road. But
>> we all know that getting there may take time, so getting things
>> right for the time until then is quite necessary (the more that we're
>> going to release 4.7 in such intermediate state aiui).
> 
> This is all an internal implementation detail.  It (very deliberately)
> doesn't manifest in the public API.
> 
> It exists only internally to Xen, and in the libxc wrapper exposing the
> autogenerated information.
> 
> ~Andrew




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 09/28] xen/x86: Calculate maximum host and guest featuresets

2016-03-22 Thread Andrew Cooper

On 22/03/16 12:39, Jan Beulich wrote:
 On 22.03.16 at 12:23,  wrote:
>> On 18/03/16 17:09, Jan Beulich wrote:
>> On 15.03.16 at 16:35,  wrote:
 +static void __init calculate_hvm_featureset(void)
 +{
 +unsigned int i;
 +const uint32_t *hvm_featuremask;
 +
 +if ( !hvm_enabled )
 +return;
 +
 +hvm_featuremask = hvm_funcs.hap_supported ?
 +hvm_hap_featuremask : hvm_shadow_featuremask;
>>> I know I asked about this before, and it still puzzles me. Could you
>>> add some explanation of this to the commit message or a comment?
>> I am not sure what more I can say about it.
>>
>> The toolstack needs the be able to see the difference between a guest
>> started in shadow mode on hap hardware, to be able to correctly
>> calculate whether it can migrate to hap-incapable hardware. 
> Difference to what? A HAP guest? How would that difference be
> invisible if you surfaced two feature sets? Even more - with just
> one feature set exposed, how would the tool stack see that very
> difference?

At the moment, a toolstack creates a domain, and has no clue what the
domain can actually see in cpuid.  In particular, it can't retrieve the
"lost bits" which Xen dynamically disables.

As a result, creating a shadow domain on a hap-capable host results in
misinformation about which features the guest was advertised at boot. 
This in turn leads to an erronious decision that the domain can't be
migrated to a hap-incapable host.

>
>> There is no "get_cpuid_policy" hypercall, so a toolstack cannot query
>> what a guest can actually see, after Xen's dynamic checks have taken place.
>>
>> Implementing get_cpuid_policy depends on Xen having a full model of
>> cpuid state, which is too much work to be rolled together into this series.
>>
>> Like all of the dynamic checks later, it is only an intermediate step,
>> and I do have plans to remove them longterm when Xen has a better model
>> of cpuid.
> I understand this is subject to further changes down the road. But
> we all know that getting there may take time, so getting things
> right for the time until then is quite necessary (the more that we're
> going to release 4.7 in such intermediate state aiui).

This is all an internal implementation detail.  It (very deliberately)
doesn't manifest in the public API.

It exists only internally to Xen, and in the libxc wrapper exposing the
autogenerated information.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxc/x86: XSAVE related adjustments

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 14:48,  wrote:
>> @@ -300,9 +304,9 @@ static void xc_cpuid_config_xsave(xc_int
>>  {
>>  case 0: 
>>  /* EAX: low 32bits of xfeature_enabled_mask */
>> -regs[0] = info->xfeature_mask & 0x;
>> +regs[0] &= info->xfeature_mask;
>>  /* EDX: high 32bits of xfeature_enabled_mask */
>> -regs[3] = (info->xfeature_mask >> 32) & 0x;
>> +regs[3] &= info->xfeature_mask >> 32;
>>  /* ECX: max size required by all HW features */
>>  {
>>  unsigned int _input[2] = {0xd, 0x0}, _regs[4];
> 
> This is an improvement on the code currently present, but is still
> superseded by the final patch of my cpuid series.

Is it? I did check your tree before sending, and you do only
mechanical adjustments. In particular you don't switch to
&= and you don't drop the pointless and-ing with 0x.

>> @@ -325,16 +329,20 @@ static void xc_cpuid_config_xsave(xc_int
> 
> Between these two hunks, there is a loop bound which is also wrong.

But seeing that your patches fix it I didn't bother stealing the fix
from your patches.

>>  if ( !info->hvm )
>>  regs[0] &= ~XSAVES;
>>  regs[2] &= info->xfeature_mask;
>> -regs[3] = 0;
>> +regs[3] &= info->xfeature_mask >> 32;
>>  break;
>> -case 2 ... 63: /* sub-leaves */
>> +case 2 ... 62: /* per-component sub-leaves */
>>  if ( !(info->xfeature_mask & (1ULL << input[1])) )
> 
> Now I think about it, this check is incomplete.  xfeature_mask doesn't
> contain xss values.

For now the XSS bitmask is blank. Looking at everything together I
do think though that once it becomes non-zero, info->xfeature_mask
will need to become the OR of both masks.

>>  {
>>  regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>  break;
>>  }
>>  /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
>> -regs[2] = regs[3] = 0;
>> +regs[2] &= XSTATE_XSS | XSTATE_ALIGN64;
>> +regs[3] = 0;
>> +break;
>> +default:
>> +regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>  break;
> 
> If you wish, I can fold this patch into the final patch of my cpuid series.

I'd be fine with that, albeit (as said in the submission) the changes
are independent of one another despite them causing conflicts.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Paul Durrant

> -Original Message-
> From: David Vrabel [mailto:david.vra...@citrix.com]
> Sent: 22 March 2016 14:38
> To: Konrad Rzeszutek Wilk
> Cc: Bob Liu; Ian Jackson; jgr...@suse.com; xen-devel@lists.xen.org;
> annie...@oracle.com; Paul Durrant; Roger Pau Monne
> Subject: Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node
> 
> On 22/03/16 14:10, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 22, 2016 at 01:41:43PM +, David Vrabel wrote:
> >> On 22/03/16 12:55, Bob Liu wrote:
> >>>
> >>> On 03/17/2016 07:12 PM, Ian Jackson wrote:
>  David Vrabel writes ("Re: [Xen-devel] [RFC PATCH] blkif.h: document
> scsi/0x12/0x83 node"):
> > On 16/03/16 13:59, Bob Liu wrote:
> >> But we'd like to get the VPD information(of underlying storage
> device) also in Linux blkfront, even blkfront is not a SCSI device.
> >
> > Why does blkback/blkfront need to involved here?  This is just some
> > xenstore keys that can be written by the toolstack and directly read
> by
> > the relevant application in the guest.
> 
> >>>
> >>> They want a more generic way because the application may run on all
> kinds of environment including baremetal.
> >>> So they prefers to just call ioctl(SG_IO) against a storage device.
> >>>
>  I'm getting rather a different picture here than at first.  Previously
>  I thought you had some 3rd-party application, not under your control,
>  which expected to see this VPD data.
> 
>  But now I think that you're saying the application is under your own
>  control.  I don't understand why synthetic VPD data is the best way to
>  give your application the information it needs.
> 
>  What is the application doing with this VPD data ?  I mean,
>  which specific application functions, and how do they depend on the
>  VPD data ?
> 
> >>>
> >>> From the feedbacks I just got, they do *not* want the details to be in
> public.
> >>
> >> It is difficult to suggest how it should be done correctly without this
> >> information.
> >
> > Just think of it as a black box.
> 
> This isn't sufficient.
> 
> You are presenting a solution but have not properly described the
> problem so no one can evaluate whether the solution is appropriate.
> 
> >> I also find it difficult to see a use case where running the storage
> >> software in the guest (instead of in the backend) is sensible or desirable.
> >
> > Are you suggesting that doing backend drivers is not sensible?
> 
> I do not understand your question.
> 
> >>> Anyway, I think this is not a block of this patch.
> >>> In Windows PV block driver, we already use the same way to get the raw
> INQUIRY data.
> >>>  * The Windows PV block driver accepts ioctl(SG_IO).
> >>>  * Then it reads this /scsi/0x12/0x83 node.
> >>>  * Then return the raw INQURIY data back to ioctl.
> >>>
> >>> Since Linux guest also wants to do the same thing, let's making this
> mechanism to be a generic interface!
> >>> I'll post a patch adding ioctl(SG_IO) support to xen-blkfront together
> with a updated version of this patch soon.
> >>
> >> I do not think this feature is generally useful outside of this
> >> unspecified use case.  I do not think that supplying details about
> >> underlying storage device (beyond generic properties) to guests is
> >> sensible (e.g., what if the guest snapshot is restored on different
> >> storage?).
> >
> > The restore process (xl) can update the XenStore key with the new
> storage.
> 
> And how is this going to be communicated to the application using the data?
> 
> >> And thus I do not not think we should either: a) make this part of the
> >> blkif ABI; or b) add support to xen-blkfront or xen-blkback.
> >
> > It is already coded in Windows PV drivers so I am not following why
> > codyfing this in the blkif.h is harmful?
> 
> My understanding was that this was a legacy hack that should be removed
> (we do not currently make use of it).
>

AFAIK XenServer still very much makes use of it.

  Paul
 
> David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread David Vrabel

On 22/03/16 14:10, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 22, 2016 at 01:41:43PM +, David Vrabel wrote:
>> On 22/03/16 12:55, Bob Liu wrote:
>>>
>>> On 03/17/2016 07:12 PM, Ian Jackson wrote:
 David Vrabel writes ("Re: [Xen-devel] [RFC PATCH] blkif.h: document 
 scsi/0x12/0x83 node"):
> On 16/03/16 13:59, Bob Liu wrote:
>> But we'd like to get the VPD information(of underlying storage device) 
>> also in Linux blkfront, even blkfront is not a SCSI device.
>
> Why does blkback/blkfront need to involved here?  This is just some
> xenstore keys that can be written by the toolstack and directly read by
> the relevant application in the guest.

>>>
>>> They want a more generic way because the application may run on all kinds 
>>> of environment including baremetal.
>>> So they prefers to just call ioctl(SG_IO) against a storage device.
>>>
 I'm getting rather a different picture here than at first.  Previously
 I thought you had some 3rd-party application, not under your control,
 which expected to see this VPD data.

 But now I think that you're saying the application is under your own
 control.  I don't understand why synthetic VPD data is the best way to
 give your application the information it needs.

 What is the application doing with this VPD data ?  I mean,
 which specific application functions, and how do they depend on the
 VPD data ?

>>>
>>> From the feedbacks I just got, they do *not* want the details to be in 
>>> public.
>>
>> It is difficult to suggest how it should be done correctly without this
>> information.
> 
> Just think of it as a black box.

This isn't sufficient.

You are presenting a solution but have not properly described the
problem so no one can evaluate whether the solution is appropriate.

>> I also find it difficult to see a use case where running the storage
>> software in the guest (instead of in the backend) is sensible or desirable.
> 
> Are you suggesting that doing backend drivers is not sensible?

I do not understand your question.

>>> Anyway, I think this is not a block of this patch.
>>> In Windows PV block driver, we already use the same way to get the raw 
>>> INQUIRY data.
>>>  * The Windows PV block driver accepts ioctl(SG_IO).
>>>  * Then it reads this /scsi/0x12/0x83 node.
>>>  * Then return the raw INQURIY data back to ioctl.
>>>
>>> Since Linux guest also wants to do the same thing, let's making this 
>>> mechanism to be a generic interface!
>>> I'll post a patch adding ioctl(SG_IO) support to xen-blkfront together with 
>>> a updated version of this patch soon.
>>
>> I do not think this feature is generally useful outside of this
>> unspecified use case.  I do not think that supplying details about
>> underlying storage device (beyond generic properties) to guests is
>> sensible (e.g., what if the guest snapshot is restored on different
>> storage?).
> 
> The restore process (xl) can update the XenStore key with the new storage.

And how is this going to be communicated to the application using the data?

>> And thus I do not not think we should either: a) make this part of the
>> blkif ABI; or b) add support to xen-blkfront or xen-blkback.
> 
> It is already coded in Windows PV drivers so I am not following why
> codyfing this in the blkif.h is harmful?

My understanding was that this was a legacy hack that should be removed
(we do not currently make use of it).

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 01/28] xen/x86: Drop unused and non-useful feature definitions

2016-03-22 Thread Jan Beulich

>>> On 22.03.16 at 15:06,  wrote:
> On 3/15/16 10:34 AM, Andrew Cooper wrote:
>> None of these features are interesting for Xen to use, or to be advertised 
> to
>> guests.  Doing so identifies further areas of code which can be removed now
>> that 32bit support has been dropped.
>> 
>> IA64 has a sole user in microcode_intel.c.  While it is plausible for a 
> 32bit
>> x86 hypervisor to get there via IA64's x86 emulation, a 64bit x86 hypervisor
>> most certainly won't.
>> 
>> MP proves to be more complicated.  It is only advertised on some K7
>> processors, not on K8 or newer, and now listed as reserved in the AMD 
> manual.
>> Cleaning this up reveals two chunks of common SMP code which was only
>> applicable to K7 processors, which are 32bit only.
>> 
>> While cleaning this area up, remove the incosistent use of newlines in the
> 
> Konrad already pointed this out.
> 
>> cpu_has_* definition block.
>> 
>> Signed-off-by: Andrew Cooper 
>> ---
> 
> Reviewed-by: Doug Goldstein 
> 
> This patch seems to be a general improvement and doesn't cause other
> regressions without the rest of the series so can we merged this one.

Please check the staging tree before making such requests.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [V5] x86/xsaves: fix overwriting between non-lazy/lazy xsaves

2016-03-22 Thread Jan Beulich

>>> On 18.03.16 at 04:01,  wrote:
> v5: Address comments from Jan
> 1. Add XSTATE_XSAVES_ONLY and using xsaves depend on whether this bits are
>set in xcr0_accum
> 2. Change compress logic in compress_xsave_states() depend on 
>!(v->arch.xcr0_accum & XSTATE_XSAVES_ONLY) && !xsave_area_compressed(src)).
> 3. XSTATE_COMPACTION_ENABLED only set in xrstor().
> 4. Rebase the code on
>[V4] x86/xsaves: calculate the xstate_comp_offsets base on xstate_bv
>(already sent out) For they both change same code. 
>(I am not sure whether this rebase is ok or not).

Such re-basing is okay, but the dependency on the other patch
shouldn't get hidden in the revision log. Best would really be if this
was a series (consisting of both patches).

> @@ -222,22 +222,21 @@ void compress_xsave_states(struct vcpu *v, const void 
> *src, unsigned int size)
>  u64 xstate_bv = ((const struct xsave_struct *)src)->xsave_hdr.xstate_bv;
>  u64 valid;
>  
> -if ( !cpu_has_xsaves && !cpu_has_xsavec )
> +if ( !(v->arch.xcr0_accum & XSTATE_XSAVES_ONLY) &&
> + !xsave_area_compressed(src) )
>  {
>  memcpy(xsave, src, size);
>  return;
>  }
>  
> -ASSERT(!xsave_area_compressed(src));

This is bogus: The function here gets called only after
validate_xstate() already succeeded. Hence the ASSERT()
should imo simply get moved ahead of the if().

>  /*
>   * Copy legacy XSAVE area, to avoid complications with CPUID
>   * leaves 0 and 1 in the loop below.
>   */
>  memcpy(xsave, src, FXSAVE_SIZE);
>  
> -/* Set XSTATE_BV and XCOMP_BV.  */
> +/* Set XSTATE_BV.  */
>  xsave->xsave_hdr.xstate_bv = xstate_bv;
> -xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | 
> XSTATE_COMPACTION_ENABLED;
>  setup_xstate_comp(xstate_comp_offsets, xstate_bv);

I see you set xcomp_bv (and hence the compaction bit) in xrstor()
now, but afaict that doesn't allow you to completely drop initializing
the field here, as the code there looks at the compaction bit.

> @@ -267,31 +266,35 @@ void xsave(struct vcpu *v, uint64_t mask)
>  uint32_t hmask = mask >> 32;
>  uint32_t lmask = mask;
>  unsigned int fip_width = v->domain->arch.x87_fip_width;
> -#define XSAVE(pfx) \
> -alternative_io_3(".byte " pfx "0x0f,0xae,0x27\n", /* xsave */ \
> - ".byte " pfx "0x0f,0xae,0x37\n", /* xsaveopt */ \
> - X86_FEATURE_XSAVEOPT, \
> - ".byte " pfx "0x0f,0xc7,0x27\n", /* xsavec */ \
> - X86_FEATURE_XSAVEC, \
> - ".byte " pfx "0x0f,0xc7,0x2f\n", /* xsaves */ \
> - X86_FEATURE_XSAVES, \
> - "=m" (*ptr), \
> - "a" (lmask), "d" (hmask), "D" (ptr))
> +#define XSAVE(pfx, xsave_ins) \
> +asm volatile ( ".byte " pfx xsave_ins \
> +   : "=m" (*ptr) \
> +   : "a" (lmask), "d" (hmask), "D" (ptr) )
>  
>  if ( fip_width == 8 || !(mask & XSTATE_FP) )
>  {
> -XSAVE("0x48,");
> +if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY )
> +XSAVE("0x48,", "0x0f,0xc7,0x2f"); /* xsaves */
> +else if ( cpu_has_xsaveopt )
> +XSAVE("0x48,", "0x0f,0xae,0x37"); /* xsaveopt */
> +else
> +XSAVE("0x48,", "0x0f,0xae,0x27"); /* xsave */

The latter two should still use alternative insn patching.

>  }
>  else if ( fip_width == 4 )
>  {
> -XSAVE("");
> +if ( v->arch.xcr0_accum & XSTATE_XSAVES_ONLY )
> +XSAVE("", "0x0f,0xc7,0x2f");
> +else if ( cpu_has_xsaveopt )
> +XSAVE("", "0x0f,0xae,0x37");
> +else
> +XSAVE("", "0x0f,0xae,0x27");

And this logic being repeated here (and another time below) should
have made obvious that you want to leave the code here untouched
and do all of your changes just to the XSAVE() macro definition.

> @@ -378,25 +387,42 @@ void xrstor(struct vcpu *v, uint64_t mask)
>  switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) )
>  {
>  BUILD_BUG_ON(sizeof(faults) != 4); /* Clang doesn't support %z 
> in asm. */
> -#define XRSTOR(pfx) \
> -alternative_io("1: .byte " pfx "0x0f,0xae,0x2f\n" \
> +#define XRSTOR(pfx, xrstor_ins) \
> +asm volatile ( "1: .byte " pfx xrstor_ins"\n" \
> "3:\n" \
> "   .section .fixup,\"ax\"\n" \
> "2: incl %[faults]\n" \
> "   jmp 3b\n" \
> "   .previous\n" \
> -   _ASM_EXTABLE(1b, 2b), \
> -   ".byte " pfx "0x0f,0xc7,0x1f\n", \
> -   X86_FEATURE_XSAVES, \
> -   ASM_OUTPUT2([mem] "+m" (*ptr), [faults] "+g" 
> (faults)), \
> -   [lmask] "a" (lmask), [hmask] "d" (hmask), \
> -

Re: [Xen-devel] [PATCH 2/3] Rename p2m_mmio_write_dm to p2m_ioreq_server

2016-03-22 Thread George Dunlap

On 16/03/16 12:21, Yu Zhang wrote:
> Previously p2m type p2m_mmio_write_dm was introduced for write-
> protected memory pages whose write operations are supposed to be
> forwarded to and emulated by an ioreq server. Yet limitations of
> rangeset restricts the number of guest pages to be write-protected.
> 
> This patch replace the p2m type p2m_mmio_write_dm with a new name:
> p2m_ioreq_server, which means this p2m type can be claimed by one
> ioreq server, instead of being tracked inside the rangeset of ioreq
> server. Patches following up will add the related hvmop handling
> code which maps type p2m_ioreq_server to an ioreq server.
> 
> Signed-off-by: Paul Durrant 
> Signed-off-by: Yu Zhang 
> ---

[snip]

> @@ -174,7 +174,8 @@ typedef unsigned int p2m_query_t;
>  
>  #define p2m_is_any_ram(_t)  (p2m_to_mask(_t) &   \
>   (P2M_RAM_TYPES | P2M_GRANT_TYPES |  \
> -  p2m_to_mask(p2m_map_foreign)))
> +  p2m_to_mask(p2m_map_foreign) | \
> +  p2m_to_mask(p2m_ioreq_server)))

So this is the only bit that doesn't seem to be a straight rename.
What's the rationale for adding this in this patch?

And in any case, we want to add this to P2M_RAM_TYPES, don't we, so that
p2m_is_ram() returns true?

For example, if p2m_ioreq_server is *not* marked as p2m_is_ram(), then
when it's ballooned out the m2p mapping won't be updated properly (see
xen/arch/x86/mm/p2m.c:set_typed_p2m_entry()); and it looks like there
could be issues with emulation when running in shadow mode (see
xen/arch/x86/mm/common.c:emulate_gva_to_mfn()).  Other examples of this
sort of thing abound.

Everything else looks fine to me.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] blkif.h: document scsi/0x12/0x83 node

2016-03-22 Thread Konrad Rzeszutek Wilk

On Tue, Mar 22, 2016 at 01:41:43PM +, David Vrabel wrote:
> On 22/03/16 12:55, Bob Liu wrote:
> > 
> > On 03/17/2016 07:12 PM, Ian Jackson wrote:
> >> David Vrabel writes ("Re: [Xen-devel] [RFC PATCH] blkif.h: document 
> >> scsi/0x12/0x83 node"):
> >>> On 16/03/16 13:59, Bob Liu wrote:
>  But we'd like to get the VPD information(of underlying storage device) 
>  also in Linux blkfront, even blkfront is not a SCSI device.
> >>>
> >>> Why does blkback/blkfront need to involved here?  This is just some
> >>> xenstore keys that can be written by the toolstack and directly read by
> >>> the relevant application in the guest.
> >>
> > 
> > They want a more generic way because the application may run on all kinds 
> > of environment including baremetal.
> > So they prefers to just call ioctl(SG_IO) against a storage device.
> > 
> >> I'm getting rather a different picture here than at first.  Previously
> >> I thought you had some 3rd-party application, not under your control,
> >> which expected to see this VPD data.
> >>
> >> But now I think that you're saying the application is under your own
> >> control.  I don't understand why synthetic VPD data is the best way to
> >> give your application the information it needs.
> >>
> >> What is the application doing with this VPD data ?  I mean,
> >> which specific application functions, and how do they depend on the
> >> VPD data ?
> >>
> > 
> > From the feedbacks I just got, they do *not* want the details to be in 
> > public.
> 
> It is difficult to suggest how it should be done correctly without this
> information.

Just think of it as a black box.
> 
> I also find it difficult to see a use case where running the storage
> software in the guest (instead of in the backend) is sensible or desirable.

Are you suggesting that doing backend drivers is not sensible?
> 
> > Anyway, I think this is not a block of this patch.
> > In Windows PV block driver, we already use the same way to get the raw 
> > INQUIRY data.
> >  * The Windows PV block driver accepts ioctl(SG_IO).
> >  * Then it reads this /scsi/0x12/0x83 node.
> >  * Then return the raw INQURIY data back to ioctl.
> > 
> > Since Linux guest also wants to do the same thing, let's making this 
> > mechanism to be a generic interface!
> > I'll post a patch adding ioctl(SG_IO) support to xen-blkfront together with 
> > a updated version of this patch soon.
> 
> I do not think this feature is generally useful outside of this
> unspecified use case.  I do not think that supplying details about
> underlying storage device (beyond generic properties) to guests is
> sensible (e.g., what if the guest snapshot is restored on different
> storage?).

The restore process (xl) can update the XenStore key with the new storage.
> 
> And thus I do not not think we should either: a) make this part of the
> blkif ABI; or b) add support to xen-blkfront or xen-blkback.

It is already coded in Windows PV drivers so I am not following why
codyfing this in the blkif.h is harmful?


> 
> David
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 01/28] xen/x86: Drop unused and non-useful feature definitions

2016-03-22 Thread Doug Goldstein

On 3/15/16 10:34 AM, Andrew Cooper wrote:
> None of these features are interesting for Xen to use, or to be advertised to
> guests.  Doing so identifies further areas of code which can be removed now
> that 32bit support has been dropped.
> 
> IA64 has a sole user in microcode_intel.c.  While it is plausible for a 32bit
> x86 hypervisor to get there via IA64's x86 emulation, a 64bit x86 hypervisor
> most certainly won't.
> 
> MP proves to be more complicated.  It is only advertised on some K7
> processors, not on K8 or newer, and now listed as reserved in the AMD manual.
> Cleaning this up reveals two chunks of common SMP code which was only
> applicable to K7 processors, which are 32bit only.
> 
> While cleaning this area up, remove the incosistent use of newlines in the

Konrad already pointed this out.

> cpu_has_* definition block.
> 
> Signed-off-by: Andrew Cooper 
> ---

Reviewed-by: Doug Goldstein 

This patch seems to be a general improvement and doesn't cause other
regressions without the rest of the series so can we merged this one.


-- 
Doug Goldstein



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [libvirt test] 86883: tolerable FAIL - PUSHED

2016-03-22 Thread osstest service owner

flight 86883 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/86883/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 13 guest-saverestorefail never pass
 test-armhf-armhf-libvirt-raw 13 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  dc56475f72b2c220443154b33fee221099eeec7f
baseline version:
 libvirt  9a0c7f5f834185db9017c34aabc03ad99cf37bed

Last test of basis86536  2016-03-18 04:25:19 Z4 days
Failing since 86625  2016-03-19 04:23:22 Z3 days4 attempts
Testing same since86883  2016-03-22 04:23:22 Z0 days1 attempts


People who touched revisions under test:
  Bjoern Walk 
  Christophe Fergeau 
  Cole Robinson 
  Cristian Klein 
  Jim Fehlig 
  Jiri Denemark 
  John Ferlan 
  Martin Kletzander 
  Michal Privoznik 
  Richard Laager 
  Roman Bogorodskiy 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm fail
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-armhf-armhf-libvirt fail
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-armhf-armhf-libvirt-qcow2   fail
 test-armhf-armhf-libvirt-raw fail
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=libvirt
+ revision=dc56475f72b2c220443154b33fee221099eeec7f
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or

Re: [Xen-devel] [PATCH] libxc/x86: XSAVE related adjustments

2016-03-22 Thread Andrew Cooper

On 22/03/16 13:05, Jan Beulich wrote:
> - don't unintentionally increase features reported by sub-leaf 0
>   EDX:EAX
> - don't discard the known flags in sub-leaves 2..63 ECX
> - handle components 32...62 (EDX) in sub-leaf 1 consistently with
>   0...31 (ECX)
> - zap sub-leaves beyond 62
>
> Signed-off-by: Jan Beulich 
> ---
> While obviously requiring re-basing on either end when taking Andrew's
> CPUID levelling series into account, the changes done here appear to
> be orthogonal to those done in his series.
>
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -281,10 +281,14 @@ static void intel_xc_cpuid_policy(xc_int
>  }
>  }
>  
> +/* Leaf 1, EAX: */
>  #define XSAVEOPT(1 << 0)
>  #define XSAVEC  (1 << 1)
>  #define XGETBV1 (1 << 2)
>  #define XSAVES  (1 << 3)

Hmm - I should convert these to be X86_FEATURESET_xxx values.

> +/* Leaves beyond 1, ECX: */
> +#define XSTATE_XSS  (1 << 0)
> +#define XSTATE_ALIGN64  (1 << 1)
>  /* Configure extended state enumeration leaves (0x000D for xsave) */
>  static void xc_cpuid_config_xsave(xc_interface *xch,
>const struct cpuid_domain_info *info,
> @@ -300,9 +304,9 @@ static void xc_cpuid_config_xsave(xc_int
>  {
>  case 0: 
>  /* EAX: low 32bits of xfeature_enabled_mask */
> -regs[0] = info->xfeature_mask & 0x;
> +regs[0] &= info->xfeature_mask;
>  /* EDX: high 32bits of xfeature_enabled_mask */
> -regs[3] = (info->xfeature_mask >> 32) & 0x;
> +regs[3] &= info->xfeature_mask >> 32;
>  /* ECX: max size required by all HW features */
>  {
>  unsigned int _input[2] = {0xd, 0x0}, _regs[4];

This is an improvement on the code currently present, but is still
superseded by the final patch of my cpuid series.

> @@ -325,16 +329,20 @@ static void xc_cpuid_config_xsave(xc_int

Between these two hunks, there is a loop bound which is also wrong.

>  if ( !info->hvm )
>  regs[0] &= ~XSAVES;
>  regs[2] &= info->xfeature_mask;
> -regs[3] = 0;
> +regs[3] &= info->xfeature_mask >> 32;
>  break;
> -case 2 ... 63: /* sub-leaves */
> +case 2 ... 62: /* per-component sub-leaves */
>  if ( !(info->xfeature_mask & (1ULL << input[1])) )

Now I think about it, this check is incomplete.  xfeature_mask doesn't
contain xss values.

For now its fine, but it will cause problems when support for Processor
Trace is added.

>  {
>  regs[0] = regs[1] = regs[2] = regs[3] = 0;
>  break;
>  }
>  /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
> -regs[2] = regs[3] = 0;
> +regs[2] &= XSTATE_XSS | XSTATE_ALIGN64;
> +regs[3] = 0;
> +break;
> +default:
> +regs[0] = regs[1] = regs[2] = regs[3] = 0;
>  break;

If you wish, I can fold this patch into the final patch of my cpuid series.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 >

1 - 100 of 154 matches

Mail list logo