Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Ian Campbell
On Wed, 2015-07-08 at 17:06 +0800, Chen, Tiejun wrote:

> #2. Don't expose "ignore" to user and just keep "host" as the default
> 
> He told me he would discuss this with you, but sounds he didn't do this, 
> or I'm missing something here?

My question was regarding how xl rdm="type=none" differed from not
saying anything (i.e. getting the default). You explained that this was
useful to allow the default to be changed, which I agreed with.

The question regarding the actually naming of the options at either the
xl level or the libxl (which seems to be what Ian J's comments were on)
are orthogonal to the question of whether there should be a way to
explicitly ask for the default (as opposed to implicitly asking for it
by omission of the option).

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Chen, Tiejun

I'll update this next revision. And also rephrase this doc to address
your comments below.


FTR I think I indicated yesterday that I was satisfied with your
explanation for why type=none exists as an option even at the xl level,
namely that it allows us to change the default in the future.



Campbell,

Jackson had some different comments at this point,

#1. Rename "type" to "strategy " and "none" to "ignore"

--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -76,6 +76,17 @@ libxl_domain_type = Enumeration("domain_type", [
 (2, "PV"),
 ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")

+libxl_rdm_reserve_strategy = Enumeration("rdm_reserve_strategy", [
+(0, "ignore"),
+(1, "host"),
+])
+
...
 libxl_channel_connection = Enumeration("channel_connection", [
 (0, "UNKNOWN"),
 (1, "PTY"),
@@ -369,6 +380,11 @@ libxl_vnode_info = Struct("vnode_info", [
 ("vcpus", libxl_bitmap), # vcpus in this node
 ])

+libxl_rdm_reserve = Struct("rdm_reserve", [
+("strategy",libxl_rdm_reserve_strategy),
+("reserve", libxl_rdm_reserve_flag),
+])
+

#2. Don't expose "ignore" to user and just keep "host" as the default

He told me he would discuss this with you, but sounds he didn't do this, 
or I'm missing something here?


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Ian Campbell
On Wed, 2015-07-08 at 08:54 +0800, Chen, Tiejun wrote:
> >> +"none" is the default value and it means we don't check any reserved 
> >> regions
> >> +and then all rdm policies would be ignored. Guest just works as before and
> >> +the conflict of RDM and guest address space wouldn't be handled, and then
> >> +this may result in the associated device not being able to work or even 
> >> crash
> >> +the VM. So if you're assigning this kind of device, this option is not
> >> +recommended unless you can make sure any conflict doesn't exist.
> >> +
> >
> > One issue didn't come to conclusion during last round of review. Ian was
> > asking what's the difference with type=none vs not specifying rdm option
> > at all.
> >
> > You need to either convince Ian or remove "type=none" in *xl* level.
> > I.e. don't touch the libxl IDL. It still needs a none type.
> 
> I'll update this next revision. And also rephrase this doc to address 
> your comments below.

FTR I think I indicated yesterday that I was satisfied with your
explanation for why type=none exists as an option even at the xl level,
namely that it allows us to change the default in the future.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Chen, Tiejun

+"none" is the default value and it means we don't check any reserved regions
+and then all rdm policies would be ignored. Guest just works as before and
+the conflict of RDM and guest address space wouldn't be handled, and then
+this may result in the associated device not being able to work or even crash
+the VM. So if you're assigning this kind of device, this option is not
+recommended unless you can make sure any conflict doesn't exist.
+


One issue didn't come to conclusion during last round of review. Ian was
asking what's the difference with type=none vs not specifying rdm option
at all.

You need to either convince Ian or remove "type=none" in *xl* level.
I.e. don't touch the libxl IDL. It still needs a none type.


I'll update this next revision. And also rephrase this doc to address 
your comments below.


Thanks
Tiejun




+For example, you're trying to set "memory = 2800" to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+RMRR region: base_addr ac6d3000 end_address ac6e6fff
+RMRR region: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If the type options is set with "none",
+


If B is set to "none", for example,


+rdm = "type=none,reserve=strict" or rdm = "type=none,reserve=relaxed"
+
+mean we don't handle any conflict just to make VM keep running as before.
+Note this is our default behavior.
+


It means we don't handle any conflict to make VM run as before. This is
the default behavior.


+#2. If the type options is set with "host",
+


If B is set to "host", for example,


+rdm = "type=host,reserve=strict" or rdm = "type=host,reserve=relaxed"
+
+mean all conflict would be handled according to our policies which is
+introduced by the reserve option as described below.
+


It means all conflicts will be handled according to the policy
introduced by B as described below.


+=item B
+
+Specifies how to deal with conflicts discovered when reserving reserved device
+memory in the guest address space.
+


Specifies how to deal with conflicts when reserving reserved device
memory in guest address space.


+When that conflict is unsolved,
+
+"strict" means this VM can't be created successfully, or the associated device
+can't be attached in the case of hotplug;
+


"strict" means VM can't be created, or the associated device can't be
attached in the case of hotplug.


+"relaxed" allows a VM to be created to keep running with a warning message
+thrown out. But this may crash this VM if this device accesses RDM. For 
example,
+Windows IGD GFX driver always access these regions so this lead to a blue 
screen
+to crash VM in such a case.
+


"relaxed" allows VM to be created but may cause VM to crash if
pass-through device accesses RDM. For exampl,e Windows IGD GFX driver
always accessed RDM regions so it leads to VM crash.

Wei.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Chen, Tiejun

On 2015/7/8 1:08, Ian Jackson wrote:

Chen, Tiejun writes ("Re: [v5][PATCH 10/16] tools: introduce some new parameters to 
set rdm policy"):

Its always fine to me but I just think, is it a good time to start to
seek another *optional* approach to overturn current design and
implementation ? Unless you're very sure we're doing something wrong. I
noticed you should be CCed when we posted this associated design.


I don't think I'm trying to overturn the design.  I have read the
design documents and they don't go into this kind of detail about the
libxl API and the xl configuration interface.


Understood.

This design just started concerning everything in high level since RMRR 
is complicated, and its hard to go into details at that moment. And in 
fact this is one reason why we posted two RFC revisions. You know, its 
better to finalize that design based on actual codes.




Questions of defaults (and of exact API names) are matters of detail.

   > But then I also don't understand why your comment "the hypervisor
   > or tools can't prevent from accessing RDM by a VM" explains why
   > "none" is a good default.

   I mean if you don't set these mappings, these devices can't work at all
   and then crash VM like IGD passthrough. But I'm also saying we don't
   pass through any devices in most cases, and those devices which own RDM
   are very rare. So why should we set 'none' to Xen by default?

(I guess you mean "why _shouldn't we".)


Yeah, this is my typo. (I corrected this on another ensuing email :) )



The answer I would give is that defaults should be safe.  That is, the


safe and efficient to most cases. As I said previously, the RDM problem 
rarely occurs in real world.



default configuration settings should not lead to uncontrolled memory
accesses or crashes, even in less common setups.


Yes.



If the default were changed to `host', what would go wrong ?



Some behaviors are different.


AFAICT nothing would change for guests which do not have devices
assigned at build time (and which haven't had `rdm' set).


This seems not be correct partially.

More precisely, we have two conditions approaching to our policy,

#1. "host" means we always concern all RDMs resided on this host, no 
matter if either you're passing through those devices associated to 
RDMs, or you don't pass though any devices.


#2. "!host" means we just check if we're passing through some devices 
owning RDM. If yes, we would step into our policy just to each 
per-device rdm. If not, nothing is changed.




And, AFACIT, if there are no devices which advertise these RMRRs,
again, there is no difference.



Just see above.


So the only difference occurs when 1. a guest is configured for device


In the case of "host",  this is not a precondition to handle RDMs. 
Instead, "host" indicates we *always" handle RDM issues.



assginemnt 2. some device on the system (perhaps not the one being
assigned) has an RMRR.



#2 is right.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Ian Jackson
Chen, Tiejun writes ("Re: [v5][PATCH 10/16] tools: introduce some new 
parameters to set rdm policy"):
> Its always fine to me but I just think, is it a good time to start to 
> seek another *optional* approach to overturn current design and 
> implementation ? Unless you're very sure we're doing something wrong. I 
> noticed you should be CCed when we posted this associated design.

I don't think I'm trying to overturn the design.  I have read the
design documents and they don't go into this kind of detail about the
libxl API and the xl configuration interface.

Questions of defaults (and of exact API names) are matters of detail.

  > But then I also don't understand why your comment "the hypervisor
  > or tools can't prevent from accessing RDM by a VM" explains why
  > "none" is a good default.

  I mean if you don't set these mappings, these devices can't work at all 
  and then crash VM like IGD passthrough. But I'm also saying we don't 
  pass through any devices in most cases, and those devices which own RDM 
  are very rare. So why should we set 'none' to Xen by default?

(I guess you mean "why _shouldn't we".)

The answer I would give is that defaults should be safe.  That is, the
default configuration settings should not lead to uncontrolled memory
accesses or crashes, even in less common setups.

If the default were changed to `host', what would go wrong ?

AFAICT nothing would change for guests which do not have devices
assigned at build time (and which haven't had `rdm' set).

And, AFACIT, if there are no devices which advertise these RMRRs,
again, there is no difference.

So the only difference occurs when 1. a guest is configured for device
assginemnt 2. some device on the system (perhaps not the one being
assigned) has an RMRR.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Chen, Tiejun

I mean if you don't set these mappings, these devices can't work at all
and then crash VM like IGD passthrough. But I'm also saying we don't
pass through any devices in most cases, and those devices which own RDM
are very rare. So why should we set 'none' to Xen by default?


One typo, s/should/shouldn't/

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Chen, Tiejun

On 2015/7/7 22:40, Ian Jackson wrote:

Chen, Tiejun writes ("Re: [v5][PATCH 10/16] tools: introduce some new parameters to 
set rdm policy"):

On 2015/7/7 21:26, Ian Jackson wrote:

Is "none" not "hoping the user can ignore the problem" ?


Its impossible since the hypervisor or tools can't prevent from
accessing RDM by a VM. So as I said early, "none" is just suitable to
two cases,

#1. Those devices don't own any RDM
#2. Guest OS doesn't access RDM

Compared to other cases, these two cases are more popular in real world
and actual usage. So we'd like to keep "none" as a default.


I have read your 00/ description, and these two emails:
  http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html
  http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
I have also reread the documentation you provide in this patch.

I'm afraid I still don't understand why it is safe for the default to
be `none'.  My view is that the default setting should avoid a
possibility of memory corruption or system malfunction.


RMRR is used to pass through a device. And in this case this require 
this sort of 1:1 mapping. And especially RMRR is always masked as 
RESERVED in e820 table. So originally, any VM can't create these 
mappings, right? So as I said, if these devices you're trying to pass 
though don't own RDM or you don't pass through any devices at all, so 
there's no any memory corruption.




Your description in the document says this:

  +"none" is the default value and it means we don't check any reserved
  +regions and then all rdm policies would be ignored. Guest just works
  +as before and the conflict of RDM and guest address space wouldn't
  +be handled, and then this may result in the associated device not
  +being able to work or even crash the VM. So if you're assigning this
  +kind of device, this option is not recommended unless you can make
  +sure any conflict doesn't exist.

So you do not recommend the use of `none', however you make it the
default.

I'm afraid also that I don't quite understand the interaction between
none-vs-host on the one hand and strict-vs-relaxed on the other.  The
documentation would suggest that the only difference between
type=none
and
type=host,reserve=relaxed
is that the latter may print some extra warning messages.  But the
code appears to do a lot of work to move guest memory about, when
type=none is specified.


Also I don't understand this:


Its impossible since the hypervisor or tools can't prevent from
accessing RDM by a VM. So as I said early, "none" is just suitable to
two cases,


Perhaps I am missing something here.

The hypervisor can obviously prevent a VM from accessing RDM by not
setting up a mapping for it.  The problem is then that the VM might
try to make the access anyway, and then crash or malfunction.  But


Yes.


presumably the VM can be instructed via the E820 or some such not to
access these regions.


As I said above we need to create them as 1:1.



For a VM which has been given passthrough access to a device which
does DMA things are more complicated but again I think the hypervisor
and tools should be able to deny accesses using the iommu tables.


RMRR is a special case, and we have to set these mapping as 1:1. This is 
why we introduce these patches to make sure RMRR don't overlap normal 
RAM and MMIO.




But then I also don't understand why your comment "the hypervisor or
tools can't prevent from accessing RDM by a VM" explains why "none" is
a good default.


I mean if you don't set these mappings, these devices can't work at all 
and then crash VM like IGD passthrough. But I'm also saying we don't 
pass through any devices in most cases, and those devices which own RDM 
are very rare. So why should we set 'none' to Xen by default?





Sorry if I'm being dense.



Ian,

Its always fine to me but I just think, is it a good time to start to 
seek another *optional* approach to overturn current design and 
implementation ? Unless you're very sure we're doing something wrong. I 
noticed you should be CCed when we posted this associated design.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Ian Jackson
Chen, Tiejun writes ("Re: [v5][PATCH 10/16] tools: introduce some new 
parameters to set rdm policy"):
> On 2015/7/7 21:26, Ian Jackson wrote:
> > Is "none" not "hoping the user can ignore the problem" ?
> 
> Its impossible since the hypervisor or tools can't prevent from 
> accessing RDM by a VM. So as I said early, "none" is just suitable to 
> two cases,
> 
> #1. Those devices don't own any RDM
> #2. Guest OS doesn't access RDM
> 
> Compared to other cases, these two cases are more popular in real world 
> and actual usage. So we'd like to keep "none" as a default.

I have read your 00/ description, and these two emails:
 http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html
 http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
I have also reread the documentation you provide in this patch.

I'm afraid I still don't understand why it is safe for the default to
be `none'.  My view is that the default setting should avoid a
possibility of memory corruption or system malfunction.

Your description in the document says this:

 +"none" is the default value and it means we don't check any reserved
 +regions and then all rdm policies would be ignored. Guest just works
 +as before and the conflict of RDM and guest address space wouldn't
 +be handled, and then this may result in the associated device not
 +being able to work or even crash the VM. So if you're assigning this
 +kind of device, this option is not recommended unless you can make
 +sure any conflict doesn't exist.

So you do not recommend the use of `none', however you make it the
default.

I'm afraid also that I don't quite understand the interaction between
none-vs-host on the one hand and strict-vs-relaxed on the other.  The
documentation would suggest that the only difference between
   type=none
and
   type=host,reserve=relaxed
is that the latter may print some extra warning messages.  But the
code appears to do a lot of work to move guest memory about, when
type=none is specified.


Also I don't understand this:

> Its impossible since the hypervisor or tools can't prevent from 
> accessing RDM by a VM. So as I said early, "none" is just suitable to 
> two cases,

Perhaps I am missing something here.

The hypervisor can obviously prevent a VM from accessing RDM by not
setting up a mapping for it.  The problem is then that the VM might
try to make the access anyway, and then crash or malfunction.  But
presumably the VM can be instructed via the E820 or some such not to
access these regions.

For a VM which has been given passthrough access to a device which
does DMA things are more complicated but again I think the hypervisor
and tools should be able to deny accesses using the iommu tables.

But then I also don't understand why your comment "the hypervisor or
tools can't prevent from accessing RDM by a VM" explains why "none" is
a good default.


Sorry if I'm being dense.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Ian Jackson
Chen, Tiejun writes ("Re: [v5][PATCH 10/16] tools: introduce some new 
parameters to set rdm policy"):
> On 2015/7/7 21:26, Ian Jackson wrote:
> > I was suggesting (in text that you have snipped) that "none" in the
> > API should be remamed "ignore".
> 
> So sounds you're saying these two changes,

I think so, yes.

But I think I need to go away and read your 00/ message and the design
document.  I still feel confused.  I will get back to you RSN.

> > Is "none" not "hoping the user can ignore the problem" ?
> 
> Its impossible since the hypervisor or tools can't prevent from 
> accessing RDM by a VM. So as I said early, "none" is just suitable to 
> two cases,
> 
> #1. Those devices don't own any RDM
> #2. Guest OS doesn't access RDM
> 
> Compared to other cases, these two cases are more popular in real world 
> and actual usage. So we'd like to keep "none" as a default.

Hopefully this answer will make more sense to me after I have read the
design info.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Chen, Tiejun

On 2015/7/7 21:26, Ian Jackson wrote:

Chen, Tiejun writes ("Re: [v5][PATCH 10/16] tools: introduce some new parameters to 
set rdm policy"):

[Later:]

As I discussed with Campbell we'd like not to expose "none" in xl level
since this is equivalent to that case we don't set anything.


I think this observation of mine applies to the libxl API level too.


Sorry I don't know what I should do at this point.


I was suggesting (in text that you have snipped) that "none" in the
API should be remamed "ignore".


So sounds you're saying these two changes,

@@ -77,7 +77,7 @@ libxl_domain_type = Enumeration("domain_type", [
 ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")

 libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
-(0, "none"),
+(0, "ignore"),
 (1, "host"),
 ])

@@ -381,8 +381,8 @@ libxl_vnode_info = Struct("vnode_info", [
 ])

 libxl_rdm_reserve = Struct("rdm_reserve", [
-("type",libxl_rdm_reserve_type),
-("reserve",   libxl_rdm_reserve_flag),
+("strategy",libxl_rdm_reserve_type),
+("reserve", libxl_rdm_reserve_flag),
 ])

 libxl_domain_build_info = Struct("domain_build_info",[

Right?




This suggests that the default is "do the dangerous thing".  That
doesn't seem right.


As I discussed with Campbell we'd like not to expose "none" in xl level
since this is equivalent to that case we don't set anything.


That's not really an answer to what I have said, I think.

Why is the default the option that the documentation recommends to
avoid ?


I mean not all devices really needs this option and actually these
devices are very rare. Currently just IGD GFX needs this consideration
so we'd like to make "none" as a default value.


What happens if "host" is used as a default with other devices ?


"host" just means we should concern all RDMS no matter if these devices 
own any RDM, and even actually you don't pass though any devices.



AFAICT such other decices do not have any RDM so "host" would work
fine.


Yes but its really not pointless to keep setting host if you already 
make sure nothing is involved to RDM. So why we still set 'host' to cost 
those instructions cycles in this case? I don't think its not better to 
set 'host' by default.





"none" means we have a chance to work as before since not all devices
own RDM. But as I said above this is same as !rdm.


Are we expecting many existing devices, and existing setups, to break
if we don't make the default be to ignore the problem ?


If we don't set anything we don't hope you can ignore this existing problem.


Is "none" not "hoping the user can ignore the problem" ?


Its impossible since the hypervisor or tools can't prevent from 
accessing RDM by a VM. So as I said early, "none" is just suitable to 
two cases,


#1. Those devices don't own any RDM
#2. Guest OS doesn't access RDM

Compared to other cases, these two cases are more popular in real world 
and actual usage. So we'd like to keep "none" as a default.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Ian Jackson
Chen, Tiejun writes ("Re: [v5][PATCH 10/16] tools: introduce some new 
parameters to set rdm policy"):
> > [Later:]
> >> As I discussed with Campbell we'd like not to expose "none" in xl level
> >> since this is equivalent to that case we don't set anything.
> >
> > I think this observation of mine applies to the libxl API level too.
> 
> Sorry I don't know what I should do at this point.

I was suggesting (in text that you have snipped) that "none" in the
API should be remamed "ignore".

> >>> This suggests that the default is "do the dangerous thing".  That
> >>> doesn't seem right.
> >>
> >> As I discussed with Campbell we'd like not to expose "none" in xl level
> >> since this is equivalent to that case we don't set anything.
> >
> > That's not really an answer to what I have said, I think.
> >
> > Why is the default the option that the documentation recommends to
> > avoid ?
> 
> I mean not all devices really needs this option and actually these 
> devices are very rare. Currently just IGD GFX needs this consideration 
> so we'd like to make "none" as a default value.

What happens if "host" is used as a default with other devices ?
AFAICT such other decices do not have any RDM so "host" would work
fine.

> >> "none" means we have a chance to work as before since not all devices
> >> own RDM. But as I said above this is same as !rdm.
> >
> > Are we expecting many existing devices, and existing setups, to break
> > if we don't make the default be to ignore the problem ?
> 
> If we don't set anything we don't hope you can ignore this existing problem.

Is "none" not "hoping the user can ignore the problem" ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Chen, Tiejun

[Later:]

As I discussed with Campbell we'd like not to expose "none" in xl level
since this is equivalent to that case we don't set anything.


I think this observation of mine applies to the libxl API level too.


Sorry I don't know what I should do at this point.





Default per-device RDM policy is 'strict', while default global
RDM policy is 'relaxed'. And the per-device policy would override
the global policy like others.


I don't think these two sentences can both be true.  If the per-device
policy overrides the global policy, then the per-device policy must
have at least three values: `strict', `relaxed', and `not set'
(implicitly using global policy).


Yeah. The per-device policy is first set as the global policy. This
behavior is same as other pci option. Please see the last patch,


OK, then can you please amend the commit message to be true :-).


Sure.





This suggests that the default is "do the dangerous thing".  That
doesn't seem right.


As I discussed with Campbell we'd like not to expose "none" in xl level
since this is equivalent to that case we don't set anything.


That's not really an answer to what I have said, I think.

Why is the default the option that the documentation recommends to
avoid ?


I mean not all devices really needs this option and actually these 
devices are very rare. Currently just IGD GFX needs this consideration 
so we'd like to make "none" as a default value.





"none" means we have a chance to work as before since not all devices
own RDM. But as I said above this is same as !rdm.


Are we expecting many existing devices, and existing setups, to break
if we don't make the default be to ignore the problem ?



If we don't set anything we don't hope you can ignore this existing problem.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Ian Jackson
Chen, Tiejun writes ("Re: [v5][PATCH 10/16] tools: introduce some new 
parameters to set rdm policy"):
> [Ian Jackson:]
> > As I understand this feature, I don't think the name `type' is right.
> > This is a method for handling or overriding/ignoring the RDM problem.
> >
> > Perhaps `handling=' or `strategy=' ?
> 
> Okay, I prefer to use 'strategy='.

(We are discussing this on irc.)

> > And `none' seems quite an innocuous name for such a dangerous
> > setting.  How about `ignore' ?
...

[Later:]
> As I discussed with Campbell we'd like not to expose "none" in xl level 
> since this is equivalent to that case we don't set anything.

I think this observation of mine applies to the libxl API level too.


> >> Default per-device RDM policy is 'strict', while default global
> >> RDM policy is 'relaxed'. And the per-device policy would override
> >> the global policy like others.
> >
> > I don't think these two sentences can both be true.  If the per-device
> > policy overrides the global policy, then the per-device policy must
> > have at least three values: `strict', `relaxed', and `not set'
> > (implicitly using global policy).
> 
> Yeah. The per-device policy is first set as the global policy. This 
> behavior is same as other pci option. Please see the last patch,

OK, then can you please amend the commit message to be true :-).


> > This suggests that the default is "do the dangerous thing".  That
> > doesn't seem right.
> 
> As I discussed with Campbell we'd like not to expose "none" in xl level 
> since this is equivalent to that case we don't set anything.

That's not really an answer to what I have said, I think.

Why is the default the option that the documentation recommends to
avoid ?

> "none" means we have a chance to work as before since not all devices 
> own RDM. But as I said above this is same as !rdm.

Are we expecting many existing devices, and existing setups, to break
if we don't make the default be to ignore the problem ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Chen, Tiejun

Global RDM parameter, "type", allows user to specify reserved regions
explicitly, e.g. using 'host' to include all reserved regions reported


As I understand this feature, I don't think the name `type' is right.
This is a method for handling or overriding/ignoring the RDM problem.

Perhaps `handling=' or `strategy=' ?


Okay, I prefer to use 'strategy='.



And `none' seems quite an innocuous name for such a dangerous
setting.  How about `ignore' ?


'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. And the per-device policy would override the global policy like
others.


I don't think these two sentences can both be true.  If the per-device
policy overrides the global policy, then the per-device policy must
have at least three values: `strict', `relaxed', and `not set'
(implicitly using global policy).


Yeah. The per-device policy is first set as the global policy. This 
behavior is same as other pci option. Please see the last patch,


@@ -1937,6 +1945,11 @@ skip_vfb:
 pcidev->power_mgmt = pci_power_mgmt;
 pcidev->permissive = pci_permissive;
 pcidev->seize = pci_seize;
+/*
+ * Like other pci option, the per-device policy always follows
+ * the global policy by default.
+ */
+pcidev->rdm_reserve = b_info->u.hvm.rdm.reserve;
 if (!xlu_pci_parse_bdf(config, pcidev, buf))
 d_config->num_pcidevs++;
 }



Perhaps I have misunderstood.


diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..a2dc343 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,83 @@ assigned slave device.

...

+=item B


The documentation on this is rather vague about what the default is.
But you do say:


+In this conflict case,
+
+#1. If the type options is set with "none",
+
+rdm = "type=none,reserve=strict" or rdm = "type=none,reserve=relaxed"
+
+mean we don't handle any conflict just to make VM keep running as before.
+Note this is our default behavior.


This suggests that the default is "do the dangerous thing".  That
doesn't seem right.


As I discussed with Campbell we'd like not to expose "none" in xl level 
since this is equivalent to that case we don't set anything.





+#2. If the type options is set with "host",
+
+rdm = "type=host,reserve=strict" or rdm = "type=host,reserve=relaxed"

...

+"relaxed" allows a VM to be created to keep running with a warning message
+thrown out. But this may crash this VM if this device accesses RDM. For 
example,
+Windows IGD GFX driver always access these regions so this lead to a blue 
screen
+to crash VM in such a case.


The difference between
type=host,reserve=relaxed
and
type=none
is simply whether a warning is printed ?



"none" means we have a chance to work as before since not all devices 
own RDM. But as I said above this is same as !rdm.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Ian Jackson
Tiejun Chen writes ("[v5][PATCH 10/16] tools: introduce some new parameters to 
set rdm policy"):
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,
...
> Global RDM parameter, "type", allows user to specify reserved regions
> explicitly, e.g. using 'host' to include all reserved regions reported

As I understand this feature, I don't think the name `type' is right.
This is a method for handling or overriding/ignoring the RDM problem.

Perhaps `handling=' or `strategy=' ?

And `none' seems quite an innocuous name for such a dangerous
setting.  How about `ignore' ?

> 'strict/relaxed' policy decides how to handle conflict when reserving RDM
> regions in pfn space. If conflict exists, 'strict' means an immediate error
> so VM can't keep running, while 'relaxed' allows moving forward with a
> warning message thrown out.
> 
> Default per-device RDM policy is 'strict', while default global RDM policy
> is 'relaxed'. And the per-device policy would override the global policy like
> others.

I don't think these two sentences can both be true.  If the per-device
policy overrides the global policy, then the per-device policy must
have at least three values: `strict', `relaxed', and `not set'
(implicitly using global policy).

Perhaps I have misunderstood.

> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index a3e0e2e..a2dc343 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -655,6 +655,83 @@ assigned slave device.
...
> +=item B

The documentation on this is rather vague about what the default is.
But you do say:

> +In this conflict case,
> +
> +#1. If the type options is set with "none",
> +
> +rdm = "type=none,reserve=strict" or rdm = "type=none,reserve=relaxed"
> +
> +mean we don't handle any conflict just to make VM keep running as before.
> +Note this is our default behavior.

This suggests that the default is "do the dangerous thing".  That
doesn't seem right.

> +#2. If the type options is set with "host",
> +
> +rdm = "type=host,reserve=strict" or rdm = "type=host,reserve=relaxed"
...
> +"relaxed" allows a VM to be created to keep running with a warning message
> +thrown out. But this may crash this VM if this device accesses RDM. For 
> example,
> +Windows IGD GFX driver always access these regions so this lead to a blue 
> screen
> +to crash VM in such a case.

The difference between
   type=host,reserve=relaxed
and
   type=none
is simply whether a warning is printed ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-07 Thread Wei Liu
On Tue, Jul 07, 2015 at 02:17:11PM +0800, Tiejun Chen wrote:
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,
> 
> Global RDM parameter:
> rdm = "type=none/host,reserve=strict/relaxed"
> Per-device RDM parameter:
> pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]
> 
> Global RDM parameter, "type", allows user to specify reserved regions
> explicitly, e.g. using 'host' to include all reserved regions reported
> on this platform which is good to handle hotplug scenario. In the future
> this parameter may be further extended to allow specifying random regions,
> e.g. even those belonging to another platform as a preparation for live
> migration with passthrough devices. Instead, 'none' means we have nothing
> to do all reserved regions and ignore all policies, so guest work as before.
> 
> 'strict/relaxed' policy decides how to handle conflict when reserving RDM
> regions in pfn space. If conflict exists, 'strict' means an immediate error
> so VM can't keep running, while 'relaxed' allows moving forward with a
> warning message thrown out.
> 
> Default per-device RDM policy is 'strict', while default global RDM policy
> is 'relaxed'. And the per-device policy would override the global policy like
> others.
> 
> CC: Ian Jackson 
> CC: Stefano Stabellini 
> CC: Ian Campbell 
> CC: Wei Liu 
> Signed-off-by: Tiejun Chen 
> ---
> v5:
> 
> * Just make sure the per-device plicy always override the global policy,
>   and so cleanup some associated comments and the patch head description.
> * A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
> * Improve all descriptions in doc.
> * Make all rdm variables specific to .hvm
> 
> v4:
> 
> * No need to define init_val for libxl_rdm_reserve_type since its just zero
> * Grab those changes to xl/libxlu to as a final patch
> 
>  docs/man/xl.cfg.pod.5| 84 
> 
>  docs/misc/vtd.txt| 24 +
>  tools/libxl/libxl_create.c   |  7 
>  tools/libxl/libxl_internal.h |  2 ++
>  tools/libxl/libxl_pci.c  |  9 +
>  tools/libxl/libxl_types.idl  | 18 ++
>  6 files changed, 144 insertions(+)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index a3e0e2e..a2dc343 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -655,6 +655,83 @@ assigned slave device.
>  
>  =back
>  
> +=item B
> +
> +(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
> +which is necessary to enable robust device passthrough. One example of RDM
> +is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
> +on x86 platform.
> +
> +B has the form C<[KEY=VALUE,KEY=VALUE,...> where:
> +
> +=over 4
> +
> +=item B
> +
> +Possible Bs are:
> +
> +=over 4
> +
> +=item B
> +
> +Currently there are only two valid types:
> +
> +"host" means all reserved device memory on this platform should be checked to
> +reserve regions in this VM's guest address space. This global RDM parameter
> +allows user to specify reserved regions explicitly, and using "host" includes
> +all reserved regions reported on this platform, which is useful when doing
> +hotplug.
> +
> +"none" is the default value and it means we don't check any reserved regions
> +and then all rdm policies would be ignored. Guest just works as before and
> +the conflict of RDM and guest address space wouldn't be handled, and then
> +this may result in the associated device not being able to work or even crash
> +the VM. So if you're assigning this kind of device, this option is not
> +recommended unless you can make sure any conflict doesn't exist.
> +

One issue didn't come to conclusion during last round of review. Ian was
asking what's the difference with type=none vs not specifying rdm option
at all.

You need to either convince Ian or remove "type=none" in *xl* level.
I.e. don't touch the libxl IDL. It still needs a none type.

> +For example, you're trying to set "memory = 2800" to allocate memory to one
> +given VM but the platform owns two RDM regions like,
> +
> +RMRR region: base_addr ac6d3000 end_address ac6e6fff
> +RMRR region: base_addr ad80 end_address afff
> +
> +In this conflict case,
> +
> +#1. If the type options is set with "none",
> +

If B is set to "none", for example,

> +rdm = "type=none,reserve=strict" or rdm = "type=none,reserve=relaxed"
> +
> +mean we don't handle any conflict just to make VM keep running as before.
> +Note this is our default behavior.
> +

It means we don't handle any conflict to make VM run as before. This is
the default behavior.

> +#2. If the type options is set with "host",
> +

If B is set to "host", for example,

> +rdm = "type=host,reserve=strict" or rdm = "type=host,reserve=relaxed"
> +
> +mean all conflict would be handled according to our policies which is
> +introduced by the reserve option as described below.
> +

It means all conflicts will be handled according to the policy
int

[Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-06 Thread Tiejun Chen
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = "type=none/host,reserve=strict/relaxed"
Per-device RDM parameter:
pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, "type", allows user to specify reserved regions
explicitly, e.g. using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. Instead, 'none' means we have nothing
to do all reserved regions and ignore all policies, so guest work as before.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 84 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 144 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..a2dc343 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,83 @@ assigned slave device.
 
 =back
 
+=item B
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+B has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B
+
+Possible Bs are:
+
+=over 4
+
+=item B
+
+Currently there are only two valid types:
+
+"host" means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global RDM parameter
+allows user to specify reserved regions explicitly, and using "host" includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+"none" is the default value and it means we don't check any reserved regions
+and then all rdm policies would be ignored. Guest just works as before and
+the conflict of RDM and guest address space wouldn't be handled, and then
+this may result in the associated device not being able to work or even crash
+the VM. So if you're assigning this kind of device, this option is not
+recommended unless you can make sure any conflict doesn't exist.
+
+For example, you're trying to set "memory = 2800" to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+RMRR region: base_addr ac6d3000 end_address ac6e6fff
+RMRR region: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If the type options is set with "none",
+
+rdm = "type=none,reserve=strict" or rdm = "type=none,reserve=relaxed"
+
+mean we don't handle any conflict just to make VM keep running as before.
+Note this is our default behavior.
+
+#2. If the type options is set with "host",
+
+rdm = "type=host,reserve=strict" or rdm = "type=host,reserve=relaxed"
+
+mean all conflict would be handled according to our policies which is
+introduced by the reserve option as described below.
+
+=item B
+
+Specifies how to deal with conflicts discovered when reserving reserved device
+memory in the guest address space.
+
+When that conflict is unsolved,
+
+"strict" means this VM can't be created successfully, or the associated device
+can't be attached in the case of hotplug;
+
+"relaxed" allows a VM to be created to keep running with a warning message
+thrown out. But this may crash this VM if this device accesses RDM. For 
example,
+Windows IGD GFX driver always access these regions so this lead to a blue 
screen
+to crash VM in such a case.
+
+Note this may be overridden by rdm_reserve option in PCI device configuration.
+
+=back
+
+=back
+
 =item B
 
 Specifies the host PCI devices to passthrough to this guest. Each 
B
@@ -717,6 +794,13 @@ dom0 without confirmation.  Please use with care.
 D0-D3h