Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-16 Thread Chen, Tiejun

On 2015/7/16 17:40, George Dunlap wrote:

On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun  wrote:

Could you take a look at the original patch #06 ?  Although Jan thought that
is complicated, that is really one version that I can refine in current time
slot.


When you say "original", which version are you talking about?  You
mean the one at the base of this thread (v7)?



Yes, I'm pointing patch #6 in v7. And sorry to make this confused to you.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-16 Thread George Dunlap
On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun  wrote:
> Could you take a look at the original patch #06 ?  Although Jan thought that
> is complicated, that is really one version that I can refine in current time
> slot.

When you say "original", which version are you talking about?  You
mean the one at the base of this thread (v7)?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/16 0:14, George Dunlap wrote:

On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap
 wrote:

Would it be possible, on a collision, to have one last "stab" at
allocating the BAR somewhere else, without relocating memory (or
relocating any other BARs)?  At very least then an administrator could
work around this kind of thing by setting the mmio_hole larger in the
domain config.


If it's not possible to have this last-ditch relocation effort, then


Could you take a look at the original patch #06 ?  Although Jan thought 
that is complicated, that is really one version that I can refine in 
current time slot.



yes, I'd be OK with just disabling the device for the time being.



Just let me send out new patch series based this idea. We can continue 
discuss this over there but we also need to further review other 
remaining comments based on a new revision.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap
 wrote:
> Would it be possible, on a collision, to have one last "stab" at
> allocating the BAR somewhere else, without relocating memory (or
> relocating any other BARs)?  At very least then an administrator could
> work around this kind of thing by setting the mmio_hole larger in the
> domain config.

If it's not possible to have this last-ditch relocation effort, then
yes, I'd be OK with just disabling the device for the time being.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On Wed, Jul 15, 2015 at 3:00 PM, Jan Beulich  wrote:
 On 15.07.15 at 15:40,  wrote:
>> On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich  wrote:
>>> Therefore I'll not make any further comments on the rest of the
>>> patch, but instead outline an allocation model that I think would
>>> fit our needs: Subject to the constraints mentioned above, set up
>>> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
>>> bits], i.e. reasonably small a memory block). Each bit represents a
>>> page usable for MMIO: First of all you remove the range from
>>> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
>>> first pass over all devices, allocating (in the bitmap) space for only
>>> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
>>> a best fit (i.e. preferably a range not usable by any bigger BAR)
>>> from top down. For example, if you have available
>>>
>>> [f000,f800)
>>> [f900,f9001000)
>>> [fa00,fa003000)
>>> [fa01,fa012000)
>>>
>>> and you're looking for a single page slot, you should end up
>>> picking fa002000.
>>>
>>> After this pass you should be able to do RAM relocation in a
>>> single attempt just like we do today (you may still grow the MMIO
>>> window if you know you need to and can fit some of the 64-bit
>>> BARs in there, subject to said constraints; this is in an attempt
>>> to help OSes not comfortable with 64-bit resources).
>>>
>>> In a 2nd pass you'd then assign 64-bit resources: If you can fit
>>> them below 4G (you still have the bitmap left of what you've got
>>> available), put them there. Allocation strategy could be the same
>>> as above (biggest first), perhaps allowing for some factoring out
>>> of logic, but here smallest first probably could work equally well.
>>> The main thought to decide between the two is whether it is
>>> better to fit as many (small) or as big (in total) as possible a set
>>> under 4G. I'd generally expect the former (as many as possible,
>>> leaving only a few huge ones to go above 4G) to be the better
>>> approach, but that's more a gut feeling than based on hard data.
>>
>> I agree that it would be more sensible for hvmloader to make a "plan"
>> first, and then do the memory reallocation (if it's possible) at one
>> time, then go through and actually update the device BARs according to
>> the "plan".
>>
>> However, I don't really see how having a bitmap really helps in this
>> case.  I would think having a list of free ranges (perhaps aligned by
>> powers of two?), sorted small->large, makes the most sense.
>
> I view bitmap vs list as just two different representations, and I
> picked the bitmap approach as being more compact storage wise
> in case there are many regions to deal with. I'd be fine with a list
> approach too, provided lookup times don't become prohibitive.

Sure, you can obviously translate one into the other.  The main reason
I dislike the idea of a bitmap is having to write code to determine
where the next free region is, and how big that region is, rather than
just going down the next on the list and reading range.start and
range.len.

Also, in your suggestion each bit is a page (4k); so assuming a 64-bit
pointer, a 64-bit starting point, and a 64-bit length (juts to make
things simple), a single "range" takes up enough bitmap to reserve
(64+64+64)*4k = 768k.  So if we make the bitmap big enough for 2GiB,
then the break-even point for storage is 2,730 ranges.  It's even
higher if we have an array instead of a linked list.

I'm pretty sure that having such a large number of ranges will be
vanishingly rare;  I'd expect the number of ranges so the "range"
representation will not only be easier to code and read, but will in
the common case (I believe) be far more compact.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
>>> On 15.07.15 at 15:40,  wrote:
> On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich  wrote:
>> Therefore I'll not make any further comments on the rest of the
>> patch, but instead outline an allocation model that I think would
>> fit our needs: Subject to the constraints mentioned above, set up
>> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
>> bits], i.e. reasonably small a memory block). Each bit represents a
>> page usable for MMIO: First of all you remove the range from
>> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
>> first pass over all devices, allocating (in the bitmap) space for only
>> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
>> a best fit (i.e. preferably a range not usable by any bigger BAR)
>> from top down. For example, if you have available
>>
>> [f000,f800)
>> [f900,f9001000)
>> [fa00,fa003000)
>> [fa01,fa012000)
>>
>> and you're looking for a single page slot, you should end up
>> picking fa002000.
>>
>> After this pass you should be able to do RAM relocation in a
>> single attempt just like we do today (you may still grow the MMIO
>> window if you know you need to and can fit some of the 64-bit
>> BARs in there, subject to said constraints; this is in an attempt
>> to help OSes not comfortable with 64-bit resources).
>>
>> In a 2nd pass you'd then assign 64-bit resources: If you can fit
>> them below 4G (you still have the bitmap left of what you've got
>> available), put them there. Allocation strategy could be the same
>> as above (biggest first), perhaps allowing for some factoring out
>> of logic, but here smallest first probably could work equally well.
>> The main thought to decide between the two is whether it is
>> better to fit as many (small) or as big (in total) as possible a set
>> under 4G. I'd generally expect the former (as many as possible,
>> leaving only a few huge ones to go above 4G) to be the better
>> approach, but that's more a gut feeling than based on hard data.
> 
> I agree that it would be more sensible for hvmloader to make a "plan"
> first, and then do the memory reallocation (if it's possible) at one
> time, then go through and actually update the device BARs according to
> the "plan".
> 
> However, I don't really see how having a bitmap really helps in this
> case.  I would think having a list of free ranges (perhaps aligned by
> powers of two?), sorted small->large, makes the most sense.

I view bitmap vs list as just two different representations, and I
picked the bitmap approach as being more compact storage wise
in case there are many regions to deal with. I'd be fine with a list
approach too, provided lookup times don't become prohibitive.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On Wed, Jul 15, 2015 at 12:34 PM, Chen, Tiejun  wrote:
>>> Maybe I  can move this chunk of codes downward those actual allocation
>>> to check if RDM conflicts with the final allocation, and then just
>>> disable those associated devices by writing PCI_COMMAND without BUG()
>>> like this draft code,
>>
>>
>> And what would keep the guest from re-enabling the device?
>>
>
> We can't but IMO,
>
> #1. We're already posting that warning messages.
>
> #2. Actually this is like this sort of case like, as you known, even in the
> native platform, some broken devices are also disabled by BIOS, right? So I
> think this is OS's responsibility or risk to force enabling such a broken
> device.
>
> #3. Its really rare possibility in real world.

Well the real question is if the guest re-enabling the device would
cause a security problem; I think the answer to that must be "no" (or
at least, "it's not any worse than it was before").

The guest OS has the device disabled, and the RMRRs in the e820 map;
if it re-enables the device over the "BIOS" (which hvmloader sort of
acts as), then it should know what it's doing.

Would it be possible, on a collision, to have one last "stab" at
allocating the BAR somewhere else, without relocating memory (or
relocating any other BARs)?  At very least then an administrator could
work around this kind of thing by setting the mmio_hole larger in the
domain config.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich  wrote:
> Therefore I'll not make any further comments on the rest of the
> patch, but instead outline an allocation model that I think would
> fit our needs: Subject to the constraints mentioned above, set up
> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
> bits], i.e. reasonably small a memory block). Each bit represents a
> page usable for MMIO: First of all you remove the range from
> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
> first pass over all devices, allocating (in the bitmap) space for only
> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
> a best fit (i.e. preferably a range not usable by any bigger BAR)
> from top down. For example, if you have available
>
> [f000,f800)
> [f900,f9001000)
> [fa00,fa003000)
> [fa01,fa012000)
>
> and you're looking for a single page slot, you should end up
> picking fa002000.
>
> After this pass you should be able to do RAM relocation in a
> single attempt just like we do today (you may still grow the MMIO
> window if you know you need to and can fit some of the 64-bit
> BARs in there, subject to said constraints; this is in an attempt
> to help OSes not comfortable with 64-bit resources).
>
> In a 2nd pass you'd then assign 64-bit resources: If you can fit
> them below 4G (you still have the bitmap left of what you've got
> available), put them there. Allocation strategy could be the same
> as above (biggest first), perhaps allowing for some factoring out
> of logic, but here smallest first probably could work equally well.
> The main thought to decide between the two is whether it is
> better to fit as many (small) or as big (in total) as possible a set
> under 4G. I'd generally expect the former (as many as possible,
> leaving only a few huge ones to go above 4G) to be the better
> approach, but that's more a gut feeling than based on hard data.

I agree that it would be more sensible for hvmloader to make a "plan"
first, and then do the memory reallocation (if it's possible) at one
time, then go through and actually update the device BARs according to
the "plan".

However, I don't really see how having a bitmap really helps in this
case.  I would think having a list of free ranges (perhaps aligned by
powers of two?), sorted small->large, makes the most sense.

So suppose we had the above example, but with the range
[fa00,fa005000) instead, and we're looking for a 4-page region.
Then our "free list" initially would look like this:

[f900,f9001000)
[fa01,fa012000)
[fa00,fa005000)
[f000,f800)

After skipping the first two because they aren't big enough, we'd take
0x4000 from the third one, placing the BAR at [fa00,fa004000), and
putting the remainder [fa004000,fa005000) back on the free list in
order, thus:

[f900,f9001000)
[fa004000,fa005000)
[fa01,fa012000)
[f000,f800)

If we got to the end and hadn't found a region large enough, *and* we
could still expand the MMIO hole, we could lower pci_mem_start until
it could fit.

What do you think?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Is not booting worse than what we have now -- which is, booting
successfully but (probably) having issues due to MMIO ranges
overlapping RMRRs?


Its really so rare possibility here since in the real world we didn't
see any RMRR regions >= 0xF000 (the default pci memory start.) And I
already sent out a little better revision in that ensuing email so also
please take a review if possible :)


Do remember the context we're talking about. :-)  Jan said, *if* there
was a device that had an RMRR conflict with the "default" MMIO
placement, then the guest simply wouldn't boot.  I was saying, in that


I understand what you guys mean. Yes, "BUG" is not good in our case :)


case, we move from "silently ignore the conflict, possibly making the
device not work" to "guest refuses to boot".  Which, if it was
guaranteed that a conflict would cause the device to no longer work,
would be an improvement.


This is really what I did this with "a little better revision" as I 
mentioned above. Maybe you're missing this, so I'd like to paste that 
below ( but if you already saw this please ignore this below )


/* If pci bars conflict with RDM we need to disable this pci device. */
for ( devfn = 0; devfn < 256; devfn++ )
{
bar_sz = pci_readl(devfn, bar_reg);
bar_data = pci_readl(devfn, bar_reg);
bar_data_upper = pci_readl(devfn, bar_reg + 4);
/* Until here we don't conflict high memory. */
if ( bar_data_upper )
continue;

for ( i = 0; i < memory_map.nr_map ; i++ )
{
uint64_t reserved_start, reserved_size;
reserved_start = memory_map.map[i].addr;
reserved_size = memory_map.map[i].size;
if ( check_overlap(bar_data & ~(bar_sz - 1), bar_sz,
   reserved_start, reserved_size) )
{
printf("Reserved device memory conflicts with this pci 
bar,"

   " so just disable this device.\n");
/* Now disable this device */
cmd = pci_readw(devfn, PCI_COMMAND);
pci_writew(devfn, PCI_COMMAND, ~cmd);
}
}
}

Here I don't break that original allocation mechanism. Instead, I just 
check if we really have that conflict case and then disable that 
associated device.





This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.



Current MMIO allocation mechanism is not good. So we really need to
reshape that, but we'd better do this with some further discussion in
next release :)


Absolutely; I was saying, if we can put in a "good enough" measure for
this release, then we can get the rest of the patch series in with our
"good enough" hvmloader fix, and then work on fixing it properly next
release.

But if you're not aiming for this release anymore, then our aims are
something different.  (See my other e-mail.)



I really still try to strive for 4.6 release if possible :)

Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Wei Liu
On Wed, Jul 15, 2015 at 09:32:34AM +0100, Jan Beulich wrote:
> >>> On 15.07.15 at 02:55,  wrote:
> >> > I agree we'd better overhaul this since we already found something
> >>> unreasonable here. But one or two weeks is really not enough to fix this
> >>> with a bitmap framework, and although a bitmap can make mmio allocation
> >>> better, but more complicated if we just want to allocate PCI mmio.
> >>>
> >>> So could we do this next? I just feel if you'd like to pay more time
> >>> help me refine our current solution, its relatively realistic to this
> >>> case :) And then we can go into bitmap in details or work out a better
> >>> solution in sufficient time slot.
> >>
> >> Looking at how long it took to get here (wasn't this series originally
> >> even meant to go into 4.5?) and how much time I already spent
> > 
> > Certainly appreciate your time.
> > 
> > I didn't mean its wasting time at this point. I just want to express 
> > that its hard to implement that solution in one or two weeks to walking 
> > into 4.6 as an exception.
> > 
> > Note I know this feature is still not accepted as an exception to 4.6 
> > right now so I'm making an assumption.
> 
> After all this is a bug fix (and would have been allowed into 4.5 had
> it been ready in time), so doesn't necessarily need a freeze
> exception (but of course the bar raises the later it gets). Rather
> than rushing in something that's cumbersome to maintain, I'd much
> prefer this to be done properly.
> 

This series is twofold. I consider the tools side change RDM (not
limited to RMRR) a new feature.  It introduces a new feature to fix a
bug. It would still be subject to freeze exception from my point of
view.

Wei.

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On 07/15/2015 12:20 PM, Chen, Tiejun wrote:
> On 2015/7/15 19:05, George Dunlap wrote:
>> On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich  wrote:
>> On 15.07.15 at 10:59,  wrote:
 On 2015/7/15 16:34, Jan Beulich wrote:
 On 15.07.15 at 06:27,  wrote:
>> Furthermore, could we have this solution as follows?
>
> Yet more special casing code you want to add. I said no to this
> model, and unless you can address the issue _without_ adding
> a lot of special casing code, the answer will remain no (subject

 What about this?

 @@ -301,6 +301,19 @@ void pci_setup(void)
pci_mem_start <<= 1;
}

 +for ( i = 0; i < memory_map.nr_map ; i++ )
 +{
 +uint64_t reserved_start, reserved_size;
 +reserved_start = memory_map.map[i].addr;
 +reserved_size = memory_map.map[i].size;
 +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
 +   reserved_start, reserved_size) )
 +{
 +printf("Reserved device memory conflicts current PCI
 memory.\n");
 +BUG();
 +}
 +}
>>>
>>> So what would the cure be if someone ran into this BUG() (other
>>> than removing the device associated with the conflicting RMRR)?
>>> Afaics such a guest would remain permanently unbootable, which
>>> of course is not an option.
>>
>> Is not booting worse than what we have now -- which is, booting
>> successfully but (probably) having issues due to MMIO ranges
>> overlapping RMRRs?
> 
> Its really so rare possibility here since in the real world we didn't
> see any RMRR regions >= 0xF000 (the default pci memory start.) And I
> already sent out a little better revision in that ensuing email so also
> please take a review if possible :)

Do remember the context we're talking about. :-)  Jan said, *if* there
was a device that had an RMRR conflict with the "default" MMIO
placement, then the guest simply wouldn't boot.  I was saying, in that
case, we move from "silently ignore the conflict, possibly making the
device not work" to "guest refuses to boot".  Which, if it was
guaranteed that a conflict would cause the device to no longer work,
would be an improvement.

>> This patch series as a whole represents a lot of work and a lot of
>> tangible improvements to the situation; and (unless the situation has
>> changed) it's almost entirely acked apart from the MMIO placement
>> part.  If there is a simple way that we can change hvmloader so that
>> most (or even many) VM/device combinations work properly for the 4.6
>> release, then I think it's worth considering.
>>
> 
> Current MMIO allocation mechanism is not good. So we really need to
> reshape that, but we'd better do this with some further discussion in
> next release :)

Absolutely; I was saying, if we can put in a "good enough" measure for
this release, then we can get the rest of the patch series in with our
"good enough" hvmloader fix, and then work on fixing it properly next
release.

But if you're not aiming for this release anymore, then our aims are
something different.  (See my other e-mail.)

 -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/15 19:27, Jan Beulich wrote:

On 15.07.15 at 13:05,  wrote:

This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.


And I think the simplest way is to replace the allocation mechanism


This is the best way not the simplest way.

The bitmap way matching our all requirement is not easy and realistic to 
address design/write/test/review in short time. And also the entire 
replacement would bring more potential risks to 4.6 release.


Thanks
Tiejun


as outlined rather than trying to make the current model cope with
the new requirement.

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On 07/15/2015 12:24 PM, Jan Beulich wrote:
 On 15.07.15 at 13:05,  wrote:
>> On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich  wrote:
>> On 15.07.15 at 10:59,  wrote:
 What about this?

 @@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start <<= 1;
   }

 +for ( i = 0; i < memory_map.nr_map ; i++ )
 +{
 +uint64_t reserved_start, reserved_size;
 +reserved_start = memory_map.map[i].addr;
 +reserved_size = memory_map.map[i].size;
 +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
 +   reserved_start, reserved_size) )
 +{
 +printf("Reserved device memory conflicts current PCI 
 memory.\n");
 +BUG();
 +}
 +}
>>>
>>> So what would the cure be if someone ran into this BUG() (other
>>> than removing the device associated with the conflicting RMRR)?
>>> Afaics such a guest would remain permanently unbootable, which
>>> of course is not an option.
>>
>> Is not booting worse than what we have now -- which is, booting
>> successfully but (probably) having issues due to MMIO ranges
>> overlapping RMRRs?
> 
> Again a matter of perspective: For devices (USB!) where the RMRR
> exists solely for boot time (or outdated OS) use, this would be a
> plain regression. For the graphics device Tiejun needs this for, it of
> course would make little difference, I agree.

So it's the case that, for some devices, not only is it functionally OK
to have RAM in the RMRR with no issues, it's also functionally OK to
have PCI BARs in the RMRR with no issues?

If so, then yes, I agree having a device fail to work with no ability to
work around it is an unacceptable regression.

If we're not targeting 4.6 for this, then the situation changes
somewhat.  One thing worth considering (which perhaps may be what tiejun
is looking at) is the cost of keeping a large number of
working-and-already-acked patches out of tree -- both the psychological
cost, the cost of rebasing, and the cost of re-reviewing rebases.  Given
how independent the hvmloader changes are to the rest of the series,
it's probably worth trying to see if we can check in the other patches
as soon as we branch.

If we can check in those patches with hvmloader still ignoring RMRRs,
then there's no problem.  But if we need the issue addressed *somehow*
in hvmloader before checking the rest of the series in, then I think it
makes sense to consider a minimal change that would make the series
somewhat functional, before moving on to overhauling the hvmloader MMIO
placement code.

 -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Maybe I  can move this chunk of codes downward those actual allocation
to check if RDM conflicts with the final allocation, and then just
disable those associated devices by writing PCI_COMMAND without BUG()
like this draft code,


And what would keep the guest from re-enabling the device?



We can't but IMO,

#1. We're already posting that warning messages.

#2. Actually this is like this sort of case like, as you known, even in 
the native platform, some broken devices are also disabled by BIOS, 
right? So I think this is OS's responsibility or risk to force enabling 
such a broken device.


#3. Its really rare possibility in real world.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
>>> On 15.07.15 at 13:05,  wrote:
> This patch series as a whole represents a lot of work and a lot of
> tangible improvements to the situation; and (unless the situation has
> changed) it's almost entirely acked apart from the MMIO placement
> part.  If there is a simple way that we can change hvmloader so that
> most (or even many) VM/device combinations work properly for the 4.6
> release, then I think it's worth considering.

And I think the simplest way is to replace the allocation mechanism
as outlined rather than trying to make the current model cope with
the new requirement.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
>>> On 15.07.15 at 12:34,  wrote:
 Yet more special casing code you want to add. I said no to this
 model, and unless you can address the issue _without_ adding
 a lot of special casing code, the answer will remain no (subject
>>>
>>> What about this?
>>>
>>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>>pci_mem_start <<= 1;
>>>}
>>>
>>> +for ( i = 0; i < memory_map.nr_map ; i++ )
>>> +{
>>> +uint64_t reserved_start, reserved_size;
>>> +reserved_start = memory_map.map[i].addr;
>>> +reserved_size = memory_map.map[i].size;
>>> +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>> +   reserved_start, reserved_size) )
>>> +{
>>> +printf("Reserved device memory conflicts current PCI 
>>> memory.\n");
>>> +BUG();
>>> +}
>>> +}
>>
>> So what would the cure be if someone ran into this BUG() (other
>> than removing the device associated with the conflicting RMRR)?
> 
> Maybe I  can move this chunk of codes downward those actual allocation 
> to check if RDM conflicts with the final allocation, and then just 
> disable those associated devices by writing PCI_COMMAND without BUG() 
> like this draft code,

And what would keep the guest from re-enabling the device?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
>>> On 15.07.15 at 13:05,  wrote:
> On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich  wrote:
> On 15.07.15 at 10:59,  wrote:
>>> What about this?
>>>
>>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>>   pci_mem_start <<= 1;
>>>   }
>>>
>>> +for ( i = 0; i < memory_map.nr_map ; i++ )
>>> +{
>>> +uint64_t reserved_start, reserved_size;
>>> +reserved_start = memory_map.map[i].addr;
>>> +reserved_size = memory_map.map[i].size;
>>> +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>> +   reserved_start, reserved_size) )
>>> +{
>>> +printf("Reserved device memory conflicts current PCI 
>>> memory.\n");
>>> +BUG();
>>> +}
>>> +}
>>
>> So what would the cure be if someone ran into this BUG() (other
>> than removing the device associated with the conflicting RMRR)?
>> Afaics such a guest would remain permanently unbootable, which
>> of course is not an option.
> 
> Is not booting worse than what we have now -- which is, booting
> successfully but (probably) having issues due to MMIO ranges
> overlapping RMRRs?

Again a matter of perspective: For devices (USB!) where the RMRR
exists solely for boot time (or outdated OS) use, this would be a
plain regression. For the graphics device Tiejun needs this for, it of
course would make little difference, I agree.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/15 19:05, George Dunlap wrote:

On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich  wrote:

On 15.07.15 at 10:59,  wrote:

On 2015/7/15 16:34, Jan Beulich wrote:

On 15.07.15 at 06:27,  wrote:

Furthermore, could we have this solution as follows?


Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject


What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start <<= 1;
   }

+for ( i = 0; i < memory_map.nr_map ; i++ )
+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+   reserved_start, reserved_size) )
+{
+printf("Reserved device memory conflicts current PCI memory.\n");
+BUG();
+}
+}


So what would the cure be if someone ran into this BUG() (other
than removing the device associated with the conflicting RMRR)?
Afaics such a guest would remain permanently unbootable, which
of course is not an option.


Is not booting worse than what we have now -- which is, booting
successfully but (probably) having issues due to MMIO ranges
overlapping RMRRs?


Its really so rare possibility here since in the real world we didn't 
see any RMRR regions >= 0xF000 (the default pci memory start.) And I 
already sent out a little better revision in that ensuing email so also 
please take a review if possible :)




This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.



Current MMIO allocation mechanism is not good. So we really need to 
reshape that, but we'd better do this with some further discussion in 
next release :)


Thanks
Tiejun



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich  wrote:
 On 15.07.15 at 10:59,  wrote:
>> On 2015/7/15 16:34, Jan Beulich wrote:
>> On 15.07.15 at 06:27,  wrote:
 Furthermore, could we have this solution as follows?
>>>
>>> Yet more special casing code you want to add. I said no to this
>>> model, and unless you can address the issue _without_ adding
>>> a lot of special casing code, the answer will remain no (subject
>>
>> What about this?
>>
>> @@ -301,6 +301,19 @@ void pci_setup(void)
>>   pci_mem_start <<= 1;
>>   }
>>
>> +for ( i = 0; i < memory_map.nr_map ; i++ )
>> +{
>> +uint64_t reserved_start, reserved_size;
>> +reserved_start = memory_map.map[i].addr;
>> +reserved_size = memory_map.map[i].size;
>> +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>> +   reserved_start, reserved_size) )
>> +{
>> +printf("Reserved device memory conflicts current PCI 
>> memory.\n");
>> +BUG();
>> +}
>> +}
>
> So what would the cure be if someone ran into this BUG() (other
> than removing the device associated with the conflicting RMRR)?
> Afaics such a guest would remain permanently unbootable, which
> of course is not an option.

Is not booting worse than what we have now -- which is, booting
successfully but (probably) having issues due to MMIO ranges
overlapping RMRRs?

This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject


What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start <<= 1;
   }

+for ( i = 0; i < memory_map.nr_map ; i++ )
+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+   reserved_start, reserved_size) )
+{
+printf("Reserved device memory conflicts current PCI memory.\n");
+BUG();
+}
+}


So what would the cure be if someone ran into this BUG() (other
than removing the device associated with the conflicting RMRR)?


Maybe I  can move this chunk of codes downward those actual allocation 
to check if RDM conflicts with the final allocation, and then just 
disable those associated devices by writing PCI_COMMAND without BUG() 
like this draft code,


/* If pci bars conflict with RDM we need to disable this pci device. */
for ( devfn = 0; devfn < 256; devfn++ )
{
bar_sz = pci_readl(devfn, bar_reg);
bar_data = pci_readl(devfn, bar_reg);
bar_data_upper = pci_readl(devfn, bar_reg + 4);
/* Until here we don't conflict high memory. */
if ( bar_data_upper )
continue;

for ( i = 0; i < memory_map.nr_map ; i++ )
{
uint64_t reserved_start, reserved_size;
reserved_start = memory_map.map[i].addr;
reserved_size = memory_map.map[i].size;
if ( check_overlap(bar_data & ~(bar_sz - 1), bar_sz,
   reserved_start, reserved_size) )
{
printf("Reserved device memory conflicts with this pci 
bar,"

   " so just disable this device.\n");
/* Now disable this device */
cmd = pci_readw(devfn, PCI_COMMAND);
pci_writew(devfn, PCI_COMMAND, ~cmd);
}
}
}

If this is still not fine to you, look I have to raise a request to 
co-maintainers since its hard to step next in practice.


Hi all guys, what about your idea?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
>>> On 15.07.15 at 10:59,  wrote:
> On 2015/7/15 16:34, Jan Beulich wrote:
> On 15.07.15 at 06:27,  wrote:
>>> Furthermore, could we have this solution as follows?
>>
>> Yet more special casing code you want to add. I said no to this
>> model, and unless you can address the issue _without_ adding
>> a lot of special casing code, the answer will remain no (subject
> 
> What about this?
> 
> @@ -301,6 +301,19 @@ void pci_setup(void)
>   pci_mem_start <<= 1;
>   }
> 
> +for ( i = 0; i < memory_map.nr_map ; i++ )
> +{
> +uint64_t reserved_start, reserved_size;
> +reserved_start = memory_map.map[i].addr;
> +reserved_size = memory_map.map[i].size;
> +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
> +   reserved_start, reserved_size) )
> +{
> +printf("Reserved device memory conflicts current PCI memory.\n");
> +BUG();
> +}
> +}

So what would the cure be if someone ran into this BUG() (other
than removing the device associated with the conflicting RMRR)?
Afaics such a guest would remain permanently unbootable, which
of course is not an option.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

This is very similar to our current policy to
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6
since actually this is also another rare possibility in real world. Even
I can do this as well when we handle that conflict with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.


Sorry, here is one typo, s/#6/#5

Thanks
Tiejun



Note its not necessary to concern high memory since we already handle
this case in the hv code previously, and its also not affected by those
relocated memory later since our previous policy can make sure RAM isn't
overlapping with RDM.

Thanks
Tiejun


to co-maintainers overriding me).

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Certainly appreciate your time.

I didn't mean its wasting time at this point. I just want to express
that its hard to implement that solution in one or two weeks to walking
into 4.6 as an exception.

Note I know this feature is still not accepted as an exception to 4.6
right now so I'm making an assumption.


After all this is a bug fix (and would have been allowed into 4.5 had
it been ready in time), so doesn't necessarily need a freeze
exception (but of course the bar raises the later it gets). Rather


Yes, this is not a bug fix again into 4.6.


than rushing in something that's cumbersome to maintain, I'd much
prefer this to be done properly.



Indeed, we'd like to finalize this properly as you said. But apparently 
time is not sufficient to allow this happened. So I just suggest we can 
further seek the best solution in next phase.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/15 16:34, Jan Beulich wrote:

On 15.07.15 at 06:27,  wrote:

Furthermore, could we have this solution as follows?


Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject


What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
 pci_mem_start <<= 1;
 }

+for ( i = 0; i < memory_map.nr_map ; i++ )
+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+   reserved_start, reserved_size) )
+{
+printf("Reserved device memory conflicts current PCI 
memory.\n");

+BUG();
+}
+}
+
 if ( mmio_total > (pci_mem_end - pci_mem_start) )
 {
 printf("Low MMIO hole not large enough for all devices,"

This is very similar to our current policy to 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 
since actually this is also another rare possibility in real world. Even 
I can do this as well when we handle that conflict with 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.


Note its not necessary to concern high memory since we already handle 
this case in the hv code previously, and its also not affected by those 
relocated memory later since our previous policy can make sure RAM isn't 
overlapping with RDM.


Thanks
Tiejun


to co-maintainers overriding me).

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
>>> On 15.07.15 at 06:27,  wrote:
> Furthermore, could we have this solution as follows?

Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject
to co-maintainers overriding me).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
>>> On 15.07.15 at 02:55,  wrote:
>> > I agree we'd better overhaul this since we already found something
>>> unreasonable here. But one or two weeks is really not enough to fix this
>>> with a bitmap framework, and although a bitmap can make mmio allocation
>>> better, but more complicated if we just want to allocate PCI mmio.
>>>
>>> So could we do this next? I just feel if you'd like to pay more time
>>> help me refine our current solution, its relatively realistic to this
>>> case :) And then we can go into bitmap in details or work out a better
>>> solution in sufficient time slot.
>>
>> Looking at how long it took to get here (wasn't this series originally
>> even meant to go into 4.5?) and how much time I already spent
> 
> Certainly appreciate your time.
> 
> I didn't mean its wasting time at this point. I just want to express 
> that its hard to implement that solution in one or two weeks to walking 
> into 4.6 as an exception.
> 
> Note I know this feature is still not accepted as an exception to 4.6 
> right now so I'm making an assumption.

After all this is a bug fix (and would have been allowed into 4.5 had
it been ready in time), so doesn't necessarily need a freeze
exception (but of course the bar raises the later it gets). Rather
than rushing in something that's cumbersome to maintain, I'd much
prefer this to be done properly.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Chen, Tiejun

On 2015/7/15 8:55, Chen, Tiejun wrote:

I agree we'd better overhaul this since we already found
something unreasonable here. But one or two weeks is really not
enough to fix this with a bitmap framework, and although a bitmap
can make mmio allocation better, but more complicated if we just
want to allocate PCI mmio.

So could we do this next? I just feel if you'd like to pay more
time help me refine our current solution, its relatively
realistic to this case :) And then we can go into bitmap in
details or work out a better solution in sufficient time slot.


Looking at how long it took to get here (wasn't this series
originally even meant to go into 4.5?) and how much time I already
spent


Certainly appreciate your time.

I didn't mean its wasting time at this point. I just want to express
 that its hard to implement that solution in one or two weeks to
walking into 4.6 as an exception.

Note I know this feature is still not accepted as an exception to 4.6
 right now so I'm making an assumption.


reviewing all the previous versions, I don't see a point in
wasting even more time on working out details of an approach that's
getting too complex/ugly already anyway.


Here I'm trying to seek such a kind of two-steps approach if
possible.



Furthermore, could we have this solution as follows?

@@ -407,8 +408,29 @@ void pci_setup(void)
 }

 base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+ reallocate_bar:
 bar_data |= (uint32_t)base;
 bar_data_upper = (uint32_t)(base >> 32);
+/*
+ * We need to skip those reserved regions like RMRR but this
pobably
+ * lead the remaing allocation failed.
+ */
+for ( j = 0; j < memory_map.nr_map ; j++ )
+{
+if ( memory_map.map[j].type != E820_RAM )
+{
+reserved_end = memory_map.map[j].addr +
memory_map.map[j].size;
+if ( check_overlap(base, bar_sz,
+   memory_map.map[j].addr,
+   memory_map.map[j].size) )
+{
+if ( !mmio_conflict )
+mmio_conflict = true;
+base = (reserved_end + bar_sz - 1) &
~(uint64_t)(bar_sz - 1);
+goto reallocate_bar;
+}
+}
+}
 base += bar_sz;

 if ( (base < resource->base) || (base > resource->max) )
@@ -416,6 +438,11 @@ void pci_setup(void)
 printf("pci dev %02x:%x bar %02x size "PRIllx": no space for "
"resource!\n", devfn>>3, devfn&7, bar_reg,
PRIllx_arg(bar_sz));
+if ( mmio_conflict )
+{
+printf("MMIO conflicts with RDM.\n");
+BUG();
+}
 continue;
 }

Here we still check RDMs and skip them if any conflicts exist, but at
last, if 1) no sufficient space to allocate all bars to this VM && 2)
this insufficient situation is triggered by RDM, we stop creating this VM.

This may change slightly current mechanism and also guarantee we don't 
ignore any conflicts. But I admit this is not a good solution because 
that alignment issue still exists but I think it may be accepted if all 
PCI devices specific to this VM can work. And in real world most RDMs 
just fall into somewhere but < 0xF000 (current default pci memory 
start address), so its not a common case to see this kind of conflict.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Chen, Tiejun

I agree we'd better overhaul this since we already found something
unreasonable here. But one or two weeks is really not enough to fix this
with a bitmap framework, and although a bitmap can make mmio allocation
better, but more complicated if we just want to allocate PCI mmio.

So could we do this next? I just feel if you'd like to pay more time
help me refine our current solution, its relatively realistic to this
case :) And then we can go into bitmap in details or work out a better
solution in sufficient time slot.


Looking at how long it took to get here (wasn't this series originally
even meant to go into 4.5?) and how much time I already spent


Certainly appreciate your time.

I didn't mean its wasting time at this point. I just want to express 
that its hard to implement that solution in one or two weeks to walking 
into 4.6 as an exception.


Note I know this feature is still not accepted as an exception to 4.6 
right now so I'm making an assumption.



reviewing all the previous versions, I don't see a point in wasting
even more time on working out details of an approach that's getting
too complex/ugly already anyway.


Here I'm trying to seek such a kind of two-steps approach if possible.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Jan Beulich
>>> On 14.07.15 at 12:54,  wrote:
>>> I think bitmap mechanism is a good idea but honestly, its not easy to
>>> cover all requirements here. And just like bootmem on Linux side, so its
>>> a little complicated to implement this entirely. So I prefer not to
>>> introduce this way in current phase.
>>
>> I'm afraid it's going to be hard to convince me of any approaches
>> further complicating the current mechanism instead of overhauling
>> it.
> 
> I agree we'd better overhaul this since we already found something 
> unreasonable here. But one or two weeks is really not enough to fix this 
> with a bitmap framework, and although a bitmap can make mmio allocation 
> better, but more complicated if we just want to allocate PCI mmio.
> 
> So could we do this next? I just feel if you'd like to pay more time 
> help me refine our current solution, its relatively realistic to this 
> case :) And then we can go into bitmap in details or work out a better 
> solution in sufficient time slot.

Looking at how long it took to get here (wasn't this series originally
even meant to go into 4.5?) and how much time I already spent
reviewing all the previous versions, I don't see a point in wasting
even more time on working out details of an approach that's getting
too complex/ugly already anyway.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Chen, Tiejun


Note here I don't address your comments above since I think we should 
achieve an agreement firstly.



I think bitmap mechanism is a good idea but honestly, its not easy to
cover all requirements here. And just like bootmem on Linux side, so its
a little complicated to implement this entirely. So I prefer not to
introduce this way in current phase.


I'm afraid it's going to be hard to convince me of any approaches
further complicating the current mechanism instead of overhauling
it.


I agree we'd better overhaul this since we already found something 
unreasonable here. But one or two weeks is really not enough to fix this 
with a bitmap framework, and although a bitmap can make mmio allocation 
better, but more complicated if we just want to allocate PCI mmio.


So could we do this next? I just feel if you'd like to pay more time 
help me refine our current solution, its relatively realistic to this 
case :) And then we can go into bitmap in details or work out a better 
solution in sufficient time slot.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Jan Beulich
>>> On 14.07.15 at 08:39,  wrote:
>> > -} *resource, mem_resource, high_mem_resource, io_resource;
>>> +} *resource, mem_resource, high_mem_resource, io_resource, 
> exp_mem_resource;
>>
>> Despite having gone through description and the rest of the patch I
>> can't seem to be able to guess what "exp_mem" stands for.
>> Meaningful variable names are quite helpful though, often avoiding
>> the need for comments.
> 
> exp_mem_resource() is the expanded mem_resource in the case of 
> populating RAM.
> 
> Maybe I should use the whole word, expand_mem_resource.

And what does "expand" here mean then?

>>> @@ -309,29 +339,31 @@ void pci_setup(void)
>>>   }
>>>
>>>   /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
>>> +cur_pci_mem_start = pci_mem_start;
>>>   while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
>>> +relocate_ram_for_pci_memory(cur_pci_mem_start);
>>
>> Please be consistent which variable to want to use in the loop
>> (pci_mem_start vs cur_pci_mem_start).
> 
> Overall I just call relocate_ram_for_pci_memory() twice and each I 
> always pass cur_pci_mem_start. Any inconsistent place?

In the quoted code you use pci_mem_start in the while()
condition and cur_pci_mem_start in that same while()'s body.

>> Also, this being the first substantial change to the function makes
>> clear that you _still_ leave the sizing loop untouched, and instead
>> make the allocation logic below more complicated. I said before a
> 
> But this may be more reasonable than it used to do. In my point of view 
> we always need to first allocate 32bit mmio and then allocate 64bit mmio 
> since as you said we don't want to expand high memory if possible.
> 
>> number of times that I don't think this helps maintainability of this
>> already convoluted code. Among other things this manifests itself
>> in your second call to relocate_ram_for_pci_memory() in no way
>> playing by the constraints explained a few lines up from here in an
>> extensive comment.
> 
> Can't all variables/comments express what I intend to do here? Except 
> for that exp_mem_resource.

I'm not talking about a lack of comments, but about your added
use of the function not being in line with what is being explained
in an earlier (pre-existing) comment.

>> Therefore I'll not make any further comments on the rest of the
>> patch, but instead outline an allocation model that I think would
>> fit our needs: Subject to the constraints mentioned above, set up
>> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
>> bits], i.e. reasonably small a memory block). Each bit represents a
>> page usable for MMIO: First of all you remove the range from
>> PCI_MEM_END upwards. Then remove all RDM pages. Now do a
>> first pass over all devices, allocating (in the bitmap) space for only
>> the 32-bit MMIO BARs, starting with the biggest one(s), by finding
>> a best fit (i.e. preferably a range not usable by any bigger BAR)
>> from top down. For example, if you have available
>>
>> [f000,f800)
>> [f900,f9001000)
>> [fa00,fa003000)
>> [fa01,fa012000)
>>
>> and you're looking for a single page slot, you should end up
>> picking fa002000.
> 
> Why is this [f900,f9001000]? Just one page in this slot.

This was just a simple example I gave. Or maybe I don't understand
your question...

>> After this pass you should be able to do RAM relocation in a
>> single attempt just like we do today (you may still grow the MMIO
>> window if you know you need to and can fit some of the 64-bit
>> BARs in there, subject to said constraints; this is in an attempt
>> to help OSes not comfortable with 64-bit resources).
>>
>> In a 2nd pass you'd then assign 64-bit resources: If you can fit
>> them below 4G (you still have the bitmap left of what you've got
>> available), put them there. Allocation strategy could be the same
> 
> I think basically, your logic is similar to what I did as I described in 
> changelog,

The goal is the same, but the approaches look quite different to
me. In particular my approach avoids calculating mmio_total up
front, then basing RAM relocation on it, only to find subsequently
that more RAM may need to be relocated.

> I think bitmap mechanism is a good idea but honestly, its not easy to 
> cover all requirements here. And just like bootmem on Linux side, so its 
> a little complicated to implement this entirely. So I prefer not to 
> introduce this way in current phase.

I'm afraid it's going to be hard to convince me of any approaches
further complicating the current mechanism instead of overhauling
it.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-13 Thread Chen, Tiejun

-} *resource, mem_resource, high_mem_resource, io_resource;
+} *resource, mem_resource, high_mem_resource, io_resource, 
exp_mem_resource;


Despite having gone through description and the rest of the patch I
can't seem to be able to guess what "exp_mem" stands for.
Meaningful variable names are quite helpful though, often avoiding
the need for comments.


exp_mem_resource() is the expanded mem_resource in the case of 
populating RAM.


Maybe I should use the whole word, expand_mem_resource.




  /* Create a list of device BARs in descending order of size. */


[snip]


@@ -309,29 +339,31 @@ void pci_setup(void)
  }

  /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
+cur_pci_mem_start = pci_mem_start;
  while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
+relocate_ram_for_pci_memory(cur_pci_mem_start);


Please be consistent which variable to want to use in the loop
(pci_mem_start vs cur_pci_mem_start).


Overall I just call relocate_ram_for_pci_memory() twice and each I 
always pass cur_pci_mem_start. Any inconsistent place?




Also, this being the first substantial change to the function makes
clear that you _still_ leave the sizing loop untouched, and instead
make the allocation logic below more complicated. I said before a


But this may be more reasonable than it used to do. In my point of view 
we always need to first allocate 32bit mmio and then allocate 64bit mmio 
since as you said we don't want to expand high memory if possible.



number of times that I don't think this helps maintainability of this
already convoluted code. Among other things this manifests itself
in your second call to relocate_ram_for_pci_memory() in no way
playing by the constraints explained a few lines up from here in an
extensive comment.


Can't all variables/comments express what I intend to do here? Except 
for that exp_mem_resource.
  /* 

 * We have to populate more RAM to further allocate 

 * the remaining 32bars. 

 */ 

if ( mmio32_unallocated_total ) 

{ 

cur_pci_mem_start = pci_mem_start - 
mmio32_unallocated_total;
relocate_ram_for_pci_memory(cur_pci_mem_start); 

exp_mem_resource.base = cur_pci_mem_start; 

exp_mem_resource.max = pci_mem_start; 


}



Therefore I'll not make any further comments on the rest of the
patch, but instead outline an allocation model that I think would
fit our needs: Subject to the constraints mentioned above, set up
a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
bits], i.e. reasonably small a memory block). Each bit represents a
page usable for MMIO: First of all you remove the range from
PCI_MEM_END upwards. Then remove all RDM pages. Now do a
first pass over all devices, allocating (in the bitmap) space for only
the 32-bit MMIO BARs, starting with the biggest one(s), by finding
a best fit (i.e. preferably a range not usable by any bigger BAR)
from top down. For example, if you have available

[f000,f800)
[f900,f9001000)
[fa00,fa003000)
[fa01,fa012000)

and you're looking for a single page slot, you should end up
picking fa002000.


Why is this [f900,f9001000]? Just one page in this slot.



After this pass you should be able to do RAM relocation in a
single attempt just like we do today (you may still grow the MMIO
window if you know you need to and can fit some of the 64-bit
BARs in there, subject to said constraints; this is in an attempt
to help OSes not comfortable with 64-bit resources).

In a 2nd pass you'd then assign 64-bit resources: If you can fit
them below 4G (you still have the bitmap left of what you've got
available), put them there. Allocation strategy could be the same


I think basically, your logic is similar to what I did as I described in 
changelog,


  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 
64bit-bar

  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to 
allocate the
  remaining 32bit-bars, then populate sufficient RAM as 
exp_mem_resource to go

  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to 
the third

  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. 
If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. 
This process

  should be same as the original.


as above (biggest first), perhaps allowing for some factoring out
of logic, but here smallest first probably could work equally well.
The main thought to decide between the two is whether it is
better to fit as many (small) or as big (in total) as possible a set
under 4G. 

Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-13 Thread Jan Beulich
>>> On 09.07.15 at 07:33,  wrote:
> @@ -50,17 +75,22 @@ void pci_setup(void)
>  /* Resources assignable to PCI devices via BARs. */
>  struct resource {
>  uint64_t base, max;
> -} *resource, mem_resource, high_mem_resource, io_resource;
> +} *resource, mem_resource, high_mem_resource, io_resource, 
> exp_mem_resource;

Despite having gone through description and the rest of the patch I
can't seem to be able to guess what "exp_mem" stands for.
Meaningful variable names are quite helpful though, often avoiding
the need for comments.

>  /* Create a list of device BARs in descending order of size. */
>  struct bars {
> -uint32_t is_64bar;
> +#define PCI_BAR_IS_64BIT0x1
> +#define PCI_BAR_IS_ALLOCATED0x2
> +uint32_t flag;

flags (you already have two)

>  uint32_t devfn;
>  uint32_t bar_reg;
>  uint64_t bar_sz;
>  } *bars = (struct bars *)scratch_start;
> -unsigned int i, nr_bars = 0;
> -uint64_t mmio_hole_size = 0;
> +unsigned int i, j, n, nr_bars = 0;
> +uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
> +bool bar32_allocating = 0;
> +uint64_t mmio32_unallocated_total = 0;
> +unsigned long cur_pci_mem_start = 0;
>  
>  const char *s;
>  /*
> @@ -222,7 +252,7 @@ void pci_setup(void)
>  if ( i != nr_bars )
>  memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
>  
> -bars[i].is_64bar = is_64bar;
> +bars[i].flag = is_64bar ? PCI_BAR_IS_64BIT : 0;
>  bars[i].devfn   = devfn;
>  bars[i].bar_reg = bar_reg;
>  bars[i].bar_sz  = bar_sz;
> @@ -309,29 +339,31 @@ void pci_setup(void)
>  }
>  
>  /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
> +cur_pci_mem_start = pci_mem_start;
>  while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
> +relocate_ram_for_pci_memory(cur_pci_mem_start);

Please be consistent which variable to want to use in the loop
(pci_mem_start vs cur_pci_mem_start).

Also, this being the first substantial change to the function makes
clear that you _still_ leave the sizing loop untouched, and instead
make the allocation logic below more complicated. I said before a
number of times that I don't think this helps maintainability of this
already convoluted code. Among other things this manifests itself
in your second call to relocate_ram_for_pci_memory() in no way
playing by the constraints explained a few lines up from here in an
extensive comment.

Therefore I'll not make any further comments on the rest of the
patch, but instead outline an allocation model that I think would
fit our needs: Subject to the constraints mentioned above, set up
a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
bits], i.e. reasonably small a memory block). Each bit represents a
page usable for MMIO: First of all you remove the range from
PCI_MEM_END upwards. Then remove all RDM pages. Now do a
first pass over all devices, allocating (in the bitmap) space for only
the 32-bit MMIO BARs, starting with the biggest one(s), by finding
a best fit (i.e. preferably a range not usable by any bigger BAR)
from top down. For example, if you have available

[f000,f800)
[f900,f9001000)
[fa00,fa003000)
[fa01,fa012000)

and you're looking for a single page slot, you should end up
picking fa002000.

After this pass you should be able to do RAM relocation in a
single attempt just like we do today (you may still grow the MMIO
window if you know you need to and can fit some of the 64-bit
BARs in there, subject to said constraints; this is in an attempt
to help OSes not comfortable with 64-bit resources).

In a 2nd pass you'd then assign 64-bit resources: If you can fit
them below 4G (you still have the bitmap left of what you've got
available), put them there. Allocation strategy could be the same
as above (biggest first), perhaps allowing for some factoring out
of logic, but here smallest first probably could work equally well.
The main thought to decide between the two is whether it is
better to fit as many (small) or as big (in total) as possible a set
under 4G. I'd generally expect the former (as many as possible,
leaving only a few huge ones to go above 4G) to be the better
approach, but that's more a gut feeling than based on hard data.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel