Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/16 17:40, George Dunlap wrote: On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun wrote: Could you take a look at the original patch #06 ? Although Jan thought that is complicated, that is really one version that I can refine in current time slot. When you say "original", which version are you talking about? You mean the one at the base of this thread (v7)? Yes, I'm pointing patch #6 in v7. And sorry to make this confused to you. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun wrote: > Could you take a look at the original patch #06 ? Although Jan thought that > is complicated, that is really one version that I can refine in current time > slot. When you say "original", which version are you talking about? You mean the one at the base of this thread (v7)? -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/16 0:14, George Dunlap wrote: On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap wrote: Would it be possible, on a collision, to have one last "stab" at allocating the BAR somewhere else, without relocating memory (or relocating any other BARs)? At very least then an administrator could work around this kind of thing by setting the mmio_hole larger in the domain config. If it's not possible to have this last-ditch relocation effort, then Could you take a look at the original patch #06 ? Although Jan thought that is complicated, that is really one version that I can refine in current time slot. yes, I'd be OK with just disabling the device for the time being. Just let me send out new patch series based this idea. We can continue discuss this over there but we also need to further review other remaining comments based on a new revision. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap wrote: > Would it be possible, on a collision, to have one last "stab" at > allocating the BAR somewhere else, without relocating memory (or > relocating any other BARs)? At very least then an administrator could > work around this kind of thing by setting the mmio_hole larger in the > domain config. If it's not possible to have this last-ditch relocation effort, then yes, I'd be OK with just disabling the device for the time being. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On Wed, Jul 15, 2015 at 3:00 PM, Jan Beulich wrote: On 15.07.15 at 15:40, wrote: >> On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich wrote: >>> Therefore I'll not make any further comments on the rest of the >>> patch, but instead outline an allocation model that I think would >>> fit our needs: Subject to the constraints mentioned above, set up >>> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19 >>> bits], i.e. reasonably small a memory block). Each bit represents a >>> page usable for MMIO: First of all you remove the range from >>> PCI_MEM_END upwards. Then remove all RDM pages. Now do a >>> first pass over all devices, allocating (in the bitmap) space for only >>> the 32-bit MMIO BARs, starting with the biggest one(s), by finding >>> a best fit (i.e. preferably a range not usable by any bigger BAR) >>> from top down. For example, if you have available >>> >>> [f000,f800) >>> [f900,f9001000) >>> [fa00,fa003000) >>> [fa01,fa012000) >>> >>> and you're looking for a single page slot, you should end up >>> picking fa002000. >>> >>> After this pass you should be able to do RAM relocation in a >>> single attempt just like we do today (you may still grow the MMIO >>> window if you know you need to and can fit some of the 64-bit >>> BARs in there, subject to said constraints; this is in an attempt >>> to help OSes not comfortable with 64-bit resources). >>> >>> In a 2nd pass you'd then assign 64-bit resources: If you can fit >>> them below 4G (you still have the bitmap left of what you've got >>> available), put them there. Allocation strategy could be the same >>> as above (biggest first), perhaps allowing for some factoring out >>> of logic, but here smallest first probably could work equally well. >>> The main thought to decide between the two is whether it is >>> better to fit as many (small) or as big (in total) as possible a set >>> under 4G. I'd generally expect the former (as many as possible, >>> leaving only a few huge ones to go above 4G) to be the better >>> approach, but that's more a gut feeling than based on hard data. >> >> I agree that it would be more sensible for hvmloader to make a "plan" >> first, and then do the memory reallocation (if it's possible) at one >> time, then go through and actually update the device BARs according to >> the "plan". >> >> However, I don't really see how having a bitmap really helps in this >> case. I would think having a list of free ranges (perhaps aligned by >> powers of two?), sorted small->large, makes the most sense. > > I view bitmap vs list as just two different representations, and I > picked the bitmap approach as being more compact storage wise > in case there are many regions to deal with. I'd be fine with a list > approach too, provided lookup times don't become prohibitive. Sure, you can obviously translate one into the other. The main reason I dislike the idea of a bitmap is having to write code to determine where the next free region is, and how big that region is, rather than just going down the next on the list and reading range.start and range.len. Also, in your suggestion each bit is a page (4k); so assuming a 64-bit pointer, a 64-bit starting point, and a 64-bit length (juts to make things simple), a single "range" takes up enough bitmap to reserve (64+64+64)*4k = 768k. So if we make the bitmap big enough for 2GiB, then the break-even point for storage is 2,730 ranges. It's even higher if we have an array instead of a linked list. I'm pretty sure that having such a large number of ranges will be vanishingly rare; I'd expect the number of ranges so the "range" representation will not only be easier to code and read, but will in the common case (I believe) be far more compact. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 15.07.15 at 15:40, wrote: > On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich wrote: >> Therefore I'll not make any further comments on the rest of the >> patch, but instead outline an allocation model that I think would >> fit our needs: Subject to the constraints mentioned above, set up >> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19 >> bits], i.e. reasonably small a memory block). Each bit represents a >> page usable for MMIO: First of all you remove the range from >> PCI_MEM_END upwards. Then remove all RDM pages. Now do a >> first pass over all devices, allocating (in the bitmap) space for only >> the 32-bit MMIO BARs, starting with the biggest one(s), by finding >> a best fit (i.e. preferably a range not usable by any bigger BAR) >> from top down. For example, if you have available >> >> [f000,f800) >> [f900,f9001000) >> [fa00,fa003000) >> [fa01,fa012000) >> >> and you're looking for a single page slot, you should end up >> picking fa002000. >> >> After this pass you should be able to do RAM relocation in a >> single attempt just like we do today (you may still grow the MMIO >> window if you know you need to and can fit some of the 64-bit >> BARs in there, subject to said constraints; this is in an attempt >> to help OSes not comfortable with 64-bit resources). >> >> In a 2nd pass you'd then assign 64-bit resources: If you can fit >> them below 4G (you still have the bitmap left of what you've got >> available), put them there. Allocation strategy could be the same >> as above (biggest first), perhaps allowing for some factoring out >> of logic, but here smallest first probably could work equally well. >> The main thought to decide between the two is whether it is >> better to fit as many (small) or as big (in total) as possible a set >> under 4G. I'd generally expect the former (as many as possible, >> leaving only a few huge ones to go above 4G) to be the better >> approach, but that's more a gut feeling than based on hard data. > > I agree that it would be more sensible for hvmloader to make a "plan" > first, and then do the memory reallocation (if it's possible) at one > time, then go through and actually update the device BARs according to > the "plan". > > However, I don't really see how having a bitmap really helps in this > case. I would think having a list of free ranges (perhaps aligned by > powers of two?), sorted small->large, makes the most sense. I view bitmap vs list as just two different representations, and I picked the bitmap approach as being more compact storage wise in case there are many regions to deal with. I'd be fine with a list approach too, provided lookup times don't become prohibitive. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On Wed, Jul 15, 2015 at 12:34 PM, Chen, Tiejun wrote: >>> Maybe I can move this chunk of codes downward those actual allocation >>> to check if RDM conflicts with the final allocation, and then just >>> disable those associated devices by writing PCI_COMMAND without BUG() >>> like this draft code, >> >> >> And what would keep the guest from re-enabling the device? >> > > We can't but IMO, > > #1. We're already posting that warning messages. > > #2. Actually this is like this sort of case like, as you known, even in the > native platform, some broken devices are also disabled by BIOS, right? So I > think this is OS's responsibility or risk to force enabling such a broken > device. > > #3. Its really rare possibility in real world. Well the real question is if the guest re-enabling the device would cause a security problem; I think the answer to that must be "no" (or at least, "it's not any worse than it was before"). The guest OS has the device disabled, and the RMRRs in the e820 map; if it re-enables the device over the "BIOS" (which hvmloader sort of acts as), then it should know what it's doing. Would it be possible, on a collision, to have one last "stab" at allocating the BAR somewhere else, without relocating memory (or relocating any other BARs)? At very least then an administrator could work around this kind of thing by setting the mmio_hole larger in the domain config. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich wrote: > Therefore I'll not make any further comments on the rest of the > patch, but instead outline an allocation model that I think would > fit our needs: Subject to the constraints mentioned above, set up > a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19 > bits], i.e. reasonably small a memory block). Each bit represents a > page usable for MMIO: First of all you remove the range from > PCI_MEM_END upwards. Then remove all RDM pages. Now do a > first pass over all devices, allocating (in the bitmap) space for only > the 32-bit MMIO BARs, starting with the biggest one(s), by finding > a best fit (i.e. preferably a range not usable by any bigger BAR) > from top down. For example, if you have available > > [f000,f800) > [f900,f9001000) > [fa00,fa003000) > [fa01,fa012000) > > and you're looking for a single page slot, you should end up > picking fa002000. > > After this pass you should be able to do RAM relocation in a > single attempt just like we do today (you may still grow the MMIO > window if you know you need to and can fit some of the 64-bit > BARs in there, subject to said constraints; this is in an attempt > to help OSes not comfortable with 64-bit resources). > > In a 2nd pass you'd then assign 64-bit resources: If you can fit > them below 4G (you still have the bitmap left of what you've got > available), put them there. Allocation strategy could be the same > as above (biggest first), perhaps allowing for some factoring out > of logic, but here smallest first probably could work equally well. > The main thought to decide between the two is whether it is > better to fit as many (small) or as big (in total) as possible a set > under 4G. I'd generally expect the former (as many as possible, > leaving only a few huge ones to go above 4G) to be the better > approach, but that's more a gut feeling than based on hard data. I agree that it would be more sensible for hvmloader to make a "plan" first, and then do the memory reallocation (if it's possible) at one time, then go through and actually update the device BARs according to the "plan". However, I don't really see how having a bitmap really helps in this case. I would think having a list of free ranges (perhaps aligned by powers of two?), sorted small->large, makes the most sense. So suppose we had the above example, but with the range [fa00,fa005000) instead, and we're looking for a 4-page region. Then our "free list" initially would look like this: [f900,f9001000) [fa01,fa012000) [fa00,fa005000) [f000,f800) After skipping the first two because they aren't big enough, we'd take 0x4000 from the third one, placing the BAR at [fa00,fa004000), and putting the remainder [fa004000,fa005000) back on the free list in order, thus: [f900,f9001000) [fa004000,fa005000) [fa01,fa012000) [f000,f800) If we got to the end and hadn't found a region large enough, *and* we could still expand the MMIO hole, we could lower pci_mem_start until it could fit. What do you think? -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
Is not booting worse than what we have now -- which is, booting successfully but (probably) having issues due to MMIO ranges overlapping RMRRs? Its really so rare possibility here since in the real world we didn't see any RMRR regions >= 0xF000 (the default pci memory start.) And I already sent out a little better revision in that ensuing email so also please take a review if possible :) Do remember the context we're talking about. :-) Jan said, *if* there was a device that had an RMRR conflict with the "default" MMIO placement, then the guest simply wouldn't boot. I was saying, in that I understand what you guys mean. Yes, "BUG" is not good in our case :) case, we move from "silently ignore the conflict, possibly making the device not work" to "guest refuses to boot". Which, if it was guaranteed that a conflict would cause the device to no longer work, would be an improvement. This is really what I did this with "a little better revision" as I mentioned above. Maybe you're missing this, so I'd like to paste that below ( but if you already saw this please ignore this below ) /* If pci bars conflict with RDM we need to disable this pci device. */ for ( devfn = 0; devfn < 256; devfn++ ) { bar_sz = pci_readl(devfn, bar_reg); bar_data = pci_readl(devfn, bar_reg); bar_data_upper = pci_readl(devfn, bar_reg + 4); /* Until here we don't conflict high memory. */ if ( bar_data_upper ) continue; for ( i = 0; i < memory_map.nr_map ; i++ ) { uint64_t reserved_start, reserved_size; reserved_start = memory_map.map[i].addr; reserved_size = memory_map.map[i].size; if ( check_overlap(bar_data & ~(bar_sz - 1), bar_sz, reserved_start, reserved_size) ) { printf("Reserved device memory conflicts with this pci bar," " so just disable this device.\n"); /* Now disable this device */ cmd = pci_readw(devfn, PCI_COMMAND); pci_writew(devfn, PCI_COMMAND, ~cmd); } } } Here I don't break that original allocation mechanism. Instead, I just check if we really have that conflict case and then disable that associated device. This patch series as a whole represents a lot of work and a lot of tangible improvements to the situation; and (unless the situation has changed) it's almost entirely acked apart from the MMIO placement part. If there is a simple way that we can change hvmloader so that most (or even many) VM/device combinations work properly for the 4.6 release, then I think it's worth considering. Current MMIO allocation mechanism is not good. So we really need to reshape that, but we'd better do this with some further discussion in next release :) Absolutely; I was saying, if we can put in a "good enough" measure for this release, then we can get the rest of the patch series in with our "good enough" hvmloader fix, and then work on fixing it properly next release. But if you're not aiming for this release anymore, then our aims are something different. (See my other e-mail.) I really still try to strive for 4.6 release if possible :) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On Wed, Jul 15, 2015 at 09:32:34AM +0100, Jan Beulich wrote: > >>> On 15.07.15 at 02:55, wrote: > >> > I agree we'd better overhaul this since we already found something > >>> unreasonable here. But one or two weeks is really not enough to fix this > >>> with a bitmap framework, and although a bitmap can make mmio allocation > >>> better, but more complicated if we just want to allocate PCI mmio. > >>> > >>> So could we do this next? I just feel if you'd like to pay more time > >>> help me refine our current solution, its relatively realistic to this > >>> case :) And then we can go into bitmap in details or work out a better > >>> solution in sufficient time slot. > >> > >> Looking at how long it took to get here (wasn't this series originally > >> even meant to go into 4.5?) and how much time I already spent > > > > Certainly appreciate your time. > > > > I didn't mean its wasting time at this point. I just want to express > > that its hard to implement that solution in one or two weeks to walking > > into 4.6 as an exception. > > > > Note I know this feature is still not accepted as an exception to 4.6 > > right now so I'm making an assumption. > > After all this is a bug fix (and would have been allowed into 4.5 had > it been ready in time), so doesn't necessarily need a freeze > exception (but of course the bar raises the later it gets). Rather > than rushing in something that's cumbersome to maintain, I'd much > prefer this to be done properly. > This series is twofold. I consider the tools side change RDM (not limited to RMRR) a new feature. It introduces a new feature to fix a bug. It would still be subject to freeze exception from my point of view. Wei. > Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 07/15/2015 12:20 PM, Chen, Tiejun wrote: > On 2015/7/15 19:05, George Dunlap wrote: >> On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich wrote: >> On 15.07.15 at 10:59, wrote: On 2015/7/15 16:34, Jan Beulich wrote: On 15.07.15 at 06:27, wrote: >> Furthermore, could we have this solution as follows? > > Yet more special casing code you want to add. I said no to this > model, and unless you can address the issue _without_ adding > a lot of special casing code, the answer will remain no (subject What about this? @@ -301,6 +301,19 @@ void pci_setup(void) pci_mem_start <<= 1; } +for ( i = 0; i < memory_map.nr_map ; i++ ) +{ +uint64_t reserved_start, reserved_size; +reserved_start = memory_map.map[i].addr; +reserved_size = memory_map.map[i].size; +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, + reserved_start, reserved_size) ) +{ +printf("Reserved device memory conflicts current PCI memory.\n"); +BUG(); +} +} >>> >>> So what would the cure be if someone ran into this BUG() (other >>> than removing the device associated with the conflicting RMRR)? >>> Afaics such a guest would remain permanently unbootable, which >>> of course is not an option. >> >> Is not booting worse than what we have now -- which is, booting >> successfully but (probably) having issues due to MMIO ranges >> overlapping RMRRs? > > Its really so rare possibility here since in the real world we didn't > see any RMRR regions >= 0xF000 (the default pci memory start.) And I > already sent out a little better revision in that ensuing email so also > please take a review if possible :) Do remember the context we're talking about. :-) Jan said, *if* there was a device that had an RMRR conflict with the "default" MMIO placement, then the guest simply wouldn't boot. I was saying, in that case, we move from "silently ignore the conflict, possibly making the device not work" to "guest refuses to boot". Which, if it was guaranteed that a conflict would cause the device to no longer work, would be an improvement. >> This patch series as a whole represents a lot of work and a lot of >> tangible improvements to the situation; and (unless the situation has >> changed) it's almost entirely acked apart from the MMIO placement >> part. If there is a simple way that we can change hvmloader so that >> most (or even many) VM/device combinations work properly for the 4.6 >> release, then I think it's worth considering. >> > > Current MMIO allocation mechanism is not good. So we really need to > reshape that, but we'd better do this with some further discussion in > next release :) Absolutely; I was saying, if we can put in a "good enough" measure for this release, then we can get the rest of the patch series in with our "good enough" hvmloader fix, and then work on fixing it properly next release. But if you're not aiming for this release anymore, then our aims are something different. (See my other e-mail.) -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/15 19:27, Jan Beulich wrote: On 15.07.15 at 13:05, wrote: This patch series as a whole represents a lot of work and a lot of tangible improvements to the situation; and (unless the situation has changed) it's almost entirely acked apart from the MMIO placement part. If there is a simple way that we can change hvmloader so that most (or even many) VM/device combinations work properly for the 4.6 release, then I think it's worth considering. And I think the simplest way is to replace the allocation mechanism This is the best way not the simplest way. The bitmap way matching our all requirement is not easy and realistic to address design/write/test/review in short time. And also the entire replacement would bring more potential risks to 4.6 release. Thanks Tiejun as outlined rather than trying to make the current model cope with the new requirement. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 07/15/2015 12:24 PM, Jan Beulich wrote: On 15.07.15 at 13:05, wrote: >> On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich wrote: >> On 15.07.15 at 10:59, wrote: What about this? @@ -301,6 +301,19 @@ void pci_setup(void) pci_mem_start <<= 1; } +for ( i = 0; i < memory_map.nr_map ; i++ ) +{ +uint64_t reserved_start, reserved_size; +reserved_start = memory_map.map[i].addr; +reserved_size = memory_map.map[i].size; +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, + reserved_start, reserved_size) ) +{ +printf("Reserved device memory conflicts current PCI memory.\n"); +BUG(); +} +} >>> >>> So what would the cure be if someone ran into this BUG() (other >>> than removing the device associated with the conflicting RMRR)? >>> Afaics such a guest would remain permanently unbootable, which >>> of course is not an option. >> >> Is not booting worse than what we have now -- which is, booting >> successfully but (probably) having issues due to MMIO ranges >> overlapping RMRRs? > > Again a matter of perspective: For devices (USB!) where the RMRR > exists solely for boot time (or outdated OS) use, this would be a > plain regression. For the graphics device Tiejun needs this for, it of > course would make little difference, I agree. So it's the case that, for some devices, not only is it functionally OK to have RAM in the RMRR with no issues, it's also functionally OK to have PCI BARs in the RMRR with no issues? If so, then yes, I agree having a device fail to work with no ability to work around it is an unacceptable regression. If we're not targeting 4.6 for this, then the situation changes somewhat. One thing worth considering (which perhaps may be what tiejun is looking at) is the cost of keeping a large number of working-and-already-acked patches out of tree -- both the psychological cost, the cost of rebasing, and the cost of re-reviewing rebases. Given how independent the hvmloader changes are to the rest of the series, it's probably worth trying to see if we can check in the other patches as soon as we branch. If we can check in those patches with hvmloader still ignoring RMRRs, then there's no problem. But if we need the issue addressed *somehow* in hvmloader before checking the rest of the series in, then I think it makes sense to consider a minimal change that would make the series somewhat functional, before moving on to overhauling the hvmloader MMIO placement code. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
Maybe I can move this chunk of codes downward those actual allocation to check if RDM conflicts with the final allocation, and then just disable those associated devices by writing PCI_COMMAND without BUG() like this draft code, And what would keep the guest from re-enabling the device? We can't but IMO, #1. We're already posting that warning messages. #2. Actually this is like this sort of case like, as you known, even in the native platform, some broken devices are also disabled by BIOS, right? So I think this is OS's responsibility or risk to force enabling such a broken device. #3. Its really rare possibility in real world. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 15.07.15 at 13:05, wrote: > This patch series as a whole represents a lot of work and a lot of > tangible improvements to the situation; and (unless the situation has > changed) it's almost entirely acked apart from the MMIO placement > part. If there is a simple way that we can change hvmloader so that > most (or even many) VM/device combinations work properly for the 4.6 > release, then I think it's worth considering. And I think the simplest way is to replace the allocation mechanism as outlined rather than trying to make the current model cope with the new requirement. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 15.07.15 at 12:34, wrote: Yet more special casing code you want to add. I said no to this model, and unless you can address the issue _without_ adding a lot of special casing code, the answer will remain no (subject >>> >>> What about this? >>> >>> @@ -301,6 +301,19 @@ void pci_setup(void) >>>pci_mem_start <<= 1; >>>} >>> >>> +for ( i = 0; i < memory_map.nr_map ; i++ ) >>> +{ >>> +uint64_t reserved_start, reserved_size; >>> +reserved_start = memory_map.map[i].addr; >>> +reserved_size = memory_map.map[i].size; >>> +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, >>> + reserved_start, reserved_size) ) >>> +{ >>> +printf("Reserved device memory conflicts current PCI >>> memory.\n"); >>> +BUG(); >>> +} >>> +} >> >> So what would the cure be if someone ran into this BUG() (other >> than removing the device associated with the conflicting RMRR)? > > Maybe I can move this chunk of codes downward those actual allocation > to check if RDM conflicts with the final allocation, and then just > disable those associated devices by writing PCI_COMMAND without BUG() > like this draft code, And what would keep the guest from re-enabling the device? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 15.07.15 at 13:05, wrote: > On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich wrote: > On 15.07.15 at 10:59, wrote: >>> What about this? >>> >>> @@ -301,6 +301,19 @@ void pci_setup(void) >>> pci_mem_start <<= 1; >>> } >>> >>> +for ( i = 0; i < memory_map.nr_map ; i++ ) >>> +{ >>> +uint64_t reserved_start, reserved_size; >>> +reserved_start = memory_map.map[i].addr; >>> +reserved_size = memory_map.map[i].size; >>> +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, >>> + reserved_start, reserved_size) ) >>> +{ >>> +printf("Reserved device memory conflicts current PCI >>> memory.\n"); >>> +BUG(); >>> +} >>> +} >> >> So what would the cure be if someone ran into this BUG() (other >> than removing the device associated with the conflicting RMRR)? >> Afaics such a guest would remain permanently unbootable, which >> of course is not an option. > > Is not booting worse than what we have now -- which is, booting > successfully but (probably) having issues due to MMIO ranges > overlapping RMRRs? Again a matter of perspective: For devices (USB!) where the RMRR exists solely for boot time (or outdated OS) use, this would be a plain regression. For the graphics device Tiejun needs this for, it of course would make little difference, I agree. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/15 19:05, George Dunlap wrote: On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich wrote: On 15.07.15 at 10:59, wrote: On 2015/7/15 16:34, Jan Beulich wrote: On 15.07.15 at 06:27, wrote: Furthermore, could we have this solution as follows? Yet more special casing code you want to add. I said no to this model, and unless you can address the issue _without_ adding a lot of special casing code, the answer will remain no (subject What about this? @@ -301,6 +301,19 @@ void pci_setup(void) pci_mem_start <<= 1; } +for ( i = 0; i < memory_map.nr_map ; i++ ) +{ +uint64_t reserved_start, reserved_size; +reserved_start = memory_map.map[i].addr; +reserved_size = memory_map.map[i].size; +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, + reserved_start, reserved_size) ) +{ +printf("Reserved device memory conflicts current PCI memory.\n"); +BUG(); +} +} So what would the cure be if someone ran into this BUG() (other than removing the device associated with the conflicting RMRR)? Afaics such a guest would remain permanently unbootable, which of course is not an option. Is not booting worse than what we have now -- which is, booting successfully but (probably) having issues due to MMIO ranges overlapping RMRRs? Its really so rare possibility here since in the real world we didn't see any RMRR regions >= 0xF000 (the default pci memory start.) And I already sent out a little better revision in that ensuing email so also please take a review if possible :) This patch series as a whole represents a lot of work and a lot of tangible improvements to the situation; and (unless the situation has changed) it's almost entirely acked apart from the MMIO placement part. If there is a simple way that we can change hvmloader so that most (or even many) VM/device combinations work properly for the 4.6 release, then I think it's worth considering. Current MMIO allocation mechanism is not good. So we really need to reshape that, but we'd better do this with some further discussion in next release :) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich wrote: On 15.07.15 at 10:59, wrote: >> On 2015/7/15 16:34, Jan Beulich wrote: >> On 15.07.15 at 06:27, wrote: Furthermore, could we have this solution as follows? >>> >>> Yet more special casing code you want to add. I said no to this >>> model, and unless you can address the issue _without_ adding >>> a lot of special casing code, the answer will remain no (subject >> >> What about this? >> >> @@ -301,6 +301,19 @@ void pci_setup(void) >> pci_mem_start <<= 1; >> } >> >> +for ( i = 0; i < memory_map.nr_map ; i++ ) >> +{ >> +uint64_t reserved_start, reserved_size; >> +reserved_start = memory_map.map[i].addr; >> +reserved_size = memory_map.map[i].size; >> +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, >> + reserved_start, reserved_size) ) >> +{ >> +printf("Reserved device memory conflicts current PCI >> memory.\n"); >> +BUG(); >> +} >> +} > > So what would the cure be if someone ran into this BUG() (other > than removing the device associated with the conflicting RMRR)? > Afaics such a guest would remain permanently unbootable, which > of course is not an option. Is not booting worse than what we have now -- which is, booting successfully but (probably) having issues due to MMIO ranges overlapping RMRRs? This patch series as a whole represents a lot of work and a lot of tangible improvements to the situation; and (unless the situation has changed) it's almost entirely acked apart from the MMIO placement part. If there is a simple way that we can change hvmloader so that most (or even many) VM/device combinations work properly for the 4.6 release, then I think it's worth considering. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
Yet more special casing code you want to add. I said no to this model, and unless you can address the issue _without_ adding a lot of special casing code, the answer will remain no (subject What about this? @@ -301,6 +301,19 @@ void pci_setup(void) pci_mem_start <<= 1; } +for ( i = 0; i < memory_map.nr_map ; i++ ) +{ +uint64_t reserved_start, reserved_size; +reserved_start = memory_map.map[i].addr; +reserved_size = memory_map.map[i].size; +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, + reserved_start, reserved_size) ) +{ +printf("Reserved device memory conflicts current PCI memory.\n"); +BUG(); +} +} So what would the cure be if someone ran into this BUG() (other than removing the device associated with the conflicting RMRR)? Maybe I can move this chunk of codes downward those actual allocation to check if RDM conflicts with the final allocation, and then just disable those associated devices by writing PCI_COMMAND without BUG() like this draft code, /* If pci bars conflict with RDM we need to disable this pci device. */ for ( devfn = 0; devfn < 256; devfn++ ) { bar_sz = pci_readl(devfn, bar_reg); bar_data = pci_readl(devfn, bar_reg); bar_data_upper = pci_readl(devfn, bar_reg + 4); /* Until here we don't conflict high memory. */ if ( bar_data_upper ) continue; for ( i = 0; i < memory_map.nr_map ; i++ ) { uint64_t reserved_start, reserved_size; reserved_start = memory_map.map[i].addr; reserved_size = memory_map.map[i].size; if ( check_overlap(bar_data & ~(bar_sz - 1), bar_sz, reserved_start, reserved_size) ) { printf("Reserved device memory conflicts with this pci bar," " so just disable this device.\n"); /* Now disable this device */ cmd = pci_readw(devfn, PCI_COMMAND); pci_writew(devfn, PCI_COMMAND, ~cmd); } } } If this is still not fine to you, look I have to raise a request to co-maintainers since its hard to step next in practice. Hi all guys, what about your idea? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 15.07.15 at 10:59, wrote: > On 2015/7/15 16:34, Jan Beulich wrote: > On 15.07.15 at 06:27, wrote: >>> Furthermore, could we have this solution as follows? >> >> Yet more special casing code you want to add. I said no to this >> model, and unless you can address the issue _without_ adding >> a lot of special casing code, the answer will remain no (subject > > What about this? > > @@ -301,6 +301,19 @@ void pci_setup(void) > pci_mem_start <<= 1; > } > > +for ( i = 0; i < memory_map.nr_map ; i++ ) > +{ > +uint64_t reserved_start, reserved_size; > +reserved_start = memory_map.map[i].addr; > +reserved_size = memory_map.map[i].size; > +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, > + reserved_start, reserved_size) ) > +{ > +printf("Reserved device memory conflicts current PCI memory.\n"); > +BUG(); > +} > +} So what would the cure be if someone ran into this BUG() (other than removing the device associated with the conflicting RMRR)? Afaics such a guest would remain permanently unbootable, which of course is not an option. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
This is very similar to our current policy to [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 since actually this is also another rare possibility in real world. Even I can do this as well when we handle that conflict with [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6. Sorry, here is one typo, s/#6/#5 Thanks Tiejun Note its not necessary to concern high memory since we already handle this case in the hv code previously, and its also not affected by those relocated memory later since our previous policy can make sure RAM isn't overlapping with RDM. Thanks Tiejun to co-maintainers overriding me). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
Certainly appreciate your time. I didn't mean its wasting time at this point. I just want to express that its hard to implement that solution in one or two weeks to walking into 4.6 as an exception. Note I know this feature is still not accepted as an exception to 4.6 right now so I'm making an assumption. After all this is a bug fix (and would have been allowed into 4.5 had it been ready in time), so doesn't necessarily need a freeze exception (but of course the bar raises the later it gets). Rather Yes, this is not a bug fix again into 4.6. than rushing in something that's cumbersome to maintain, I'd much prefer this to be done properly. Indeed, we'd like to finalize this properly as you said. But apparently time is not sufficient to allow this happened. So I just suggest we can further seek the best solution in next phase. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/15 16:34, Jan Beulich wrote: On 15.07.15 at 06:27, wrote: Furthermore, could we have this solution as follows? Yet more special casing code you want to add. I said no to this model, and unless you can address the issue _without_ adding a lot of special casing code, the answer will remain no (subject What about this? @@ -301,6 +301,19 @@ void pci_setup(void) pci_mem_start <<= 1; } +for ( i = 0; i < memory_map.nr_map ; i++ ) +{ +uint64_t reserved_start, reserved_size; +reserved_start = memory_map.map[i].addr; +reserved_size = memory_map.map[i].size; +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, + reserved_start, reserved_size) ) +{ +printf("Reserved device memory conflicts current PCI memory.\n"); +BUG(); +} +} + if ( mmio_total > (pci_mem_end - pci_mem_start) ) { printf("Low MMIO hole not large enough for all devices," This is very similar to our current policy to [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 since actually this is also another rare possibility in real world. Even I can do this as well when we handle that conflict with [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6. Note its not necessary to concern high memory since we already handle this case in the hv code previously, and its also not affected by those relocated memory later since our previous policy can make sure RAM isn't overlapping with RDM. Thanks Tiejun to co-maintainers overriding me). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 15.07.15 at 06:27, wrote: > Furthermore, could we have this solution as follows? Yet more special casing code you want to add. I said no to this model, and unless you can address the issue _without_ adding a lot of special casing code, the answer will remain no (subject to co-maintainers overriding me). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 15.07.15 at 02:55, wrote: >> > I agree we'd better overhaul this since we already found something >>> unreasonable here. But one or two weeks is really not enough to fix this >>> with a bitmap framework, and although a bitmap can make mmio allocation >>> better, but more complicated if we just want to allocate PCI mmio. >>> >>> So could we do this next? I just feel if you'd like to pay more time >>> help me refine our current solution, its relatively realistic to this >>> case :) And then we can go into bitmap in details or work out a better >>> solution in sufficient time slot. >> >> Looking at how long it took to get here (wasn't this series originally >> even meant to go into 4.5?) and how much time I already spent > > Certainly appreciate your time. > > I didn't mean its wasting time at this point. I just want to express > that its hard to implement that solution in one or two weeks to walking > into 4.6 as an exception. > > Note I know this feature is still not accepted as an exception to 4.6 > right now so I'm making an assumption. After all this is a bug fix (and would have been allowed into 4.5 had it been ready in time), so doesn't necessarily need a freeze exception (but of course the bar raises the later it gets). Rather than rushing in something that's cumbersome to maintain, I'd much prefer this to be done properly. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/15 8:55, Chen, Tiejun wrote: I agree we'd better overhaul this since we already found something unreasonable here. But one or two weeks is really not enough to fix this with a bitmap framework, and although a bitmap can make mmio allocation better, but more complicated if we just want to allocate PCI mmio. So could we do this next? I just feel if you'd like to pay more time help me refine our current solution, its relatively realistic to this case :) And then we can go into bitmap in details or work out a better solution in sufficient time slot. Looking at how long it took to get here (wasn't this series originally even meant to go into 4.5?) and how much time I already spent Certainly appreciate your time. I didn't mean its wasting time at this point. I just want to express that its hard to implement that solution in one or two weeks to walking into 4.6 as an exception. Note I know this feature is still not accepted as an exception to 4.6 right now so I'm making an assumption. reviewing all the previous versions, I don't see a point in wasting even more time on working out details of an approach that's getting too complex/ugly already anyway. Here I'm trying to seek such a kind of two-steps approach if possible. Furthermore, could we have this solution as follows? @@ -407,8 +408,29 @@ void pci_setup(void) } base = (resource->base + bar_sz - 1) & ~(uint64_t)(bar_sz - 1); + reallocate_bar: bar_data |= (uint32_t)base; bar_data_upper = (uint32_t)(base >> 32); +/* + * We need to skip those reserved regions like RMRR but this pobably + * lead the remaing allocation failed. + */ +for ( j = 0; j < memory_map.nr_map ; j++ ) +{ +if ( memory_map.map[j].type != E820_RAM ) +{ +reserved_end = memory_map.map[j].addr + memory_map.map[j].size; +if ( check_overlap(base, bar_sz, + memory_map.map[j].addr, + memory_map.map[j].size) ) +{ +if ( !mmio_conflict ) +mmio_conflict = true; +base = (reserved_end + bar_sz - 1) & ~(uint64_t)(bar_sz - 1); +goto reallocate_bar; +} +} +} base += bar_sz; if ( (base < resource->base) || (base > resource->max) ) @@ -416,6 +438,11 @@ void pci_setup(void) printf("pci dev %02x:%x bar %02x size "PRIllx": no space for " "resource!\n", devfn>>3, devfn&7, bar_reg, PRIllx_arg(bar_sz)); +if ( mmio_conflict ) +{ +printf("MMIO conflicts with RDM.\n"); +BUG(); +} continue; } Here we still check RDMs and skip them if any conflicts exist, but at last, if 1) no sufficient space to allocate all bars to this VM && 2) this insufficient situation is triggered by RDM, we stop creating this VM. This may change slightly current mechanism and also guarantee we don't ignore any conflicts. But I admit this is not a good solution because that alignment issue still exists but I think it may be accepted if all PCI devices specific to this VM can work. And in real world most RDMs just fall into somewhere but < 0xF000 (current default pci memory start address), so its not a common case to see this kind of conflict. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
I agree we'd better overhaul this since we already found something unreasonable here. But one or two weeks is really not enough to fix this with a bitmap framework, and although a bitmap can make mmio allocation better, but more complicated if we just want to allocate PCI mmio. So could we do this next? I just feel if you'd like to pay more time help me refine our current solution, its relatively realistic to this case :) And then we can go into bitmap in details or work out a better solution in sufficient time slot. Looking at how long it took to get here (wasn't this series originally even meant to go into 4.5?) and how much time I already spent Certainly appreciate your time. I didn't mean its wasting time at this point. I just want to express that its hard to implement that solution in one or two weeks to walking into 4.6 as an exception. Note I know this feature is still not accepted as an exception to 4.6 right now so I'm making an assumption. reviewing all the previous versions, I don't see a point in wasting even more time on working out details of an approach that's getting too complex/ugly already anyway. Here I'm trying to seek such a kind of two-steps approach if possible. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 14.07.15 at 12:54, wrote: >>> I think bitmap mechanism is a good idea but honestly, its not easy to >>> cover all requirements here. And just like bootmem on Linux side, so its >>> a little complicated to implement this entirely. So I prefer not to >>> introduce this way in current phase. >> >> I'm afraid it's going to be hard to convince me of any approaches >> further complicating the current mechanism instead of overhauling >> it. > > I agree we'd better overhaul this since we already found something > unreasonable here. But one or two weeks is really not enough to fix this > with a bitmap framework, and although a bitmap can make mmio allocation > better, but more complicated if we just want to allocate PCI mmio. > > So could we do this next? I just feel if you'd like to pay more time > help me refine our current solution, its relatively realistic to this > case :) And then we can go into bitmap in details or work out a better > solution in sufficient time slot. Looking at how long it took to get here (wasn't this series originally even meant to go into 4.5?) and how much time I already spent reviewing all the previous versions, I don't see a point in wasting even more time on working out details of an approach that's getting too complex/ugly already anyway. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
Note here I don't address your comments above since I think we should achieve an agreement firstly. I think bitmap mechanism is a good idea but honestly, its not easy to cover all requirements here. And just like bootmem on Linux side, so its a little complicated to implement this entirely. So I prefer not to introduce this way in current phase. I'm afraid it's going to be hard to convince me of any approaches further complicating the current mechanism instead of overhauling it. I agree we'd better overhaul this since we already found something unreasonable here. But one or two weeks is really not enough to fix this with a bitmap framework, and although a bitmap can make mmio allocation better, but more complicated if we just want to allocate PCI mmio. So could we do this next? I just feel if you'd like to pay more time help me refine our current solution, its relatively realistic to this case :) And then we can go into bitmap in details or work out a better solution in sufficient time slot. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 14.07.15 at 08:39, wrote: >> > -} *resource, mem_resource, high_mem_resource, io_resource; >>> +} *resource, mem_resource, high_mem_resource, io_resource, > exp_mem_resource; >> >> Despite having gone through description and the rest of the patch I >> can't seem to be able to guess what "exp_mem" stands for. >> Meaningful variable names are quite helpful though, often avoiding >> the need for comments. > > exp_mem_resource() is the expanded mem_resource in the case of > populating RAM. > > Maybe I should use the whole word, expand_mem_resource. And what does "expand" here mean then? >>> @@ -309,29 +339,31 @@ void pci_setup(void) >>> } >>> >>> /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */ >>> +cur_pci_mem_start = pci_mem_start; >>> while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) >>> +relocate_ram_for_pci_memory(cur_pci_mem_start); >> >> Please be consistent which variable to want to use in the loop >> (pci_mem_start vs cur_pci_mem_start). > > Overall I just call relocate_ram_for_pci_memory() twice and each I > always pass cur_pci_mem_start. Any inconsistent place? In the quoted code you use pci_mem_start in the while() condition and cur_pci_mem_start in that same while()'s body. >> Also, this being the first substantial change to the function makes >> clear that you _still_ leave the sizing loop untouched, and instead >> make the allocation logic below more complicated. I said before a > > But this may be more reasonable than it used to do. In my point of view > we always need to first allocate 32bit mmio and then allocate 64bit mmio > since as you said we don't want to expand high memory if possible. > >> number of times that I don't think this helps maintainability of this >> already convoluted code. Among other things this manifests itself >> in your second call to relocate_ram_for_pci_memory() in no way >> playing by the constraints explained a few lines up from here in an >> extensive comment. > > Can't all variables/comments express what I intend to do here? Except > for that exp_mem_resource. I'm not talking about a lack of comments, but about your added use of the function not being in line with what is being explained in an earlier (pre-existing) comment. >> Therefore I'll not make any further comments on the rest of the >> patch, but instead outline an allocation model that I think would >> fit our needs: Subject to the constraints mentioned above, set up >> a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19 >> bits], i.e. reasonably small a memory block). Each bit represents a >> page usable for MMIO: First of all you remove the range from >> PCI_MEM_END upwards. Then remove all RDM pages. Now do a >> first pass over all devices, allocating (in the bitmap) space for only >> the 32-bit MMIO BARs, starting with the biggest one(s), by finding >> a best fit (i.e. preferably a range not usable by any bigger BAR) >> from top down. For example, if you have available >> >> [f000,f800) >> [f900,f9001000) >> [fa00,fa003000) >> [fa01,fa012000) >> >> and you're looking for a single page slot, you should end up >> picking fa002000. > > Why is this [f900,f9001000]? Just one page in this slot. This was just a simple example I gave. Or maybe I don't understand your question... >> After this pass you should be able to do RAM relocation in a >> single attempt just like we do today (you may still grow the MMIO >> window if you know you need to and can fit some of the 64-bit >> BARs in there, subject to said constraints; this is in an attempt >> to help OSes not comfortable with 64-bit resources). >> >> In a 2nd pass you'd then assign 64-bit resources: If you can fit >> them below 4G (you still have the bitmap left of what you've got >> available), put them there. Allocation strategy could be the same > > I think basically, your logic is similar to what I did as I described in > changelog, The goal is the same, but the approaches look quite different to me. In particular my approach avoids calculating mmio_total up front, then basing RAM relocation on it, only to find subsequently that more RAM may need to be relocated. > I think bitmap mechanism is a good idea but honestly, its not easy to > cover all requirements here. And just like bootmem on Linux side, so its > a little complicated to implement this entirely. So I prefer not to > introduce this way in current phase. I'm afraid it's going to be hard to convince me of any approaches further complicating the current mechanism instead of overhauling it. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
-} *resource, mem_resource, high_mem_resource, io_resource; +} *resource, mem_resource, high_mem_resource, io_resource, exp_mem_resource; Despite having gone through description and the rest of the patch I can't seem to be able to guess what "exp_mem" stands for. Meaningful variable names are quite helpful though, often avoiding the need for comments. exp_mem_resource() is the expanded mem_resource in the case of populating RAM. Maybe I should use the whole word, expand_mem_resource. /* Create a list of device BARs in descending order of size. */ [snip] @@ -309,29 +339,31 @@ void pci_setup(void) } /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */ +cur_pci_mem_start = pci_mem_start; while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) +relocate_ram_for_pci_memory(cur_pci_mem_start); Please be consistent which variable to want to use in the loop (pci_mem_start vs cur_pci_mem_start). Overall I just call relocate_ram_for_pci_memory() twice and each I always pass cur_pci_mem_start. Any inconsistent place? Also, this being the first substantial change to the function makes clear that you _still_ leave the sizing loop untouched, and instead make the allocation logic below more complicated. I said before a But this may be more reasonable than it used to do. In my point of view we always need to first allocate 32bit mmio and then allocate 64bit mmio since as you said we don't want to expand high memory if possible. number of times that I don't think this helps maintainability of this already convoluted code. Among other things this manifests itself in your second call to relocate_ram_for_pci_memory() in no way playing by the constraints explained a few lines up from here in an extensive comment. Can't all variables/comments express what I intend to do here? Except for that exp_mem_resource. /* * We have to populate more RAM to further allocate * the remaining 32bars. */ if ( mmio32_unallocated_total ) { cur_pci_mem_start = pci_mem_start - mmio32_unallocated_total; relocate_ram_for_pci_memory(cur_pci_mem_start); exp_mem_resource.base = cur_pci_mem_start; exp_mem_resource.max = pci_mem_start; } Therefore I'll not make any further comments on the rest of the patch, but instead outline an allocation model that I think would fit our needs: Subject to the constraints mentioned above, set up a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19 bits], i.e. reasonably small a memory block). Each bit represents a page usable for MMIO: First of all you remove the range from PCI_MEM_END upwards. Then remove all RDM pages. Now do a first pass over all devices, allocating (in the bitmap) space for only the 32-bit MMIO BARs, starting with the biggest one(s), by finding a best fit (i.e. preferably a range not usable by any bigger BAR) from top down. For example, if you have available [f000,f800) [f900,f9001000) [fa00,fa003000) [fa01,fa012000) and you're looking for a single page slot, you should end up picking fa002000. Why is this [f900,f9001000]? Just one page in this slot. After this pass you should be able to do RAM relocation in a single attempt just like we do today (you may still grow the MMIO window if you know you need to and can fit some of the 64-bit BARs in there, subject to said constraints; this is in an attempt to help OSes not comfortable with 64-bit resources). In a 2nd pass you'd then assign 64-bit resources: If you can fit them below 4G (you still have the bitmap left of what you've got available), put them there. Allocation strategy could be the same I think basically, your logic is similar to what I did as I described in changelog, 1>. The first allocation round just to 32bit-bar If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar with all remaining resources including low pci memory. If not, we need to calculate how much RAM should be populated to allocate the remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go to the second allocation round 2>. 2>. The second allocation round to the remaining 32bit-bar We should can finish allocating all 32bit-bar in theory, then go to the third allocation round 3>. 3>. The third allocation round to 64bit-bar We'll try to first allocate from the remaining low memory resource. If that isn't enough, we try to expand highmem to allocate for 64bit-bar. This process should be same as the original. as above (biggest first), perhaps allowing for some factoring out of logic, but here smallest first probably could work equally well. The main thought to decide between the two is whether it is better to fit as many (small) or as big (in total) as possible a set under 4G.
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
>>> On 09.07.15 at 07:33, wrote: > @@ -50,17 +75,22 @@ void pci_setup(void) > /* Resources assignable to PCI devices via BARs. */ > struct resource { > uint64_t base, max; > -} *resource, mem_resource, high_mem_resource, io_resource; > +} *resource, mem_resource, high_mem_resource, io_resource, > exp_mem_resource; Despite having gone through description and the rest of the patch I can't seem to be able to guess what "exp_mem" stands for. Meaningful variable names are quite helpful though, often avoiding the need for comments. > /* Create a list of device BARs in descending order of size. */ > struct bars { > -uint32_t is_64bar; > +#define PCI_BAR_IS_64BIT0x1 > +#define PCI_BAR_IS_ALLOCATED0x2 > +uint32_t flag; flags (you already have two) > uint32_t devfn; > uint32_t bar_reg; > uint64_t bar_sz; > } *bars = (struct bars *)scratch_start; > -unsigned int i, nr_bars = 0; > -uint64_t mmio_hole_size = 0; > +unsigned int i, j, n, nr_bars = 0; > +uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size; > +bool bar32_allocating = 0; > +uint64_t mmio32_unallocated_total = 0; > +unsigned long cur_pci_mem_start = 0; > > const char *s; > /* > @@ -222,7 +252,7 @@ void pci_setup(void) > if ( i != nr_bars ) > memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars)); > > -bars[i].is_64bar = is_64bar; > +bars[i].flag = is_64bar ? PCI_BAR_IS_64BIT : 0; > bars[i].devfn = devfn; > bars[i].bar_reg = bar_reg; > bars[i].bar_sz = bar_sz; > @@ -309,29 +339,31 @@ void pci_setup(void) > } > > /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */ > +cur_pci_mem_start = pci_mem_start; > while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) > +relocate_ram_for_pci_memory(cur_pci_mem_start); Please be consistent which variable to want to use in the loop (pci_mem_start vs cur_pci_mem_start). Also, this being the first substantial change to the function makes clear that you _still_ leave the sizing loop untouched, and instead make the allocation logic below more complicated. I said before a number of times that I don't think this helps maintainability of this already convoluted code. Among other things this manifests itself in your second call to relocate_ram_for_pci_memory() in no way playing by the constraints explained a few lines up from here in an extensive comment. Therefore I'll not make any further comments on the rest of the patch, but instead outline an allocation model that I think would fit our needs: Subject to the constraints mentioned above, set up a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19 bits], i.e. reasonably small a memory block). Each bit represents a page usable for MMIO: First of all you remove the range from PCI_MEM_END upwards. Then remove all RDM pages. Now do a first pass over all devices, allocating (in the bitmap) space for only the 32-bit MMIO BARs, starting with the biggest one(s), by finding a best fit (i.e. preferably a range not usable by any bigger BAR) from top down. For example, if you have available [f000,f800) [f900,f9001000) [fa00,fa003000) [fa01,fa012000) and you're looking for a single page slot, you should end up picking fa002000. After this pass you should be able to do RAM relocation in a single attempt just like we do today (you may still grow the MMIO window if you know you need to and can fit some of the 64-bit BARs in there, subject to said constraints; this is in an attempt to help OSes not comfortable with 64-bit resources). In a 2nd pass you'd then assign 64-bit resources: If you can fit them below 4G (you still have the bitmap left of what you've got available), put them there. Allocation strategy could be the same as above (biggest first), perhaps allowing for some factoring out of logic, but here smallest first probably could work equally well. The main thought to decide between the two is whether it is better to fit as many (small) or as big (in total) as possible a set under 4G. I'd generally expect the former (as many as possible, leaving only a few huge ones to go above 4G) to be the better approach, but that's more a gut feeling than based on hard data. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel