Re: [Xen-devel] Question about partitioning shared cache in Xen

2015-01-15 Thread Meng Xu
2015-01-15 3:23 GMT-05:00 Jan Beulich :
 On 14.01.15 at 22:19,  wrote:
>> So when Xen allocate memory to a PV guest with 256MB memory and 4KB
>> page size (i.e., 2^16 memory pages), Xen will allocate 2^16 continuous
>> memory pages to this guest since the maximum continuous memory pages
>> Xen allocates to PV guest is 1024*1024.
>
> Provided this is (a) not a debug build and (b) there is a large enough
> chunk of memory available.
>
>> Although the 2^16 memory pages are continuous, Xen still need to fill
>> this guest's p2m table in a page-by-page fashion, which means each
>> element in the guest's p2m table is the page frame number of one 4KB
>> page. Right?
>
> Sure, but again only for PV.
>
 But can we allocate one memory page to guests until the guests have
 enough pages?
>>>
>>> We can, but that's inefficient for TLB usage and page table lookup.
>>
>> IMHO, that's true for any case when we have a smaller page size. In my
>> understanding, Xen manages guests' memory, say p2m table or m2p table,
>> in the granularity of 4KB page. In other words, the page size in Xen
>> is still 4KB. (Please correct me if I'm wrong.)
>
> Once again - correct for PV (without the [unsupported?] superpages
> flag set), but not for PVH/HVM.
>
>> So if the number of pages a guest requests does not change, (which
>> means the size of page is 4KB,) the TLB usage should be same.
>> If the page size in Xen is larger than 4KB, the TLB usage will
>> increase for sure if we force Xen to use 4KB page size.
>
> And that's what is the case for PVH/HVM.
>
>> OK. Suppose TLB usage and page table lookup becomes inefficient
>> because of the page coloring mechanism. I totally agree that
>> non-continuous memory may hurt the performance of a guest when the
>> guest runs alone. However, the shared-cache partition can make the
>> performance of a guest more stable and not easy to be influenced by
>> other guests. Briefly speaking, I'm trying to make the running time of
>> the workload in a guest more deterministic and robust to other guests'
>> interference.
>>
>> For those application, like the control program in automobile, that
>> must produce results within a deadline, a deterministic execution time
>> is more important than an execution time that is smaller in most cases
>> but may be very large in worst case.
>
> Understood, but in that case I suppose this may need to be a
> default-off optional feature.
>

Yes. Actually, right now I just want to evaluate if the idea of shared
cache partition works on Xen in some specific applications. People did
such similar things in Linux (in the research projects) and according
to wiki (http://en.wikipedia.org/wiki/Cache_coloring), some OS, e.g.,
FreeBSD, supports it. So I'm curious about the performance of the page
coloring mechanism on Xen and the strength and weakness of it.
(Although in theory, we know the strength and weakness, I want/hope to
see it in practical implementation. That's why I'm trying to do this.)

Thank you very much for your explanation and confirmation of my
understanding! :-)

Best,

Meng




-- 


---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question about partitioning shared cache in Xen

2015-01-15 Thread Jan Beulich
>>> On 14.01.15 at 22:19,  wrote:
> So when Xen allocate memory to a PV guest with 256MB memory and 4KB
> page size (i.e., 2^16 memory pages), Xen will allocate 2^16 continuous
> memory pages to this guest since the maximum continuous memory pages
> Xen allocates to PV guest is 1024*1024.

Provided this is (a) not a debug build and (b) there is a large enough
chunk of memory available.

> Although the 2^16 memory pages are continuous, Xen still need to fill
> this guest's p2m table in a page-by-page fashion, which means each
> element in the guest's p2m table is the page frame number of one 4KB
> page. Right?

Sure, but again only for PV.

>>> But can we allocate one memory page to guests until the guests have
>>> enough pages?
>>
>> We can, but that's inefficient for TLB usage and page table lookup.
> 
> IMHO, that's true for any case when we have a smaller page size. In my
> understanding, Xen manages guests' memory, say p2m table or m2p table,
> in the granularity of 4KB page. In other words, the page size in Xen
> is still 4KB. (Please correct me if I'm wrong.)

Once again - correct for PV (without the [unsupported?] superpages
flag set), but not for PVH/HVM.

> So if the number of pages a guest requests does not change, (which
> means the size of page is 4KB,) the TLB usage should be same.
> If the page size in Xen is larger than 4KB, the TLB usage will
> increase for sure if we force Xen to use 4KB page size.

And that's what is the case for PVH/HVM.

> OK. Suppose TLB usage and page table lookup becomes inefficient
> because of the page coloring mechanism. I totally agree that
> non-continuous memory may hurt the performance of a guest when the
> guest runs alone. However, the shared-cache partition can make the
> performance of a guest more stable and not easy to be influenced by
> other guests. Briefly speaking, I'm trying to make the running time of
> the workload in a guest more deterministic and robust to other guests'
> interference.
> 
> For those application, like the control program in automobile, that
> must produce results within a deadline, a deterministic execution time
> is more important than an execution time that is smaller in most cases
> but may be very large in worst case.

Understood, but in that case I suppose this may need to be a
default-off optional feature.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question about partitioning shared cache in Xen

2015-01-14 Thread Meng Xu
2015-01-14 11:29 GMT-05:00 Jan Beulich :
 On 14.01.15 at 16:27,  wrote:
>> 2015-01-14 10:02 GMT-05:00 Jan Beulich :
>> On 14.01.15 at 15:45,  wrote:
 Yes. I try to use the bits [A16, A12] to isolate different colors in a
 shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to
 index the cache set. Because page size is 4KB, we have page frame
 number's bits [A16, A12] overlapped with the bits used to index a
 shared cache's cache set. So we can control those [A16, A12] bits to
 control where the page should be placed. (The wiki pages about page
 coloring is here: http://en.wikipedia.org/wiki/Cache_coloring)
>>>
>>> But the majority of allocations done for guests would be as 2M or
>>> 1G pages,
>>
>> First, I want to confirm my understanding is not incorrect: When Xen
>> allocate memory pages to guests, it current allocate a bunch of memory
>> pages at one time to guests. That's why you said the majority
>> allocation would be 2MB or 1GB. But the size of one memory page used
>> by guests is still 4KB. Am I correct?
>
> Yes.

So when Xen allocate memory to a PV guest with 256MB memory and 4KB
page size (i.e., 2^16 memory pages), Xen will allocate 2^16 continuous
memory pages to this guest since the maximum continuous memory pages
Xen allocates to PV guest is 1024*1024.
Although the 2^16 memory pages are continuous, Xen still need to fill
this guest's p2m table in a page-by-page fashion, which means each
element in the guest's p2m table is the page frame number of one 4KB
page. Right?

>
>> But can we allocate one memory page to guests until the guests have
>> enough pages?
>
> We can, but that's inefficient for TLB usage and page table lookup.

IMHO, that's true for any case when we have a smaller page size. In my
understanding, Xen manages guests' memory, say p2m table or m2p table,
in the granularity of 4KB page. In other words, the page size in Xen
is still 4KB. (Please correct me if I'm wrong.)
So if the number of pages a guest requests does not change, (which
means the size of page is 4KB,) the TLB usage should be same.
If the page size in Xen is larger than 4KB, the TLB usage will
increase for sure if we force Xen to use 4KB page size.

OK. Suppose TLB usage and page table lookup becomes inefficient
because of the page coloring mechanism. I totally agree that
non-continuous memory may hurt the performance of a guest when the
guest runs alone. However, the shared-cache partition can make the
performance of a guest more stable and not easy to be influenced by
other guests. Briefly speaking, I'm trying to make the running time of
the workload in a guest more deterministic and robust to other guests'
interference.

For those application, like the control program in automobile, that
must produce results within a deadline, a deterministic execution time
is more important than an execution time that is smaller in most cases
but may be very large in worst case.

>
>> I find in arch_setup_meminit() function in tools/libxc/xc_dom_x86.c
>> allocate memory pages depending on if the dom->superpages is true.
>> Can we add a if-else to allocate one page at each time to the guest
>> instead of allocate many pages in one time?
>
> That's for PV guests, which (by default) can't use 2M (not to speak of
> 1G) pages anyway.

Right now, I'm only looking at the PV guests and try to have some
measurements of the cache partitioning mechanisms on the PV guests. I
want to first show the benefits and cost of the page coloring
mechanisms in Xen and then may explore the other types of guests.

However, even for the PV guests, I'm struggling with the error I
mentioned above. :-(

Thank you very much for your time and help!
Hope you could give me some advice where I should investigate to fix the issue.

Best,

Meng

-- 


---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question about partitioning shared cache in Xen

2015-01-14 Thread Jan Beulich
>>> On 14.01.15 at 16:27,  wrote:
> 2015-01-14 10:02 GMT-05:00 Jan Beulich :
> On 14.01.15 at 15:45,  wrote:
>>> Yes. I try to use the bits [A16, A12] to isolate different colors in a
>>> shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to
>>> index the cache set. Because page size is 4KB, we have page frame
>>> number's bits [A16, A12] overlapped with the bits used to index a
>>> shared cache's cache set. So we can control those [A16, A12] bits to
>>> control where the page should be placed. (The wiki pages about page
>>> coloring is here: http://en.wikipedia.org/wiki/Cache_coloring)
>>
>> But the majority of allocations done for guests would be as 2M or
>> 1G pages,
> 
> First, I want to confirm my understanding is not incorrect: When Xen
> allocate memory pages to guests, it current allocate a bunch of memory
> pages at one time to guests. That's why you said the majority
> allocation would be 2MB or 1GB. But the size of one memory page used
> by guests is still 4KB. Am I correct?

Yes.

> But can we allocate one memory page to guests until the guests have
> enough pages?

We can, but that's inefficient for TLB usage and page table lookup.

> I find in arch_setup_meminit() function in tools/libxc/xc_dom_x86.c
> allocate memory pages depending on if the dom->superpages is true.
> Can we add a if-else to allocate one page at each time to the guest
> instead of allocate many pages in one time?

That's for PV guests, which (by default) can't use 2M (not to speak of
1G) pages anyway.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question about partitioning shared cache in Xen

2015-01-14 Thread Meng Xu
Hi Jan,

2015-01-14 10:02 GMT-05:00 Jan Beulich :
 On 14.01.15 at 15:45,  wrote:
>> Yes. I try to use the bits [A16, A12] to isolate different colors in a
>> shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to
>> index the cache set. Because page size is 4KB, we have page frame
>> number's bits [A16, A12] overlapped with the bits used to index a
>> shared cache's cache set. So we can control those [A16, A12] bits to
>> control where the page should be placed. (The wiki pages about page
>> coloring is here: http://en.wikipedia.org/wiki/Cache_coloring)
>
> But the majority of allocations done for guests would be as 2M or
> 1G pages,

First, I want to confirm my understanding is not incorrect: When Xen
allocate memory pages to guests, it current allocate a bunch of memory
pages at one time to guests. That's why you said the majority
allocation would be 2MB or 1GB. But the size of one memory page used
by guests is still 4KB. Am I correct?

If I'm correct, I will have the following argument. If I'm incorrect,
would you mind letting me know which function or file I should look at
to correct my understanding? (Thank you very much! )

If I'm wrong in the size of one memory page, the following
argument won't hold and I need to first correct my above
understanding===
But can we allocate one memory page to guests until the guests have
enough pages?

I find in arch_setup_meminit() function in tools/libxc/xc_dom_x86.c
allocate memory pages depending on if the dom->superpages is true.
Can we add a if-else to allocate one page at each time to the guest
instead of allocate many pages in one time?

For example, if it's not superpages, we call
xc_domain_populate_physmap_exact_ca() to allocate one cache-colored
page each time.

if ( num_cache_colors == 0 )

{

for ( i = rc = allocsz = 0;

  (i < dom->total_pages) && !rc;

  i += allocsz )

{

allocsz = dom->total_pages - i;

if ( allocsz > 1024*1024 )

allocsz = 1024*1024;

rc = xc_domain_populate_physmap_exact(

dom->xch, dom->guest_domid, allocsz,

0, 0, &dom->p2m_host[i]);

}

}

else

{

for ( i = rc = allocsz = 0;

(i < dom->total_pages) && !rc;

i += allocsz )

{

allocsz = 1; /* TODO: change to allocate mulitple
pages when have memory pool */

rc = xc_domain_populate_physmap_exact_ca(

dom->xch, dom->guest_domid, allocsz,

0, 0, &dom->p2m_host[i], cache_colors);

}

}



Thank you very much!

Best,

Meng



-- 


---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question about partitioning shared cache in Xen

2015-01-14 Thread Jan Beulich
>>> On 14.01.15 at 15:45,  wrote:
> Yes. I try to use the bits [A16, A12] to isolate different colors in a
> shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to
> index the cache set. Because page size is 4KB, we have page frame
> number's bits [A16, A12] overlapped with the bits used to index a
> shared cache's cache set. So we can control those [A16, A12] bits to
> control where the page should be placed. (The wiki pages about page
> coloring is here: http://en.wikipedia.org/wiki/Cache_coloring)

But the majority of allocations done for guests would be as 2M or
1G pages, so picking address bits 12..16 for coloring seems rather
undesirable. (And surely no LLC would be large enough any time
soon to allow coloring of 1G pages anyway.)

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question about partitioning shared cache in Xen

2015-01-14 Thread Meng Xu
Hi Andrew,

Thank you very much for your quick reply!

2015-01-14 7:20 GMT-05:00 Andrew Cooper :
> On 14/01/15 00:41, Meng Xu wrote:
>> Hi,
>>
>> [Goal]
>> I want to investigate the impact of the shared cache on the
>> performance of workload in guest domain.
>> I also want to partition the shared cache via page coloring mechanism
>> so that guest domains can use different cache colors of shared cache
>> and will not have interference in the shared cache.
>>
>> [Motivation: Why do I want to partition the shared cache?]
>> Because the shared cache is shared among all guest domains (I assume
>> the machine has multicores sharing the same LLC. For example, Intel(R)
>> Xeon(R) CPU E5-1650 v2 has 6 physical cores sharing a 12MB L3 cache.),
>> the workload in one domU can interfere another domU's memory-intensive
>> workload on the same machine via shared cache. This shared-cache
>> interference makes the execution time of the workload in a domU
>> non-deterministic and increase a lot. (If we assume the worst case,
>> the worst-case execution time of the workload will be too
>> pessimistic.) A stable execution time is very important in real-time
>> computation when the real-time program, like the control program on
>> automobile, have to generate the result within a deadline.
>>
>> I did some quick measurements to show how shared cache can be used by
>> a holistic domain to interfere the execution time of another domain's
>> workload. I pin the VCPUs of two domains to different physical cores
>> and use one domain to pollute the shared cache. The result shows that
>> the shared-cache interference can make the execution time of another
>> domain's workload slow down by 4x. The whole experiment result can be
>> found at 
>> https://github.com/PennPanda/cis601/blob/master/project/data/boxplot_cache_v2.pdf
>>  . (The workload of the figure is a program reading a large array. I
>> run the program for 100 times and draw the latency of accessing the
>> array in a box plot. The first column with name "alone−d1v1" is the
>> boxplot latency when the program in dom1 runs alone. The fourth column
>> "d1v1d2v1−pindiffcore" is the boxplot latency when the program in dom1
>> runs along with another program in dom2, and these two domains uses
>> different cores. dom1 and dom2 have 1 vcpu with budget equal to
>> period. The scheduler is credit scheduler.)
>>
>> [Idea of how to partition the shared cache]
>> When a PV guest domain is created, it will call xc_dom_boot_mem_init()
>> to allocate memory for the domain, which finally calls
>> xc_domain_populate_physmap_exact() to allocate memory pages from
>> domheap in Xen.
>> The idea of partitioning the share cache is as follows:
>> 1) xl tool change: Add an option in domain's configuration file which
>> specifies which cache colors this domain should use. (I have done this
>> and when I use xl create --dry-run, I can see the parameters are
>> parsed to the build information.)
>> 2) hypervisor change: Add another hypercall
>> xc_domain_populate_physmap_exact_ca() which has one more parameter,
>> i.e, the cache colors this domain should use. I also need to reserve a
>> memory pool which sort the reserved memory pages based on its cache
>> color.
>>
>> When a PV domain is created, I can specify the cache colors it uses.
>> Then the xl tool will call the xc_domain_populate_physmap_exact_ca()
>> to only allocate the memory pages with the specified cache colors to
>> this domain.
>>
>> [Quick implementation]
>> I attached my quick implementation patch at the end of this email.
>>
>> [Issues and Questions]
>> After I applied the patch to  Xen's commit point
>> 36174af3fbeb1b662c0eadbfa193e77f68cc955b and run it on my machine,
>> dom0 cannot boot up.:-(
>> The error message from dom0 is:
>> [0.00] Kernel panic - not syncing: Failed to get contiguous
>> memory for DMA from Xen!
>>
>> [0.00] You either: don't have the permissions, do not have
>> enough free memory under 4GB, or the hypervisor memory is too
>> fragmented! (rc:-12)
>>
>> I tried to print every message in the function I touched in order to
>> figure out where goes wrong but failed. :-(
>> The thing I cannot understand is that: My implementation haven't
>> reserve any  memory pages in the cache-aware memory pool before the
>> system boots up. Basically, every function I modified haven't been
>> called before the system boots up. But the system crashes. :-( (The
>> system can boot up and work perfectly before applying my patch.)
>>
>> I really appreciate it if any of you could point out the part I missed
>> or misunderstood. :-)
>
> The error message is quite clear.  I presume that your cache
> partitioning algorithm has prevented dom0 from getting any
> machine-contiguous pages for DMA.  This prevents dom0 from using any
> hardware, such as its disks or the network.

Actually, I didn't partition the shared cache for dom0. dom0 should
have a continuous memory as before.

I didn't modify any function in the existi

Re: [Xen-devel] Question about partitioning shared cache in Xen

2015-01-14 Thread Andrew Cooper
On 14/01/15 00:41, Meng Xu wrote:
> Hi,
>
> [Goal]
> I want to investigate the impact of the shared cache on the
> performance of workload in guest domain.
> I also want to partition the shared cache via page coloring mechanism
> so that guest domains can use different cache colors of shared cache
> and will not have interference in the shared cache.
>
> [Motivation: Why do I want to partition the shared cache?]
> Because the shared cache is shared among all guest domains (I assume
> the machine has multicores sharing the same LLC. For example, Intel(R)
> Xeon(R) CPU E5-1650 v2 has 6 physical cores sharing a 12MB L3 cache.),
> the workload in one domU can interfere another domU's memory-intensive
> workload on the same machine via shared cache. This shared-cache
> interference makes the execution time of the workload in a domU
> non-deterministic and increase a lot. (If we assume the worst case,
> the worst-case execution time of the workload will be too
> pessimistic.) A stable execution time is very important in real-time
> computation when the real-time program, like the control program on
> automobile, have to generate the result within a deadline.
>
> I did some quick measurements to show how shared cache can be used by
> a holistic domain to interfere the execution time of another domain's
> workload. I pin the VCPUs of two domains to different physical cores
> and use one domain to pollute the shared cache. The result shows that
> the shared-cache interference can make the execution time of another
> domain's workload slow down by 4x. The whole experiment result can be
> found at 
> https://github.com/PennPanda/cis601/blob/master/project/data/boxplot_cache_v2.pdf
>  . (The workload of the figure is a program reading a large array. I
> run the program for 100 times and draw the latency of accessing the
> array in a box plot. The first column with name "alone−d1v1" is the
> boxplot latency when the program in dom1 runs alone. The fourth column
> "d1v1d2v1−pindiffcore" is the boxplot latency when the program in dom1
> runs along with another program in dom2, and these two domains uses
> different cores. dom1 and dom2 have 1 vcpu with budget equal to
> period. The scheduler is credit scheduler.)
>
> [Idea of how to partition the shared cache]
> When a PV guest domain is created, it will call xc_dom_boot_mem_init()
> to allocate memory for the domain, which finally calls
> xc_domain_populate_physmap_exact() to allocate memory pages from
> domheap in Xen.
> The idea of partitioning the share cache is as follows:
> 1) xl tool change: Add an option in domain's configuration file which
> specifies which cache colors this domain should use. (I have done this
> and when I use xl create --dry-run, I can see the parameters are
> parsed to the build information.)
> 2) hypervisor change: Add another hypercall
> xc_domain_populate_physmap_exact_ca() which has one more parameter,
> i.e, the cache colors this domain should use. I also need to reserve a
> memory pool which sort the reserved memory pages based on its cache
> color.
>
> When a PV domain is created, I can specify the cache colors it uses.
> Then the xl tool will call the xc_domain_populate_physmap_exact_ca()
> to only allocate the memory pages with the specified cache colors to
> this domain.
>
> [Quick implementation]
> I attached my quick implementation patch at the end of this email.
>
> [Issues and Questions]
> After I applied the patch to  Xen's commit point
> 36174af3fbeb1b662c0eadbfa193e77f68cc955b and run it on my machine,
> dom0 cannot boot up.:-(
> The error message from dom0 is:
> [0.00] Kernel panic - not syncing: Failed to get contiguous
> memory for DMA from Xen!
>
> [0.00] You either: don't have the permissions, do not have
> enough free memory under 4GB, or the hypervisor memory is too
> fragmented! (rc:-12)
>
> I tried to print every message in the function I touched in order to
> figure out where goes wrong but failed. :-(
> The thing I cannot understand is that: My implementation haven't
> reserve any  memory pages in the cache-aware memory pool before the
> system boots up. Basically, every function I modified haven't been
> called before the system boots up. But the system crashes. :-( (The
> system can boot up and work perfectly before applying my patch.)
>
> I really appreciate it if any of you could point out the part I missed
> or misunderstood. :-)

The error message is quite clear.  I presume that your cache
partitioning algorithm has prevented dom0 from getting any
machine-contiguous pages for DMA.  This prevents dom0 from using any
hardware, such as its disks or the network.

What I don't see is how you plan to isolate different colours in a
shared cache.  I am guessing (seeing as the patch is full of debugging
and hard to follow) that you are using the low order bits in the
physical address to identify the colour, which will indeed prevent any
continuous allocations from happening.  Is this what you are attem

[Xen-devel] Question about partitioning shared cache in Xen

2015-01-13 Thread Meng Xu
Hi,

[Goal]
I want to investigate the impact of the shared cache on the
performance of workload in guest domain.
I also want to partition the shared cache via page coloring mechanism
so that guest domains can use different cache colors of shared cache
and will not have interference in the shared cache.

[Motivation: Why do I want to partition the shared cache?]
Because the shared cache is shared among all guest domains (I assume
the machine has multicores sharing the same LLC. For example, Intel(R)
Xeon(R) CPU E5-1650 v2 has 6 physical cores sharing a 12MB L3 cache.),
the workload in one domU can interfere another domU's memory-intensive
workload on the same machine via shared cache. This shared-cache
interference makes the execution time of the workload in a domU
non-deterministic and increase a lot. (If we assume the worst case,
the worst-case execution time of the workload will be too
pessimistic.) A stable execution time is very important in real-time
computation when the real-time program, like the control program on
automobile, have to generate the result within a deadline.

I did some quick measurements to show how shared cache can be used by
a holistic domain to interfere the execution time of another domain's
workload. I pin the VCPUs of two domains to different physical cores
and use one domain to pollute the shared cache. The result shows that
the shared-cache interference can make the execution time of another
domain's workload slow down by 4x. The whole experiment result can be
found at 
https://github.com/PennPanda/cis601/blob/master/project/data/boxplot_cache_v2.pdf
 . (The workload of the figure is a program reading a large array. I
run the program for 100 times and draw the latency of accessing the
array in a box plot. The first column with name "alone−d1v1" is the
boxplot latency when the program in dom1 runs alone. The fourth column
"d1v1d2v1−pindiffcore" is the boxplot latency when the program in dom1
runs along with another program in dom2, and these two domains uses
different cores. dom1 and dom2 have 1 vcpu with budget equal to
period. The scheduler is credit scheduler.)

[Idea of how to partition the shared cache]
When a PV guest domain is created, it will call xc_dom_boot_mem_init()
to allocate memory for the domain, which finally calls
xc_domain_populate_physmap_exact() to allocate memory pages from
domheap in Xen.
The idea of partitioning the share cache is as follows:
1) xl tool change: Add an option in domain's configuration file which
specifies which cache colors this domain should use. (I have done this
and when I use xl create --dry-run, I can see the parameters are
parsed to the build information.)
2) hypervisor change: Add another hypercall
xc_domain_populate_physmap_exact_ca() which has one more parameter,
i.e, the cache colors this domain should use. I also need to reserve a
memory pool which sort the reserved memory pages based on its cache
color.

When a PV domain is created, I can specify the cache colors it uses.
Then the xl tool will call the xc_domain_populate_physmap_exact_ca()
to only allocate the memory pages with the specified cache colors to
this domain.

[Quick implementation]
I attached my quick implementation patch at the end of this email.

[Issues and Questions]
After I applied the patch to  Xen's commit point
36174af3fbeb1b662c0eadbfa193e77f68cc955b and run it on my machine,
dom0 cannot boot up.:-(
The error message from dom0 is:
[0.00] Kernel panic - not syncing: Failed to get contiguous
memory for DMA from Xen!

[0.00] You either: don't have the permissions, do not have
enough free memory under 4GB, or the hypervisor memory is too
fragmented! (rc:-12)

I tried to print every message in the function I touched in order to
figure out where goes wrong but failed. :-(
The thing I cannot understand is that: My implementation haven't
reserve any  memory pages in the cache-aware memory pool before the
system boots up. Basically, every function I modified haven't been
called before the system boots up. But the system crashes. :-( (The
system can boot up and work perfectly before applying my patch.)

I really appreciate it if any of you could point out the part I missed
or misunderstood. :-)

Thank you very very much!

Best,

Meng


The full crash message is as follows:

Xen 4.5.0-rc

(XEN) Xen version 4.5.0-rc (root@) (gcc (Ubuntu/Linaro 4.6.3-1ubuntu5)
4.6.3) debug=y Sun Jan 11 11:39:23 EST 2015

(XEN) Latest ChangeSet: Sun Jan 4 22:19:40 2015 -0500 git:962a13f-dirty

(XEN) Bootloader: GRUB 1.99-21ubuntu3.14

(XEN) Command line: placeholder dom0_memory=512M sched=credit
console=tty0 com1=115200n8 console=com1

(XEN) Video information:

(XEN)  VGA is text mode 80x25, font 8x16

(XEN) Disc information:

(XEN)  Found 1 MBR signatures

(XEN)  Found 1 EDD information structures

(XEN) Xen-e820 RAM map:

(XEN)   - 0009fc00 (usable)

(XEN)  0009fc00 - 000a (reserved)

(XEN)  000f - 00