Re: [Xen-devel] Question about partitioning shared cache in Xen
2015-01-15 3:23 GMT-05:00 Jan Beulich : On 14.01.15 at 22:19, wrote: >> So when Xen allocate memory to a PV guest with 256MB memory and 4KB >> page size (i.e., 2^16 memory pages), Xen will allocate 2^16 continuous >> memory pages to this guest since the maximum continuous memory pages >> Xen allocates to PV guest is 1024*1024. > > Provided this is (a) not a debug build and (b) there is a large enough > chunk of memory available. > >> Although the 2^16 memory pages are continuous, Xen still need to fill >> this guest's p2m table in a page-by-page fashion, which means each >> element in the guest's p2m table is the page frame number of one 4KB >> page. Right? > > Sure, but again only for PV. > But can we allocate one memory page to guests until the guests have enough pages? >>> >>> We can, but that's inefficient for TLB usage and page table lookup. >> >> IMHO, that's true for any case when we have a smaller page size. In my >> understanding, Xen manages guests' memory, say p2m table or m2p table, >> in the granularity of 4KB page. In other words, the page size in Xen >> is still 4KB. (Please correct me if I'm wrong.) > > Once again - correct for PV (without the [unsupported?] superpages > flag set), but not for PVH/HVM. > >> So if the number of pages a guest requests does not change, (which >> means the size of page is 4KB,) the TLB usage should be same. >> If the page size in Xen is larger than 4KB, the TLB usage will >> increase for sure if we force Xen to use 4KB page size. > > And that's what is the case for PVH/HVM. > >> OK. Suppose TLB usage and page table lookup becomes inefficient >> because of the page coloring mechanism. I totally agree that >> non-continuous memory may hurt the performance of a guest when the >> guest runs alone. However, the shared-cache partition can make the >> performance of a guest more stable and not easy to be influenced by >> other guests. Briefly speaking, I'm trying to make the running time of >> the workload in a guest more deterministic and robust to other guests' >> interference. >> >> For those application, like the control program in automobile, that >> must produce results within a deadline, a deterministic execution time >> is more important than an execution time that is smaller in most cases >> but may be very large in worst case. > > Understood, but in that case I suppose this may need to be a > default-off optional feature. > Yes. Actually, right now I just want to evaluate if the idea of shared cache partition works on Xen in some specific applications. People did such similar things in Linux (in the research projects) and according to wiki (http://en.wikipedia.org/wiki/Cache_coloring), some OS, e.g., FreeBSD, supports it. So I'm curious about the performance of the page coloring mechanism on Xen and the strength and weakness of it. (Although in theory, we know the strength and weakness, I want/hope to see it in practical implementation. That's why I'm trying to do this.) Thank you very much for your explanation and confirmation of my understanding! :-) Best, Meng -- --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about partitioning shared cache in Xen
>>> On 14.01.15 at 22:19, wrote: > So when Xen allocate memory to a PV guest with 256MB memory and 4KB > page size (i.e., 2^16 memory pages), Xen will allocate 2^16 continuous > memory pages to this guest since the maximum continuous memory pages > Xen allocates to PV guest is 1024*1024. Provided this is (a) not a debug build and (b) there is a large enough chunk of memory available. > Although the 2^16 memory pages are continuous, Xen still need to fill > this guest's p2m table in a page-by-page fashion, which means each > element in the guest's p2m table is the page frame number of one 4KB > page. Right? Sure, but again only for PV. >>> But can we allocate one memory page to guests until the guests have >>> enough pages? >> >> We can, but that's inefficient for TLB usage and page table lookup. > > IMHO, that's true for any case when we have a smaller page size. In my > understanding, Xen manages guests' memory, say p2m table or m2p table, > in the granularity of 4KB page. In other words, the page size in Xen > is still 4KB. (Please correct me if I'm wrong.) Once again - correct for PV (without the [unsupported?] superpages flag set), but not for PVH/HVM. > So if the number of pages a guest requests does not change, (which > means the size of page is 4KB,) the TLB usage should be same. > If the page size in Xen is larger than 4KB, the TLB usage will > increase for sure if we force Xen to use 4KB page size. And that's what is the case for PVH/HVM. > OK. Suppose TLB usage and page table lookup becomes inefficient > because of the page coloring mechanism. I totally agree that > non-continuous memory may hurt the performance of a guest when the > guest runs alone. However, the shared-cache partition can make the > performance of a guest more stable and not easy to be influenced by > other guests. Briefly speaking, I'm trying to make the running time of > the workload in a guest more deterministic and robust to other guests' > interference. > > For those application, like the control program in automobile, that > must produce results within a deadline, a deterministic execution time > is more important than an execution time that is smaller in most cases > but may be very large in worst case. Understood, but in that case I suppose this may need to be a default-off optional feature. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about partitioning shared cache in Xen
2015-01-14 11:29 GMT-05:00 Jan Beulich : On 14.01.15 at 16:27, wrote: >> 2015-01-14 10:02 GMT-05:00 Jan Beulich : >> On 14.01.15 at 15:45, wrote: Yes. I try to use the bits [A16, A12] to isolate different colors in a shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to index the cache set. Because page size is 4KB, we have page frame number's bits [A16, A12] overlapped with the bits used to index a shared cache's cache set. So we can control those [A16, A12] bits to control where the page should be placed. (The wiki pages about page coloring is here: http://en.wikipedia.org/wiki/Cache_coloring) >>> >>> But the majority of allocations done for guests would be as 2M or >>> 1G pages, >> >> First, I want to confirm my understanding is not incorrect: When Xen >> allocate memory pages to guests, it current allocate a bunch of memory >> pages at one time to guests. That's why you said the majority >> allocation would be 2MB or 1GB. But the size of one memory page used >> by guests is still 4KB. Am I correct? > > Yes. So when Xen allocate memory to a PV guest with 256MB memory and 4KB page size (i.e., 2^16 memory pages), Xen will allocate 2^16 continuous memory pages to this guest since the maximum continuous memory pages Xen allocates to PV guest is 1024*1024. Although the 2^16 memory pages are continuous, Xen still need to fill this guest's p2m table in a page-by-page fashion, which means each element in the guest's p2m table is the page frame number of one 4KB page. Right? > >> But can we allocate one memory page to guests until the guests have >> enough pages? > > We can, but that's inefficient for TLB usage and page table lookup. IMHO, that's true for any case when we have a smaller page size. In my understanding, Xen manages guests' memory, say p2m table or m2p table, in the granularity of 4KB page. In other words, the page size in Xen is still 4KB. (Please correct me if I'm wrong.) So if the number of pages a guest requests does not change, (which means the size of page is 4KB,) the TLB usage should be same. If the page size in Xen is larger than 4KB, the TLB usage will increase for sure if we force Xen to use 4KB page size. OK. Suppose TLB usage and page table lookup becomes inefficient because of the page coloring mechanism. I totally agree that non-continuous memory may hurt the performance of a guest when the guest runs alone. However, the shared-cache partition can make the performance of a guest more stable and not easy to be influenced by other guests. Briefly speaking, I'm trying to make the running time of the workload in a guest more deterministic and robust to other guests' interference. For those application, like the control program in automobile, that must produce results within a deadline, a deterministic execution time is more important than an execution time that is smaller in most cases but may be very large in worst case. > >> I find in arch_setup_meminit() function in tools/libxc/xc_dom_x86.c >> allocate memory pages depending on if the dom->superpages is true. >> Can we add a if-else to allocate one page at each time to the guest >> instead of allocate many pages in one time? > > That's for PV guests, which (by default) can't use 2M (not to speak of > 1G) pages anyway. Right now, I'm only looking at the PV guests and try to have some measurements of the cache partitioning mechanisms on the PV guests. I want to first show the benefits and cost of the page coloring mechanisms in Xen and then may explore the other types of guests. However, even for the PV guests, I'm struggling with the error I mentioned above. :-( Thank you very much for your time and help! Hope you could give me some advice where I should investigate to fix the issue. Best, Meng -- --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about partitioning shared cache in Xen
>>> On 14.01.15 at 16:27, wrote: > 2015-01-14 10:02 GMT-05:00 Jan Beulich : > On 14.01.15 at 15:45, wrote: >>> Yes. I try to use the bits [A16, A12] to isolate different colors in a >>> shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to >>> index the cache set. Because page size is 4KB, we have page frame >>> number's bits [A16, A12] overlapped with the bits used to index a >>> shared cache's cache set. So we can control those [A16, A12] bits to >>> control where the page should be placed. (The wiki pages about page >>> coloring is here: http://en.wikipedia.org/wiki/Cache_coloring) >> >> But the majority of allocations done for guests would be as 2M or >> 1G pages, > > First, I want to confirm my understanding is not incorrect: When Xen > allocate memory pages to guests, it current allocate a bunch of memory > pages at one time to guests. That's why you said the majority > allocation would be 2MB or 1GB. But the size of one memory page used > by guests is still 4KB. Am I correct? Yes. > But can we allocate one memory page to guests until the guests have > enough pages? We can, but that's inefficient for TLB usage and page table lookup. > I find in arch_setup_meminit() function in tools/libxc/xc_dom_x86.c > allocate memory pages depending on if the dom->superpages is true. > Can we add a if-else to allocate one page at each time to the guest > instead of allocate many pages in one time? That's for PV guests, which (by default) can't use 2M (not to speak of 1G) pages anyway. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about partitioning shared cache in Xen
Hi Jan, 2015-01-14 10:02 GMT-05:00 Jan Beulich : On 14.01.15 at 15:45, wrote: >> Yes. I try to use the bits [A16, A12] to isolate different colors in a >> shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to >> index the cache set. Because page size is 4KB, we have page frame >> number's bits [A16, A12] overlapped with the bits used to index a >> shared cache's cache set. So we can control those [A16, A12] bits to >> control where the page should be placed. (The wiki pages about page >> coloring is here: http://en.wikipedia.org/wiki/Cache_coloring) > > But the majority of allocations done for guests would be as 2M or > 1G pages, First, I want to confirm my understanding is not incorrect: When Xen allocate memory pages to guests, it current allocate a bunch of memory pages at one time to guests. That's why you said the majority allocation would be 2MB or 1GB. But the size of one memory page used by guests is still 4KB. Am I correct? If I'm correct, I will have the following argument. If I'm incorrect, would you mind letting me know which function or file I should look at to correct my understanding? (Thank you very much! ) If I'm wrong in the size of one memory page, the following argument won't hold and I need to first correct my above understanding=== But can we allocate one memory page to guests until the guests have enough pages? I find in arch_setup_meminit() function in tools/libxc/xc_dom_x86.c allocate memory pages depending on if the dom->superpages is true. Can we add a if-else to allocate one page at each time to the guest instead of allocate many pages in one time? For example, if it's not superpages, we call xc_domain_populate_physmap_exact_ca() to allocate one cache-colored page each time. if ( num_cache_colors == 0 ) { for ( i = rc = allocsz = 0; (i < dom->total_pages) && !rc; i += allocsz ) { allocsz = dom->total_pages - i; if ( allocsz > 1024*1024 ) allocsz = 1024*1024; rc = xc_domain_populate_physmap_exact( dom->xch, dom->guest_domid, allocsz, 0, 0, &dom->p2m_host[i]); } } else { for ( i = rc = allocsz = 0; (i < dom->total_pages) && !rc; i += allocsz ) { allocsz = 1; /* TODO: change to allocate mulitple pages when have memory pool */ rc = xc_domain_populate_physmap_exact_ca( dom->xch, dom->guest_domid, allocsz, 0, 0, &dom->p2m_host[i], cache_colors); } } Thank you very much! Best, Meng -- --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about partitioning shared cache in Xen
>>> On 14.01.15 at 15:45, wrote: > Yes. I try to use the bits [A16, A12] to isolate different colors in a > shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to > index the cache set. Because page size is 4KB, we have page frame > number's bits [A16, A12] overlapped with the bits used to index a > shared cache's cache set. So we can control those [A16, A12] bits to > control where the page should be placed. (The wiki pages about page > coloring is here: http://en.wikipedia.org/wiki/Cache_coloring) But the majority of allocations done for guests would be as 2M or 1G pages, so picking address bits 12..16 for coloring seems rather undesirable. (And surely no LLC would be large enough any time soon to allow coloring of 1G pages anyway.) Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about partitioning shared cache in Xen
Hi Andrew, Thank you very much for your quick reply! 2015-01-14 7:20 GMT-05:00 Andrew Cooper : > On 14/01/15 00:41, Meng Xu wrote: >> Hi, >> >> [Goal] >> I want to investigate the impact of the shared cache on the >> performance of workload in guest domain. >> I also want to partition the shared cache via page coloring mechanism >> so that guest domains can use different cache colors of shared cache >> and will not have interference in the shared cache. >> >> [Motivation: Why do I want to partition the shared cache?] >> Because the shared cache is shared among all guest domains (I assume >> the machine has multicores sharing the same LLC. For example, Intel(R) >> Xeon(R) CPU E5-1650 v2 has 6 physical cores sharing a 12MB L3 cache.), >> the workload in one domU can interfere another domU's memory-intensive >> workload on the same machine via shared cache. This shared-cache >> interference makes the execution time of the workload in a domU >> non-deterministic and increase a lot. (If we assume the worst case, >> the worst-case execution time of the workload will be too >> pessimistic.) A stable execution time is very important in real-time >> computation when the real-time program, like the control program on >> automobile, have to generate the result within a deadline. >> >> I did some quick measurements to show how shared cache can be used by >> a holistic domain to interfere the execution time of another domain's >> workload. I pin the VCPUs of two domains to different physical cores >> and use one domain to pollute the shared cache. The result shows that >> the shared-cache interference can make the execution time of another >> domain's workload slow down by 4x. The whole experiment result can be >> found at >> https://github.com/PennPanda/cis601/blob/master/project/data/boxplot_cache_v2.pdf >> . (The workload of the figure is a program reading a large array. I >> run the program for 100 times and draw the latency of accessing the >> array in a box plot. The first column with name "alone−d1v1" is the >> boxplot latency when the program in dom1 runs alone. The fourth column >> "d1v1d2v1−pindiffcore" is the boxplot latency when the program in dom1 >> runs along with another program in dom2, and these two domains uses >> different cores. dom1 and dom2 have 1 vcpu with budget equal to >> period. The scheduler is credit scheduler.) >> >> [Idea of how to partition the shared cache] >> When a PV guest domain is created, it will call xc_dom_boot_mem_init() >> to allocate memory for the domain, which finally calls >> xc_domain_populate_physmap_exact() to allocate memory pages from >> domheap in Xen. >> The idea of partitioning the share cache is as follows: >> 1) xl tool change: Add an option in domain's configuration file which >> specifies which cache colors this domain should use. (I have done this >> and when I use xl create --dry-run, I can see the parameters are >> parsed to the build information.) >> 2) hypervisor change: Add another hypercall >> xc_domain_populate_physmap_exact_ca() which has one more parameter, >> i.e, the cache colors this domain should use. I also need to reserve a >> memory pool which sort the reserved memory pages based on its cache >> color. >> >> When a PV domain is created, I can specify the cache colors it uses. >> Then the xl tool will call the xc_domain_populate_physmap_exact_ca() >> to only allocate the memory pages with the specified cache colors to >> this domain. >> >> [Quick implementation] >> I attached my quick implementation patch at the end of this email. >> >> [Issues and Questions] >> After I applied the patch to Xen's commit point >> 36174af3fbeb1b662c0eadbfa193e77f68cc955b and run it on my machine, >> dom0 cannot boot up.:-( >> The error message from dom0 is: >> [0.00] Kernel panic - not syncing: Failed to get contiguous >> memory for DMA from Xen! >> >> [0.00] You either: don't have the permissions, do not have >> enough free memory under 4GB, or the hypervisor memory is too >> fragmented! (rc:-12) >> >> I tried to print every message in the function I touched in order to >> figure out where goes wrong but failed. :-( >> The thing I cannot understand is that: My implementation haven't >> reserve any memory pages in the cache-aware memory pool before the >> system boots up. Basically, every function I modified haven't been >> called before the system boots up. But the system crashes. :-( (The >> system can boot up and work perfectly before applying my patch.) >> >> I really appreciate it if any of you could point out the part I missed >> or misunderstood. :-) > > The error message is quite clear. I presume that your cache > partitioning algorithm has prevented dom0 from getting any > machine-contiguous pages for DMA. This prevents dom0 from using any > hardware, such as its disks or the network. Actually, I didn't partition the shared cache for dom0. dom0 should have a continuous memory as before. I didn't modify any function in the existi
Re: [Xen-devel] Question about partitioning shared cache in Xen
On 14/01/15 00:41, Meng Xu wrote: > Hi, > > [Goal] > I want to investigate the impact of the shared cache on the > performance of workload in guest domain. > I also want to partition the shared cache via page coloring mechanism > so that guest domains can use different cache colors of shared cache > and will not have interference in the shared cache. > > [Motivation: Why do I want to partition the shared cache?] > Because the shared cache is shared among all guest domains (I assume > the machine has multicores sharing the same LLC. For example, Intel(R) > Xeon(R) CPU E5-1650 v2 has 6 physical cores sharing a 12MB L3 cache.), > the workload in one domU can interfere another domU's memory-intensive > workload on the same machine via shared cache. This shared-cache > interference makes the execution time of the workload in a domU > non-deterministic and increase a lot. (If we assume the worst case, > the worst-case execution time of the workload will be too > pessimistic.) A stable execution time is very important in real-time > computation when the real-time program, like the control program on > automobile, have to generate the result within a deadline. > > I did some quick measurements to show how shared cache can be used by > a holistic domain to interfere the execution time of another domain's > workload. I pin the VCPUs of two domains to different physical cores > and use one domain to pollute the shared cache. The result shows that > the shared-cache interference can make the execution time of another > domain's workload slow down by 4x. The whole experiment result can be > found at > https://github.com/PennPanda/cis601/blob/master/project/data/boxplot_cache_v2.pdf > . (The workload of the figure is a program reading a large array. I > run the program for 100 times and draw the latency of accessing the > array in a box plot. The first column with name "alone−d1v1" is the > boxplot latency when the program in dom1 runs alone. The fourth column > "d1v1d2v1−pindiffcore" is the boxplot latency when the program in dom1 > runs along with another program in dom2, and these two domains uses > different cores. dom1 and dom2 have 1 vcpu with budget equal to > period. The scheduler is credit scheduler.) > > [Idea of how to partition the shared cache] > When a PV guest domain is created, it will call xc_dom_boot_mem_init() > to allocate memory for the domain, which finally calls > xc_domain_populate_physmap_exact() to allocate memory pages from > domheap in Xen. > The idea of partitioning the share cache is as follows: > 1) xl tool change: Add an option in domain's configuration file which > specifies which cache colors this domain should use. (I have done this > and when I use xl create --dry-run, I can see the parameters are > parsed to the build information.) > 2) hypervisor change: Add another hypercall > xc_domain_populate_physmap_exact_ca() which has one more parameter, > i.e, the cache colors this domain should use. I also need to reserve a > memory pool which sort the reserved memory pages based on its cache > color. > > When a PV domain is created, I can specify the cache colors it uses. > Then the xl tool will call the xc_domain_populate_physmap_exact_ca() > to only allocate the memory pages with the specified cache colors to > this domain. > > [Quick implementation] > I attached my quick implementation patch at the end of this email. > > [Issues and Questions] > After I applied the patch to Xen's commit point > 36174af3fbeb1b662c0eadbfa193e77f68cc955b and run it on my machine, > dom0 cannot boot up.:-( > The error message from dom0 is: > [0.00] Kernel panic - not syncing: Failed to get contiguous > memory for DMA from Xen! > > [0.00] You either: don't have the permissions, do not have > enough free memory under 4GB, or the hypervisor memory is too > fragmented! (rc:-12) > > I tried to print every message in the function I touched in order to > figure out where goes wrong but failed. :-( > The thing I cannot understand is that: My implementation haven't > reserve any memory pages in the cache-aware memory pool before the > system boots up. Basically, every function I modified haven't been > called before the system boots up. But the system crashes. :-( (The > system can boot up and work perfectly before applying my patch.) > > I really appreciate it if any of you could point out the part I missed > or misunderstood. :-) The error message is quite clear. I presume that your cache partitioning algorithm has prevented dom0 from getting any machine-contiguous pages for DMA. This prevents dom0 from using any hardware, such as its disks or the network. What I don't see is how you plan to isolate different colours in a shared cache. I am guessing (seeing as the patch is full of debugging and hard to follow) that you are using the low order bits in the physical address to identify the colour, which will indeed prevent any continuous allocations from happening. Is this what you are attem
[Xen-devel] Question about partitioning shared cache in Xen
Hi, [Goal] I want to investigate the impact of the shared cache on the performance of workload in guest domain. I also want to partition the shared cache via page coloring mechanism so that guest domains can use different cache colors of shared cache and will not have interference in the shared cache. [Motivation: Why do I want to partition the shared cache?] Because the shared cache is shared among all guest domains (I assume the machine has multicores sharing the same LLC. For example, Intel(R) Xeon(R) CPU E5-1650 v2 has 6 physical cores sharing a 12MB L3 cache.), the workload in one domU can interfere another domU's memory-intensive workload on the same machine via shared cache. This shared-cache interference makes the execution time of the workload in a domU non-deterministic and increase a lot. (If we assume the worst case, the worst-case execution time of the workload will be too pessimistic.) A stable execution time is very important in real-time computation when the real-time program, like the control program on automobile, have to generate the result within a deadline. I did some quick measurements to show how shared cache can be used by a holistic domain to interfere the execution time of another domain's workload. I pin the VCPUs of two domains to different physical cores and use one domain to pollute the shared cache. The result shows that the shared-cache interference can make the execution time of another domain's workload slow down by 4x. The whole experiment result can be found at https://github.com/PennPanda/cis601/blob/master/project/data/boxplot_cache_v2.pdf . (The workload of the figure is a program reading a large array. I run the program for 100 times and draw the latency of accessing the array in a box plot. The first column with name "alone−d1v1" is the boxplot latency when the program in dom1 runs alone. The fourth column "d1v1d2v1−pindiffcore" is the boxplot latency when the program in dom1 runs along with another program in dom2, and these two domains uses different cores. dom1 and dom2 have 1 vcpu with budget equal to period. The scheduler is credit scheduler.) [Idea of how to partition the shared cache] When a PV guest domain is created, it will call xc_dom_boot_mem_init() to allocate memory for the domain, which finally calls xc_domain_populate_physmap_exact() to allocate memory pages from domheap in Xen. The idea of partitioning the share cache is as follows: 1) xl tool change: Add an option in domain's configuration file which specifies which cache colors this domain should use. (I have done this and when I use xl create --dry-run, I can see the parameters are parsed to the build information.) 2) hypervisor change: Add another hypercall xc_domain_populate_physmap_exact_ca() which has one more parameter, i.e, the cache colors this domain should use. I also need to reserve a memory pool which sort the reserved memory pages based on its cache color. When a PV domain is created, I can specify the cache colors it uses. Then the xl tool will call the xc_domain_populate_physmap_exact_ca() to only allocate the memory pages with the specified cache colors to this domain. [Quick implementation] I attached my quick implementation patch at the end of this email. [Issues and Questions] After I applied the patch to Xen's commit point 36174af3fbeb1b662c0eadbfa193e77f68cc955b and run it on my machine, dom0 cannot boot up.:-( The error message from dom0 is: [0.00] Kernel panic - not syncing: Failed to get contiguous memory for DMA from Xen! [0.00] You either: don't have the permissions, do not have enough free memory under 4GB, or the hypervisor memory is too fragmented! (rc:-12) I tried to print every message in the function I touched in order to figure out where goes wrong but failed. :-( The thing I cannot understand is that: My implementation haven't reserve any memory pages in the cache-aware memory pool before the system boots up. Basically, every function I modified haven't been called before the system boots up. But the system crashes. :-( (The system can boot up and work perfectly before applying my patch.) I really appreciate it if any of you could point out the part I missed or misunderstood. :-) Thank you very very much! Best, Meng The full crash message is as follows: Xen 4.5.0-rc (XEN) Xen version 4.5.0-rc (root@) (gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3) debug=y Sun Jan 11 11:39:23 EST 2015 (XEN) Latest ChangeSet: Sun Jan 4 22:19:40 2015 -0500 git:962a13f-dirty (XEN) Bootloader: GRUB 1.99-21ubuntu3.14 (XEN) Command line: placeholder dom0_memory=512M sched=credit console=tty0 com1=115200n8 console=com1 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) - 0009fc00 (usable) (XEN) 0009fc00 - 000a (reserved) (XEN) 000f - 00