Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Mon, 2015-04-20 at 11:38 +0100, Stefano Stabellini wrote: On Mon, 20 Apr 2015, Ian Campbell wrote: On Mon, 2015-04-20 at 10:58 +0100, Stefano Stabellini wrote: On Sat, 18 Apr 2015, Chen Baozi wrote: On Fri, Apr 17, 2015 at 05:13:16PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The code is pretty complex as is -- I would rather avoid adding more complexity to it. For example we would need to bring back a mechanism to track dma address - pseudo-physical address mappings on arm, even though it would be far simpler of course. Also I think it makes sense to use the swiotlb buffer for its original purpose. If we could introduce a mechanism to get a lower than 4G buffer in dom0, but matching the 1:1, I think it would make the maintenance much easier on the linux side. +1 Actually, we have already had the mechanism on arm32 to populate at least one bank of memory below 4G. Thus, the only thing we have to do on the hypervisor side is to make arm32 and arm64 share the same process in allocate_memory_11(), removing the 'lowmem = is_32bit_domain(d)' related conditions. If this is acceptable, the only thing we need to do in Linux kernel is to add the __GFP_DMA flag when allocating pages for xen_io_tlb_start in xen_swiotlb_init. Please send out the Linux patch using __GFP_DMA and I'll queue it up. What happens with __GFP_DMA if no suitable memory is available (i.e. all of RAM is 4GB)? __get_free_pages would fail and xen_swiotlb_init will try again with a smaller size and print a warning. If no RAM under 4G is available, This is always going to be the case on e.g. X-Gene where all RAM is 4G (starts at 128GB IIRC). IOW just doing it like this is going to break on some arm64 platforms. xen_swiotlb_init will fail with an error. However it is probably better to fail explicitly with an error message than failing with a stack trace at some point down the line when DMA is actually done. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Sat, 18 Apr 2015, Chen Baozi wrote: On Fri, Apr 17, 2015 at 05:13:16PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The code is pretty complex as is -- I would rather avoid adding more complexity to it. For example we would need to bring back a mechanism to track dma address - pseudo-physical address mappings on arm, even though it would be far simpler of course. Also I think it makes sense to use the swiotlb buffer for its original purpose. If we could introduce a mechanism to get a lower than 4G buffer in dom0, but matching the 1:1, I think it would make the maintenance much easier on the linux side. +1 Actually, we have already had the mechanism on arm32 to populate at least one bank of memory below 4G. Thus, the only thing we have to do on the hypervisor side is to make arm32 and arm64 share the same process in allocate_memory_11(), removing the 'lowmem = is_32bit_domain(d)' related conditions. If this is acceptable, the only thing we need to do in Linux kernel is to add the __GFP_DMA flag when allocating pages for xen_io_tlb_start in xen_swiotlb_init. Please send out the Linux patch using __GFP_DMA and I'll queue it up. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Sat, 2015-04-18 at 17:23 +0800, Chen Baozi wrote: On Sat, Apr 18, 2015 at 05:08:58PM +0800, Chen Baozi wrote: On Fri, Apr 17, 2015 at 05:13:16PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The code is pretty complex as is -- I would rather avoid adding more complexity to it. For example we would need to bring back a mechanism to track dma address - pseudo-physical address mappings on arm, even though it would be far simpler of course. Also I think it makes sense to use the swiotlb buffer for its original purpose. If we could introduce a mechanism to get a lower than 4G buffer in dom0, but matching the 1:1, I think it would make the maintenance much easier on the linux side. +1 Actually, we have already had the mechanism on arm32 to populate at least one bank of memory below 4G. Thus, the only thing we have to do on the hypervisor side is to make arm32 and arm64 share the same process in allocate_memory_11(), removing the 'lowmem = is_32bit_domain(d)' related conditions. If this is acceptable, Ah yes, I'd forgotten we already handled this for 32-bit, so enabling it for 64-bit does indeed make sense. However what you do is unfortunately not quite correct, since on a 64-bit system which has no RAM at all under 4GB it will now fail to populate the first bank and error out. I can think of two solutions off hand, either: Either: initialise lowmem to (is_32bit_domain(d) || ram_exists_below_4g()) (which in practice is equivalent to ram_exists_below_4g() since that is always going to be true for a 32-bit system). Or; Allow the initial bank0 filling to also fallback to non-lowmem (on 64-bit only) as the following stuff for subsequent banks does. The first option is tricky because I'm not sure what ram_exists_below_4g() would look like inside, i.e. whether it can be queried from the page allocator or if we would need to make a note of this fact when parsing/settingup memory early on. The second one is tricky because it opens up the possibility of ploughing on if lowmem can't be allocated, even if it is strictly required on the platform. I think on balance the first would be better. the only thing we need to do in Linux kernel is to add the __GFP_DMA flag when allocating pages for xen_io_tlb_start in xen_swiotlb_init. This is the hacks I'm using: diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 40a5303..d83a2b1 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -264,7 +264,7 @@ static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo) unsigned int order = get_11_allocation_size(kinfo-unassigned_mem); int i; -bool_t lowmem = is_32bit_domain(d); +bool_t lowmem = true; unsigned int bits; printk(Allocating 1:1 mappings totalling %ldMB for dom0:\n, @@ -279,7 +279,7 @@ static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo) */ while ( order = min_low_order ) { -for ( bits = order ; bits = (lowmem ? 32 : PADDR_BITS); bits++ ) +for ( bits = order ; bits = 32 ; bits++ ) I think leave this as is, even if lowmem ends up being const. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Mon, 20 Apr 2015, Ian Campbell wrote: On Mon, 2015-04-20 at 10:58 +0100, Stefano Stabellini wrote: On Sat, 18 Apr 2015, Chen Baozi wrote: On Fri, Apr 17, 2015 at 05:13:16PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The code is pretty complex as is -- I would rather avoid adding more complexity to it. For example we would need to bring back a mechanism to track dma address - pseudo-physical address mappings on arm, even though it would be far simpler of course. Also I think it makes sense to use the swiotlb buffer for its original purpose. If we could introduce a mechanism to get a lower than 4G buffer in dom0, but matching the 1:1, I think it would make the maintenance much easier on the linux side. +1 Actually, we have already had the mechanism on arm32 to populate at least one bank of memory below 4G. Thus, the only thing we have to do on the hypervisor side is to make arm32 and arm64 share the same process in allocate_memory_11(), removing the 'lowmem = is_32bit_domain(d)' related conditions. If this is acceptable, the only thing we need to do in Linux kernel is to add the __GFP_DMA flag when allocating pages for xen_io_tlb_start in xen_swiotlb_init. Please send out the Linux patch using __GFP_DMA and I'll queue it up. What happens with __GFP_DMA if no suitable memory is available (i.e. all of RAM is 4GB)? __get_free_pages would fail and xen_swiotlb_init will try again with a smaller size and print a warning. If no RAM under 4G is available, xen_swiotlb_init will fail with an error. However it is probably better to fail explicitly with an error message than failing with a stack trace at some point down the line when DMA is actually done. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Sat, 2015-04-18 at 00:41 +0800, Chen Baozi wrote: On Fri, Apr 17, 2015 at 02:21:45PM +0100, Ian Campbell wrote: On Fri, 2015-04-17 at 19:24 +0800, Chen Baozi wrote: Hi all, According to my recent experience, there might be some problems of swiotlb dma map on 1:1 mapping arm64 dom0 with large memory. The issue is like below: For those arm64 server with large memory, it is possible to set dom0_mem 4G (e.g. I have one set with 16G). In this case, according to my understanding, there is chance that the dom0 kernel needs to map some buffers above 4G to do ^below? DMA operations (e.g. in snps,dwmac ethernet driver). However, most DMA engines support only 32-bit physical address, thus aren't able to operate directly on those memory. Even on arm64 systems with RAM above 4GB? That seems short-sighted. Oh well, I suppose we have to live with it. I understand for most ARM SoCs, the DMA engines come from third party IP companies which is arm32/arm64 independent. Thus, 32-bit address DMA engine should be common even on arm64 system. Yes, I suppose that will be true, but I stand by my shortsighted comment ;-). (What's the point of a DMA engine which can only access 1/4 of the system's RAM and therefore requires bounce buffering before it can be used...) The preferred way is to use/enable SMMU(IOMMU). However, we are focusing on 1:1 mapping right now... IIUC, swiotlb is implemented to solve this (using bounce buffer), if there is no IOMMU or IOMMU is not enabled on the system. Sadly, it seems that xen_swiotlb_map_page in my dom0 kernel allocates (start_dma_addr = 0x94480) the buffers for DMA above 4G which fails dma_capable() checking and was then unable to return from xen_swiotlb_map_page() successfully. The swiotlb bounce buffer have been allocated below 4GB? I have no idea (about the exact behavior of bounce buffer). But I don't think it has been allocated below 4GB on my board, for in that case it won't fail dma_capable() in the end of xen_swiotlb_map_page(). I suspect that xen_swiotlb_init is buggy for ARM -- it allocates some random pages and then swizzles the backing pages for ones 4G, but that won't work on an ARM dom0 with a 1:1 mapping, I don't think. Do you see error messages along those lines? Essentially I think either xen_swiotlb_fixup is unable to work on ARM, or the following: start_dma_addr = xen_virt_to_bus(xen_io_tlb_start); is returning 1:1 and not reflecting the fixup. Yes. It seems very likely what happened in my system. Stefano suggested that xen_swiotlb_fixup is a NOP on arm (which doesn't surprise me, that made sense until this issue was identified), which will be the root cause here. If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? If all the banks of memory that xen populate to dom0 is below 4G, yes. However, if some banks of memory for dom0 is above 4G, usually not. You message on IRC suggested you weren't, did you hack around this? Yes. I did some hacks to help understand my situation earlier. What I have done and observed is as below: 1. At the very beginning, I used the default dom0_mem value to boot the system, which is 128M. And I didn't realize the DMA buffer problem. 2. I started to try more dom0_mem (16G). Then the ethernet driver reported that it cannot initiate rx buffers (DMA buffers). And I found out that allocate_memory_11 didn't populate any banks of memory below 4G for dom0. At that time, I guessed the failure might be introduced because there is no memory banks below 4G was populated. (there is only a 2GB address space below 4G for physical memory on my platform, and there is a hole for PCI memory address space above 4G before the memory address space continue.) 3. So I did some hacks to let lowmem=true manually in allocate_memory_11, which made xen on arm64 acts similar as it is on arm32 that populates at least one bank of memory below 4G to dom0. (this is the point when I send you message on IRC.) I thought that can solve the problem, but it doesn't. 4. Then I found out once xen populated any banks of memory which is above 4G, the ethernet driver would have chances (very likely, almost every time if dom0_mem=16G) to use buffers above 4G, regardless whether dom0 has banks of memory below 4G. I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. Although the first option seems preferable at first glance it has the short coming that it requires dom0 to have some memory below 4GB, which might not necessarily be the
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Mon, 2015-04-20 at 10:58 +0100, Stefano Stabellini wrote: On Sat, 18 Apr 2015, Chen Baozi wrote: On Fri, Apr 17, 2015 at 05:13:16PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The code is pretty complex as is -- I would rather avoid adding more complexity to it. For example we would need to bring back a mechanism to track dma address - pseudo-physical address mappings on arm, even though it would be far simpler of course. Also I think it makes sense to use the swiotlb buffer for its original purpose. If we could introduce a mechanism to get a lower than 4G buffer in dom0, but matching the 1:1, I think it would make the maintenance much easier on the linux side. +1 Actually, we have already had the mechanism on arm32 to populate at least one bank of memory below 4G. Thus, the only thing we have to do on the hypervisor side is to make arm32 and arm64 share the same process in allocate_memory_11(), removing the 'lowmem = is_32bit_domain(d)' related conditions. If this is acceptable, the only thing we need to do in Linux kernel is to add the __GFP_DMA flag when allocating pages for xen_io_tlb_start in xen_swiotlb_init. Please send out the Linux patch using __GFP_DMA and I'll queue it up. What happens with __GFP_DMA if no suitable memory is available (i.e. all of RAM is 4GB)? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Mon, Apr 20, 2015 at 10:58:52AM +0100, Stefano Stabellini wrote: On Sat, 18 Apr 2015, Chen Baozi wrote: On Fri, Apr 17, 2015 at 05:13:16PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The code is pretty complex as is -- I would rather avoid adding more complexity to it. For example we would need to bring back a mechanism to track dma address - pseudo-physical address mappings on arm, even though it would be far simpler of course. Also I think it makes sense to use the swiotlb buffer for its original purpose. If we could introduce a mechanism to get a lower than 4G buffer in dom0, but matching the 1:1, I think it would make the maintenance much easier on the linux side. +1 Actually, we have already had the mechanism on arm32 to populate at least one bank of memory below 4G. Thus, the only thing we have to do on the hypervisor side is to make arm32 and arm64 share the same process in allocate_memory_11(), removing the 'lowmem = is_32bit_domain(d)' related conditions. If this is acceptable, the only thing we need to do in Linux kernel is to add the __GFP_DMA flag when allocating pages for xen_io_tlb_start in xen_swiotlb_init. Please send out the Linux patch using __GFP_DMA and I'll queue it up. I have sent it out in case that Ian would have no disagreement after the coming discussion (hopefully) ;-) Baozi. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Mon, 20 Apr 2015, Ian Campbell wrote: On Mon, 2015-04-20 at 11:38 +0100, Stefano Stabellini wrote: On Mon, 20 Apr 2015, Ian Campbell wrote: On Mon, 2015-04-20 at 10:58 +0100, Stefano Stabellini wrote: On Sat, 18 Apr 2015, Chen Baozi wrote: On Fri, Apr 17, 2015 at 05:13:16PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The code is pretty complex as is -- I would rather avoid adding more complexity to it. For example we would need to bring back a mechanism to track dma address - pseudo-physical address mappings on arm, even though it would be far simpler of course. Also I think it makes sense to use the swiotlb buffer for its original purpose. If we could introduce a mechanism to get a lower than 4G buffer in dom0, but matching the 1:1, I think it would make the maintenance much easier on the linux side. +1 Actually, we have already had the mechanism on arm32 to populate at least one bank of memory below 4G. Thus, the only thing we have to do on the hypervisor side is to make arm32 and arm64 share the same process in allocate_memory_11(), removing the 'lowmem = is_32bit_domain(d)' related conditions. If this is acceptable, the only thing we need to do in Linux kernel is to add the __GFP_DMA flag when allocating pages for xen_io_tlb_start in xen_swiotlb_init. Please send out the Linux patch using __GFP_DMA and I'll queue it up. What happens with __GFP_DMA if no suitable memory is available (i.e. all of RAM is 4GB)? __get_free_pages would fail and xen_swiotlb_init will try again with a smaller size and print a warning. If no RAM under 4G is available, This is always going to be the case on e.g. X-Gene where all RAM is 4G (starts at 128GB IIRC). IOW just doing it like this is going to break on some arm64 platforms. OK. Basically we need to find a way to retry without GFP_DMA if no ram under 4G is present at all. xen_swiotlb_init will fail with an error. However it is probably better to fail explicitly with an error message than failing with a stack trace at some point down the line when DMA is actually done. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Fri, Apr 17, 2015 at 03:32:20PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Chen Baozi wrote: Hi all, According to my recent experience, there might be some problems of swiotlb dma map on 1:1 mapping arm64 dom0 with large memory. The issue is like below: For those arm64 server with large memory, it is possible to set dom0_mem 4G (e.g. I have one set with 16G). In this case, according to my understanding, there is chance that the dom0 kernel needs to map some buffers above 4G to do DMA operations (e.g. in snps,dwmac ethernet driver). However, most DMA engines support only 32-bit physical address, thus aren't able to operate directly on those memory. IIUC, swiotlb is implemented to solve this (using bounce buffer), if there is no IOMMU or IOMMU is not enabled on the system. Sadly, it seems that xen_swiotlb_map_page in my dom0 kernel allocates (start_dma_addr = 0x94480) the buffers for DMA above 4G which fails dma_capable() checking and was then unable to return from xen_swiotlb_map_page() successfully. If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. I am not familiar with swiotlb-xen, so there would be misunderstanding about the current situation. Fix me if I did/understood anything wrong. Any ideas? I think that the problem is that xen_swiotlb_init doesn't necessarely allocate memory under 4G on arm/arm64. xen_swiotlb_init calls __get_free_pages to allocate memory, so the pages could easily be above 4G. Subsequently xen_swiotlb_fixup is called on the allocated memory range, calling xen_create_contiguous_region and passing an address_bits mask. However xen_create_contiguous_region doesn't actually do anything at all on ARM. I think that given that dom0 is mapped 1:1 on ARM, the easiest and best fix would be to simply allocate memory under 4G to begin with. Something like (maybe with an ifdef ARM around it): diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 810ad41..22ac33a 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -235,7 +235,7 @@ retry: #define SLABS_PER_PAGE (1 (PAGE_SHIFT - IO_TLB_SHIFT)) #define IO_TLB_MIN_SLABS ((120) IO_TLB_SHIFT) while ((SLABS_PER_PAGE order) IO_TLB_MIN_SLABS) { - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order); + xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN|__GFP_DMA32, order); ^^ __GFP_DMA works on arm64 Cheers, Baozi. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
Hi Stefano, On Fri, Apr 17, 2015 at 03:32:20PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Chen Baozi wrote: Hi all, According to my recent experience, there might be some problems of swiotlb dma map on 1:1 mapping arm64 dom0 with large memory. The issue is like below: For those arm64 server with large memory, it is possible to set dom0_mem 4G (e.g. I have one set with 16G). In this case, according to my understanding, there is chance that the dom0 kernel needs to map some buffers above 4G to do DMA operations (e.g. in snps,dwmac ethernet driver). However, most DMA engines support only 32-bit physical address, thus aren't able to operate directly on those memory. IIUC, swiotlb is implemented to solve this (using bounce buffer), if there is no IOMMU or IOMMU is not enabled on the system. Sadly, it seems that xen_swiotlb_map_page in my dom0 kernel allocates (start_dma_addr = 0x94480) the buffers for DMA above 4G which fails dma_capable() checking and was then unable to return from xen_swiotlb_map_page() successfully. If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. I am not familiar with swiotlb-xen, so there would be misunderstanding about the current situation. Fix me if I did/understood anything wrong. Any ideas? I think that the problem is that xen_swiotlb_init doesn't necessarely allocate memory under 4G on arm/arm64. xen_swiotlb_init calls __get_free_pages to allocate memory, so the pages could easily be above 4G. Subsequently xen_swiotlb_fixup is called on the allocated memory range, calling xen_create_contiguous_region and passing an address_bits mask. However xen_create_contiguous_region doesn't actually do anything at all on ARM. I think that given that dom0 is mapped 1:1 on ARM, the easiest and best fix would be to simply allocate memory under 4G to begin with. Something like (maybe with an ifdef ARM around it): diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 810ad41..22ac33a 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -235,7 +235,7 @@ retry: #define SLABS_PER_PAGE (1 (PAGE_SHIFT - IO_TLB_SHIFT)) #define IO_TLB_MIN_SLABS ((120) IO_TLB_SHIFT) while ((SLABS_PER_PAGE order) IO_TLB_MIN_SLABS) { - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order); + xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN|__GFP_DMA32, order); if (xen_io_tlb_start) break; order--; I have no idea if __GFP_DMA32 on arm64 has something wrong. But It looks like that it doesn't help... Here is the memory info about what xen has populated to dom0 (I did some hacks to allocate_memory_11 to make it map some low memory banks to dom0): (XEN) Allocating 1:1 mappings totalling 16384MB for dom0: (XEN) BANK[0] 0x008800-0x009800 (256MB) (XEN) BANK[1] 0x00a000-0x00f800 (1408MB) (XEN) BANK[2] 0x04-0x06 (8192MB) (XEN) BANK[3] 0x068000-0x07 (2048MB) (XEN) BANK[4] 0x08-0x09 (4096MB) (XEN) BANK[5] 0x094000-0x095800 (384MB) And Here is the printk info I got when trying to map a dma page: enter xen_swiotlb_map_page. phys = 0x9444e4042, dev_addr = 0x9444e4042, size = 0x600 start_dma_addr = 0x94480 virt_to_phys(xen_io_tlb_start) = 0x94480 Oh Well, have to allocate and map a bounce buffer. map = 0x94480 dev_addr = 0x94480 *dev-dma_mask = 0x !dma_capable(0xffc8bd384810, 0x94480, 0x600) And the patch I used for dom0 hacking: diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 810ad41..96465cf 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -235,7 +235,7 @@ retry: #define SLABS_PER_PAGE (1 (PAGE_SHIFT - IO_TLB_SHIFT)) #define IO_TLB_MIN_SLABS ((120) IO_TLB_SHIFT) while ((SLABS_PER_PAGE order) IO_TLB_MIN_SLABS) { - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order); + xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN|__GFP_DMA32, order); if (xen_io_tlb_start) break; order--; @@ -391,6 +391,13 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, dma_addr_t dev_addr = xen_phys_to_bus(phys); BUG_ON(dir == DMA_NONE); + printk(enter xen_swiotlb_map_page.\n); + printk(phys = 0x%lx, dev_addr = 0x%lx, size = 0x%lx\n, + phys, dev_addr, size); + printk(start_dma_addr = 0x%lx\n, start_dma_addr); + printk(virt_to_phys(xen_io_tlb_start) = 0x%lx\n, + virt_to_phys(xen_io_tlb_start)); + /* * If
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Fri, Apr 17, 2015 at 05:13:16PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The code is pretty complex as is -- I would rather avoid adding more complexity to it. For example we would need to bring back a mechanism to track dma address - pseudo-physical address mappings on arm, even though it would be far simpler of course. Also I think it makes sense to use the swiotlb buffer for its original purpose. If we could introduce a mechanism to get a lower than 4G buffer in dom0, but matching the 1:1, I think it would make the maintenance much easier on the linux side. +1 Actually, we have already had the mechanism on arm32 to populate at least one bank of memory below 4G. Thus, the only thing we have to do on the hypervisor side is to make arm32 and arm64 share the same process in allocate_memory_11(), removing the 'lowmem = is_32bit_domain(d)' related conditions. If this is acceptable, the only thing we need to do in Linux kernel is to add the __GFP_DMA flag when allocating pages for xen_io_tlb_start in xen_swiotlb_init. Baozi. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Sat, Apr 18, 2015 at 05:08:58PM +0800, Chen Baozi wrote: On Fri, Apr 17, 2015 at 05:13:16PM +0100, Stefano Stabellini wrote: On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The code is pretty complex as is -- I would rather avoid adding more complexity to it. For example we would need to bring back a mechanism to track dma address - pseudo-physical address mappings on arm, even though it would be far simpler of course. Also I think it makes sense to use the swiotlb buffer for its original purpose. If we could introduce a mechanism to get a lower than 4G buffer in dom0, but matching the 1:1, I think it would make the maintenance much easier on the linux side. +1 Actually, we have already had the mechanism on arm32 to populate at least one bank of memory below 4G. Thus, the only thing we have to do on the hypervisor side is to make arm32 and arm64 share the same process in allocate_memory_11(), removing the 'lowmem = is_32bit_domain(d)' related conditions. If this is acceptable, the only thing we need to do in Linux kernel is to add the __GFP_DMA flag when allocating pages for xen_io_tlb_start in xen_swiotlb_init. This is the hacks I'm using: diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 40a5303..d83a2b1 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -264,7 +264,7 @@ static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo) unsigned int order = get_11_allocation_size(kinfo-unassigned_mem); int i; -bool_t lowmem = is_32bit_domain(d); +bool_t lowmem = true; unsigned int bits; printk(Allocating 1:1 mappings totalling %ldMB for dom0:\n, @@ -279,7 +279,7 @@ static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo) */ while ( order = min_low_order ) { -for ( bits = order ; bits = (lowmem ? 32 : PADDR_BITS); bits++ ) +for ( bits = order ; bits = 32 ; bits++ ) { pg = alloc_domheap_pages(d, order, MEMF_bits(bits)); if ( pg != NULL ) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Fri, 17 Apr 2015, Ian Campbell wrote: On Fri, 2015-04-17 at 19:24 +0800, Chen Baozi wrote: Hi all, According to my recent experience, there might be some problems of swiotlb dma map on 1:1 mapping arm64 dom0 with large memory. The issue is like below: For those arm64 server with large memory, it is possible to set dom0_mem 4G (e.g. I have one set with 16G). In this case, according to my understanding, there is chance that the dom0 kernel needs to map some buffers above 4G to do ^below? DMA operations (e.g. in snps,dwmac ethernet driver). However, most DMA engines support only 32-bit physical address, thus aren't able to operate directly on those memory. Even on arm64 systems with RAM above 4GB? That seems short-sighted. Oh well, I suppose we have to live with it. IIUC, swiotlb is implemented to solve this (using bounce buffer), if there is no IOMMU or IOMMU is not enabled on the system. Sadly, it seems that xen_swiotlb_map_page in my dom0 kernel allocates (start_dma_addr = 0x94480) the buffers for DMA above 4G which fails dma_capable() checking and was then unable to return from xen_swiotlb_map_page() successfully. The swiotlb bounce buffer have been allocated below 4GB? I suspect that xen_swiotlb_init is buggy for ARM -- it allocates some random pages and then swizzles the backing pages for ones 4G, but that won't work on an ARM dom0 with a 1:1 mapping, I don't think. Do you see error messages along those lines? Essentially I think either xen_swiotlb_fixup is unable to work on ARM, or the following: start_dma_addr = xen_virt_to_bus(xen_io_tlb_start); is returning 1:1 and not reflecting the fixup. The swiotlb on arm doesn't necessarily get memory under 4G, see my other reply. If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. Although the first option seems preferable at first glance it has the short coming that it requires dom0 to have some memory below 4GB, which might not necessarily be the case. I think we should arrange dom0 to get some memory under 4G to begin with, not necessarily all of it. The second option seems like it might be uglier but doesn't suffer from this issue. Can you please look and find out if the IPA at 0x94480 is actually backed by 1:1 RAM or if xen_swiotlb_fixup has done it's job and updated things such that the associated PAs are below 4GB? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Fri, 2015-04-17 at 15:32 +0100, Stefano Stabellini wrote: I think that given that dom0 is mapped 1:1 on ARM, the easiest and best fix would be to simply allocate memory under 4G to begin with. Not necessarily best, see my reply (hint: dom0 might not have RAM under 4GB even if the host does). ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Fri, 2015-04-17 at 15:34 +0100, Stefano Stabellini wrote: If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. I don't think that making xen_swiotlb_fixup work on ARM is a good idea: it would break the 1:1. This would actually work though, I think, because this is the swiotlb so we definitely have the opportunity to return the actual DMA address whenever we use this buffer and the device will use it in the right places for sure. The swiotlb buffer can't ever get reused for anything else so we don't even need to worry about undoing the damage later. Although the first option seems preferable at first glance it has the short coming that it requires dom0 to have some memory below 4GB, which might not necessarily be the case. I think we should arrange dom0 to get some memory under 4G to begin with, not necessarily all of it. It's another option for sure, the question is how to decide how much, and how to make it configurable etc. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Fri, 2015-04-17 at 19:24 +0800, Chen Baozi wrote: Hi all, According to my recent experience, there might be some problems of swiotlb dma map on 1:1 mapping arm64 dom0 with large memory. The issue is like below: For those arm64 server with large memory, it is possible to set dom0_mem 4G (e.g. I have one set with 16G). In this case, according to my understanding, there is chance that the dom0 kernel needs to map some buffers above 4G to do ^below? DMA operations (e.g. in snps,dwmac ethernet driver). However, most DMA engines support only 32-bit physical address, thus aren't able to operate directly on those memory. Even on arm64 systems with RAM above 4GB? That seems short-sighted. Oh well, I suppose we have to live with it. IIUC, swiotlb is implemented to solve this (using bounce buffer), if there is no IOMMU or IOMMU is not enabled on the system. Sadly, it seems that xen_swiotlb_map_page in my dom0 kernel allocates (start_dma_addr = 0x94480) the buffers for DMA above 4G which fails dma_capable() checking and was then unable to return from xen_swiotlb_map_page() successfully. The swiotlb bounce buffer have been allocated below 4GB? I suspect that xen_swiotlb_init is buggy for ARM -- it allocates some random pages and then swizzles the backing pages for ones 4G, but that won't work on an ARM dom0 with a 1:1 mapping, I don't think. Do you see error messages along those lines? Essentially I think either xen_swiotlb_fixup is unable to work on ARM, or the following: start_dma_addr = xen_virt_to_bus(xen_io_tlb_start); is returning 1:1 and not reflecting the fixup. If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? You message on IRC suggested you weren't, did you hack around this? I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. Although the first option seems preferable at first glance it has the short coming that it requires dom0 to have some memory below 4GB, which might not necessarily be the case. The second option seems like it might be uglier but doesn't suffer from this issue. Can you please look and find out if the IPA at 0x94480 is actually backed by 1:1 RAM or if xen_swiotlb_fixup has done it's job and updated things such that the associated PAs are below 4GB? Ian. I am not familiar with swiotlb-xen, so there would be misunderstanding about the current situation. Fix me if I did/understood anything wrong. Any ideas? Cheers, Chen Baozi ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
Hi all, According to my recent experience, there might be some problems of swiotlb dma map on 1:1 mapping arm64 dom0 with large memory. The issue is like below: For those arm64 server with large memory, it is possible to set dom0_mem 4G (e.g. I have one set with 16G). In this case, according to my understanding, there is chance that the dom0 kernel needs to map some buffers above 4G to do DMA operations (e.g. in snps,dwmac ethernet driver). However, most DMA engines support only 32-bit physical address, thus aren't able to operate directly on those memory. IIUC, swiotlb is implemented to solve this (using bounce buffer), if there is no IOMMU or IOMMU is not enabled on the system. Sadly, it seems that xen_swiotlb_map_page in my dom0 kernel allocates (start_dma_addr = 0x94480) the buffers for DMA above 4G which fails dma_capable() checking and was then unable to return from xen_swiotlb_map_page() successfully. If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. I am not familiar with swiotlb-xen, so there would be misunderstanding about the current situation. Fix me if I did/understood anything wrong. Any ideas? Cheers, Chen Baozi ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Fri, 17 Apr 2015, Chen Baozi wrote: Hi all, According to my recent experience, there might be some problems of swiotlb dma map on 1:1 mapping arm64 dom0 with large memory. The issue is like below: For those arm64 server with large memory, it is possible to set dom0_mem 4G (e.g. I have one set with 16G). In this case, according to my understanding, there is chance that the dom0 kernel needs to map some buffers above 4G to do DMA operations (e.g. in snps,dwmac ethernet driver). However, most DMA engines support only 32-bit physical address, thus aren't able to operate directly on those memory. IIUC, swiotlb is implemented to solve this (using bounce buffer), if there is no IOMMU or IOMMU is not enabled on the system. Sadly, it seems that xen_swiotlb_map_page in my dom0 kernel allocates (start_dma_addr = 0x94480) the buffers for DMA above 4G which fails dma_capable() checking and was then unable to return from xen_swiotlb_map_page() successfully. If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. I am not familiar with swiotlb-xen, so there would be misunderstanding about the current situation. Fix me if I did/understood anything wrong. Any ideas? I think that the problem is that xen_swiotlb_init doesn't necessarely allocate memory under 4G on arm/arm64. xen_swiotlb_init calls __get_free_pages to allocate memory, so the pages could easily be above 4G. Subsequently xen_swiotlb_fixup is called on the allocated memory range, calling xen_create_contiguous_region and passing an address_bits mask. However xen_create_contiguous_region doesn't actually do anything at all on ARM. I think that given that dom0 is mapped 1:1 on ARM, the easiest and best fix would be to simply allocate memory under 4G to begin with. Something like (maybe with an ifdef ARM around it): diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 810ad41..22ac33a 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -235,7 +235,7 @@ retry: #define SLABS_PER_PAGE (1 (PAGE_SHIFT - IO_TLB_SHIFT)) #define IO_TLB_MIN_SLABS ((120) IO_TLB_SHIFT) while ((SLABS_PER_PAGE order) IO_TLB_MIN_SLABS) { - xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN, order); + xen_io_tlb_start = (void *)__get_free_pages(__GFP_NOWARN|__GFP_DMA32, order); if (xen_io_tlb_start) break; order--; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about DMA on 1:1 mapping dom0 of arm64
On Fri, Apr 17, 2015 at 02:21:45PM +0100, Ian Campbell wrote: On Fri, 2015-04-17 at 19:24 +0800, Chen Baozi wrote: Hi all, According to my recent experience, there might be some problems of swiotlb dma map on 1:1 mapping arm64 dom0 with large memory. The issue is like below: For those arm64 server with large memory, it is possible to set dom0_mem 4G (e.g. I have one set with 16G). In this case, according to my understanding, there is chance that the dom0 kernel needs to map some buffers above 4G to do ^below? DMA operations (e.g. in snps,dwmac ethernet driver). However, most DMA engines support only 32-bit physical address, thus aren't able to operate directly on those memory. Even on arm64 systems with RAM above 4GB? That seems short-sighted. Oh well, I suppose we have to live with it. I understand for most ARM SoCs, the DMA engines come from third party IP companies which is arm32/arm64 independent. Thus, 32-bit address DMA engine should be common even on arm64 system. The preferred way is to use/enable SMMU(IOMMU). However, we are focusing on 1:1 mapping right now... IIUC, swiotlb is implemented to solve this (using bounce buffer), if there is no IOMMU or IOMMU is not enabled on the system. Sadly, it seems that xen_swiotlb_map_page in my dom0 kernel allocates (start_dma_addr = 0x94480) the buffers for DMA above 4G which fails dma_capable() checking and was then unable to return from xen_swiotlb_map_page() successfully. The swiotlb bounce buffer have been allocated below 4GB? I have no idea (about the exact behavior of bounce buffer). But I don't think it has been allocated below 4GB on my board, for in that case it won't fail dma_capable() in the end of xen_swiotlb_map_page(). I suspect that xen_swiotlb_init is buggy for ARM -- it allocates some random pages and then swizzles the backing pages for ones 4G, but that won't work on an ARM dom0 with a 1:1 mapping, I don't think. Do you see error messages along those lines? Essentially I think either xen_swiotlb_fixup is unable to work on ARM, or the following: start_dma_addr = xen_virt_to_bus(xen_io_tlb_start); is returning 1:1 and not reflecting the fixup. Yes. It seems very likely what happened in my system. If I set dom0_mem to a small value (e.g. 512M), which makes all physical memory of dom0 below 4G, everything goes fine. So you are getting allocated memory below 4G? If all the banks of memory that xen populate to dom0 is below 4G, yes. However, if some banks of memory for dom0 is above 4G, usually not. You message on IRC suggested you weren't, did you hack around this? Yes. I did some hacks to help understand my situation earlier. What I have done and observed is as below: 1. At the very beginning, I used the default dom0_mem value to boot the system, which is 128M. And I didn't realize the DMA buffer problem. 2. I started to try more dom0_mem (16G). Then the ethernet driver reported that it cannot initiate rx buffers (DMA buffers). And I found out that allocate_memory_11 didn't populate any banks of memory below 4G for dom0. At that time, I guessed the failure might be introduced because there is no memory banks below 4G was populated. (there is only a 2GB address space below 4G for physical memory on my platform, and there is a hole for PCI memory address space above 4G before the memory address space continue.) 3. So I did some hacks to let lowmem=true manually in allocate_memory_11, which made xen on arm64 acts similar as it is on arm32 that populates at least one bank of memory below 4G to dom0. (this is the point when I send you message on IRC.) I thought that can solve the problem, but it doesn't. 4. Then I found out once xen populated any banks of memory which is above 4G, the ethernet driver would have chances (very likely, almost every time if dom0_mem=16G) to use buffers above 4G, regardless whether dom0 has banks of memory below 4G. I think we have two options, either xen_swiotlb_init allocates pages below 4GB (e.g. __GFP_DMA) or we do something to allow xen_swiotlb_fixup to actually work even on a 1:1 dom0. Although the first option seems preferable at first glance it has the short coming that it requires dom0 to have some memory below 4GB, which might not necessarily be the case. The second option seems like it might be uglier but doesn't suffer from this issue. Can you please look and find out if the IPA at 0x94480 is actually backed by 1:1 RAM or if xen_swiotlb_fixup has done it's job and updated things such that the associated PAs are below 4GB? I am at home now and will check it out tomorrow. But I guess it should be the first situation you mentioned. Cheers, Baozi. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel