This is quite small and simple patch but it has taken me almost 2 months researching and understanding the problem and finding the right solution. It has involved reading ARMv8 programmer guide, posting questions to ARM forums as well as trying to debug the problem mostly in trial-and-error fashion as somewhat documented by the issue #1100. The special credit goes to Claudio Fontana who helped me tremendously by explaining and suggesting many valuable ideas.
As the issue #1100 explains, OSv would occasionally or quite repeatedly depending on the application, crash due to an unexpected Unknown Reason class synchronous exception (EC=0). This would never happen in emulated mode (QEMU with TCG) but quite freqently on real ARM hardware like RPI 4 on QEMU with KVM or Firecracker. Per ARM documentation - https://developer.arm.com/docs/ddi0595/h/aarch64-system-registers/esr_el1#ISS_exceptionswithanunknownreason - there are many potential causes of EC=0 exception including "attempted execution of an instruction bit pattern that has no allocated instruction" which means trying to execute garbage. All of those potential causes which I quite meticulously researched, examined and discussed some with Claudio, did not seem to apply or did not make much sense in OSv context. Until one of them did when I stumbled across this article - https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/caches-and-self-modifying-code - about "self-modifying code". Initially this article seemed to apply to JIT-type of scenarios but then after eventually seeing this small font annotation: "A more common (though less obvious) example is that of an operating system kernel: from the point of view of the processor, some code in the system is modifying some other code in the system every time a process is swapped in or out." it kind of started making me think that OSv dynamic linker is somewhat close. Then I eventually found this paragraph in ARMv8 programmer's guide in chapter 11.5 "Cache maintenance": "It is sometimes necessary for software to clean or invalidate a cache. This might be required when the contents of external memory have been changed and it is necessary to remove stale data from the cache. It can also be required after MMU-related activity such as changing access permissions, cache policies, or virtual to Physical Address mappings, or when I and D-caches must be synchronized for dynamically generated code such as JIT-compilers and dynamic library loaders." In essence aarch64 architecture (Modified Harvard) defines separate instruction and data caches - I-cache and D-cache. Therefore it is sometimes necessary to synchronize both caches with each other by cleaning the D-cache and invalidating the I-cache cache after loading code into memory. Which is exactly what the article about self modifying code explains. How does it apply to OSv? Well, OSv dynamic linker being part of kernel (code A) loads into memory application code (B), which by itself does not mean OSv modifies its own kernel code however it dynamically loads another code and executes it in the same memory space. Making this long story short, this patch modifies critical part of OSv memory management code - populate_vma() - which gets called any time vma portion (page) is filled due to page fault or eagerly. It changes the populate_vma() by making it synchronize the data and instruction caches with each other if the vma is executable per its permission - in essence any time any code is loaded into memory. To achieve it delegates to an obscure built-in - __clear_cache(). This logic is actually no-op in x86-64 port, as this architecture has strong coherency between instruction and data caches and there is no need to do anything special in this case. Fixes #1100 Signed-off-by: Waldemar Kozaczuk <jwkozac...@gmail.com> --- arch/aarch64/mmu.cc | 34 ++++++++++++++++++++++++++++++++++ arch/x64/mmu.cc | 4 ++++ core/mmu.cc | 7 +++++++ include/osv/mmu.hh | 2 ++ 4 files changed, 47 insertions(+) diff --git a/arch/aarch64/mmu.cc b/arch/aarch64/mmu.cc index dd8ef850..8fd71b51 100644 --- a/arch/aarch64/mmu.cc +++ b/arch/aarch64/mmu.cc @@ -97,4 +97,38 @@ bool is_page_fault_write_exclusive(unsigned int esr) { bool fast_sigsegv_check(uintptr_t addr, exception_frame* ef) { return false; } + +void synchronize_cpu_caches(void *v, size_t size) { + // The aarch64 qualifies as Modified Harvard architecture and defines separate + // cpu instruction and data caches - I-cache and D-cache. Therefore it is necessary + // to synchronize both caches by cleaning data cache and invalidating instruction + // cache after loading code into memory before letting it be executed. + // For more details of why and when it is necessary please read this excellent article - + // https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/caches-and-self-modifying-code + // or this paper - https://hal.inria.fr/hal-02509910/document. + // + // So when OSv dynamic linker, being part of the kernel code, loads pages + // of executable sections of ELF segments into memory, we need to clean D-cache + // in order push code (as data) into next cache level (L2) and invalidate + // the I-cache right before it gets executed. + // + // In order to achieve the above we delegate to the __clear_cache builtin. + // The __clear_cache does following in terms of ARM64 assembly: + // + // For each D-cache line in the range (v, v + size): + // DC CVAU, Xn ; Clean data cache by virtual address (VA) to PoU + // DSB ISH ; Ensure visibility of the data cleaned from cache + // For each I-cache line in the range (v, v + size): + // IC IVAU, Xn ; Invalidate instruction cache by VA to PoU + // DSB ISH ; Ensure completion of the invalidations + // ISB ; Synchronize the fetched instruction stream + // + // Please note that that both DC CVAU and IC CVAU are broadcast to all cores in the + // same Inner Sharebility domain (which all OSv memory is mapped as) so that all + // caches in all cores should eventually see and execute same code. + // + // For more details about what this built-in does, please read this gcc documentation - + // https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html + __builtin___clear_cache((char*)v, (char*)(v + size)); +} } diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc index 24da5caa..1af268c0 100644 --- a/arch/x64/mmu.cc +++ b/arch/x64/mmu.cc @@ -191,4 +191,8 @@ bool fast_sigsegv_check(uintptr_t addr, exception_frame* ef) return false; } + +// The x86_64 is considered to conform to the von Neumann architecture with unified +// data and instruction caches. Therefore we do not need to do anything as they are always in sync. +void synchronize_cpu_caches(void *v, size_t size) {} } diff --git a/core/mmu.cc b/core/mmu.cc index ff3fab47..37a1c60b 100644 --- a/core/mmu.cc +++ b/core/mmu.cc @@ -1206,6 +1206,13 @@ ulong populate_vma(vma *vma, void *v, size_t size, bool write = false) vma->operate_range(populate_small<Account>(map, vma->perm(), write, vma->map_dirty()), v, size) : vma->operate_range(populate<Account>(map, vma->perm(), write, vma->map_dirty()), v, size); + // On some architectures, the cpu data and instruction caches are separate (non-unified) + // and therefore it might be necessary to synchronize data cache with instruction cache + // after populating vma with executable code. + if (vma->perm() & perm_exec) { + synchronize_cpu_caches(v, size); + } + return total; } diff --git a/include/osv/mmu.hh b/include/osv/mmu.hh index 1830048c..12fcb8a4 100644 --- a/include/osv/mmu.hh +++ b/include/osv/mmu.hh @@ -319,6 +319,8 @@ std::string procfs_maps(); unsigned long all_vmas_size(); +// Synchronize cpu data and instruction caches for specified area of virtual memory +void synchronize_cpu_caches(void *v, size_t size); } #endif /* MMU_HH */ -- 2.29.2 -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/20201228065122.63815-1-jwkozaczuk%40gmail.com.