On OSv emailing list David Smith reported sporadic crashes
when running OSv on firecracker in nested virtualization setup
looking like this:

"OSv v0.55.0-168-g31a04239
1 CPUs detected
Firmware vendor: Unknown
bsd: initializingAssertion failed: sched::preemptable() (arch/x64/mmu.cc: 
page_fault: 37)
[backtrace]"

or

"OSv v0.55.0-237-g7cd33926
1 CPUs detected
Firmware vendor: Unknown
trying to execute or access missing symbol
[backtrace]"

After long back-and-forth, I was able to re-create this issue
on firecracker and qemu microvm where I was able to connect
with gdb. It turned out that OSv would trigger page fault in
at core/mempool.cc:1198:

 l1_pool_stats[cpu_id]._nr = nr;

The l1_pool_stats is a vector that gets re-sized to the number of cpus
every time the cpu notifier is called. Unfortunately, with this
approach we may occasionally experience a race condition when this
vector gets resized in parallel at the same time on multiple CPUs.
In another race scenario, the smp_allocator_cnt counter might be incremented
in l2::fill_thread() and smp_allocator set to true before l1_pool_stats
get resized in cpu::notifier in which case the accesses to l1_pool_stats
might result in page faults.

To fix this we need to guarantee that the l1_pool_stats vector is
resized once and before smp_allocator flag is set.

To achieve this, this patch adds another atomic counter,
l1_initialized_cnt, that tracks number of initialized l1 pools.
Only when that counter is equal to the number of cpus in the system
we resize the l1_pool_stats vector. We also do it before incrementing
the smp_allocator_cnt.

Signed-off-by: Waldemar Kozaczuk <jwkozac...@gmail.com>
---
 core/mempool.cc | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/core/mempool.cc b/core/mempool.cc
index bd8e2fcf..c85bc2be 100644
--- a/core/mempool.cc
+++ b/core/mempool.cc
@@ -1323,15 +1323,18 @@ private:
     std::unique_ptr<sched::thread> _fill_thread;
 };
 
+std::atomic<unsigned int> l1_initialized_cnt{};
 PERCPU(l1*, percpu_l1);
 static sched::cpu::notifier _notifier([] () {
     *percpu_l1 = new l1(sched::cpu::current());
+    if (++l1_initialized_cnt == sched::cpus.size()) {
+        l1_pool_stats.resize(sched::cpus.size());
+    }
     // N per-cpu threads for L1 page pool, 1 thread for L2 page pool
     // Switch to smp_allocator only when all the N + 1 threads are ready
     if (smp_allocator_cnt++ == sched::cpus.size()) {
         smp_allocator = true;
     }
-    l1_pool_stats.resize(sched::cpus.size());
 });
 static inline l1& get_l1()
 {
-- 
2.30.2

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/20210503204838.845838-1-jwkozaczuk%40gmail.com.

Reply via email to