2022-07-27 14:59 (UTC+0300), MOD: > Hi All, > > My team and I have encountered a problem where allocation of a mempool > larger than 1GB (== 1 Hugepage) fails. > We are in a multi-process environment, and the `rte_mempool_create` > happens in the secondary process. > > Sometimes the allocation succeeds but after some successes (for me > specifically, two) the following occurs: > the secondary process segfaults on `malloc_elem_can_hold`, inside a stack > starting from `rte_mempool_create`. > > Restarting the secondary process does not work as it is stuck on `EAL: > Probing VFIO support`, and restarting > the main process is the only option. > > Has anyone had this problem, or knows any possible solution? > Thanks!
Please tell the DPDK version and attach the stack trace. If possible, try rebuilding DPDK with RTE_MALLOC_DEBUG defined, and if your DPDK version supports it, with AddressSanitizer enabled. Segfault in a function that traverses the malloc element list suggests the heap may be corrupted, but it's only a guess. Restarting the secondary process after a segfault is hardly a viable idea because at this point the common memory may be already corrupted, some lock may be taken and never released (which is a possible reason it stucks, BTW).
