Hi, I modified eal_legacy_hugepage_init and problem is solved. Should the correction be added to dpdk upstream?
diff --git a/package/dpdk/dpdk-20.11/lib/librte_eal/linux/eal_memory.c b/package/dpdk/dpdk-20.11/lib/librte_eal/linux/eal_memory.c index 03a4f2dd..89a13e91 100644 --- a/package/dpdk/dpdk-20.11/lib/librte_eal/linux/eal_memory.c +++ b/package/dpdk/dpdk-20.11/lib/librte_eal/linux/eal_memory.c @@ -1458,6 +1458,7 @@ eal_legacy_hugepage_init(void) mem_sz = msl->len; munmap(msl->base_va, mem_sz); msl->base_va = NULL; + msl->len = 0; msl->heap = 0; /* destroy backing fbarray */ Best regards Yan Xiaoping From: Yan, Xiaoping (NSB - CN/Hangzhou) Sent: 2021年8月4日 15:14 To: 'users@dpdk.org' <users@dpdk.org> Subject: dpdk-pdump prints "EAL: Error: Invalid memory" Hi, After updating dpdk version from 19.11 to 20.11 dpdk-pdump prints such error: EAL: Error: Invalid memory Port 7 MAC: 02 70 63 61 70 03 core (2), capture for (1) tuples - port 0 device ((null)) queue 65535 ^C --legacy-mem is used for both primary primary and dpdk-pdump. With some debug, I find that mlx5_mem_is_rte incorrectly consider this address from os memory ((addr=0x4482b80)) as rte address, so mlx5_free calls rte_free() to free it and caused error. And this seems to because len of some unused memsegs is not set to 0 (so rte_mem_virt2memseg_list(0x4482b80) returns a memseg). Here is memsegs: (gdb) p mcfg->memsegs $3 = {{{base_va = 0x2aac0000000, addr_64 = 2932388921344}, page_sz = 1073741824, socket_id = 0, version = 0, len = 34359738368, external = 0, heap = 1, memseg_arr = { name = "memseg-1048576k-0-0", '\000' <repeats 44 times>, count = 5, len = 32, elt_sz = 48, data = 0x2aaa302e000, rwlock = {cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 1073741824, socket_id = 0, version = 0, len = 34359738368, external = 0, heap = 0, memseg_arr = {name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0, data = 0x0, rwlock = {cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 1073741824, socket_id = 1, version = 0, len = 34359738368, external = 0, heap = 0, memseg_arr = { name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0, data = 0x0, rwlock = { cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 1073741824, socket_id = 1, version = 0, len = 34359738368, external = 0, heap = 0, memseg_arr = { name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0, data = 0x0, rwlock = { cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 0, socket_id = 0, version = 0, len = 0, external = 0, heap = 0, memseg_arr = {name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0, data = 0x0, rwlock = {cnt = 0}}} <repeats 124 times>} Here is the stack trace (gdb) bt #0 mlx5_free (addr=0x4482b80) at ../dpdk-20.11/drivers/common/mlx5/mlx5_malloc.c:260 #1 0x0000000000706f5c in mlx5_mp_req_verbs_cmd_fd (mp_id=mp_id@entry=0x7ffcdb6d9e50) at ../dpdk-20.11/drivers/common/mlx5/mlx5_common_mp.c:140 #2 0x000000000050496f in mlx5_dev_spawn (config=0x7ffcdb6d9d70, spawn=0x2ab753799c0, dpdk_dev=0x4491400) at ../dpdk-20.11/drivers/net/mlx5/linux/mlx5_os.c:774 #3 mlx5_os_pci_probe (pci_drv=<optimized out>, pci_dev=<optimized out>) at ../dpdk-20.11/drivers/net/mlx5/linux/mlx5_os.c:2154 #4 0x0000000000708b5a in drivers_probe (user_classes=1, pci_dev=0x44913f0, pci_drv=0xe01800 <mlx5_pci_driver>, dev=0x2ab75379a80) at ../dpdk-20.11/drivers/common/mlx5/mlx5_common_pci.c:246 #5 mlx5_common_pci_probe (pci_drv=0xe01800 <mlx5_pci_driver>, pci_dev=0x44913f0) at ../dpdk-20.11/drivers/common/mlx5/mlx5_common_pci.c:308 #6 0x00000000004268f9 in rte_pci_probe_one_driver (dev=0x44913f0, dr=0xe01800 <mlx5_pci_driver>) at ../dpdk-20.11/drivers/bus/pci/pci_common.c:243 #7 pci_probe_all_drivers (dev=0x44913f0) at ../dpdk-20.11/drivers/bus/pci/pci_common.c:318 #8 pci_probe () at ../dpdk-20.11/drivers/bus/pci/pci_common.c:345 #9 0x00000000006bc4d3 in rte_bus_probe () at ../dpdk-20.11/lib/librte_eal/common/eal_common_bus.c:72 #10 0x0000000000422304 in rte_eal_init (argc=argc@entry=16, argv=argv@entry=0x7ffcdb6da530) at ../dpdk-20.11/lib/librte_eal/linux/eal.c:1210 #11 0x000000000056fee9 in main (argc=16, argv=0x7ffcdb6da748) at ../dpdk-20.11/app/pdump/main.c:1118 (gdb) c Continuing. Thread 1 "dpdk-pdump" hit Breakpoint 5, rte_free (addr=0x4482b80) It seems to me that below code piece from eal_legacy_hugepage_init should also set len to 0? /* we're not going to allocate more pages, so release VA space for * unused memseg lists */ for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { struct rte_memseg_list *msl = &mcfg->memsegs[i]; size_t mem_sz; /* skip inactive lists */ if (msl->base_va == NULL) continue; /* skip lists where there is at least one page allocated */ if (msl->memseg_arr.count > 0) continue; /* this is an unused list, deallocate it */ mem_sz = msl->len; munmap(msl->base_va, mem_sz); msl->base_va = NULL; // here, we should add msl->len = 0; ? msl->heap = 0; /* destroy backing fbarray */ rte_fbarray_destroy(&msl->memseg_arr); } Any comment? Thank you. Best regards Yan Xiaoping