Hello,
In this example i've got a 4 vCPU Azure VM with 16G of RAM, 2G of that is given
to 1024 2MB huge pages:
$ cat /proc/meminfo | grep -i huge
AnonHugePages: 71680 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 1024
HugePages_Free: 1
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 2097152 kB
$
There are 2 interfaces which are vpp owned and which are both using the netvsc
pmd:
$ sudo vppctl sh hard
Name Idx Link Hardware
GigabitEthernet1 1 up GigabitEthernet1
Link speed: 50 Gbps
RX Queues:
queue thread mode
0 vpp_wk_0 (1) polling
1 vpp_wk_1 (2) polling
Ethernet address 60:45:bd:85:22:97
Microsoft Hyper-V Netvsc
carrier up full duplex max-frame-size 0
flags: tx-offload rx-ip4-cksum
Devargs:
rx: queues 2 (max 64), desc 1024 (min 0 max 65535 align 1)
tx: queues 2 (max 64), desc 1024 (min 1 max 4096 align 1)
max rx packet len: 65536
promiscuous: unicast off all-multicast off
vlan offload: strip off filter off qinq off
rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum rss-hash
rx offload active: ipv4-cksum
tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum tcp-tso
multi-segs
tx offload active: ipv4-cksum udp-cksum tcp-cksum multi-segs
rss avail: ipv4-tcp ipv4-udp ipv4 ipv6-tcp ipv6
rss active: ipv4-tcp ipv4 ipv6-tcp ipv6
tx burst function: (not available)
rx burst function: (not available)
GigabitEthernet2 2 up GigabitEthernet2
Link speed: 50 Gbps
RX Queues:
queue thread mode
0 vpp_wk_2 (3) polling
1 vpp_wk_0 (1) polling
Ethernet address 60:45:bd:85:23:94
Microsoft Hyper-V Netvsc
carrier up full duplex max-frame-size 0
flags: tx-offload rx-ip4-cksum
Devargs:
rx: queues 2 (max 64), desc 1024 (min 0 max 65535 align 1)
tx: queues 2 (max 64), desc 1024 (min 1 max 4096 align 1)
max rx packet len: 65536
promiscuous: unicast off all-multicast off
vlan offload: strip off filter off qinq off
rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum rss-hash
rx offload active: ipv4-cksum
tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum tcp-tso
multi-segs
tx offload active: ipv4-cksum udp-cksum tcp-cksum multi-segs
rss avail: ipv4-tcp ipv4-udp ipv4 ipv6-tcp ipv6
rss active: ipv4-tcp ipv4 ipv6-tcp ipv6
tx burst function: (not available)
rx burst function: (not available)
local0 0 down local0
Link speed: unknown
local
$
Config file looks like this:
unix {
nodaemon
log /var/log/vpp/vpp.log
full-coredump
cli-listen /run/vpp/cli.sock
gid vpp
}
api-trace {
on
}
api-segment {
gid vpp
}
socksvr {
socket-name /run/vpp/api.sock
}
plugins {
# Common plugins.
plugin default { disable }
plugin dpdk_plugin.so { enable }
plugin linux_cp_plugin.so { enable }
plugin crypto_native_plugin.so { enable }
< -- snip lots of plugins -- >
}
dpdk {
# VMBUS UUID.
dev 6045bd85-2297-6045-bd85-22976045bd85 {
num-rx-queues 4
num-tx-queues 4
name GigabitEthernet1
}
# VMBUS UUID.
dev 6045bd85-2394-6045-bd85-23946045bd85 {
num-rx-queues 4
num-tx-queues 4
name GigabitEthernet2
}
}
cpu {
skip-cores 0
main-core 0
corelist-workers 1-3
}
buffers {
# Max buffers based on data size & huge page configuration.
buffers-per-numa 853440
default data-size 2048
page-size default-hugepage
}
statseg {
size 128M
}
My issue is that I start to see errors from the mlnx5 driver when using a large
number of buffers:
2022/06/29 12:44:11:427 notice dpdk common_mlx5: Unable to find
virtually contiguous chunk for address (0x1000000000). rte_memseg_contig_walk()
failed.
2022/06/29 12:44:11:427 notice dpdk common_mlx5: Unable to find
virtually contiguous chunk for address (0x103fe00000). rte_memseg_contig_walk()
failed.
2022/06/29 12:44:11:427 notice dpdk common_mlx5: Unable to find
virtually contiguous chunk for address (0x1040000000). rte_memseg_contig_walk()
failed.
2022/06/29 12:44:11:427 notice dpdk common_mlx5: Unable to find
virtually contiguous chunk for address (0x1040200000). rte_memseg_contig_walk()
failed.
The spew continues.
With a smaller number of buffers I don't see this problem and there are no
issues with the packet forwarding side of things. I'm not sure what the buffer
limit is before things
go bad.
I read the excellent description of how buffer sizes are calculated here:
https://lists.fd.io/g/vpp-dev/topic/buffer_occupancy_calculation/76605334?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,20,76605334
and as a result though it would be a good idea to allocate
buffers based on buffer size and the available of memory in huge pages, however
when the buffer size is large "enough" the
common_mlx5 errors start to spew. I don't see this issue on other platforms
where I am able to max out buffers based on huge page allocation.
I was pointed towards
https://doc.dpdk.org/guides/platform/mlx5.html#mlx5-common-driver-options and
mr_ext_memseg_en which would supress
this notice. However I can only pass dpdk eal options to the netvsc pmd and not
the mlx5, so this does not seem to be an option.
Ideally what I would like to do is max out the number of buffers based on
available hugepage memory since on some setups, if there is a cap
on mappable buffers allowed for this specific device (mlx5) then I could cap to
that number instead of using max buffers based on huge page availability.
Thanks,
Peter.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21593): https://lists.fd.io/g/vpp-dev/message/21593
Mute This Topic: https://lists.fd.io/mt/92064311/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-