Re: [vpp-dev] PSA: host stack active opens from first worker
Florin, Most Excellent -- very nice work on improving the VPP Hoststack! :D Thanks, -daw- On 12/1/22 18:17, Florin Coras wrote: Hi folks, It’s been many months and patches at this point but once [1] is merged, session layer will accept connects from both main, with worker barrier, and first worker. Preference is now for the latter, especially if CPS performance is critical. There should be no need to change existing apps. In particular, vcl applications will transparently leverage this improvement while builtin applications should still work even if they rely on main thread for the connects. The reason why this is a PSA is because in spite of all testing, there’s still a chance some corner case are not supported, some transports might be buggy now or just plain old bugs might’ve slipped in. Should you hit any issues, or have any additional comments, feel free to reach out via this thread or directly. Benefits of this improvement are: - no more main thread polling under heavy connect load. Under certain circumstances, main can still be used to perform the connects but this should be the exception not the norm. - higher CPS performance. And to be precise, regarding the CPS performance, on my skylake testbed using 40Gbps nics: o) prior to [2], albeit note that several other changes that might’ve affected performance have already gone in: - 1 worker vpp, pre-warmup: 80k post warmup: 105k - 4 worker vpp, pre-warmup: 135k post-warmup: 230k o) after change: - 1 worker vpp, pre-warmup: 110k, post warmup: 165k - 4 worker vpp, pre-warmup: 150k, post warmup: 360k You can try to reproduce these results using this [2]. Regards, Florin [1] https://gerrit.fd.io/r/c/vpp/+/35713/ [2] https://wiki.fd.io/view/VPP/HostStack/EchoClientServer#TCP_CPS_measurement -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22271): https://lists.fd.io/g/vpp-dev/message/22271 Mute This Topic: https://lists.fd.io/mt/95395667/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] Gerrit review for memif DMA acceleration
Damjan, External memory VFIO mapping action can moved to master thread through rpc call. From the usage of host stack, pre-allocate is not enough for session segments as mapped size may vary and these segments will dynamically allocated and freed when session created and destroyed. Regards, Marvin > -Original Message- > From: Damjan Marion > Sent: Friday, December 2, 2022 9:22 PM > To: Liu, Yong > Cc: vpp-dev@lists.fd.io > Subject: Re: Gerrit review for memif DMA acceleration > > > Please pre-allocate segments, don’t do it on runtime from worker thread… > > — > Damjan > > > On 01.12.2022., at 10:09, Liu, Yong wrote: > > > > Sure, I think it is possible as only few batches needed for typical > > workload. > > For dynamically mapping extended memory, I think this function is still > needed as new segment will be allocated from system memory when new > stream established. This action is happened in work thread. > > > >> -Original Message- > >> From: Damjan Marion > >> Sent: Wednesday, November 30, 2022 9:23 PM > >> To: Liu, Yong > >> Cc: vpp-dev@lists.fd.io > >> Subject: Re: Gerrit review for memif DMA acceleration > >> > >> > >> Thanks, > >> > >> dynamically allocating physical memory from worker thread is not > something > >> what we do today and i don’t think it is right way to do. > >> Even for buffer pools we don’t do that. Can you simply pre-allocate > >> reasonable amount of physical memory on startup instead? > >> > >> — > >> Damjan > >> > >> > >>> On 30.11.2022., at 10:20, Liu, Yong wrote: > >>> > >>> Hi Damjan, > >>> VFIO map function now can be called from work thread in some cases. > Like > >> allocate physical memory for DMA batch and then do mapping for if no > >> attached device, or a new session segment attached to VPP process. So I > use > >> event-logger for vfio logging. > >>> These numbers are collected from my sample server, we are modifying > >> CSTI case for DMA usage. Will see more official number later. > >>> > >>> 1C memif l2patch No-DMA > >>> 64 128 256 512 1024 > >>> 1518 > >>> 8.00Mpps 6.49Mpps 4.69Mpps 3.23Mpps 2.37Mpps 1.96Mpps > >>> 4.09Gbps 6.65Gbps 9.62Gbps 13.24Gbps 19.43Gbps 23.86Gbps > >>> > >>> 1C memif l2patch DMA > >>> 64 128 256 512 1024 > >>> 1518 > >>> 8.65Mpps 8.60Mpps 8.54Mpps 8.22Mpps 8.36Mpps 7.61Mpps > >>> 4.43Gbps 8.81Gbps 8.54Mpps 33.67Gbps 68.51Gbps 92.39Gbps > >>> > >>> Regards, > >>> Marvin > >>> > -Original Message- > From: Damjan Marion > Sent: Tuesday, November 29, 2022 10:45 PM > To: Liu, Yong > Cc: vpp-dev@lists.fd.io > Subject: Re: Gerrit review for memif DMA acceleration > > > Hi Marvin, > > for a start can you use standard vlib logging instead of elog, as all > those > logging stiuff are not perf critical. > > Also, can you share some perf comparison between standard CPU path > >> and > DSA accelerated memif? > > Thanks, > > Damjan > > > > On 29.11.2022., at 09:05, Liu, Yong wrote: > > > > Hi Damjan and community, > > For extend the usage of latest introduced DMA infrastructure, I > uploaded > several patches for review. Before your review, let me briefly introduce > these patches. > > In review 37572, add a new vfio mapped function for extended > memory. > This kind of memory may come from another process like memif > regions > >> or > dynamic allocated like hoststack shared segments. > > In review 37573, support vfio based DSA device for the scenario that > >> need > fully control the resource of DSA. Compared to assign work queues > from > single idxd instance into multiple processes, this way can guarantee > >> resource. > > In review 37574, support CBDMA device which only support pci device > model. The usage of CBDMA in hoststack and memif is depend on > 37572. > > In review 37572, add new datapath function in memif input and tx > node. > These functions followed async model and will be chosen if option > >> “use_dma” > added when creating memif interface. > > Gerrit link: > > > >> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerr > >> > it.fd.io%2Fr%2Fc%2Fvpp%2F%2B%2F37572data=05%7C01%7C%7C93c > >> > 7d85e4b704701d79e08dad2b422f7%7C84df9e7fe9f640afb435 > >> %7C1%7C0%7C638053968361772272%7CUnknown%7CTWFpbGZsb3d8eyJ > WI > >> > joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3 > >> > 000%7C%7C%7Csdata=92BzqQfssl2f5%2BufcV1kLyBgGC%2BOGIf5uAG > >> Iz27u8jg%3Dreserved=0 > > > >> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerr > >> > it.fd.io%2Fr%2Fc%2Fvpp%2F%2B%2F37573data=05%7C01%7C%7C93c > >> > 7d85e4b704701d79e08dad2b422f7%7C84df9e7fe9f640afb435 > >>
Re: [E] COMMERCIAL BULK: Re: [vpp-dev] Issues with failsafe dpdk pmd in Azure
Hi All, After I run below two cmds manually, ping traffic recovered , but why tc mirred not working after sometime tc qdisc add dev eth1 handle : ingress tc filter add dev eth1 parent : u32 match u32 0 0 action mirred egress redirect dev dtap0 Best Regards, Kevin From: vpp-dev@lists.fd.io On Behalf Of Kevin Yan via lists.fd.io Sent: Friday, December 2, 2022 4:39 PM To: Peter Morrow ; Stephen Hemminger Cc: vpp-dev@lists.fd.io; Long Li Subject: COMMERCIAL BULK: Re: [E] COMMERCIAL BULK: Re: [vpp-dev] Issues with failsafe dpdk pmd in Azure [EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe. Hi Peter, Stephen and Long, I am facing some issues when running VPP on Azure VM, can you please help and give some suggestion if possible. I'm running CentOS 7.9 with kernel version 3.10.0 on Azure VM, VPP version is 20.09 and DPDK version is 20.11, below is the snapshot of vpp startup.conf related to netvsc device dpdk { socket-mem 0 no-multi-seg vdev net_vdev_netvsc0,iface=eth1 vdev net_vdev_netvsc1,iface=eth2 netvsc_dev eth1 { vpp_interface_name fpeth1 num-rx-queues 1 num-tx-queues 1 num-rx-desc 1024 num-tx-desc 1024 } netvsc_dev eth2 { vpp_interface_name fpeth2 num-rx-queues 1 num-tx-queues 1 num-rx-desc 1024 num-tx-desc 1024 } } Forget the netvsc_dev section, this is added by me in order to change the failsafe interface name ,otherwise it will always use the default name: FailsafeEthernet1, FailsafeEthernet2, etc. Btw, for my kernel version(3.10.0) , it can only run failsafe PMD in VPP/DDPK, right? Basically, we are using two netvsc interfaces and vpp can come up without any issue, show hard/show log output looks good and is listed as below: vpp# sh hard NameIdx Link Hardware fpeth1 1 up fpeth1 Link speed: 50 Gbps Ethernet address 00:0d:3a:57:cc:aa FailsafeEthernet carrier up full duplex mtu 1504 flags: admin-up pmd rx-ip4-cksum Devargs: fd(30),dev(net_tap_vsc0,remote=eth1) rx: queues 1 (max 16), desc 1024 (min 0 max 65535 align 1) tx: queues 1 (max 16), desc 1024 (min 0 max 65535 align 1) max rx packet len: 1522 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: ipv4-cksum udp-cksum tcp-cksum scatter rx offload active: ipv4-cksum tx offload avail: ipv4-cksum udp-cksum tcp-cksum tcp-tso multi-segs tx offload active: none rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-other ipv4 ipv6-tcp-ex ipv6-udp-ex ipv6-frag ipv6-tcp ipv6-udp ipv6-other ipv6-ex ipv6 rss active:none tx burst function: failsafe_tx_burst_fast rx burst function: failsafe_rx_burst_fast tx frames ok1507 tx bytes ok95370 rx frames ok 322 rx bytes ok33127 extended stats: rx_good_packets322 tx_good_packets 1507 rx_good_bytes33127 tx_good_bytes95370 rx_q0_packets 322 rx_q0_bytes 33127 tx_q0_packets 1507 tx_q0_bytes 95370 tx_sub0_good_packets 1507 tx_sub0_good_bytes 95370 tx_sub0_q0_packets1507 tx_sub0_q0_bytes 95370 tx_sub0_unicast_packets322 tx_sub0_unicast_bytes30910 tx_sub0_multicast_packets 29 tx_sub0_multicast_bytes 3066 tx_sub0_broadcast_packets 1209 tx_sub0_broadcast_bytes 80718 rx_sub1_good_packets 322 rx_sub1_good_bytes 33127 rx_sub1_q0_packets 322 rx_sub1_q0_bytes 33127 fpeth2 2 up fpeth2 Link speed: 50 Gbps Ethernet address 00:0d:3a:57:cf:f0 FailsafeEthernet carrier up full duplex mtu 1504 flags: admin-up pmd rx-ip4-cksum Devargs: fd(45),dev(net_tap_vsc1,remote=eth2) rx: queues 1 (max 16), desc 1024 (min 0 max 65535 align 1) tx: queues 1 (max 16), desc 1024 (min 0
Re: [E] COMMERCIAL BULK: Re: [vpp-dev] Issues with failsafe dpdk pmd in Azure
Hi Peter, Stephen and Long, I am facing some issues when running VPP on Azure VM, can you please help and give some suggestion if possible. I'm running CentOS 7.9 with kernel version 3.10.0 on Azure VM, VPP version is 20.09 and DPDK version is 20.11, below is the snapshot of vpp startup.conf related to netvsc device dpdk { socket-mem 0 no-multi-seg vdev net_vdev_netvsc0,iface=eth1 vdev net_vdev_netvsc1,iface=eth2 netvsc_dev eth1 { vpp_interface_name fpeth1 num-rx-queues 1 num-tx-queues 1 num-rx-desc 1024 num-tx-desc 1024 } netvsc_dev eth2 { vpp_interface_name fpeth2 num-rx-queues 1 num-tx-queues 1 num-rx-desc 1024 num-tx-desc 1024 } } Forget the netvsc_dev section, this is added by me in order to change the failsafe interface name ,otherwise it will always use the default name: FailsafeEthernet1, FailsafeEthernet2, etc. Btw, for my kernel version(3.10.0) , it can only run failsafe PMD in VPP/DDPK, right? Basically, we are using two netvsc interfaces and vpp can come up without any issue, show hard/show log output looks good and is listed as below: vpp# sh hard NameIdx Link Hardware fpeth1 1 up fpeth1 Link speed: 50 Gbps Ethernet address 00:0d:3a:57:cc:aa FailsafeEthernet carrier up full duplex mtu 1504 flags: admin-up pmd rx-ip4-cksum Devargs: fd(30),dev(net_tap_vsc0,remote=eth1) rx: queues 1 (max 16), desc 1024 (min 0 max 65535 align 1) tx: queues 1 (max 16), desc 1024 (min 0 max 65535 align 1) max rx packet len: 1522 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: ipv4-cksum udp-cksum tcp-cksum scatter rx offload active: ipv4-cksum tx offload avail: ipv4-cksum udp-cksum tcp-cksum tcp-tso multi-segs tx offload active: none rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-other ipv4 ipv6-tcp-ex ipv6-udp-ex ipv6-frag ipv6-tcp ipv6-udp ipv6-other ipv6-ex ipv6 rss active:none tx burst function: failsafe_tx_burst_fast rx burst function: failsafe_rx_burst_fast tx frames ok1507 tx bytes ok95370 rx frames ok 322 rx bytes ok33127 extended stats: rx_good_packets322 tx_good_packets 1507 rx_good_bytes33127 tx_good_bytes95370 rx_q0_packets 322 rx_q0_bytes 33127 tx_q0_packets 1507 tx_q0_bytes 95370 tx_sub0_good_packets 1507 tx_sub0_good_bytes 95370 tx_sub0_q0_packets1507 tx_sub0_q0_bytes 95370 tx_sub0_unicast_packets322 tx_sub0_unicast_bytes30910 tx_sub0_multicast_packets 29 tx_sub0_multicast_bytes 3066 tx_sub0_broadcast_packets 1209 tx_sub0_broadcast_bytes 80718 rx_sub1_good_packets 322 rx_sub1_good_bytes 33127 rx_sub1_q0_packets 322 rx_sub1_q0_bytes 33127 fpeth2 2 up fpeth2 Link speed: 50 Gbps Ethernet address 00:0d:3a:57:cf:f0 FailsafeEthernet carrier up full duplex mtu 1504 flags: admin-up pmd rx-ip4-cksum Devargs: fd(45),dev(net_tap_vsc1,remote=eth2) rx: queues 1 (max 16), desc 1024 (min 0 max 65535 align 1) tx: queues 1 (max 16), desc 1024 (min 0 max 65535 align 1) max rx packet len: 1522 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: ipv4-cksum udp-cksum tcp-cksum scatter rx offload active: ipv4-cksum tx offload avail: ipv4-cksum udp-cksum tcp-cksum tcp-tso multi-segs tx offload active: none rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-other ipv4 ipv6-tcp-ex ipv6-udp-ex ipv6-frag ipv6-tcp ipv6-udp ipv6-other ipv6-ex ipv6 rss active:none tx burst function: failsafe_tx_burst_fast rx burst function: failsafe_rx_burst_fast tx frames ok