Hi team I am using XL710 i40e NIC on dpdk19.11. I found that the NIC occasionally lost packets when I enabled the TSO(not GSO) feature.
For example, I send below mbuf to rte_eth_tx_burst, : m is 0x2a166cf40, pkt_len=32822, ol_flags=d4000000000000, nb_segs=23, port=10007 dump mbuf at 0x2a166cf40, iova=2e166d008, buf_len=2176 pkt_len=32822, ol_flags=d4000000000000, nb_segs=23, in_port=65535 segment at 0x2a166cf40, data=0x2a166d088, data_len=1514 Dump data at [0x2a166d088], len=1514 00000000: 3C FD FE 9E 99 29 3C FD FE 9E 98 59 08 00 45 00 | <....)<....Y..E. 00000010: 80 28 03 C8 00 00 FF 06 00 00 42 42 42 0D 42 42 | .(........BBB.BB 00000020: 42 0C 27 17 79 89 00 07 79 6E 11 B5 93 DC 50 18 | B.'.y...yn....P. 00000030: FF FF 08 A4 00 00 23 2A 2A 2A 2A 2A 2A 2A 2A 2A | ......#********* 00000040: 2A 2A 2A 2A 2A 2A 2A 00 33 30 30 32 3A 36 36 2E | *******.3002:66. segment at 0x2a166c580, data=0x2a166c6fe, data_len=1460 Dump data at [0x2a166c6fe], len=1460 segment at 0x2a166bbc0, data=0x2a166bd3e, data_len=1460 Dump data at [0x2a166bd3e], len=1460 segment at 0x2a166b200, data=0x2a166b37e, data_len=1460 Dump data at [0x2a166b37e], len=1460 segment at 0x2a166a840, data=0x2a166a9be, data_len=1460 Dump data at [0x2a166a9be], len=1460 segment at 0x2a1669e80, data=0x2a1669ffe, data_len=1460 Dump data at [0x2a1669ffe], len=1460 segment at 0x2a16694c0, data=0x2a166963e, data_len=1460 Dump data at [0x2a166963e], len=1460 segment at 0x2a1668b00, data=0x2a1668c7e, data_len=1460 Dump data at [0x2a1668c7e], len=1460 segment at 0x2a1668140, data=0x2a16682be, data_len=1460 Dump data at [0x2a16682be], len=1460 segment at 0x2a1667780, data=0x2a16678fe, data_len=1460 Dump data at [0x2a16678fe], len=1460 segment at 0x2a1666dc0, data=0x2a1666f3e, data_len=1460 Dump data at [0x2a1666f3e], len=1460 segment at 0x2a1666400, data=0x2a166657e, data_len=1460 Dump data at [0x2a166657e], len=1460 segment at 0x2a14b9400, data=0x2a14b957e, data_len=1460 Dump data at [0x2a14b957e], len=1460 segment at 0x2a14b9dc0, data=0x2a14b9f3e, data_len=1460 Dump data at [0x2a14b9f3e], len=1460 segment at 0x2a14ba780, data=0x2a14ba8fe, data_len=1460 Dump data at [0x2a14ba8fe], len=1460 segment at 0x2a14bb140, data=0x2a14bb2be, data_len=1460 Dump data at [0x2a14bb2be], len=1460 segment at 0x2a14bbb00, data=0x2a14bbc7e, data_len=1460 Dump data at [0x2a14bbc7e], len=1460 segment at 0x2a14bc4c0, data=0x2a14bc63e, data_len=1460 Dump data at [0x2a14bc63e], len=1460 segment at 0x2a14bce80, data=0x2a14bcffe, data_len=1460 Dump data at [0x2a14bcffe], len=1460 segment at 0x2a14bd840, data=0x2a14bd9be, data_len=1460 Dump data at [0x2a14bd9be], len=1460 segment at 0x2a14be200, data=0x2a14be37e, data_len=1460 Dump data at [0x2a14be37e], len=1460 segment at 0x2a14bebc0, data=0x2a14bed3e, data_len=1460 Dump data at [0x2a14bed3e], len=1460 segment at 0x2a14bf580, data=0x2a14bf6fe, data_len=648 Dump data at [0x2a14bf6fe], len=648 rte_eth_tx_burst return value is 1, indicating send success. I use tcpdump to capture packets at the peer. The length of the captured packets is 29200, but actuall len is 32768, loss 3568. The count of while loops is equal to the number of mbuf nb_segs, everything seems good... https://github.com/DPDK/dpdk/blob/v19.11/drivers/net/i40e/i40e_rxtx.c#L1180 I am not familiar with i40e driver, and too much debugging message will not recur this issue. I wonder if there is a better way to debug? Please give me some ideas, Thanks a lot! TSO config: mbufs->ol_flags = d4000000000000 mbufs->tso_segsz = 1460 mbufs->l2_len = 14 mbufs->l3_len = 20 mbufs->l4_len = 20
