Hello again, At the bottom of this email you find my rte_eth_conf configuration, which includes RSS. For my NIC, documentation says RSS can only be used taking into account also the transport layer [1]. For a given client/server pair, all the packets with the same src/dst port are received by the same core. So to ensure that all the fragments are received by the same core, I keep fixed the src-dst port.
Indeed, this works just fine with smaller payloads (even multi-frame), and also the clients always get multi-frame replies, because an individual logical reply has all its segments delivered to the same client thread. Thank you again for your feedback. Regards, Harold ========= [1] http://dpdk.org/doc/guides/nics/mlx4.html static struct rte_eth_conf port_conf = { .rxmode = { .mq_mode = ETH_MQ_RX_RSS, .split_hdr_size = 0, .header_split = 0, /**< Header Split disabled */ .hw_ip_checksum = 0, /**< IP checksum offload enabled */ .hw_vlan_filter = 0, /**< VLAN filtering disabled */ .jumbo_frame = 0, /**< Jumbo Frame Support disabled */ .hw_strip_crc = 0, /**< CRC stripped by hardware */ .max_rx_pkt_len = ETHER_MAX_LEN, .enable_scatter = 1 }, .rx_adv_conf = { .rss_conf = { .rss_key = NULL, .rss_hf = ETH_RSS_IP | ETH_RSS_UDP, }, }, .txmode = { .mq_mode = ETH_MQ_TX_NONE, }, }; 2017-07-18 12:07 GMT+02:00 Shyam Shrivastav <shrivastav.sh...@gmail.com>: > Hi Harold > I meant optimal performance w.r.t packets per second. If there is no loss > without app fragmentation at target pps with say 8 RX queues, and same > results in missing packets with app fragmentation then the issue might me > somewhere else. What is RSS configuration, you should not take transport > headers into account ETH_RSS_IPV4 is safe otherwise different app fragments > of same packet can go to different RX queues. > > On Tue, Jul 18, 2017 at 3:06 PM, Harold Demure <harold.demur...@gmail.com> > wrote: > >> Hello Shyam, >> Thank you for your suggestion. I will try what you say. However, this >> problem arises only with specific workloads. For example, if the clients >> only send requests of 1 frame, everything runs smoothly even with 16 active >> queues. My problem arises only with bigger payloads and multiple queues. >> Shouldn't this suggest that the problem is not "simply" that my NIC drops >> packets with > X active queues? >> >> Regards, >> Harold >> >> 2017-07-18 7:50 GMT+02:00 Shyam Shrivastav <shrivastav.sh...@gmail.com>: >> >>> As I understand the problem disappears with 1 RX queue on server. You >>> can reduce number of queues on server from 8 and arrive at an optimal value >>> without packet loss. >>> For intel 82599 NIC packet loss is experienced with more than 4 RX >>> queues, this was reported in dpdk dev or user mailing list, read in >>> archives sometime back while looking for similar information with 82599. >>> >>> On Tue, Jul 18, 2017 at 4:54 AM, Harold Demure < >>> harold.demur...@gmail.com> wrote: >>> >>>> Hello again, >>>> I tried to convert my statically defined buffers into buffers >>>> allocated >>>> through rte_malloc (as discussed in the previous email, see quoted >>>> text). >>>> Unfortunately, the problem is still there :( >>>> Regards, >>>> Harold >>>> >>>> >>>> >>>> > >>>> > 2. How do you know you have the packet loss? >>>> > >>>> > >>>> > *I know it because some fragmented packets never get reassembled >>>> fully. If >>>> > I print the packets seen by the server I see something like "PCKT_ID >>>> 10 >>>> > FRAG 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.* >>>> > >>>> > *Actually, something strange that happens sometimes is that a core >>>> > receives fragments of two packets and, say, receives frag 1 of >>>> packet X, >>>> > frag 2 of packet Y, frag 3 of packet X, frag 4 of packet Y.* >>>> > *Or that, after "losing" a fragment for packet X, I only see printed >>>> > fragments with EVEN frag_id for that packet X. At least for a while.* >>>> > >>>> > *This led me also to consider a bug in my implementation (I don't >>>> > experience this problem if I run with a SINGLE client thread). >>>> However, >>>> > with smaller payloads, even fragmented, everything runs smoothly.* >>>> > *If you have any suggestions for tests to run to spot a possible bug >>>> in my >>>> > implementation, It'd be more than welcome!* >>>> > >>>> > *MORE ON THIS: the buffers in which I store the packets taken from RX >>>> are >>>> > statically defined arrays, like struct rte_mbuf* temp_mbuf[SIZE]. >>>> SIZE >>>> > can be pretty high (say, 10K entries), and there are 3 of those >>>> arrays per >>>> > core. Can it be that, somehow, they mess up the memory layout (e.g., >>>> they >>>> > intersect)?* >>>> > >>>> >>> >>> >> >