Hi all, I have developed a variation of the pipeline example. I am just using the ROUTING pipeline along with a new pipeline that I have developed on purpose for the ARP handing.
- The ROUTING pipeline just creates one tx and rx queue on each interface where it captures everything (promiscuous mode) and performs pipeline LPM table (table 0) classification and next-hop setting on the headroom. Subsequently it applies the ARP table (table 1) where the dest and src MAC addresses are set on the outgoing packet header previuos to the tx of the packet to the corresponding tx queue. - The ARP pipeline also creates a separate tx and rx queue on the same interfaces with the particularity that a filter is also set on the rx queues (just capture the ARP frames on that queue). Then it processes the ARP requests and provides the IP-MAC address translations to the ROUTING pipeline via the MASTER pipeline (sending IP-MAC translation events). It is as simple as this. - There is also a MASTER pipeline (executed in the logical core 0) which sends messages and receives events (new feature developed by us) from each pipeline. The thing is that, as the pipeline example allows it, I can assign which core I want each pipeline to be executed. When ROUTING pipeline is executed in core 1 and ARP pipeline executed in core 2 everything goes perfect as maximum throughput is reached in my environment (1 Gbps for a single TCP flow). No pàcket loss at all. However when I assign both ROUTING and ARP pipelines the same core (logical core 1) the behaviour degrades considerably. Packets loss and disorder are detected (and confirmed by Wireshark captures). This causes that the same TCP flow cannot reach a constant rate due to retries and packet loss detection. Both ROUTING and ARP pipelines, when capturing packets or ARP frames use the same mempool. One thing to take into account is that the ethernet interface statistics reveals no missing packets at all in both cases. So the thread executing both pipelines in the same logical core is not slow enough while packet processing to provoke packets misses that could cause this TCP degradation. My question is: If the code is exactly the same, why the behaviour is so different when I decide to execute them on the same core? It seems to me that the issue comes from the fact that I am rx and tx packets from the same thread to more than one queue in each interface. All the examples I have seen in DPDK web page directly maps each thread to one hw queue per interface and this is the differing point with respect to the case I have explained to you. What do you think about it? It is not possible to receive packets from several hw queues of the same interface in the same thread? Could it be the cause of this packet loss/disordering due to slow memory access? Thanks for your attention, -- Victor