Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates
that the rest of the descriptor words have valid values. Hence, the
word containing DD bit must be read first before reading the rest of
the descriptor words.

Since the entire descriptor is not read atomically, on relaxed memory
ordered systems like Aarch64, read of the word containing DD field
could be reordered after read of other words.

Read barrier is inserted between read of the word with DD field
and read of other words. The barrier ensures that the fetched data
is correct.

Testpmd single core test showed no performance drop on x86 or N1SDP.
On ThunderX2, 22% performance regression was observed.

Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform")
Cc: sta...@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
---
 drivers/net/i40e/i40e_rxtx.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 8329cbdd4e..c4cd6b6b60 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -746,6 +746,12 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
                        break;
                }
 
+               /**
+                * Use acquire fence to ensure that qword1 which includes DD
+                * bit is loaded before loading of other descriptor words.
+                */
+               rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
                rxd = *rxdp;
                nb_hold++;
                rxe = &sw_ring[rx_id];
@@ -862,6 +868,12 @@ i40e_recv_scattered_pkts(void *rx_queue,
                        break;
                }
 
+               /**
+                * Use acquire fence to ensure that qword1 which includes DD
+                * bit is loaded before loading of other descriptor words.
+                */
+               rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
                rxd = *rxdp;
                nb_hold++;
                rxe = &sw_ring[rx_id];
-- 
2.25.1

Reply via email to