[dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

2015-01-30 Thread Ananyev, Konstantin
Hey Zhihong, > -Original Message- > From: Wang, Zhihong > Sent: Friday, January 30, 2015 5:57 AM > To: Ananyev, Konstantin; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in > arch/x86/rte_memcpy.h for both SSE and AVX platforms > > Hey

[dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

2015-01-30 Thread Wang, Zhihong
Hey Konstantin, This method does reduce code size but lead to significant performance drop. I think we need to keep the original code. Thanks Zhihong (John) > -Original Message- > From: Ananyev, Konstantin > Sent: Thursday, January 29, 2015 11:18 PM > To: Wang, Zhihong; dev at

[dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

2015-01-29 Thread Ananyev, Konstantin
Hi Zhihong, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhihong Wang > Sent: Thursday, January 29, 2015 2:39 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in > arch/x86/rte_memcpy.h for both SSE and AVX

[dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

2015-01-29 Thread Zhihong Wang
Main code changes: 1. Differentiate architectural features based on CPU flags a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth b. Implement separated copy flow specifically optimized for target architecture 2. Rewrite the memcpy