On Tue, Apr 14, 2015 at 7:53 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:
> On Tue, 14 Apr 2015 14:31:53 -0700
> Ravi Kerur wrote:
>
> > +
> > + for (i = 0; i < 2; i++)
> > + rte_mov32(dst + i * 32, src + i * 32);
> > }
> Unless you force compiler to unroll the l
On Tue, Apr 14, 2015 at 11:32 PM, Pawel Wodkowski <
pawelx.wodkowski at intel.com> wrote:
> On 2015-04-14 23:31, Ravi Kerur wrote:
>
>> +
>> + for (i = 0; i < 8; i++) {
>> + ymm = _mm256_loadu_si256((const __m256i *)(src +
>> i * 32));
>> +
On 2015-04-14 23:31, Ravi Kerur wrote:
> +
> + for (i = 0; i < 8; i++) {
> + ymm = _mm256_loadu_si256((const __m256i *)(src + i *
> 32));
> + _mm256_storeu_si256((__m256i *)(dst + i * 32), ymm);
> + }
> +
> n -= 256;
> -
On Tue, 14 Apr 2015 14:31:53 -0700
Ravi Kerur wrote:
> +
> + for (i = 0; i < 2; i++)
> + rte_mov32(dst + i * 32, src + i * 32);
> }
Unless you force compiler to unroll the loop, it will be slower.
Remove unnecessary type casting in functions.
Use loop to adjust offset during copy instead of separate invocations.
Signed-off-by: Ravi Kerur
---
.../common/include/arch/x86/rte_memcpy.h | 317 ++---
1 file changed, 151 insertions(+), 166 deletions(-)
diff --git a/lib
5 matches
Mail list logo