Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-03-07 Thread tsbockman via Digitalmars-d-learn
On Sunday, 7 March 2021 at 22:54:32 UTC, tsbockman wrote: ... result = diffSq[0]; static foreach(i; 0 .. 3) result += diffSq[i]; ... Oops, that's supposed to say `i; 1 .. 3`. Fixed: import std.meta : Repeat; void euclideanDistanceFixedSizeArray(V)(ref Repeat!(3, const(V)) a, r

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-03-07 Thread tsbockman via Digitalmars-d-learn
On Sunday, 7 March 2021 at 22:54:32 UTC, tsbockman wrote: import std.meta : Repeat; void euclideanDistanceFixedSizeArray(V)(ref Repeat!(3, const(V)) a, ref Repeat!(3, const(V)) b, out V result) if(is(V : __vector(float[length]), size_t length)) ... Resulting asm with is(V == __vector(float

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-03-07 Thread tsbockman via Digitalmars-d-learn
On Sunday, 7 March 2021 at 18:00:57 UTC, z wrote: On Friday, 26 February 2021 at 03:57:12 UTC, tsbockman wrote: static foreach(size_t i; 0 .. 3/+typeof(a).length+/){ distance += a[i].abs;//abs required by the caller (a * a) above is always positive for real numbers. You don't need the

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-03-07 Thread tsbockman via Digitalmars-d-learn
On Sunday, 7 March 2021 at 13:26:37 UTC, z wrote: On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: However, AVX512 support seems limited to being able to use the 16 other YMM registers, rather than using the same code template but changed to use ZMM registers and double the offsets to t

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-03-07 Thread Bruce Carneal via Digitalmars-d-learn
On Sunday, 7 March 2021 at 14:15:58 UTC, z wrote: On Thursday, 25 February 2021 at 14:28:40 UTC, Guillaume Piolat wrote: On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: How does one optimize code to make full use of the CPU's SIMD capabilities? Is there any way to guarantee that "packed

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-03-07 Thread z via Digitalmars-d-learn
On Friday, 26 February 2021 at 03:57:12 UTC, tsbockman wrote: static foreach(size_t i; 0 .. 3/+typeof(a).length+/){ distance += a[i].abs;//abs required by the caller (a * a) above is always positive for real numbers. You don't need the call to abs unless you're trying to guarantee that

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-03-07 Thread z via Digitalmars-d-learn
On Thursday, 25 February 2021 at 14:28:40 UTC, Guillaume Piolat wrote: On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: How does one optimize code to make full use of the CPU's SIMD capabilities? Is there any way to guarantee that "packed" versions of SIMD instructions will be used?(e.g.

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-03-07 Thread z via Digitalmars-d-learn
On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: ... It seems that using static foreach with pointer parameters exclusively is the best way to "guide" LDC into optimizing code.(using arr1[] += arr2[] syntax resulted in worse performance for me.) However, AVX512 support seems limited to

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-02-26 Thread Guillaume Piolat via Digitalmars-d-learn
On Thursday, 25 February 2021 at 14:28:40 UTC, Guillaume Piolat wrote: On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: How does one optimize code to make full use of the CPU's SIMD capabilities? Is there any way to guarantee that "packed" versions of SIMD instructions will be used?(e.g.

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-02-25 Thread Bruce Carneal via Digitalmars-d-learn
On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: How does one optimize code to make full use of the CPU's SIMD capabilities? Is there any way to guarantee that "packed" versions of SIMD instructions will be used?(e.g. vmulps, vsqrtps, etc...) To give some context, this is a sample of one

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-02-25 Thread tsbockman via Digitalmars-d-learn
On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: float euclideanDistanceFixedSizeArray(float[3] a, float[3] b) { Use __vector(float[4]), not float[3]. float distance; The default value for float is float.nan. You need to explicitly initialize it to 0.0f or something if you want th

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-02-25 Thread tsbockman via Digitalmars-d-learn
On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: Is there any way to guarantee that "packed" versions of SIMD instructions will be used?(e.g. vmulps, vsqrtps, etc...) To give some context, this is a sample of one of the functions that could benefit from better SIMD usage : float euclidea

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-02-25 Thread Guillaume Piolat via Digitalmars-d-learn
On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote: How does one optimize code to make full use of the CPU's SIMD capabilities? Is there any way to guarantee that "packed" versions of SIMD instructions will be used?(e.g. vmulps, vsqrtps, etc...) https://code.dlang.org/packages/intel-intrin

Optimizing for SIMD: best practices?(i.e. what features are allowed?)

2021-02-25 Thread z via Digitalmars-d-learn
How does one optimize code to make full use of the CPU's SIMD capabilities? Is there any way to guarantee that "packed" versions of SIMD instructions will be used?(e.g. vmulps, vsqrtps, etc...) To give some context, this is a sample of one of the functions that could benefit from better SIMD usa