On 25 August 2013 01:01, Ilya Yaroshenko wrote:
> On Sunday, 18 August 2013 at 05:26:00 UTC, Manu wrote:
>
>> movups is not good. It'll be a lot faster (and portable) if you use
>> movaps.
>>
>> Process looks something like:
>> * do the first few from a[0] until a's alignment interval as scalar
On Sunday, 18 August 2013 at 05:26:00 UTC, Manu wrote:
movups is not good. It'll be a lot faster (and portable) if you
use movaps.
Process looks something like:
* do the first few from a[0] until a's alignment interval as
scalar
* load the left of b's aligned pair
* loop for each aligne
On 8/17/13 11:50 AM, Ilya Yaroshenko wrote:
http://spiceandmath.blogspot.ru/2013/08/simd-implementation-of-dot-product_17.html
http://www.reddit.com/r/programming/comments/1ktue0/benchmarking_a_simd_implementation_of_dot_product/
Andrei
On 8/18/13 10:24 AM, Ilya Yaroshenko wrote:
On Sunday, 18 August 2013 at 16:32:33 UTC, Andrei Alexandrescu wrote:
On 8/17/13 11:50 AM, Ilya Yaroshenko wrote:
http://spiceandmath.blogspot.ru/2013/08/simd-implementation-of-dot-product_17.html
Ilya
The images never load for me, all I see is s
On 17 August 2013 19:50, Ilya Yaroshenko wrote:
> http://spiceandmath.blogspot.ru/2013/08/simd-implementation-of-dot-product_17.html
>
> Ilya
>
>
Having a quick flick through the simd.d source, I see LDC's and GDC's
implementation couldn't be any more wildly different... (LDC's doesn't
even look
On Sunday, 18 August 2013 at 16:32:33 UTC, Andrei Alexandrescu
wrote:
On 8/17/13 11:50 AM, Ilya Yaroshenko wrote:
http://spiceandmath.blogspot.ru/2013/08/simd-implementation-of-dot-product_17.html
Ilya
The images never load for me, all I see is some "Request timed
out" stripes after the tex
On 8/17/13 11:50 AM, Ilya Yaroshenko wrote:
http://spiceandmath.blogspot.ru/2013/08/simd-implementation-of-dot-product_17.html
Ilya
The images never load for me, all I see is some "Request timed out"
stripes after the text.
Typo: Ununtu
Andrei
On Sunday, 18 August 2013 at 05:26:00 UTC, Manu wrote:
movups is not good. It'll be a lot faster (and portable) if you
use movaps.
Process looks something like:
* do the first few from a[0] until a's alignment interval as
scalar
* load the left of b's aligned pair
* loop for each aligne
movups is not good. It'll be a lot faster (and portable) if you use movaps.
Process looks something like:
* do the first few from a[0] until a's alignment interval as scalar
* load the left of b's aligned pair
* loop for each aligned vector in a
- load a[n..n+4] aligned
- load the ri
On Sunday, 18 August 2013 at 05:07:12 UTC, Manu wrote:
On 18 August 2013 14:39, Ilya Yaroshenko
wrote:
On Sunday, 18 August 2013 at 01:53:53 UTC, Manu wrote:
It doesn't look like you account for alignment.
This is basically not-portable (I doubt unaligned loads in
this context
are
faster
On 18 August 2013 14:39, Ilya Yaroshenko wrote:
> On Sunday, 18 August 2013 at 01:53:53 UTC, Manu wrote:
>
>> It doesn't look like you account for alignment.
>> This is basically not-portable (I doubt unaligned loads in this context
>> are
>> faster than performing scalar operations), and possibl
On Saturday, 17 August 2013 at 19:38:52 UTC, John Colvin wrote:
On Saturday, 17 August 2013 at 19:24:52 UTC, Ilya Yaroshenko
wrote:
BTW: -march=native automatically implies -mtune=native
Thanks, I`ll remove mtune)
It would be really interesting if you could try writing the
same code in c, b
On Sunday, 18 August 2013 at 01:53:53 UTC, Manu wrote:
It doesn't look like you account for alignment.
This is basically not-portable (I doubt unaligned loads in this
context are
faster than performing scalar operations), and possibly
inefficient on x86
too.
dotProduct uses unaligned loads (
It doesn't look like you account for alignment.
This is basically not-portable (I doubt unaligned loads in this context are
faster than performing scalar operations), and possibly inefficient on x86
too.
To make it account for potentially random alignment will be awkward, but it
might be possible t
Ilya Yaroshenko:
http://spiceandmath.blogspot.ru/2013/08/simd-implementation-of-dot-product_17.html
From the blog post:
Compile fast_math code from other program separately and then
link it. This is easy solution. However this is a step back to
C.<
To introduce a @fast_math attribute. This i
On Saturday, 17 August 2013 at 19:24:52 UTC, Ilya Yaroshenko
wrote:
BTW: -march=native automatically implies -mtune=native
Thanks, I`ll remove mtune)
It would be really interesting if you could try writing the same
code in c, both a scalar version and a version using gcc's vector
instrinsic
BTW: -march=native automatically implies -mtune=native
Thanks, I`ll remove mtune)
On Saturday, 17 August 2013 at 18:50:15 UTC, Ilya Yaroshenko
wrote:
http://spiceandmath.blogspot.ru/2013/08/simd-implementation-of-dot-product_17.html
Ilya
Nice, that's a good speedup.
BTW: -march=native automatically implies -mtune=native
http://spiceandmath.blogspot.ru/2013/08/simd-implementation-of-dot-product_17.html
Ilya
19 matches
Mail list logo