Does that work for you? I have to write
A .= (*).(A,B)
On Wednesday, November 2, 2016 at 3:51:54 AM UTC+1, Chris Rackauckas wrote:
>
> It's the other way around. .* won't fuse because it's still an operator.
> .= will. It you want .* to fuse, you can instead do:
>
> A .= *.(A,B)
>
> since this i
Hmm, that's surprising. Looks like we're using generic broadcasting
machinery for that operation (check out what @which P.*P returns). Might be
good to add .* to this line:
https://github.com/JuliaLang/julia/blob/b7f1aa7554c71d3759702b9c2e14904ebdc94199/base/arraymath.jl#L69.
Want to make a pull re
OK, good to know. I think putting the function in a package is overkill.
> On 2 Nov. 2016, at 6:35 pm, Chris Rackauckas wrote:
>
> Yes, this most likely won't help for GPU arrays because you likely don't want
> to be looping through elements serially: you want to call a vectorized
Yes, this most likely won't help for GPU arrays because you likely don't
want to be looping through elements serially: you want to call a vectorized
GPU function which will do the computation in parallel on the GPU.
ArrayFire's mathematical operations are already overloaded to do this, but
I do
Ah thanks!
Though I guess if I want the same code to work also on a GPU array then this
won't help?
Sent from my iPhone
> On 2 Nov. 2016, at 13:51, Chris Rackauckas wrote:
>
> It's the other way around. .* won't fuse because it's still an operator. .=
> will. It you want .* to fuse, you can
It's the other way around. .* won't fuse because it's still an operator. .=
will. It you want .* to fuse, you can instead do:
A .= *.(A,B)
since this invokes the broadcast on *, instead of invoking .*. But that's
just a temporary thing.
On Tuesday, November 1, 2016 at 7:27:40 PM UTC-7, Tom Bre
As I understand it, the .* will fuse, but the .= will not (until 0.6?), so
A will be rebound to a newly allocated array. If my understanding is wrong
I'd love to know. There have been many times in the last few days that I
would have used it...
On Tue, Nov 1, 2016 at 10:06 PM, Sheehan Olver
wro
Ah, good point. Though I guess that won't work til 0.6 since .* won't
auto-fuse yet?
Sent from my iPhone
> On 2 Nov. 2016, at 12:55, Chris Rackauckas wrote:
>
> This is pretty much obsolete by the . fusing changes:
>
> A .= A.*B
>
> should be an in-place update of A scaled by B (Tomas' sol
This is pretty much obsolete by the . fusing changes:
A .= A.*B
should be an in-place update of A scaled by B (Tomas' solution).
On Tuesday, November 1, 2016 at 4:39:15 PM UTC-7, Sheehan Olver wrote:
>
> Should this be added to a package? I imagine if the arrays are on the GPU
> (AFArrays) the
Should this be added to a package? I imagine if the arrays are on the GPU
(AFArrays) then the operation could be much faster, and having a consistent
name would be helpful.
On Wednesday, October 7, 2015 at 1:28:29 AM UTC+11, Lionel du Peloux wrote:
>
> Dear all,
>
> I'm looking for the fastest
Thanks for the confirmation! Yes, I need more tests to see what the best
practice is for my particular problem.
On Monday, June 20, 2016 at 3:05:31 PM UTC+1, Chris Rackauckas wrote:
>
> Most likely. I would also time it with and without @simd at your problem
> size. For some reason I've had s
Most likely. I would also time it with and without @simd at your problem
size. For some reason I've had some simple loops do better without @simd.
On Monday, June 20, 2016 at 2:50:22 PM UTC+1, chobb...@gmail.com wrote:
>
> Thanks! I'm still using v0.4.5. In this case, is the code I highlighted
Thanks! I'm still using v0.4.5. In this case, is the code I highlighted
above still the best choice for doing the job?
On Monday, June 20, 2016 at 1:57:25 PM UTC+1, Chris Rackauckas wrote:
>
> I think that for medium size (but not large) arrays in v0.5 you may want
> to use @threads from the th
I think that for medium size (but not large) arrays in v0.5 you may want to
use @threads from the threadding branch, and then for really large arrays
you may want to use @parallel. But you'd have to test some timings.
On Monday, June 20, 2016 at 11:38:15 AM UTC+1, chobb...@gmail.com wrote:
>
> I
I have the same question regarding how to calculate the entry-wise vector
product and find this thread. As a novice, I wonder if the following code
snippet is still the standard for entry-wise vector multiplication that one
should stick to in practice? Thanks!
@fastmath @inbounds @simd for i=1
Thank you for all of your suggestions.
The @simd macro effectively gives a (very) slightly improved performance
(5%).
Note that the BLAS dot product probably uses all sorts of tricks to squeeze
the last cycle of SIMD performance out of the CPU. e.g. here is the
OpenBLAS ddot function for SandyBridge, which is hand-coded in assembly:
https://github.com/xianyi/OpenBLAS/blob/develop/kernel/x86_64/ddot_microk_sand
On Tuesday, October 6, 2015 at 2:23:33 PM UTC-4, Patrick Kofod Mogensen
wrote:
>
> That was supposed to be "A * B only allocates..." right?
>
Yes.
That was supposed to be "A * B only allocates..." right?
On Tuesday, October 6, 2015 at 1:52:18 PM UTC-4, Steven G. Johnson wrote:
>
>
>
> On Tuesday, October 6, 2015 at 12:29:04 PM UTC-4, Christoph Ortner wrote:
>>
>> a *= b is equivalent to a = a * b, which allocates a temporary variable I
>> t
On Tuesday, October 6, 2015 at 12:29:04 PM UTC-4, Christoph Ortner wrote:
>
> a *= b is equivalent to a = a * b, which allocates a temporary variable I
> think?
>
A * A only allocates memory on the heap if A is an array or something other
heap-allocated datatype. For A[i] *= B[i] where A[i]
a *= b is equivalent to a = a * b, which allocates a temporary variable I
think?
Try
@fastmath @inbounds @simd for i=1:n
A[i] *= B[i]
end
or, possibly A[i] = A[i] * B[i]
(I'm not sure whether @simd automatically translates *= to what it needs)
On Tuesday, 6 October 2015 17:29:04 UTC+1, Christoph Ortner wrote:
>
> a *= b is equivalent to a = a * b, which allocates a temporary variable I
> think?
>
> Try
>
> @fastmath @inbounds @si
Well, I guess your table pretty much shows it, right? It seems as it
allocates a lot of temporary memory to carry out the calculations.
On Tuesday, October 6, 2015 at 10:28:29 AM UTC-4, Lionel du Peloux wrote:
>
> Dear all,
>
> I'm looking for the fastest way to do element-wise vector multiplicat
I made some simple changes to your `xpy!`, and managed to get it to
allocate nothing at all, while performing very close to the speed of `dot`.
I don't know anything about e.g. `@simd` instructions, but I imagine they
could help speeding this up even further.
The most significant change was swi
24 matches
Mail list logo