On Thursday, 21 May 2020 at 07:38:45 UTC, data pulverizer wrote:
Started uploading the code and writing the article for this.
The code for each language can be run, see the script.x files
in each folder for details and timings.
https://github.com/dataPulverizer/KernelMatrixBenchmark
Thanks
On Wednesday, 6 May 2020 at 17:31:39 UTC, Jacob Carlborg wrote:
On 2020-05-06 12:23, data pulverizer wrote:
Yes, I'll do a blog or something on GitHub and link it.
It would be nice if you could get it published on the Dlang
blog [1]. One usually get paid for that. Contact Mike Parker.
[1]
On Wednesday, 13 May 2020 at 15:13:50 UTC, wjoe wrote:
On Friday, 8 May 2020 at 13:43:40 UTC, data pulverizer wrote:
[...] I also chose kernel matrix calculations, you can't
always call a library, sometimes you just need to write
performant code.
Aren't kernel function calls suffering a conte
On Friday, 8 May 2020 at 13:43:40 UTC, data pulverizer wrote:
[...] I also chose kernel matrix calculations, you can't always
call a library, sometimes you just need to write performant
code.
Aren't kernel function calls suffering a context switch though ?
On 2020-05-07 02:17, data pulverizer wrote:
What is the difference between -O2 and -O3 ldc2 compiler optimizations?
`--help` says -O2 is "Good optimizations" and -O3 "Aggressive
optimizations". Not very specific.
--
/Jacob Carlborg
On Friday, 8 May 2020 at 13:36:22 UTC, data pulverizer wrote:
...I've disallowed calling BLAS because I'm looking at the
performance of the programming language implementations rather
than it's ability to call other libraries.
Also BLAS is of limited use for most of all the kernel functions,
On Thursday, 7 May 2020 at 14:49:43 UTC, data pulverizer wrote:
After running the Julia code by the Julia community they made
some changes (using views rather than passing copies of the
array) and their time has come down to ~ 2.5 seconds. The plot
thickens.
I've run the Chapel code past the
On Thursday, 7 May 2020 at 15:41:12 UTC, drug wrote:
07.05.2020 17:49, data pulverizer пишет:
On Thursday, 7 May 2020 at 02:06:32 UTC, data pulverizer wrote:
On Wednesday, 6 May 2020 at 10:23:17 UTC, data pulverizer
wrote:
D: ~ 1.5 seconds
After running the Julia code by the Julia commu
07.05.2020 17:49, data pulverizer пишет:
On Thursday, 7 May 2020 at 02:06:32 UTC, data pulverizer wrote:
On Wednesday, 6 May 2020 at 10:23:17 UTC, data pulverizer wrote:
D: ~ 1.5 seconds
This is going to sound absurd but can we do even better? If none of
the optimizations we have so far
On Thursday, 7 May 2020 at 02:06:32 UTC, data pulverizer wrote:
On Wednesday, 6 May 2020 at 10:23:17 UTC, data pulverizer wrote:
D: ~ 1.5 seconds
This is going to sound absurd but can we do even better? If
none of the optimizations we have so far is using simd maybe we
can get even bett
On Wednesday, 6 May 2020 at 10:23:17 UTC, data pulverizer wrote:
D: ~ 1.5 seconds
This is going to sound absurd but can we do even better? If none
of the optimizations we have so far is using simd maybe we can
get even better performance by using it. I think I need to go and
read a simd
On Wednesday, 6 May 2020 at 23:10:05 UTC, data pulverizer wrote:
The -O3 -O5 optimization on the ldc compiler is instrumental in
bringing the times down, going with -02 based optimization even
with the other flags gives us ~ 13 seconds for the 10,000
dataset rather than the very nice 1.5 second
On Wednesday, 6 May 2020 at 17:31:39 UTC, Jacob Carlborg wrote:
On 2020-05-06 12:23, data pulverizer wrote:
Yes, I'll do a blog or something on GitHub and link it.
It would be nice if you could get it published on the Dlang
blog [1]. One usually get paid for that. Contact Mike Parker.
[1]
On 5/6/20 2:29 PM, drug wrote:
06.05.2020 16:57, Steven Schveighoffer пишет:
```
foreach(i; 0..n) // instead of for(long i = 0; i < n;)
```
I guess that `proc` delegate cant capture `i` var of `foreach` loop
so the range violation doesn't happen.
foreach over a range of integers is lowered to
06.05.2020 13:23, data pulverizer пишет:
On Wednesday, 6 May 2020 at 08:28:41 UTC, drug wrote:
What is current D time? ...
Current Times:
D: ~ 1.5 seconds
Chapel: ~ 9 seconds
Julia: ~ 35 seconds
Oh, I'm impressed. I thought that D time has been decreased by 1.5
seconds but it is 1.5
06.05.2020 16:57, Steven Schveighoffer пишет:
```
foreach(i; 0..n) // instead of for(long i = 0; i < n;)
```
I guess that `proc` delegate cant capture `i` var of `foreach` loop so
the range violation doesn't happen.
foreach over a range of integers is lowered to an equivalent for loop,
so tha
On 2020-05-06 12:23, data pulverizer wrote:
Yes, I'll do a blog or something on GitHub and link it.
It would be nice if you could get it published on the Dlang blog [1].
One usually get paid for that. Contact Mike Parker.
[1] https://blog.dlang.org
--
/Jacob Carlborg
On 5/6/20 2:49 AM, drug wrote:
06.05.2020 09:24, data pulverizer пишет:
On Wednesday, 6 May 2020 at 05:44:47 UTC, drug wrote:
proc is already a delegate, so &proc is a pointer to the delegate,
just pass a `proc` itself
Thanks done that but getting a range violation on z which was not
there
On Wednesday, 6 May 2020 at 08:28:41 UTC, drug wrote:
What is current D time? ...
Current Times:
D: ~ 1.5 seconds
Chapel: ~ 9 seconds
Julia: ~ 35 seconds
That would be really nice if you make the resume of your
research.
Yes, I'll do a blog or something on GitHub and link it.
Thanks
06.05.2020 11:18, data pulverizer пишет:
CPU usage now revs up almost has time to touch 100% before the process
is finished! Interestingly using `--boundscheck=off` without
`--ffast-math` gives a timing of around 4 seconds and, whereas using
`--ffast-math` without `--boundscheck=off` made no
On Wednesday, 6 May 2020 at 07:57:46 UTC, WebFreak001 wrote:
On Wednesday, 6 May 2020 at 07:42:44 UTC, data pulverizer wrote:
On Wednesday, 6 May 2020 at 07:27:19 UTC, data pulverizer
wrote:
Just tried removing the boundscheck and got 1.5 seconds in D!
Cool! But before getting too excited I w
On Wednesday, 6 May 2020 at 07:47:59 UTC, drug wrote:
06.05.2020 10:42, data pulverizer пишет:
On Wednesday, 6 May 2020 at 07:27:19 UTC, data pulverizer
wrote:
On Wednesday, 6 May 2020 at 06:54:07 UTC, drug wrote:
Thing are really interesting. So there is a space to improve
performance in 2.5
On Wednesday, 6 May 2020 at 07:42:44 UTC, data pulverizer wrote:
On Wednesday, 6 May 2020 at 07:27:19 UTC, data pulverizer wrote:
On Wednesday, 6 May 2020 at 06:54:07 UTC, drug wrote:
Thing are really interesting. So there is a space to improve
performance in 2.5 times :-)
Yes, `array` is smart
On 2020-05-06 06:04, Mathias LANG wrote:
In general, if you want to parallelize something, you should aim to have
as many threads as you have cores.
That should be _logical_ cores. If the CPU supports hyper threading it
can run two threads per core.
--
/Jacob Carlborg
06.05.2020 10:42, data pulverizer пишет:
On Wednesday, 6 May 2020 at 07:27:19 UTC, data pulverizer wrote:
On Wednesday, 6 May 2020 at 06:54:07 UTC, drug wrote:
Thing are really interesting. So there is a space to improve
performance in 2.5 times :-)
Yes, `array` is smart enough and if you call
On 2020-05-06 08:54, drug wrote:
Do you try `--fast-math` in ldc? Don't know if 05 use this flag
Try the following flags as well:
`-mcpu=native -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto`
--
/Jacob Carlborg
On Wednesday, 6 May 2020 at 07:27:19 UTC, data pulverizer wrote:
On Wednesday, 6 May 2020 at 06:54:07 UTC, drug wrote:
Thing are really interesting. So there is a space to improve
performance in 2.5 times :-)
Yes, `array` is smart enough and if you call it on another
array it is no op.
What mea
On Wednesday, 6 May 2020 at 07:27:19 UTC, data pulverizer wrote:
On Wednesday, 6 May 2020 at 06:54:07 UTC, drug wrote:
Thing are really interesting. So there is a space to improve
performance in 2.5 times :-)
Yes, `array` is smart enough and if you call it on another
array it is no op.
What mea
On 2020-05-06 05:25, data pulverizer wrote:
I have been using std.parallelism and that has worked quite nicely but
it is not fully utilising all the cpu resources in my computation
If you happen to be using macOS, I know that when std.parallelism checks
how many cores the computer has, it chec
On Wednesday, 6 May 2020 at 06:49:13 UTC, drug wrote:
... Then you can pass the arguments in ctor of the derived
class like:
```
foreach(long i; 0..n)
new DerivedThread(double)(i), cast(double)(i + 1), i,
z).start(); thread_joinAll();
```
not tested example of derived thread
```
class Der
On Wednesday, 6 May 2020 at 06:54:07 UTC, drug wrote:
Thing are really interesting. So there is a space to improve
performance in 2.5 times :-)
Yes, `array` is smart enough and if you call it on another
array it is no op.
What means `--fast` in Chapel? Do you try `--fast-math` in ldc?
Don't kno
06.05.2020 09:43, data pulverizer пишет:
On Wednesday, 6 May 2020 at 05:50:23 UTC, drug wrote:
General advice - try to avoid using `array` and `new` in hot code.
Memory allocating is slow in general, except if you use carefully
crafted custom memory allocators. And that can easily be the reason
06.05.2020 09:24, data pulverizer пишет:
On Wednesday, 6 May 2020 at 05:44:47 UTC, drug wrote:
proc is already a delegate, so &proc is a pointer to the delegate,
just pass a `proc` itself
Thanks done that but getting a range violation on z which was not there
before.
```
core.exception.Ra
On Wednesday, 6 May 2020 at 05:50:23 UTC, drug wrote:
General advice - try to avoid using `array` and `new` in hot
code. Memory allocating is slow in general, except if you use
carefully crafted custom memory allocators. And that can easily
be the reason of 40% cpu usage because the cores are w
On Wednesday, 6 May 2020 at 05:44:47 UTC, drug wrote:
proc is already a delegate, so &proc is a pointer to the
delegate, just pass a `proc` itself
Thanks done that but getting a range violation on z which was not
there before.
```
core.exception.RangeError@onlineapp.d(3): Range violation
-
06.05.2020 07:52, data pulverizer пишет:
On Wednesday, 6 May 2020 at 04:04:14 UTC, Mathias LANG wrote:
On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:
Yes, that's exactly what I want the actual computation I'm running is
much more expensive and much larger. It shouldn't matter
On Wednesday, 6 May 2020 at 04:52:30 UTC, data pulverizer wrote:
myData is referencing elements [5..10] of data and not creating
a new array with elements data[5..10] copied?
Just checked this and can confirm that the data is not being
copied so that is not the source of cpu idling:
https://d
06.05.2020 07:25, data pulverizer пишет:
On Wednesday, 6 May 2020 at 03:56:04 UTC, Ali Çehreli wrote:
On 5/5/20 8:41 PM, data pulverizer wrote:> On Wednesday, 6 May 2020 at
03:33:12 UTC, Mathias LANG wrote:
>> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer
wrote:
> Is there somethin
On Wednesday, 6 May 2020 at 04:04:14 UTC, Mathias LANG wrote:
On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:
Yes, that's exactly what I want the actual computation I'm
running is much more expensive and much larger. It shouldn't
matter if I have like 100_000_000 threads should
On Wednesday, 6 May 2020 at 03:56:04 UTC, Ali Çehreli wrote:
On 5/5/20 8:41 PM, data pulverizer wrote:> On Wednesday, 6 May
2020 at 03:33:12 UTC, Mathias LANG wrote:
>> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer
wrote:
> Is there something I need to do to wait for each thread to
On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:
Is there something I need to do to wait for each thread to
finish computation?
Yeah, you need to synchronize so that your main thread wait on
all the other threads to finish.
Look up `Thread.join`.
Yes, that's exactly what I
On 5/5/20 8:41 PM, data pulverizer wrote:> On Wednesday, 6 May 2020 at
03:33:12 UTC, Mathias LANG wrote:
>> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer wrote:
> Is there something I need to do to wait for each thread to finish
> computation?
thread_joinAll(). I have an example her
On Wednesday, 6 May 2020 at 03:33:12 UTC, Mathias LANG wrote:
On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer wrote:
[...]
The problem here is that `process` is a delegate, not a
function. The compiler *should* know it's a function, but for
some reason it does not. Making the funct
06.05.2020 06:25, data pulverizer пишет:
```
onlineapp.d(14): Error: template std.concurrency.spawn cannot deduce
function from argument types !()(void delegate(double x, double y, long
i, shared(double[]) z) pure nothrow @nogc @safe, double, double, long,
shared(double[])), candidates are:
/
On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer wrote:
[...]
The problem here is that `process` is a delegate, not a function.
The compiler *should* know it's a function, but for some reason
it does not. Making the function static, or moving it outside of
the scope of main, will fi
I have been using std.parallelism and that has worked quite
nicely but it is not fully utilising all the cpu resources in my
computation so I though it could be good to run it concurrently
to see if I can get better performance. However I am very new to
std.concurrency and the baby version of t
46 matches
Mail list logo