Thanks. Yes, you are right. I will change my program.
Assuming double.sizeof==8 on your machine, You're requesting
1024*1024*1024*8*8 bytes = 68GB, do you have that much RAM
available?
You are completely correct, however in C, one could do:
const long DIM = 1024L * 1024L * 1024L* 8L ;
int main() {
double signal[DIM];
}
which runs fine. So,
I am running
enum long DIM = 1024L * 1024L * 1024L* 8L ;
void main() {
auto signal = new double[DIM];
}
and getting core.exception.OutOfMemoryError error. One option is
to use short/int, but I need to use double. Also, on using large
arrays, computer becomes slow.
Is there no workaround
Thanks a lot for your reply.
Here is a code:
import std.stdio, std.datetime, std.random, std.range,
std.parallelism;
enum long numberOfSlaves = 2;
void myFunc( ref long countvar)
{
countvar = 500;
writeln( " value of countvar is ", countvar);
}
void main()
{
long count1=0, count2=0;
alias typeof(task!(myFunc)(
I realized that access to "temp" causes bottleneck. On defining
it inside for loop, it become local and then there is speedup.
Defining it outside makes it shared, which slows the program.
Removing immutable word solves the problem. Thanks.
On Friday, 1 March 2013 at 20:28:19 UTC, FG wrote:
I suppose this:
immutable long DIM = 1024L*1024L *128L;
immutable(double)[] signal = new double[DIM+1];
static this() {
for (long i=0L; i< DIM+1; i++) {
signal[i] = (i+DIM)%7 + (i+DIM+1)%5;
}
}
void main()
{ ... }
Thanks. This
foreach (immutable i; 0 .. DIM + 1) {
Thanks. However, rdmd gives error on this line:
temp1.d(12): Error: no identifier for declarator immutable(i)
Array is really big!
import std.stdio;
import std.datetime;
import std.parallelism;
import std.range;
//int numberOfWorkers = 2; //for parallel;
double my_abs(double n) { return n > 0 ? n : -n; }
immutable long DIM = 1024L*1024L *128L;
void main()
{
double[] signal = new double[DIM+1];
d
I am making a program which accesses 1D array using for loop and
then I am parallelizing this with foreach, TaskPool and parallel.
The array does not need to change, once initialized. However, the
parallel version takes more time than serial version, which I
think may be because compiler is tr
Thanks a lot for your reply. It was very helpful.
On Thursday, 14 February 2013 at 15:51:45 UTC, Joseph Rushton
Wakeling wrote:
On 02/14/2013 04:44 PM, Sparsh Mittal wrote:
Can you please tell, why it is taking DIM as zero? If I reduce
DIM, it works
fine. It is strange.
1024 is an int value. Write 1024L instead to ensure that the
Here is the program:
import std.stdio;
const long DIM = 1024*1024*1024*1024*4;
void main()
{
writeln(" DIM is ", DIM);
writeln(" Value ", 1024*1024*1024*1024*4);
writeln(" Max ", long.max);
}
I compiled it:
gdc -frelease -O3 temp.d -o t1 ; ./t1
DIM is 0
Value 0
Max 9223372036854775807
C
Thanks a lot for your reply.
LOL. For a while you thought that C++ could be that much faster
than D? :D
I was stunned and shared it with others who could not find. It
was like a scientist discovering a phenomenon which is against
established laws. Good that I was wrong and a right person
pointed it.
I had a look, but first had to make juliaValue global, because
g++ had optimized all the calculations away.
Brilliant! Yes, that is why the time was coming out to be zero,
regardless of what value of DIM I put. Thank you very very much.
Thanks for your insights. It was very helpful.
OK. I found it.
Pardon me, can you please point me to suitable reference or tell
just command here. Searching on google, I could not find anything
yet. Performance is my main concern.
I am finding C++ code is much faster than D code.
I am writing Julia sets program in C++ and D; exactly same way as
much as possible. On executing I find large difference in their
execution time. Can you comment what wrong am I doing or is it
expected?
//===C++ code, compiled with -O3 ==
#include
#include
using name
Think again if you need that. Things start getting pretty
ugly. :)
Yes, it is not at all intuitive.
Indeed... Sparsh, any reason you need the calculation to be
done on 2d
blocks instead of independent slots?
For my problem, original answer was fine, since parallel
calculations are not at
for(int i=1; i< N; i++)<==>foreach(i; iota(1, N))
so you can use: foreach(i; parallel(iota(1, N))) { ... }
Thanks a lot. This one divides the x-cross-y region by rows.
Suppose dimension is 8*12 and 4 parallel threads are there, so
current method is dividing by 2*12 to each of 4 thread
It's not a big deal, but indexing *might* be a little slower
with this scheme.
Thanks a lot for your reply. It was extremely useful, since I am
optimizing for performance.
Thanks for your prompt reply. It was very helpful.
I am allocating 2d array as:
double[gridSize][gridSize] gridInfo;
which works for small dimension, but for large dimension, 16Mb
limit comes.
Would you please tell me how do allocate a large 2d array (which
has to be done as a dynamic array)? It is a square grid and
dimensions are already k
Thanks. Yes, you are right. I have increased the dimension.
Excellent. Thank you so much for your suggestion and code. It now
produces near linear speedup.
Here is the code:
#!/usr/bin/env rdmd
import std.stdio;
import std.concurrency;
import core.thread;
import std.datetime;
import std.conv;
import core.sync.barrier;
immutable int gridSize = 256;
immutable int MAXSTEPS = 5; /* Maximum number of
iterations */
immutable d
Can't tell much without the whole source or at least compilable
standalone piece.
Give me a moment. I will post.
It got posted before I completed it! Sorry.
I am parallelizing a program which follows this structure:
immutable int numberOfThreads= 2
for iter = 1 to MAX_ITERATION
{
myLocalBarrier = new Barrier(numberOfThreads+1);
for i= 1 to numberOfThreads
{
spawn(&myFunc, args)
I am parallelizing a program which follows this structure:
for iter = 1 to MAX_ITERATION
{
myLocalBarrier = new Barrier(numberOfThreads+1);
for i= 1 to numberOfThreads
{
spawn(&myFunc, args)
}
}
Thanks a lot. VERY VERY helpful.
I wrote this code. My purpose is to see how shared works in D. I
create a global variable (globalVar) and access it in two
different threads and it prints fine, although it is not shared.
So, can you please tell, what difference it makes to use/not-use
shared (ref.
http://www.informit.com/ar
Thank you very much for the code. It works fine and is extremely
useful.
I suggest looking at std.parallelism since it's designed for
this kind of thing. That aside, all traditional
synchronization methods are in core.sync. The equivalent of
"sync" in Cylk would be core.sync.barrier.
Thanks. I wrote this:
#!/usr/bin/env rdmd
import std.stdio;
import std.con
Background:
I am implementing an iterative algorithm in parallel manner. The
algorithm iteratively updates a matrix (2D grid) of data. So, I
will "divide" the grid to different threads, which will work on
it for single iteration. After each iteration, all threads should
wait since next iterat
Thanks a lot. Your code is very valuable to explain the whole
concept. I have changed my code based on it.
Thanks a lot. Actually, I am using std.concurrency, following
your tutorial:
http://ddili.org/ders/d.en/concurrency.html. Thanks for that
tutorial.
My requirement is to sort a portion of an array in each thread,
such that there is no overlap b/w portions and all portions
together make the who
Thanks for your reply and link (which I will try to follow).
However, I am trying to write a parallel program where I have a
big array. Multiple (e.g. 2, 4, 8) threads do work on part of
those arrays. Afterwards, they sort their portions of array and
return the answer to main.
So, I have mad
Purpose: I am trying to sort only a range of values in an array
of struct (the struct has two fields and I want to sort on one of
its fields using myComp function below). However, I am getting
this error:
../src/phobos/std/algorithm.d(7731): Error: cannot implicitly
convert expression (assume
Thanks a lot, it was very helpful.
43 matches
Mail list logo