Hi,
We used the following code to test the performance of foreach.
add1() is a sequential code.
in add2(), we use foreach, and let X10 to partition workloads.
and in add3(), we partition the workloads by ourselves.
We use c++ backend, and run the code as
# env X10_NTHREADS=2 runx10 ./Test_foreach
The performance:
time of add1() = 32.3 ms
time of add2() = 3277.98 ms
time of add3() = 18.33 ms
It is surprising that add2() is 100 times slower than add1(). Is someone knows
the reason? Thanks.
// Test_foreach.x10
def add1()
{
for ((i) in 0..size-1)
{
data(i) += 5;
}
}
def add2()
{
finish foreach ((i) in 0..size-1)
{
data(i) += 5;
}
}
def add3()
{
var numThreads: Int = 2;
val mySize = size/numThreads;
finish foreach ((p) in 0..numThreads-1)
{
for ((i) in p*mySize..(p+1)*mySize-1)
{
data(i) += 5;
}
}
}
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
X10-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/x10-users