Hi George,
I'm a member of Fujitsu MPI development team.
Thank you for picking up the issue.
We checked the changesets and unfortunately found they are incomplete.
Our testing method is as follows:
- Using LLVM clang to compile trunk with -ftrapv (integer overflow detection)
because GCC's -ftr
On Mar 5 2012, George Bosilca wrote:
I gave it a try (r26103). It was messy, and I hope I got it right. Let's
soak it for few days with our nightly testing to see how it behave.
That'll at least check that it's not totally broken. The killer about
such wording is that you cannot guarantee ex
I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak
it for few days with our nightly testing to see how it behave.
george.
On Mar 5, 2012, at 16:37 , N.M. Maclaren wrote:
> On Mar 5 2012, George Bosilca wrote:
>>
>> I was afraid about all those little intermediary s
On Mar 5 2012, George Bosilca wrote:
I was afraid about all those little intermediary steps. I asked a
compiler guy and apparently reversing the order (aka starting with the
ptrdiff_t variable) will not solve anything. The only portable way to
solve this is to cast every single member, to pre
I was afraid about all those little intermediary steps. I asked a compiler guy
and apparently reversing the order (aka starting with the ptrdiff_t variable)
will not solve anything. The only portable way to solve this is to cast every
single member, to prevent __any__ compiler from hurting us.
George,
I think Yuki's interpretation is correct.
The following is one of the suspicious parts.
(Many similar code in ompi/coll/tuned/*.c)
--- in ompi/coll/tuned/coll_tuned_allgather.c (V1.4.X's trunk)---
398tmprecv = (char*) rbuf + rank * rcount * rext;
---
Yuki,
I pushed a fix for this issue in the trunk (r26097). However, I disagree with
you on some of the topics below.
On Mar 5, 2012, at 04:02 , Y.MATSUMOTO wrote:
> Dear All,
>
> Next feedback is about "collective communications".
>
> Collective communication may be abend when it use over 2Gi
Dear All,
Next feedback is about "collective communications".
Collective communication may be abend when it use over 2GiB buffer.
This problem occurs following condition:
-- communicator_size * count(scount/rcount) >= 2GiB
It occurs in even small PC cluster.
The following is one of the suspiciou