Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-16 Thread Tomoya Adachi
Hi George, I'm a member of Fujitsu MPI development team. Thank you for picking up the issue. We checked the changesets and unfortunately found they are incomplete. Our testing method is as follows: - Using LLVM clang to compile trunk with -ftrapv (integer overflow detection) because GCC's -ftr

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread N.M. Maclaren
On Mar 5 2012, George Bosilca wrote: I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak it for few days with our nightly testing to see how it behave. That'll at least check that it's not totally broken. The killer about such wording is that you cannot guarantee ex

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread George Bosilca
I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak it for few days with our nightly testing to see how it behave. george. On Mar 5, 2012, at 16:37 , N.M. Maclaren wrote: > On Mar 5 2012, George Bosilca wrote: >> >> I was afraid about all those little intermediary s

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread N.M. Maclaren
On Mar 5 2012, George Bosilca wrote: I was afraid about all those little intermediary steps. I asked a compiler guy and apparently reversing the order (aka starting with the ptrdiff_t variable) will not solve anything. The only portable way to solve this is to cast every single member, to pre

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread George Bosilca
I was afraid about all those little intermediary steps. I asked a compiler guy and apparently reversing the order (aka starting with the ptrdiff_t variable) will not solve anything. The only portable way to solve this is to cast every single member, to prevent __any__ compiler from hurting us.

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread Larry Baker
George, I think Yuki's interpretation is correct. The following is one of the suspicious parts. (Many similar code in ompi/coll/tuned/*.c) --- in ompi/coll/tuned/coll_tuned_allgather.c (V1.4.X's trunk)--- 398tmprecv = (char*) rbuf + rank * rcount * rext; ---

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread George Bosilca
Yuki, I pushed a fix for this issue in the trunk (r26097). However, I disagree with you on some of the topics below. On Mar 5, 2012, at 04:02 , Y.MATSUMOTO wrote: > Dear All, > > Next feedback is about "collective communications". > > Collective communication may be abend when it use over 2Gi

[OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread Y.MATSUMOTO
Dear All, Next feedback is about "collective communications". Collective communication may be abend when it use over 2GiB buffer. This problem occurs following condition: -- communicator_size * count(scount/rcount) >= 2GiB It occurs in even small PC cluster. The following is one of the suspiciou