Re: [OMPI devel] poor btl sm latency

2012-03-05 Thread Matthias Jurenz
Here the SM BTL parameters: $ ompi_info --param btl sm MCA btl: parameter "btl_base_verbose" (current value: <0>, data source: default value) Verbosity level of the BTL framework MCA btl: parameter "btl" (current value: , data source: file [/sw/atlas/libraries/openmpi/1.5.5rc3/x86_64/etc/openmpi

[OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread Y.MATSUMOTO
Dear All, Next feedback is about "collective communications". Collective communication may be abend when it use over 2GiB buffer. This problem occurs following condition: -- communicator_size * count(scount/rcount) >= 2GiB It occurs in even small PC cluster. The following is one of the suspiciou

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread George Bosilca
Yuki, I pushed a fix for this issue in the trunk (r26097). However, I disagree with you on some of the topics below. On Mar 5, 2012, at 04:02 , Y.MATSUMOTO wrote: > Dear All, > > Next feedback is about "collective communications". > > Collective communication may be abend when it use over 2Gi

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread Larry Baker
George, I think Yuki's interpretation is correct. The following is one of the suspicious parts. (Many similar code in ompi/coll/tuned/*.c) --- in ompi/coll/tuned/coll_tuned_allgather.c (V1.4.X's trunk)--- 398tmprecv = (char*) rbuf + rank * rcount * rext; ---

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread George Bosilca
I was afraid about all those little intermediary steps. I asked a compiler guy and apparently reversing the order (aka starting with the ptrdiff_t variable) will not solve anything. The only portable way to solve this is to cast every single member, to prevent __any__ compiler from hurting us.

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread N.M. Maclaren
On Mar 5 2012, George Bosilca wrote: I was afraid about all those little intermediary steps. I asked a compiler guy and apparently reversing the order (aka starting with the ptrdiff_t variable) will not solve anything. The only portable way to solve this is to cast every single member, to pre

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread George Bosilca
I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak it for few days with our nightly testing to see how it behave. george. On Mar 5, 2012, at 16:37 , N.M. Maclaren wrote: > On Mar 5 2012, George Bosilca wrote: >> >> I was afraid about all those little intermediary s

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread N.M. Maclaren
On Mar 5 2012, George Bosilca wrote: I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak it for few days with our nightly testing to see how it behave. That'll at least check that it's not totally broken. The killer about such wording is that you cannot guarantee ex