Squyres, I thought RDMA read and write are implemented as one side communication using get and put respectively.. Is it not so?
On Wed, Feb 29, 2012 at 10:49 AM, Jeffrey Squyres <jsquy...@cisco.com>wrote: > FWIW, if Brian says that our one-sided stuff is a bit buggy, I believe him > (because he wrote it). :-) > > The fact is that the MPI-2 one-sided stuff is extremely complicated and > somewhat open to interpretation. In practice, I haven't seen the MPI-2 > one-sided stuff used much in the wild. The MPI-3 working group just > revamped the one-sided support and generally made it much mo'betta. Brian > is re-implementing that stuff, and I believe it'll also be much mo'betta. > > My point: I wouldn't worry if not all one-sided benchmarks run with OMPI. > No one uses them (yet) anyway. > > > On Feb 29, 2012, at 1:42 PM, Jingcha Joba wrote: > > > When I ran my osu tests , I was able to get the numbers out of all the > tests except latency_mt (which was obvious, as I didnt compile open-mpi > with multi threaded support). > > A good way to know if the problem is with openmpi or with your custom > OFED stack would be to use some other device like tcp instead of ib and > rerun these one sided comm tests. > > On Wed, Feb 29, 2012 at 10:04 AM, Barrett, Brian W <bwba...@sandia.gov> > wrote: > > I'm pretty sure that they are correct. Our one-sided implementation is > > buggier than I'd like (indeed, I'm in the process of rewriting most of it > > as part of Open MPI's support for MPI-3's revised RDMA), so it's likely > > that the bugs are in Open MPI's onesided support. Can you try a more > > recent release (something from the 1.5 tree) and see if the problem > > persists? > > > > Thanks, > > > > Brian > > > > On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquy...@cisco.com> wrote: > > > > >FWIW, I'm immediately suspicious of *any* MPI application that uses the > > >MPI one-sided operations (i.e., MPI_PUT and MPI_GET). It looks like > > >these two OSU benchmarks are using those operations. > > > > > >Is it known that these two benchmarks are correct? > > > > > > > > > > > >On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote: > > > > > >> Sorry, i forgot to introduce the system.. Ours is the customized OFED > > >>stack implemented to work on the specific hardware.. We tested the > stack > > >>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We > > >>want to execute the osu_benchamark3.1.1 suite on our OFED.. > > >> > > >> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku > > >><dvrao....@gmail.com> wrote: > > >> Hiii, > > >> I tried executing osu_benchamarks-3.1.1 suite with the > openmpi-1.4.3... > > >>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_ > > >> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and > > >>the remaining tests are hanging at some message size.. the output is > > >>shown below > > >> > > >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl > > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca > > >>orte_base_help_aggregate 0 > > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw > > >> failed to create doorbell file /dev/plx2_char_dev > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> WARNING: No preset parameters were found for the device that Open MPI > > >> detected: > > >> > > >> Local host: test1 > > >> Device name: plx2_0 > > >> Device vendor ID: 0x10b5 > > >> Device vendor part ID: 4277 > > >> > > >> Default device parameters will be used, which may result in lower > > >> performance. You can edit any of the files specified by the > > >> btl_openib_device_param_files MCA parameter to set values for your > > >> device. > > >> > > >> NOTE: You can turn off this warning by setting the MCA parameter > > >> btl_openib_warn_no_device_params_found to 0. > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> failed to create doorbell file /dev/plx2_char_dev > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> WARNING: No preset parameters were found for the device that Open MPI > > >> detected: > > >> > > >> Local host: test2 > > >> Device name: plx2_0 > > >> Device vendor ID: 0x10b5 > > >> Device vendor part ID: 4277 > > >> > > >> Default device parameters will be used, which may result in lower > > >> performance. You can edit any of the files specified by the > > >> btl_openib_device_param_files MCA parameter to set values for your > > >> device. > > >> > > >> NOTE: You can turn off this warning by setting the MCA parameter > > >> btl_openib_warn_no_device_params_found to 0. > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1 > > >> # Size Bi-Bandwidth (MB/s) > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> 1 0.00 > > >> 2 0.00 > > >> 4 0.01 > > >> 8 0.03 > > >> 16 0.07 > > >> 32 0.15 > > >> 64 0.11 > > >> 128 0.21 > > >> 256 0.43 > > >> 512 0.88 > > >> 1024 2.10 > > >> 2048 4.21 > > >> 4096 8.10 > > >> 8192 16.19 > > >> 16384 8.46 > > >> 32768 20.34 > > >> 65536 39.85 > > >> 131072 84.22 > > >> 262144 142.23 > > >> 524288 234.83 > > >> mpirun: killing job... > > >> > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> mpirun noticed that process rank 0 with PID 7305 on node test2 exited > > >>on signal 0 (Unknown signal 0). > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> 2 total processes killed (some possibly by mpirun during cleanup) > > >> mpirun: clean termination accomplished > > >> > > >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl > > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca > > >>orte_base_help_aggregate 0 > > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw > > >> failed to create doorbell file /dev/plx2_char_dev > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> WARNING: No preset parameters were found for the device that Open MPI > > >> detected: > > >> > > >> Local host: test1 > > >> Device name: plx2_0 > > >> Device vendor ID: 0x10b5 > > >> Device vendor part ID: 4277 > > >> > > >> Default device parameters will be used, which may result in lower > > >> performance. You can edit any of the files specified by the > > >> btl_openib_device_param_files MCA parameter to set values for your > > >> device. > > >> > > >> NOTE: You can turn off this warning by setting the MCA parameter > > >> btl_openib_warn_no_device_params_found to 0. > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> failed to create doorbell file /dev/plx2_char_dev > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> WARNING: No preset parameters were found for the device that Open MPI > > >> detected: > > >> > > >> Local host: test2 > > >> Device name: plx2_0 > > >> Device vendor ID: 0x10b5 > > >> Device vendor part ID: 4277 > > >> > > >> Default device parameters will be used, which may result in lower > > >> performance. You can edit any of the files specified by the > > >> btl_openib_device_param_files MCA parameter to set values for your > > >> device. > > >> > > >> NOTE: You can turn off this warning by setting the MCA parameter > > >> btl_openib_warn_no_device_params_found to 0. > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> # OSU One Sided MPI_Put Bandwidth Test v3.1.1 > > >> # Size Bandwidth (MB/s) > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> 1 0.02 > > >> 2 0.05 > > >> 4 0.10 > > >> 8 0.19 > > >> 16 0.39 > > >> 32 0.77 > > >> 64 1.53 > > >> 128 2.57 > > >> 256 4.16 > > >> 512 8.30 > > >> 1024 16.62 > > >> 2048 33.22 > > >> 4096 66.51 > > >> 8192 42.45 > > >> 16384 11.99 > > >> 32768 18.20 > > >> 65536 76.04 > > >> 131072 98.64 > > >> 262144 407.66 > > >> 524288 489.84 > > >> mpirun: killing job... > > >> > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> mpirun noticed that process rank 0 with PID 7314 on node test2 exited > > >>on signal 0 (Unknown signal 0). > > >> > > > >>------------------------------------------------------------------------- > > >>- > > >> 2 total processes killed (some possibly by mpirun during cleanup) > > >> mpirun: clean termination accomplished > > >> > > >> I even checked the logs but i couldn't see any errors... > > >> Could you suggest a way to overcome/debug this issue.. > > >> > > >> Thanks for the kind reply.. > > >> > > >> > > >> -- > > >> Thanks & Regards, > > >> D.Venkateswara Rao, > > >> Software Engineer,One Convergence Devices Pvt Ltd., > > >> Jubille Hills,Hyderabad. > > >> > > >> > > >> > > >> > > >> -- > > >> Thanks & Regards, > > >> D.Venkateswara Rao, > > >> Software Engineer,One Convergence Devices Pvt Ltd., > > >> Jubille Hills,Hyderabad. > > >> > > >> _______________________________________________ > > >> users mailing list > > >> us...@open-mpi.org > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > >-- > > >Jeff Squyres > > >jsquy...@cisco.com > > >For corporate legal information go to: > > >http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > > > >_______________________________________________ > > >users mailing list > > >us...@open-mpi.org > > >http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > > > -- > > Brian W. Barrett > > Dept. 1423: Scalable System Software > > Sandia National Laboratories > > > > > > > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >