Re: [OMPI devel] Hostfile info argument with MPI_COMM_SPAWN in a Torque environment

2013-03-23 Thread Sebastian Rinke
I found the bug, it was me. After all I somehow missed to actually provide the MPI_Info argument to the spawn call. Instead I provided MPI_INFO_NULL. My apologies for this mistake. Thank you for your efforts. Sebastian On Mar 22, 2013, at 1:10 PM, Sebastian Rinke wrote: > Thanks for

Re: [OMPI devel] Hostfile info argument with MPI_COMM_SPAWN in a Torque environment

2013-03-22 Thread Sebastian Rinke
Thanks for the quick response. >> I'm using OMPI 1.6.4 in a Torque-like environment. >> However, since there are modifications in Torque that prevent OMPI from >> spawning processes the way it does with MPI_COMM_SPAWN, > > That hasn't been true in the past - did you folks locally modify Torque

[OMPI devel] Hostfile info argument with MPI_COMM_SPAWN in a Torque environment

2013-03-21 Thread Sebastian Rinke
Dear all, I'm using OMPI 1.6.4 in a Torque-like environment. However, since there are modifications in Torque that prevent OMPI from spawning processes the way it does with MPI_COMM_SPAWN, I want to circumvent Torque and use plain ssh only. So, I configured --without-tm and can successfully run

Re: [OMPI devel] GPUDirect v1 issues

2012-01-23 Thread Sebastian Rinke
n the last half of 2010, I can say it is a pain for > the 9 - 14% gain in efficiency we saw. > > Ken > > On Fri, 2012-01-20 at 18:20 +0100, Sebastian Rinke wrote: >> >> With >> >> >> * MLNX OFED stack tailored for GPUDirect >> * RHEL + kern

Re: [OMPI devel] GPUDirect v1 issues

2012-01-20 Thread Sebastian Rinke
> set CUDA_NIC_INTEROP=1 > > > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Sebastian Rinke > Sent: Wednesday, January 18, 2012 8:15 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] GPUDirect v1 issues > > Settin

Re: [OMPI devel] GPUDirect v1 issues

2012-01-18 Thread Sebastian Rinke
o longer necessary. > > Rolf > > > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Sebastian Rinke > Sent: Tuesday, January 17, 2012 5:22 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] GPUDirect v1 issues > > Th

Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Sebastian Rinke
pen-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Sebastian Rinke Sent: Tuesday, January 17, 2012 12:08 PM To: Open MPI Developers Subject: Re: [OMPI devel] GPUDirect v1 issues I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2. Attached you find a little test case which is based on

Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Sebastian Rinke
of any issues. Can you send me a test program and I can try > it out? > Which version of CUDA are you using? > > Rolf > >> -Original Message- >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] >> On Behalf Of Sebastian Rinke >>

[OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Sebastian Rinke
Dear all, I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking MPI_SEND/RECV to block forever. For two subsequent MPI_RECV, it hangs if the recv buffer pointer of the second recv points to somewhere, i.e. not at the beginning, in the recv buffer (previously allocated with cudaMal

[OMPI devel] RDMA with non-contiguous payload

2012-01-04 Thread Sebastian Rinke
Dear all, Playing around with GPUDirect v1 and Infiniband I noticed that once the payload is non-contiguous no RDMA is used at all. Can anybody confirm this? I'm using Open MPI 1.4.3. If the above is true, has this behavior changed with later versions of Open MPI? Thanks a lot. Best, Sebasti

Re: [OMPI devel] openib error for message size 1.5 GB

2011-06-07 Thread Sebastian Rinke
Worked. Thanks a lot! On Jun 7, 2011, at 6:43 AM, Mike Dubman wrote: > > Please try with "--mca mpi_leave_pinned 0" > > On Mon, Jun 6, 2011 at 4:16 PM, Sebastian Rinke wrote: > Dear all, > > While trying to send a message of size 1610612736 B (1.5 GB

[OMPI devel] openib error for message size 1.5 GB

2011-06-06 Thread Sebastian Rinke
Dear all, While trying to send a message of size 1610612736 B (1.5 GB), I get the following error: [[52363,1],1][../../../../../../ompi/mca/btl/openib/btl_openib_component.c:2951:handle_wc] from grsacc20 to: grsacc19 error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id

[OMPI devel] PML csum: checksum for RDMA transfers?

2010-01-19 Thread Sebastian Rinke
Hi, I'm using csum PML to detect errors in data transfers. Regarding RDMA transfers (in the pipline protocol for instance), is there an error checking enabled as well? TIA Sebastian

[OMPI devel] Data correctness checks in PML

2010-01-07 Thread Sebastian Rinke
Dear all, I'm looking for a way to make Open MPI check the correctness of data in message transfers. I.e. transmission errors in the data received should be detected and reported. Is there a way to activate this checking? Thanks a lot. Sebastian

[OMPI devel] ob1 question

2009-07-24 Thread Sebastian Rinke
Hello, Testing a new BTL component I get SIGSEGV in mca_pml_ob1_recv_request_progress_frag(). I found that recvreq points to an unmapped memory location. As far as I understand recvreq is taken directly from the PML header of the message received? To better understand the message flow, could yo

Re: [OMPI devel] BTL receive callback

2009-07-23 Thread Sebastian Rinke
I am curious if you are indeed using a new interconnect (new hardware and protocol) or if it is requirements of the 3D-torus network that are not addressed by the openib btl that are driving the need for a new btl? It is the first one. Sebastian. On 07/21/09 11:55, Sebastian Rinke

Re: [OMPI devel] BTL receive callback

2009-07-21 Thread Sebastian Rinke
ompi_convertor_pack and prepare_src please? george. On Jul 21, 2009, at 11:55 , Sebastian Rinke wrote: Hello, I am developing a new BTL component (Open MPI v1.3.2) for a new 3D-torus interconnect. During a simple message transfer of 16362 B between two nodes with MPI_Send(), MPI_Recv() I e

[OMPI devel] BTL receive callback

2009-07-21 Thread Sebastian Rinke
e diving into the ob1 code, could you tell me under which conditions this cb calls free() instead of send() so that I can get an idea of where to look for errors in my BTL component. Thank you very much in advance. Sebastian Rinke