Re: [OMPI devel] [bug/fix] correction of a small bug in conf file parsing

2013-06-21 Thread Piotr Lesnicki
Hi, 1) you are right, an eof in the middle of a section must be handled explicitly, otherwise it ends by an unrelated error later, just as it currently does. By the way, here it ends with [no-options-error] which has no corresponding message in 'help-opal-wrapper.txt'. I joined t

Re: [OMPI devel] [bug/fix] correction of a small bug in conf file parsing

2013-06-21 Thread Jeff Squyres (jsquyres)
On Jun 21, 2013, at 7:40 AM, Piotr Lesnicki wrote: > 1) you are right, an eof in the middle of a section must be > handled explicitly, otherwise it ends by an unrelated error later, > just as it currently does. By the way, here it ends with > [no-options-error] which has no corresponding m

Re: [OMPI devel] [bug/fix] correction of a small bug in conf file parsing

2013-06-21 Thread Piotr Lesnicki
Le 21/06/2013 15:03, Jeff Squyres (jsquyres) a écrit : Can you submit a combined patch when ready? Ok, i'll make a combined lex patch.

Re: [OMPI devel] Problem when using struct types at specific offsets

2013-06-21 Thread Thomas Jahns
Hello, On 04/08/2013 04:08 PM, Thomas Jahns wrote: > a colleague of mine has investigated a difficult problem we traced to OpenMPI > giving incorrectly delivered data on some struct datatypes which use specific > offsets (on the stack in our case but the problem can be reproduced when using > spec

[OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Rolf vandeVaart
I ran into a hang in a test in which the sender sends less data than the receiver is expecting. For example, the following shows the receiver expecting twice what the sender is sending. Rank 0: MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD) Rank 1: MPI_Recv(buf, BUFSIZE*2, MPI_INT, 0,

Re: [OMPI devel] Problem when using struct types at specific offsets

2013-06-21 Thread George Bosilca
Thomas, I'm not aware about any other issue with the datatypes. There might an easy way to see what the issue with your application is. If you can debug your application, and know exactly which datatype has problems, then attach with gdb and call ompi_datatype_dump(type), where type is the data

Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Nathan Hjelm
I thought I fixed this problem awhile back (though looking at the code its possible I never committed the fix). I will have to look through my local repository and see what happened to that fix. Your fix might not work correctly since a RGET can be broken up into multiple get operations. It may

Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread George Bosilca
The amount of bytes received is atomically updated on the completion callback, and the completion test is clearly spelled-out int the recv_request_pml_complete_check function (of course minus the lock part). Rolf I think your patch is correct. That being said req_bytes_expected is a special val

Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Nathan Hjelm
Found my original fix (still don't know why I never pushed it) and I think George is correct. This should in both the single and multiple get cases. -Nathan On Fri, Jun 21, 2013 at 05:52:28PM +0200, George Bosilca wrote: > The amount of bytes received is atomically updated on the completion > c

Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Jeff Squyres (jsquyres)
Does this need to go to v1.6? On Jun 21, 2013, at 11:59 AM, Nathan Hjelm wrote: > Found my original fix (still don't know why I never pushed it) and I think > George is correct. This should in both the single and multiple get cases. > > -Nathan > > On Fri, Jun 21, 2013 at 05:52:28PM +0200, Ge

Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread Nathan Hjelm
I don't think so. The Mellanox change that caused this issue should not be in 1.6. -Nathan On Fri, Jun 21, 2013 at 05:18:16PM +, Jeff Squyres (jsquyres) wrote: > Does this need to go to v1.6? > > On Jun 21, 2013, at 11:59 AM, Nathan Hjelm wrote: > > > Found my original fix (still don't kn