I am not saying that remaining_len is correct or that mpich is bugless :-) I am only saying that decrementing remaining_len as proposed breaks the pmi2 protocol as since the client sends a wrong length to the server. We actually applied the very same fix in the past and then had to undo it commit: 1e25eb10df0f.

Funny enough Ralph was one of those that reported that broken protocol bug :-) "mpi/pmi2: no value for key %s in req".

We can either use sprintf() as there is no need to use snprintf() here or do snprintf(c, remaining_len - PMII_COMMANDLEN_SIZE, "thrid=%p;", resp);

On 10/03/2013 09:50 AM, Ralph Castain wrote:

I'm afraid that isn't quite correct, David. If you walk thru the code, you'll 
see that the snprintf is being given an incorrect length for the buffer. When 
using some libc versions, snprintf automatically fills the given buffer with 
NULLs to ensure that the string is NULL-terminated. When it does so, the code 
as currently written causes a segfault.

We have tested the provided change and it fixes the problem. If your code review 
indicates that remaining_len should not be changed where we did it, then you need to at 
least change the snprintf command to reflect the reduced size of the "c" buffer.

On Oct 3, 2013, at 9:44 AM, David Bigagli <da...@schedmd.com> wrote:



Hi,
   I don't know the details of the segfault but the code in question is 
correct. If you decrease the length then the file cmdlen:

cmdlen = PMII_MAX_COMMAND_LEN - remaining_len;

will not be correct and wrong length will be sent to the pmi2 server.

This code is taken verbatim from mpich2-1.5, precisely from: 
mpich2-1.5/src/pmi/pmi2/simple2pmi.c

On 10/03/2013 08:47 AM, Ralph Castain wrote:

Hi folks

We have uncovered a segfault-causing bug in contribs/pmi2/pmi2_api.c. Looking 
at the code in the current trunk, you need to add the following at line 1499:

     remaining_len -= PMII_COMMANDLEN_SIZE;

as you have changed the location of the pointer to the buffer, but failed to 
account for that change in the size param being later passed to snprintf.

Could you please fix this and include it in 2.6.3?

Thanks
Ralph


--

Thanks,
      /David

--

Thanks,
      /David

Reply via email to