I am not saying that remaining_len is correct or that mpich is bugless
:-) I am only saying that decrementing remaining_len as proposed breaks
the pmi2 protocol as since the client sends a wrong length to the
server. We actually applied the very same fix in the past and then had
to undo it commit: 1e25eb10df0f.
Funny enough Ralph was one of those that reported that broken protocol
bug :-) "mpi/pmi2: no value for key %s in req".
We can either use sprintf() as there is no need to use snprintf() here
or do snprintf(c, remaining_len - PMII_COMMANDLEN_SIZE, "thrid=%p;", resp);
On 10/03/2013 09:50 AM, Ralph Castain wrote:
I'm afraid that isn't quite correct, David. If you walk thru the code, you'll
see that the snprintf is being given an incorrect length for the buffer. When
using some libc versions, snprintf automatically fills the given buffer with
NULLs to ensure that the string is NULL-terminated. When it does so, the code
as currently written causes a segfault.
We have tested the provided change and it fixes the problem. If your code review
indicates that remaining_len should not be changed where we did it, then you need to at
least change the snprintf command to reflect the reduced size of the "c" buffer.
On Oct 3, 2013, at 9:44 AM, David Bigagli <da...@schedmd.com> wrote:
Hi,
I don't know the details of the segfault but the code in question is
correct. If you decrease the length then the file cmdlen:
cmdlen = PMII_MAX_COMMAND_LEN - remaining_len;
will not be correct and wrong length will be sent to the pmi2 server.
This code is taken verbatim from mpich2-1.5, precisely from:
mpich2-1.5/src/pmi/pmi2/simple2pmi.c
On 10/03/2013 08:47 AM, Ralph Castain wrote:
Hi folks
We have uncovered a segfault-causing bug in contribs/pmi2/pmi2_api.c. Looking
at the code in the current trunk, you need to add the following at line 1499:
remaining_len -= PMII_COMMANDLEN_SIZE;
as you have changed the location of the pointer to the buffer, but failed to
account for that change in the size param being later passed to snprintf.
Could you please fix this and include it in 2.6.3?
Thanks
Ralph
--
Thanks,
/David
--
Thanks,
/David