Eugene Loh wrote:
Ralph Castain wrote:
Hi Bryan
I have seen similar issues on LANL clusters when message sizes were
fairly large. How big are your buffers when you call Allreduce? Can
you send us your Allreduce call params (e.g., the reduce operation,
datatype, num elements)?
If you do
Eugene Loh wrote:
Ralph Castain wrote:
Hi Bryan
I have seen similar issues on LANL clusters when message sizes were
fairly large. How big are your buffers when you call Allreduce? Can
you send us your Allreduce call params (e.g., the reduce operation,
datatype, num elements)?
If you do
Ralph Castain wrote:
Hi Bryan
I have seen similar issues on LANL clusters when message sizes were
fairly large. How big are your buffers when you call Allreduce? Can
you send us your Allreduce call params (e.g., the reduce operation,
datatype, num elements)?
If you don't want to send th
Dear Paul & all,
On Monday 18 May 2009 03:19:48 pm Paul H. Hargrove wrote:
> IMHO there are two distinct issues being entangled here.
> 1) Flagging deprecated functionality
> 2) Informing the user about a change of compiler (possibly as an
> #error or #warning)
>
> I understand why solving #1 r
IMHO there are two distinct issues being entangled here.
1) Flagging deprecated functionality
2) Informing the user about a change of compiler (possibly as an
#error or #warning)
I understand why solving #1 requires detecting the compiler change to
avoid a "bad attribute" (see BACKGROUND, be
It's certainly helped and now runs for me however if I run mpirun under
valgrind and then opmi-ps in another window Valgrind reports errors and
ompi-ps doesn't list the job so there is clearly something still amiss.
I'm trying to do some more diagnostics now.
==32362== Syscall param writev(vector
Aha! Thanks for spotting the problem - I had to move that var init to
cover all cases, but it should be working now with r21249
On May 18, 2009, at 8:08 AM, Ashley Pittman wrote:
Ralph,
This patch fixed it, num_nodes was being used initialised and hence
the
client was getting a bogus v
Bizarre - it works perfectly for me. Is it possible you have stale
libraries around? Or are attempting to connect to older versions of
mpirun?
You might also try cleaning out any old session dirs just to be safe -
my best guess is that you are connecting to an older version of mpirun
and
Ralph,
This patch fixed it, num_nodes was being used initialised and hence the
client was getting a bogus value for the number of nodes.
Ashley,
On Mon, 2009-05-18 at 10:09 +0100, Ashley Pittman wrote:
> No joy I'm afraid, now I get errors when I run it. This is a single
> node job run with t
Agreed. Being able to handle such scenarios properly is one of the
reasons that Rainer and I are iterating on this in a mercurial branch.
On May 18, 2009, at 7:39 AM, Brian Barrett wrote:
I think care must be taken on this front. While I know we don't like
to admit it, there is no reason t
I think care must be taken on this front. While I know we don't like
to admit it, there is no reason the C compilers have to match, and
indeed good reasons they might not. For example, at LANL, we
frequently compiled OMPI with GCC, then fixed up the wrapper compilers
to use Icc or whatever,
No joy I'm afraid, now I get errors when I run it. This is a single
node job run with the command line "mpirun -n 3 ./a.out". I've attached
the strace output and gzipped /tmp files from the machine. Valgrind on
the opmi-ps process doesn't show anything interesting.
[alpha:29942] [[35044,0],0]
What: Warn user about deprecated MPI functionality and "wrong" compiler usage
Why: Because deprecated MPI functions, are ... deprecated
Where: On trunk
When: Apply on trunk before branching for v1.5 (it is user-visible)
Timeout: 1 weeks - May 26, 2009 after the teleconf.
--
13 matches
Mail list logo