To that point, where exactly in the openib BTL init / query sequence is it
returning an error for you, Sylvain? Is it just a matter of tidying something
up properly before returning the error?
On May 28, 2010, at 2:21 PM, George Bosilca wrote:
> On May 28, 2010, at 10:03 , Sylvain Jeaugey wro
On May 28, 2010, at 10:03 , Sylvain Jeaugey wrote:
> On Fri, 28 May 2010, Jeff Squyres wrote:
>
>> On May 28, 2010, at 9:32 AM, Jeff Squyres wrote:
>>
>>> Understood, and I agreed that the bug should be fixed. Patches would be
>>> welcome. :-)
> I sent a patch on the bml layer in my first e-m
On Fri, 28 May 2010, Jeff Squyres wrote:
On May 28, 2010, at 9:32 AM, Jeff Squyres wrote:
Understood, and I agreed that the bug should be fixed. Patches would
be welcome. :-)
I sent a patch on the bml layer in my first e-mail. We will apply it on
our tree, but as always we're trying to send
On May 28, 2010, at 9:32 AM, Jeff Squyres wrote:
>> So please, fix the bug first, then if you want that "automatic failover to
>> TCP" feature, develop it. Put a parameter for an eventual notification, or
>> abort, or whatever you want. But it doesn't exist today. It just doesn't
>> work, with any
On May 28, 2010, at 7:19 AM, Sylvain Jeaugey wrote:
> So please, fix the bug first, then if you want that "automatic failover to
> TCP" feature, develop it. Put a parameter for an eventual notification, or
> abort, or whatever you want. But it doesn't exist today. It just doesn't
> work, with any
On Fri, 28 May 2010, Jeff Squyres wrote:
Herein lies the quandary: we don't/can't know the user or sysadmin
intent. They may not care if the IB is borked -- they might just want
the job to fall over to TCP and continue. But they may care a lot if IB
is borked -- they might want the job to ab
On May 28, 2010, at 6:04 AM, Sylvain Jeaugey wrote:
> Having errors on add_procs stop the application seems a good thing in all
> cases, so why not do it ? That would solve my original problem which lead
> to this discussion.
>
> Yes, the openib BTL may be suboptimal (stopping the application ins
On Thu, 27 May 2010, Jeff Squyres wrote:
On May 27, 2010, at 10:32 AM, Sylvain Jeaugey wrote:
That's pretty much my first proposition : abort when an error arises,
because if we don't, we'll crash soon afterwards. That's my original
concern and this should really be fixed.
Now, if you want to
13th, 14th question are as follows:
(13) Some messages are not shown even though --mca snapc_base_verbose parameter
is used.
Framework : snapc
Component : full
The source file : orte/mca/snapc/base/snapc_base_open.c
The function name : orte_snapc_base_open
I think that the fo
Hi,Josh
>https://svn.open-mpi.org/trac/ompi/ticket/2397
Thank you very much for filing my questions to ticket system.
Now I have 3 new questions and I will post them.
Regards,
Takayuki Seki
12th question is as follows:
(12) Checkpointing of an MPI job which uses two (or more?) openib btl modu
10 matches
Mail list logo