should be fixed.
thanks
On Tue, May 13, 2014 at 2:53 AM, Joshua Ladd wrote:
> Yes. Will look into it.
>
> Josh
>
>
> On Mon, May 12, 2014 at 6:01 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> Ah; I guess the tags aren't getting pulled over.
>>
>> Mellanox -- can you check into
Hmm. The last tag I see on github is still 1.7.2.
On May 13, 2014, at 2:11 AM, Mike Dubman wrote:
> should be fixed.
> thanks
>
>
> On Tue, May 13, 2014 at 2:53 AM, Joshua Ladd wrote:
> Yes. Will look into it.
>
> Josh
>
>
> On Mon, May 12, 2014 at 6:01 PM, Jeff Squyres (jsquyres)
> wr
I see a v1.8.1, but no v1.8.0, is that correct?
Andrew
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff
> Squyres (jsquyres)
> Sent: Tuesday, May 13, 2014 3:15 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] OMPI v1.8.x git tags?
>
> Hmm.
I think Mellanox is still working on it.
On May 13, 2014, at 10:57 AM, "Friedley, Andrew"
wrote:
> I see a v1.8.1, but no v1.8.0, is that correct?
>
> Andrew
>
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff
>> Squyres (jsquyres)
>> Sent: Tu
Open MPI 1.6:
- Release was waiting on
https://svn.open-mpi.org/trac/ompi/ticket/3079 but during meeting we decided it
was not necessary. Therefore, Jeff will go ahead and roll Open MPI 1.6.6 RC1.
Open MPI 1.8:
- Several tickets have been applied. Some discussion about other
I notice that BTLs are not checking the return value from ompi_modex_recv() for
OPAL_ERR_DATA_VALUE_NOT_FOUND (indicating that the peer process didn't put that
modex key). In the BTL context, NOT_FOUND means that that peer process doesn't
have this BTL, so this local peer process should probabl
Now that the 1.8 series is out, we're going to do one final release in the
1.6.x series, just so that the few bug fixes that came in after 1.6.5 can get
out into the world (for those who are unable to upgrade to the v1.8 series).
1.6.6rc1 has been posted:
http://www.open-mpi.org/software/om
While tracking down memory leaks in components I ran into an interesting
issue. osc/rdma uses an opal_free_list_t (not an ompi_free_list_t) for
buffer fragments. The fragment class allocates a buffer as part in the
constructor and frees the buffer in the destructor. The problem is that
the item con
Folks,
i would like to comment on r31738 :
> There is no reason to cancel the listening thread. It should die
> automatically when the file descriptor is closed.
i could not agree more
> It is sufficient to just wait for the thread to exit with pthread join.
unfortunatly, at least in my test envi
It could be a bug in the software stack, though I wouldn't count on it.
Unfortunately, pthread_cancel is known to have bad side effects, and so we
avoid its use.
The key here is that the thread must detect that the file descriptor has closed
and exit, or use some other method for detecting that
I heard multiple references to pthread_cancel being known to have bad
side effects. Can somebody educate my on this topic please?
Thanks,
George.
On Tue, May 13, 2014 at 10:25 PM, Ralph Castain wrote:
> It could be a bug in the software stack, though I wouldn't count on it.
> Unfortunat
Ralph,
scif_poll(...) is called with an infinite timeout.
a quick fix would be to use a finite timeout (1s ? 10s ? more ?)
the obvious drawback is the thread has to wake up every xxx seconds and
that would be for
nothing 99.9% of the time.
my analysis (see #4615) is the crash occurs when the btl
George,
Just my USD0.02:
With pthreads many system calls (mostly those that might block) become
"cancellation points" where the implementation checks if the callinf thread
has been cancelled.
This means that a thread making any of those calls may simply never return
(calling pthread_exit() intern
As I said, this isn't the only thread that faces this issue, and we have
resolved it elsewhere - surely we can resolve it here as well in an acceptable
manner.
Nathan?
On May 13, 2014, at 7:33 PM, Gilles Gouaillardet
wrote:
> Ralph,
>
> scif_poll(...) is called with an infinite timeout.
>
+1 - seen it before, and you'll find warnings across many software sites about
this problem. Easy to have the main program segfault by touching the wrong
thing after a cancel unless all the stars are properly aligned in the various
libraries.
On May 13, 2014, at 7:56 PM, Paul Hargrove wrote:
15 matches
Mail list logo