Paul,
the latest master nightly snapshot does include the fix, and i made PRs
for v2.x and v1.10
Cheers,
Gilles
On 9/28/2015 6:29 PM, Gilles Gouaillardet wrote:
Thanks Brice,
I will do the PR for the various ompi branches from tomorrow
Cheers,
Gilles
Brice Goglin wrote:
Sorry, I didn't
Thanks Brice,
I will do the PR for the various ompi branches from tomorrow
Cheers,
Gilles
Brice Goglin wrote:
>Sorry, I didn't see this report before the pull request.
>
>I applied Gilles' "simple but arguable" fix to master and stable branches up
>to v1.9. It could be too imperfect if somebo
Sorry, I didn't see this report before the pull request.
I applied Gilles' "simple but arguable" fix to master and stable
branches up to v1.9. It could be too imperfect if somebody ever changes
to permissions of /devices/pci* but I guess that's not going to happen
in practice. Finding the right de
Paul and Brice,
the error message is displayed by libpciaccess when hwloc invokes
pci_system_init
on Solaris :
crw--- 1 root sys 182, 253 Sep 28 10:55 /devices/pci@0,0:reg
from libpciaccess
snprintf(nexus_path, sizeof(nexus_path), "/devices%s", nexus_name);
if ((fd = op
FYI:
Things look fine today with last night's master tarball.
I hope Brice has a way to eliminate the hwloc warning, since I am sure I am
not the only one with scripts that will notice "Error" in the output.
-Paul
On Wed, Sep 23, 2015 at 6:08 PM, Ralph Castain wrote:
> Aha! Thanks - just what
Aha! Thanks - just what the doctor ordered!
> On Sep 23, 2015, at 5:45 PM, Gilles Gouaillardet wrote:
>
> Ralph,
>
> the root cause is
> getsockopt(..., SOL_SOCKET, SO_RCVTIMEO,...)
> fails with errno ENOPROTOOPT on solaris 11.2
>
> the attached patch is a proof of concept and works for me :
Ralph,
the root cause is
getsockopt(..., SOL_SOCKET, SO_RCVTIMEO,...)
fails with errno ENOPROTOOPT on solaris 11.2
the attached patch is a proof of concept and works for me :
/* if ENOPROTOOPT, do not try to set and restore SO_RCVTIMEO */
Cheers,
Gilles
On 9/21/2015 2:16 PM, Paul Hargrove wro
Ralph,
Just as you say:
The first 64s pause was before the hwloc error message appeared.
The second was after the second server_setup_fork appears, and before
whatever line came after that.
I don't know if stdio buffering my be "distorting" the placement of the
pause relative to the lines of outp
?? Just so this old fossilized brain gets this right: you are saying there was
a 64s pause before the hwloc error appeared, and then another 64s pause after
the second server_setup_fork message appeared?
If that’s true, then I’m chasing the wrong problem - it sounds like something
is messed up
Ralph,
Still failing with that patch, but with the addition of a fairly long pause
(64s) before the first error message appears, and again after the second
"server setup_fork" (64s again)
New output is attached.
-Paul
On Sun, Sep 20, 2015 at 2:15 PM, Ralph Castain wrote:
> Argh - found a typo
Argh - found a typo in the output line. Could you please try the attached patch and do it again? This might fix it, but if not it will provide me with some idea of the returned error.ThanksRalph
paul.diff
Description: Binary data
On Sep 20, 2015, at 12:40 PM, Paul Hargrove wro
Yes, it is definitely at 10.
Another attempt is attached.
-Paul
On Sun, Sep 20, 2015 at 8:19 AM, Ralph Castain wrote:
> Paul - can you please confirm that you gave mpirun a level of 10 for the
> pmix_base_verbose param? This output isn’t what I would have expected from
> that level - it looks mo
Paul - can you please confirm that you gave mpirun a level of 10 for the
pmix_base_verbose param? This output isn’t what I would have expected from that
level - it looks more like the verbosity was set to 5, and so the error number
isn’t printed.
Thanks
Ralph
> On Sep 20, 2015, at 3:42 AM, Gi
Paul,
I do not remember it like that ...
at that time, the issue in ompi was that the global errno was uses instead
of the per thread errno.
though the man pages tells -mt should be used fir multithreaded apps, you
tried -D_REENTRANT on all your platforms, and it was enough to get the
expected re
Gilles,
Yes every $CC invocation in opal/mca/pmix/pmix1xx includes "-D_REENTRANT".
However, they don't include "-mt".
I believe we concluded (when we had problems previously) that "-mt" was the
proper flag (at compile and link) for multi-threaded with the Studio
compilers.
-Paul
On Sat, Sep 19,
Paul,
Can you please double check pmix1xx is compiled with -D_REENTRANT ?
We ran into similar issues in the past, and they only occurred with Solaris
Cheers,
Gilles
On Sunday, September 20, 2015, Paul Hargrove wrote:
> Ralph,
> The output from the requested run is attached.
> -Paul
>
> On Sat
Ralph,
The output from the requested run is attached.
-Paul
On Sat, Sep 19, 2015 at 9:46 PM, Ralph Castain wrote:
> Ah, okay - that makes more sense. I’ll have to let Brice see if he can
> figure out how to silence the hwloc error message as I can’t find where it
> came from. The other errors ar
Ah, okay - that makes more sense. I’ll have to let Brice see if he can figure
out how to silence the hwloc error message as I can’t find where it came from.
The other errors are real and are the reason why the job was terminated.
The problem is that we are trying to establish a communication bet
Ralph,
No it did not run.
The complete output (which I really should have included in the first
place) is below.
-Paul
$ mpirun -mca btl sm,self -np 2 examples/ring_c'
Error opening /devices/pci@0,0:reg: Permission denied
[pcp-d-3:26054] PMIX ERROR: ERROR in file
/export/home/phargrov/OMPI/openm
Paul, can you clarify something for me? The error in this case indicates that
the client wasn’t able to reach the daemon - this should have resulted in
termination of the job. Did the job actually run?
> On Sep 18, 2015, at 2:50 AM, Ralph Castain wrote:
>
> I'm on travel right now, but it sho
I'm on travel right now, but it should be an easy fix when I return. Sorry
for the annoyance
On Thu, Sep 17, 2015 at 11:13 PM, Paul Hargrove wrote:
> Any suggestion how I (as a non-root user) can avoid seeing this hwloc
> error message on every run?
>
> -Paul
>
> On Thu, Sep 17, 2015 at 11:00 P
Any suggestion how I (as a non-root user) can avoid seeing this hwloc error
message on every run?
-Paul
On Thu, Sep 17, 2015 at 11:00 PM, Gilles Gouaillardet
wrote:
> Paul,
>
> IIRC, the "Permission denied" is coming from hwloc that cannot collect all
> the info it would like.
>
> Cheers,
>
> G
Paul,
IIRC, the "Permission denied" is coming from hwloc that cannot collect
all the info it would like.
Cheers,
Gilles
On 9/18/2015 2:34 PM, Paul Hargrove wrote:
Tried tonight's master tarball on Solaris 11.2 on x86-64 with the
Studio Compilers (default ILP32 output) and saw the following
Tried tonight's master tarball on Solaris 11.2 on x86-64 with the Studio
Compilers (default ILP32 output) and saw the following result
$ mpirun -mca btl sm,self -np 2 examples/ring_c'
Error opening /devices/pci@0,0:reg: Permission denied
[pcp-d-4:00492] PMIX ERROR: ERROR in file
/export/home/phar
24 matches
Mail list logo