Re: [hwloc-users] Travis CI unit tests failing with HW "operating system" error

2018-09-14 Thread Jeff Hammond
With HWLOC_COMPONENTS=no_os, MPICH is now working fine but all tests now
fail with Open-MPI (see below).  I know how to resolve this, but am noting
it for the benefit of others.

--
All nodes which are allocated for this job are already filled.
--

Jeff

On Thu, Sep 13, 2018 at 10:36 PM, Brice Goglin 
wrote:

> If lstopo fails there, run "hwloc-gather-topology foo" and send foo.tar.bz2
>
> As a workaround for ARMCI, you may try setting HWLOC_COMPONENTS=no_os,stop
> in the environment so that hwloc behaves as if the operating system had no
> topology support.
>
> Brice
>
>
>
> Le 14/09/2018 à 06:11, Jeff Hammond a écrit :
>
> All of the job failures have this warning so I am inclined to think they
> are related.  I don't know what I should expect from lstopo on inside of
> AWS, but I guess I'll try it.
>
> I was using the hwloc shipped with the mpich-3.3b1.  Talk to the MPICH
> team if you want them to upgrade :-)
>
> Jeff
>
> On Thu, Sep 13, 2018 at 8:42 AM, Brice Goglin 
> wrote:
>
>> This is actually just a warning. Usually it causes the topology to be
>> wrong (like a missing object), but it shouldn't prevent the program from
>> working. Are you sure your programs are failing because of hwloc? Do you
>> have a way to run lstopo on that node?
>>
>> By the way, you shouldn't use hwloc 2.0.0rc2, at least because it's old,
>> it has a broken ABI, and it's a RC :)
>>
>> Brice
>>
>>
>>
>> Le 13/09/2018 à 16:12, Jeff Hammond a écrit :
>>
>> I am running ARMCI-MPI over MPICH in a Travis CI Linux instance and
>> topology is causing it to fail.  I do not care about topology in a
>> virtualized environment.  How do I fix this?
>>
>> 
>> 
>> * hwloc 2.0.0rc2-git has encountered what looks like an error from the
>> operating system.
>> *
>> * Group0 (cpuset 0x,0x) intersects with L3 (cpuset
>> 0x1000,0x0212) without inclusion!
>> * Error occurred in topology.c line 1384
>> *
>> * The following FAQ entry in the hwloc documentation may help:
>> *   What should I do when hwloc reports "operating system" warnings?
>> * Otherwise please report this error message to the hwloc user's mailing
>> list
>> * along with the files generated by the hwloc-gather-topology script.
>> 
>> 
>>
>> https://travis-ci.org/jeffhammond/armci-mpi/jobs/425342479 has all of
>> the details.
>>
>> Jeff
>>
>>
>> --
>> Jeff Hammond
>> jeff.scie...@gmail.com
>> http://jeffhammond.github.io/
>>
>>
>> ___
>> hwloc-users mailing 
>> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users
>>
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>>
>
>
>
> --
> Jeff Hammond
> jeff.scie...@gmail.com
> http://jeffhammond.github.io/
>
>
> ___
> hwloc-users mailing 
> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
>
>
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] Travis CI unit tests failing with HW "operating system" error

2018-09-14 Thread Madhu, Kavitha Tiptur
We will upgrade the hwloc submodule used in MPICH asap. IIRC, we have supressed 
hwloc warnings as well. I will double check this.

Kavitha

On Sep 14, 2018, at 12:36 AM, Brice Goglin 
mailto:brice.gog...@inria.fr>> wrote:


If lstopo fails there, run "hwloc-gather-topology foo" and send foo.tar.bz2

As a workaround for ARMCI, you may try setting HWLOC_COMPONENTS=no_os,stop in 
the environment so that hwloc behaves as if the operating system had no 
topology support.

Brice


Le 14/09/2018 à 06:11, Jeff Hammond a écrit :
All of the job failures have this warning so I am inclined to think they are 
related.  I don't know what I should expect from lstopo on inside of AWS, but I 
guess I'll try it.

I was using the hwloc shipped with the mpich-3.3b1.  Talk to the MPICH team if 
you want them to upgrade :-)

Jeff

On Thu, Sep 13, 2018 at 8:42 AM, Brice Goglin 
mailto:brice.gog...@inria.fr>> wrote:

This is actually just a warning. Usually it causes the topology to be wrong 
(like a missing object), but it shouldn't prevent the program from working. Are 
you sure your programs are failing because of hwloc? Do you have a way to run 
lstopo on that node?

By the way, you shouldn't use hwloc 2.0.0rc2, at least because it's old, it has 
a broken ABI, and it's a RC :)

Brice


Le 13/09/2018 à 16:12, Jeff Hammond a écrit :
I am running ARMCI-MPI over MPICH in a Travis CI Linux instance and topology is 
causing it to fail.  I do not care about topology in a virtualized environment. 
 How do I fix this?


* hwloc 2.0.0rc2-git has encountered what looks like an error from the 
operating system.
*
* Group0 (cpuset 0x,0x) intersects with L3 (cpuset 
0x1000,0x0212) without inclusion!
* Error occurred in topology.c line 1384
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list
* along with the files generated by the hwloc-gather-topology script.


https://travis-ci.org/jeffhammond/armci-mpi/jobs/425342479 has all of the 
details.

Jeff


--
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/



___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users



--
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/



___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] Travis CI unit tests failing with HW "operating system" error

2018-09-13 Thread Jeff Hammond
All of the job failures have this warning so I am inclined to think they
are related.  I don't know what I should expect from lstopo on inside of
AWS, but I guess I'll try it.

I was using the hwloc shipped with the mpich-3.3b1.  Talk to the MPICH team
if you want them to upgrade :-)

Jeff

On Thu, Sep 13, 2018 at 8:42 AM, Brice Goglin  wrote:

> This is actually just a warning. Usually it causes the topology to be
> wrong (like a missing object), but it shouldn't prevent the program from
> working. Are you sure your programs are failing because of hwloc? Do you
> have a way to run lstopo on that node?
>
> By the way, you shouldn't use hwloc 2.0.0rc2, at least because it's old,
> it has a broken ABI, and it's a RC :)
>
> Brice
>
>
>
> Le 13/09/2018 à 16:12, Jeff Hammond a écrit :
>
> I am running ARMCI-MPI over MPICH in a Travis CI Linux instance and
> topology is causing it to fail.  I do not care about topology in a
> virtualized environment.  How do I fix this?
>
> 
> 
> * hwloc 2.0.0rc2-git has encountered what looks like an error from the
> operating system.
> *
> * Group0 (cpuset 0x,0x) intersects with L3 (cpuset
> 0x1000,0x0212) without inclusion!
> * Error occurred in topology.c line 1384
> *
> * The following FAQ entry in the hwloc documentation may help:
> *   What should I do when hwloc reports "operating system" warnings?
> * Otherwise please report this error message to the hwloc user's mailing
> list
> * along with the files generated by the hwloc-gather-topology script.
> 
> 
>
> https://travis-ci.org/jeffhammond/armci-mpi/jobs/425342479 has all of the
> details.
>
> Jeff
>
>
> --
> Jeff Hammond
> jeff.scie...@gmail.com
> http://jeffhammond.github.io/
>
>
> ___
> hwloc-users mailing 
> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
>
>
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] Travis CI unit tests failing with HW "operating system" error

2018-09-13 Thread Brice Goglin
This is actually just a warning. Usually it causes the topology to be
wrong (like a missing object), but it shouldn't prevent the program from
working. Are you sure your programs are failing because of hwloc? Do you
have a way to run lstopo on that node?

By the way, you shouldn't use hwloc 2.0.0rc2, at least because it's old,
it has a broken ABI, and it's a RC :)

Brice



Le 13/09/2018 à 16:12, Jeff Hammond a écrit :
> I am running ARMCI-MPI over MPICH in a Travis CI Linux instance and
> topology is causing it to fail.  I do not care about topology in a
> virtualized environment.  How do I fix this?
>
> 
> * hwloc 2.0.0rc2-git has encountered what looks like an error from the
> operating system.
> *
> * Group0 (cpuset 0x,0x) intersects with L3 (cpuset
> 0x1000,0x0212) without inclusion!
> * Error occurred in topology.c line 1384
> *
> * The following FAQ entry in the hwloc documentation may help:
> *   What should I do when hwloc reports "operating system" warnings?
> * Otherwise please report this error message to the hwloc user's
> mailing list
> * along with the files generated by the hwloc-gather-topology script.
> 
>
> https://travis-ci.org/jeffhammond/armci-mpi/jobs/425342479 has all of
> the details.
>
> Jeff
>
>
> --
> Jeff Hammond
> jeff.scie...@gmail.com 
> http://jeffhammond.github.io/
>
>
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

[hwloc-users] Travis CI unit tests failing with HW "operating system" error

2018-09-13 Thread Jeff Hammond
I am running ARMCI-MPI over MPICH in a Travis CI Linux instance and
topology is causing it to fail.  I do not care about topology in a
virtualized environment.  How do I fix this?


* hwloc 2.0.0rc2-git has encountered what looks like an error from the
operating system.
*
* Group0 (cpuset 0x,0x) intersects with L3 (cpuset
0x1000,0x0212) without inclusion!
* Error occurred in topology.c line 1384
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing
list
* along with the files generated by the hwloc-gather-topology script.


https://travis-ci.org/jeffhammond/armci-mpi/jobs/425342479 has all of the
details.

Jeff


--
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users