is to my attention,
https://github.com/open-mpi/ompi/issues/3003
Yes this is exactly the issue!
So just in case anybody run into the same issue again ...
On Tue, Mar 7, 2017 at 12:59 PM, Yong Qin wrote:
> OK, did some testing with MVAPICH and everything is normal so this is
> clearly wi
OK, did some testing with MVAPICH and everything is normal so this is
clearly with OMPI. Is there anything that I should try?
Thanks,
Yong Qin
On Mon, Mar 6, 2017 at 11:46 AM, Yong Qin wrote:
> Hi,
>
> I'm wondering if anybody who has done perf testing on Mellanox EDR with
&
~10.6 GB/s, latency
~ 1.0 us.
So I'm wondering if I'm missing anything in the OMPI setup that causes such
a huge delta? OMPI command was simplely: mpirun -np 2 -H host1,host2 -mca
btl openib,sm,self osu_bw
Thanks,
Yong Qin
___
users mai
ko0 echo ok
> if that is the case, then I have no idea why we are doing this ...
>
> Cheers,
>
> Gilles
>
> On Thursday, August 27, 2015, Yong Qin wrote:
>
>> > regardless of number of nodes
>>
>> No, this is not true. I was referring to this specific test, which wa
ng as the DN was
> the same, regardless of number of nodes. It only failed when the DN’s of
> the nodes differed.
>
>
> On Aug 25, 2015, at 3:31 PM, Yong Qin wrote:
>
> Of course! I blame that two-node test distracted me from checking all the
> FQDN relevant parameters. :)
&g
mes 1 on your mpirun line, or
> set the equivalent MCA param
>
>
> > On Aug 25, 2015, at 2:24 PM, Yong Qin wrote:
> >
> > Hi,
> >
> > This has been bothering me for a while but I never got a chance to
> identify the root cause. I know this issue could be
00 ~]$ mpirun -V
mpirun (Open MPI) 1.10.0
[yqin@n0009.scs00 ~]$ mpirun -np 2 -H n0189.mako0,n0233.mako0 hostname
n0189.mako0
n0233.mako0
The issue only exposes itself when more than 2 nodes are involved. Any
thoughts?
Thanks,
Yong Qin
code.
>
> I'd run your code through valgrind or some other memory-checking debugger and
> see if that can shed any light on what's going on.
>
>
> On Sep 6, 2012, at 12:06 AM, Yong Qin wrote:
>
>> Hi,
>>
>> While debugging a mysterious crash of a code,
in M_SIESTA_FORCES::siesta_forces
(istep=@0xf9a4d070) at siesta_forces.F:90
#8 0x0070e475 in siesta () at siesta.F:23
#9 0x0045e47c in main ()
Can anybody shed some light here on what could be wrong?
Thanks,
Yong Qin
Yes, so far this has only been observed in VASP and a specific dataset.
Thanks,
On Wed, Sep 5, 2012 at 4:52 AM, Yevgeny Kliteynik
wrote:
> On 9/4/2012 7:21 PM, Yong Qin wrote:
>> On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik
>> wrote:
>>> On 8/30/2012 10:28 PM, Yon
On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik
wrote:
> On 8/30/2012 10:28 PM, Yong Qin wrote:
>> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres wrote:
>>> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote:
>>>
>>>> This issue has been observed on OMPI 1.6 an
On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres wrote:
> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote:
>
>> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but
>> not on 1.4.5 (tcp btl is always fine). The application is VASP and
>> only one specific data
s and any advice is appreciated.
Yong Qin
I'm confused now :).
I thought that's what I did, removing "mca_btl_mx.la" and
"mca_btl_mx.so". This is the MX BTL plugin, right?
On Fri, Jun 29, 2012 at 11:16 AM, Jeff Squyres wrote:
> On Jun 29, 2012, at 2:14 PM, Yong Qin wrote:
>
>> Thanks J
Thanks Jeff for the doc. However I'm not sure if I understand your
following comment correctly. If I remove the MX BTL plugins, a.k.a.,
mca_btl_mx.la and mca_btl_mx.so, I'm now getting errors of these
components not found.
[n0026.hbar:09467] mca: base: component_find: unable to open
.../mca_btl_mx
l do
the select automatically? Also that error message "mx_finalize"
doesn't look right either.
Thanks,
Yong
On Fri, Jun 15, 2012 at 6:41 AM, Jeff Squyres wrote:
> On Jun 11, 2012, at 7:48 PM, Yong Qin wrote:
>
>> ah, I guess my original understanding of PML was wrong.
ferent than the original gm BTL.
Thanks,
Yong Qin
On Mon, Jun 11, 2012 at 3:59 PM, Aurélien Bouteiller
wrote:
>
> Le 11 juin 2012 à 18:57, Aurélien Bouteiller a écrit :
>
>> Hi,
>>
>> If some mx devices are found, the logic is not only to use the mx BTL but
>> also to
's going to affect
the mx BTL.
Thanks,
Yong Qin
On Mon, Jun 11, 2012 at 3:57 PM, Aurélien Bouteiller
wrote:
> Hi,
>
> If some mx devices are found, the logic is not only to use the mx BTL but
> also to use the mx MTL. You can try to disable this with --mca mtl ob1.
>
> Aure
exited on signal 11 (Segmentation fault).
--
<>
Can anybody shed some light here? It looks like ompi is trying to open
the MX device no matter what. This is on a fresh build of Open MPI 1.6
with "--with-mx --with-openib" options. We didn't have such an issue
with the old GM BTL.
Thanks,
Yong Qin
Intel. But in the mean time, a workaround (revision 6839) has
been submitted to the trunk. The workaround is actually fairly simple,
you just need to switch the order of "use parser_m" and "use mpi_m" in
states.F90.
Thanks,
Yong Qin
> Message: 4
> Date: Mon, 16 Aug
20 matches
Mail list logo