Re: [OMPI users] Mellanox EDR performance

2017-03-15 Thread Yong Qin
is to my attention, https://github.com/open-mpi/ompi/issues/3003 Yes this is exactly the issue! So just in case anybody run into the same issue again ... On Tue, Mar 7, 2017 at 12:59 PM, Yong Qin wrote: > OK, did some testing with MVAPICH and everything is normal so this is > clearly wi

Re: [OMPI users] Mellanox EDR performance

2017-03-07 Thread Yong Qin
OK, did some testing with MVAPICH and everything is normal so this is clearly with OMPI. Is there anything that I should try? Thanks, Yong Qin On Mon, Mar 6, 2017 at 11:46 AM, Yong Qin wrote: > Hi, > > I'm wondering if anybody who has done perf testing on Mellanox EDR with &

[OMPI users] Mellanox EDR performance

2017-03-06 Thread Yong Qin
~10.6 GB/s, latency ~ 1.0 us. So I'm wondering if I'm missing anything in the OMPI setup that causes such a huge delta? OMPI command was simplely: mpirun -np 2 -H host1,host2 -mca btl openib,sm,self osu_bw Thanks, Yong Qin ___ users mai

Re: [OMPI users] ssh: Could not resolve hostname xxxx: Name or service not known (v1.8+)

2015-08-26 Thread Yong Qin
ko0 echo ok > if that is the case, then I have no idea why we are doing this ... > > Cheers, > > Gilles > > On Thursday, August 27, 2015, Yong Qin wrote: > >> > regardless of number of nodes >> >> No, this is not true. I was referring to this specific test, which wa

Re: [OMPI users] ssh: Could not resolve hostname xxxx: Name or service not known (v1.8+)

2015-08-26 Thread Yong Qin
ng as the DN was > the same, regardless of number of nodes. It only failed when the DN’s of > the nodes differed. > > > On Aug 25, 2015, at 3:31 PM, Yong Qin wrote: > > Of course! I blame that two-node test distracted me from checking all the > FQDN relevant parameters. :) &g

Re: [OMPI users] ssh: Could not resolve hostname xxxx: Name or service not known (v1.8+)

2015-08-25 Thread Yong Qin
mes 1 on your mpirun line, or > set the equivalent MCA param > > > > On Aug 25, 2015, at 2:24 PM, Yong Qin wrote: > > > > Hi, > > > > This has been bothering me for a while but I never got a chance to > identify the root cause. I know this issue could be

[OMPI users] ssh: Could not resolve hostname xxxx: Name or service not known (v1.8+)

2015-08-25 Thread Yong Qin
00 ~]$ mpirun -V mpirun (Open MPI) 1.10.0 [yqin@n0009.scs00 ~]$ mpirun -np 2 -H n0189.mako0,n0233.mako0 hostname n0189.mako0 n0233.mako0 The issue only exposes itself when more than 2 nodes are involved. Any thoughts? Thanks, Yong Qin

Re: [OMPI users] SIGSEGV in OMPI 1.6.x

2012-09-06 Thread Yong Qin
code. > > I'd run your code through valgrind or some other memory-checking debugger and > see if that can shed any light on what's going on. > > > On Sep 6, 2012, at 12:06 AM, Yong Qin wrote: > >> Hi, >> >> While debugging a mysterious crash of a code,

[OMPI users] SIGSEGV in OMPI 1.6.x

2012-09-06 Thread Yong Qin
in M_SIESTA_FORCES::siesta_forces (istep=@0xf9a4d070) at siesta_forces.F:90 #8 0x0070e475 in siesta () at siesta.F:23 #9 0x0045e47c in main () Can anybody shed some light here on what could be wrong? Thanks, Yong Qin

Re: [OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time

2012-09-05 Thread Yong Qin
Yes, so far this has only been observed in VASP and a specific dataset. Thanks, On Wed, Sep 5, 2012 at 4:52 AM, Yevgeny Kliteynik wrote: > On 9/4/2012 7:21 PM, Yong Qin wrote: >> On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik >> wrote: >>> On 8/30/2012 10:28 PM, Yon

Re: [OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time

2012-09-04 Thread Yong Qin
On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik wrote: > On 8/30/2012 10:28 PM, Yong Qin wrote: >> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres wrote: >>> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote: >>> >>>> This issue has been observed on OMPI 1.6 an

Re: [OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time

2012-08-30 Thread Yong Qin
On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres wrote: > On Aug 29, 2012, at 2:25 PM, Yong Qin wrote: > >> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but >> not on 1.4.5 (tcp btl is always fine). The application is VASP and >> only one specific data

[OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time

2012-08-29 Thread Yong Qin
s and any advice is appreciated. Yong Qin

Re: [OMPI users] Myricom MX2G Segmentation fault on OMPI 1.6

2012-06-29 Thread Yong Qin
I'm confused now :). I thought that's what I did, removing "mca_btl_mx.la" and "mca_btl_mx.so". This is the MX BTL plugin, right? On Fri, Jun 29, 2012 at 11:16 AM, Jeff Squyres wrote: > On Jun 29, 2012, at 2:14 PM, Yong Qin wrote: > >> Thanks J

Re: [OMPI users] Myricom MX2G Segmentation fault on OMPI 1.6

2012-06-29 Thread Yong Qin
Thanks Jeff for the doc. However I'm not sure if I understand your following comment correctly. If I remove the MX BTL plugins, a.k.a., mca_btl_mx.la and mca_btl_mx.so, I'm now getting errors of these components not found. [n0026.hbar:09467] mca: base: component_find: unable to open .../mca_btl_mx

Re: [OMPI users] Myricom MX2G Segmentation fault on OMPI 1.6

2012-06-28 Thread Yong Qin
l do the select automatically? Also that error message "mx_finalize" doesn't look right either. Thanks, Yong On Fri, Jun 15, 2012 at 6:41 AM, Jeff Squyres wrote: > On Jun 11, 2012, at 7:48 PM, Yong Qin wrote: > >> ah, I guess my original understanding of PML was wrong.

Re: [OMPI users] Myricom MX2G Segmentation fault on OMPI 1.6

2012-06-11 Thread Yong Qin
ferent than the original gm BTL. Thanks, Yong Qin On Mon, Jun 11, 2012 at 3:59 PM, Aurélien Bouteiller wrote: > > Le 11 juin 2012 à 18:57, Aurélien Bouteiller a écrit : > >> Hi, >> >> If some mx devices are found, the logic is not only to use the mx BTL but >> also to

Re: [OMPI users] Myricom MX2G Segmentation fault on OMPI 1.6

2012-06-11 Thread Yong Qin
's going to affect the mx BTL. Thanks, Yong Qin On Mon, Jun 11, 2012 at 3:57 PM, Aurélien Bouteiller wrote: > Hi, > > If some mx devices are found, the logic is not only to use the mx BTL but > also to use the mx MTL. You can try to disable this with --mca mtl ob1. > > Aure

[OMPI users] Myricom MX2G Segmentation fault on OMPI 1.6

2012-06-11 Thread Yong Qin
exited on signal 11 (Segmentation fault). -- <> Can anybody shed some light here? It looks like ompi is trying to open the MX device no matter what. This is on a fresh build of Open MPI 1.6 with "--with-mx --with-openib" options. We didn't have such an issue with the old GM BTL. Thanks, Yong Qin

Re: [OMPI users] Does OpenMPI 1.4.1 support the MPI_IN_PLACE designation ...

2010-08-17 Thread Yong Qin
Intel. But in the mean time, a workaround (revision 6839) has been submitted to the trunk. The workaround is actually fairly simple, you just need to switch the order of "use parser_m" and "use mpi_m" in states.F90. Thanks, Yong Qin > Message: 4 > Date: Mon, 16 Aug