Hi again, I think the problem is solved. Thanks to Gus, I've tried mpirun -mca mpi_paffinity_alone 1 while running the program, and I've made a quick search on that, it assures that every program works on a specific core I guess. (correct me if I'm wrong). I've ran over 20 tests, and now it works fine.
Thanks a lot, Saygin On Thu, Aug 12, 2010 at 11:39 AM, Saygin Arkan <saygen...@gmail.com> wrote: > Hi Gus, > > 1 - first of all, turning off hyper-threading is not an option. And it > gives pretty good results if I can find a way to arrange the cores. > > 2 - Actually Eugene (one of her messages in this thread) had suggested to > arrange the slots. > I did and wrote the results, it delivers the cores randomly, nothing > changed. > but I haven't checked loadbalance option. -byslot or -bynode is not gonna > help. > > 3 - Could you give me a bit more detail how affinity works? or what it does > actually? > > Thanks a lot for your suggestions > > Saygin > > > On Wed, Aug 11, 2010 at 6:18 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > >> Hi Saygin >> >> You could: >> >> 1) turn off hyperthreading (on BIOS), or >> >> 2) use the mpirun options (you didn't send your mpirun command) >> to distribute the processes across the nodes, cores, etc. >> "man mpirun" is a good resource, see the explanations about >> the -byslot, -bynode, -loadbalance options. >> >> 3) In addition, you can use the mca parameters to set processor affinity >> in the mpirun command line "mpirun -mca mpi_paffinity_alone 1 ..." >> I don't know how this will play in a hyperthreaded machine, >> but it works fine in our dual processor quad-core computers >> (not hyperthreaded). >> >> Depending on your code, hyperthreading may not help performance anyway. >> >> I hope this helps, >> Gus Correa >> >> Saygin Arkan wrote: >> >>> Hello, >>> >>> I'm running mpi jobs in non-homogeneous cluster. 4 of my machines have >>> the following properties, os221, os222, os223, os224: >>> >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 23 >>> model name : Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz >>> stepping : 7 >>> cache size : 3072 KB >>> physical id : 0 >>> siblings : 4 >>> core id : 3 >>> cpu cores : 4 >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 10 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge >>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall >>> nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx >>> est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm >>> bogomips : 4999.40 >>> clflush size : 64 >>> cache_alignment : 64 >>> address sizes : 36 bits physical, 48 bits virtual >>> >>> and the problematic, hyper-threaded 2 machines are as follows, os228 and >>> os229: >>> >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 26 >>> model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz >>> stepping : 5 >>> cache size : 8192 KB >>> physical id : 0 >>> siblings : 8 >>> core id : 3 >>> cpu cores : 4 >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 11 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge >>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall >>> nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl >>> vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm ida >>> bogomips : 5396.88 >>> clflush size : 64 >>> cache_alignment : 64 >>> address sizes : 36 bits physical, 48 bits virtual >>> >>> >>> The problem is: those 2 machines seem to be having 8 cores (virtually, >>> actualy core number is 4). >>> When I submit an MPI job, I calculated the comparison times in the >>> cluster. I got strange results. >>> >>> I'm running the job on 6 nodes, 3 core per node. And sometimes ( I can >>> say 1/3 of the tests) os228 or os229 returns strange results. 2 cores are >>> slow (slower than the first 4 nodes) but the 3rd core is extremely fast. >>> >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - RANK(0) Printing >>> Times... >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(1) >>> :38 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(2) >>> :38 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(3) >>> :38 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(4) >>> :37 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(5) >>> :34 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(6) >>> :38 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(7) >>> :39 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(8) >>> :37 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(9) >>> :38 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(10) >>> :*48 sec* >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(11) >>> :35 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(12) >>> :38 sec >>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(13) >>> :37 sec >>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os222 RANK(14) >>> :37 sec >>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os224 RANK(15) >>> :38 sec >>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os228 RANK(16) >>> :*43 sec* >>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os229 RANK(17) >>> :35 sec >>> TOTAL CORRELATION TIME: 48 sec >>> >>> >>> or another test: >>> >>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - RANK(0) Printing >>> Times... >>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(1) >>> :170 sec >>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os222 RANK(2) >>> :161 sec >>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os224 RANK(3) >>> :158 sec >>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os228 RANK(4) >>> :142 sec >>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os229 RANK(5) >>> :*256 sec* >>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os223 RANK(6) >>> :156 sec >>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(7) >>> :162 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(8) >>> :159 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(9) >>> :168 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(10) >>> :141 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(11) >>> :136 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os223 RANK(12) >>> :173 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os221 RANK(13) >>> :164 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(14) >>> :171 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(15) >>> :156 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(16) >>> :136 sec >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(17) >>> :*250 sec* >>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - TOTAL CORRELATION >>> TIME: 256 sec >>> >>> >>> Do you have any idea? Why it is happening? >>> I assume that it gives 2 jobs to 2 cores in os229, but actually those 2 >>> are one core. >>> Do you have any idea? If you have, how can I fix it? because the longest >>> time affects the whole time information. 100 sec delay is too much for 250 >>> sec comparison time, >>> and it might have finish around 160 sec. >>> >>> >>> >>> -- >>> Saygin >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Saygin > > > -- Saygin