Hi All,
I´m doing some probes in a multi core (8 cores per node) machine with
NAS benchmarks. Something that I consider strange is occurring...
I´m using only one NIC and paffinity:
./bin/mpirun
-n 8
--hostfile ./hostfile
--mca mpi_paffinity_alone 1
--mca btl_tcp_if_include eth1
--loadbalance
./codes/nas/NPB3.3/NPB3.3-MPI/bin/lu.C.8
I have sufficient memory to run this application in only one node, but:
1) If I use one node (8 cores) the "user" % is around 100% per core. The
execution time is around 430 seconds.
2) If I use 2 nodes (4 cores in each node) the "user" % is around 95%
per core and the "sys" % is 5%. The execution time is around 220 seconds.
3) If I use 4 nodes (1 cores in each node) the "user" % is around %85
per core and the "sys" % is 15%. The execution time is around 200 seconds.
Well... the questions are:
A) The execution time in case "1" should be smaller (only sm
communication, no?) than case "2" and "3", no? Cache problems?
B) Why the "sys" time while using communication inter nodes? NIC driver?
Why this time increase when I balance the load across the nodes?
Thanks,
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478