I have a very dim recollection of some kernel TCP issues back in some older kernel versions -- such issues affected all TCP communications, not just MPI. Can you try a newer kernel, perchance?
On Mar 30, 2010, at 1:26 PM, <open...@docawk.org> <open...@docawk.org> wrote: > Hello List, > > I hope you can help us out on that one, as we are trying to figure out > since weeks. > > The situation: We have a program being capable of slitting to several > processes to be shared on nodes within a cluster network using openmpi. > We were running that system on "older" cluster hardware (Intel Core2 Duo > based, 2GB RAM) using an "older" kernel (2.6.18.6). All nodes are > diskless network booting. Recently we upgraded the hardware (Intel i5, > 8GB RAM) which also required an upgrade to a recent kernel version > (2.6.26+). > > Here is the problem: We experience overall performance loss on the new > hardware and think, we can break it down to a communication issue > inbetween the processes. > > Also, we found out, the issue araises in the transition from kernel > 2.6.23 to 2.6.24 (tested on the Core2 Duo system). > > Here is an output from our programm: > > 2.6.23.17 (64bit), MPI 1.2.7 > 5 Iterationen (Core2 Duo) 6 CPU: > 93.33 seconds per iteration. > Node 0 communication/computation time: 6.83 / 647.64 seconds. > Node 1 communication/computation time: 10.09 / 644.36 seconds. > Node 2 communication/computation time: 7.27 / 645.03 seconds. > Node 3 communication/computation time: 165.02 / 485.52 seconds. > Node 4 communication/computation time: 6.50 / 643.82 seconds. > Node 5 communication/computation time: 7.80 / 627.63 seconds. > Computation time: 897.00 seconds. > > 2.6.24.7 (64bit) .. re-evaluated, MPI 1.2.7 > 5 Iterationen (Core2 Duo) 6 CPU: > 131.33 seconds per iteration. > Node 0 communication/computation time: 364.15 / 645.24 seconds. > Node 1 communication/computation time: 362.83 / 645.26 seconds. > Node 2 communication/computation time: 349.39 / 645.07 seconds. > Node 3 communication/computation time: 508.34 / 485.53 seconds. > Node 4 communication/computation time: 349.94 / 643.81 seconds. > Node 5 communication/computation time: 349.07 / 627.47 seconds. > Computation time: 1251.00 seconds. > > The program is 32 bit software, but it doesn't make any difference > whether the kernel is 64 or 32 bit. Also the OpenMPI version 1.4.1 was > tested, cut communication times by half (which still is too high), but > improvement decreased with increasing kernel version number. > > The communication time is meant to be the time the master process > distributes the data portions for calculation and collecting the results > from the slave processes. The value also contains times a slave has to > wait to communicate with the master as he is occupied. This explains the > extended communication time of node #3 as the calculation time is > reduced (based on the nature of the data) > > The command to start the calculation: > mpirun -np 2 -host cluster-17 invert-master -b -s -p inv_grav.inp : -np > 4 -host cluster-18,cluster-19 > > Using top (with 'f' and 'j' showing P row) we could track which process > runs on which core. We found processes stayed on its initial core in > kernel 2.6.23, but started to flip around with 2.6.24. Using the > --bind-to-core option in openmpi 1.4.1 kept the processes on its cores > again, but that didn't influence the overall outcome, didn't fix the issue. > > We found top showing ~25% CPU wait time, and processes showing 'D' , > also on slave only nodes. According to our programmer communications are > only between the master process and its slaves, but not among slaves. On > kernel 2.6.23 and lower CPU usage is 100% on user, no wait or system > percentage. > > Example from top: > > Cpu(s): 75.3%us, 0.6%sy, 0.0%ni, 0.0%id, 23.1%wa, 0.7%hi, 0.3%si, > 0.0%st > Mem: 8181236k total, 131224k used, 8050012k free, 0k buffers > Swap: 0k total, 0k used, 0k free, 49868k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND > 3386 oli 20 0 90512 20m 3988 R 74 0.3 12:31.80 0 invert- > 3387 oli 20 0 85072 15m 3780 D 67 0.2 11:59.30 1 invert- > 3388 oli 20 0 85064 14m 3588 D 77 0.2 12:56.90 2 invert- > 3389 oli 20 0 84936 14m 3436 R 85 0.2 13:28.30 3 invert- > > > Some system information that might be helpful: > > Nodes Hardware: > 1. "older": Intel Core2 Duo, (2x1)GB RAM > 2. "newer": Intel(R) Core(TM) i5 CPU, Mainboard ASUS RS100-E6, (4x2)GB RAM > > Debian stable (lenny) distribution with > ii libc6 2.7-18lenny2 > ii libopenmpi1 1.2.7~rc2-2 > ii openmpi-bin 1.2.7~rc2-2 > ii openmpi-common 1.2.7~rc2-2 > > Nodes are booting diskless with nfs-root and a kernel with all drivers > needed compiled in. > > Information on the program using openmpi and tools used to compile it: > > mpirun --version: > mpirun (Open MPI) 1.2.7rc2 > > libopenmpi-dev 1.2.7~rc2-2 > depends on: > libc6 (2.7-18lenny2) > libopenmpi1 (1.2.7~rc2-2) > openmpi-common (1.2.7~rc2-2) > > > Compilation command: > mpif90 > > > FORTRAN compiler (FC): > gfortran --version: > GNU Fortran (Debian 4.3.2-1.1) 4.3.2 > > > Called OpenMPI-functions (FORTRAN Bindings): > mpi_comm-rank > mpi_comm_size > > mpi_bcast > mpi_reduce > > mpi_isend > mpi_wait > > mpi_send > mpi_probe > mpi_recv > > MPI_Wtime > > > Additionally linked libncurses library: > libncurses5-dev (5.7+20081213-1) > On remote nodes no calls are ever made to this library. On local nodes > such calls (coded in C) are only optionally, but usually they are > skipped too (i.e. even no initscr() is called). > > > A signal handler is integrated (coded in C) that reacts specifically on > SIGTERM and SIGUSR1 signals. > > > If you need more information (e.g. kernel config etc.) please ask. > I hope you can provide some ideas to test and resolve the issue. > Thanks anyways. > > Oli > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/