Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-20 Thread Zhang, Junchao via petsc-users
Those messages were used to build MatMult communication pattern for the matrix. They were not part of the matrix entries-passing you imagined, but indeed happened in MatAssemblyEnd. If you want to make sure processors do not set remote entries, you can use MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-20 Thread Smith, Barry F. via petsc-users
Note that this is a one time cost if the nonzero structure of the matrix stays the same. It will not happen in future MatAssemblies. > On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users > wrote: > > Those messages were used to build MatMult communication pattern for the > matrix.

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-21 Thread Zhang, Junchao via petsc-users
On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia mailto:amfog...@gmail.com>> wrote: Thanks both of you for your answers, El jue., 20 jun. 2019 a las 22:20, Smith, Barry F. (mailto:bsm...@mcs.anl.gov>>) escribió: Note that this is a one time cost if the nonzero structure of the matrix stays the s

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-21 Thread Smith, Barry F. via petsc-users
The load balance is definitely out of whack. BuildTwoSidedF 1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06 0.0e+00 2 4 13 13 0 15 25100100 0 2935476 MatAss

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-21 Thread Jed Brown via petsc-users
What is the partition like? Suppose you randomly assigned nodes to processes; then in the typical case, all neighbors would be on different processors. Then the "diagonal block" would be nearly diagonal and the off-diagonal block would be huge, requiring communication with many other processes.

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-21 Thread Zhang, Junchao via petsc-users
MatAssembly was called once (in stage 5) and cost 2.5% of the total time. Look at stage 5. It says MatAssemblyBegin calls BuildTwoSidedF, which does global synchronization. The high max/min ratio means load imbalance. What I do not understand is MatAssemblyEnd. The ratio is 1.0. It means proces

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-21 Thread Smith, Barry F. via petsc-users
You could access the VecScatter inside the matrix-multiply and call VecScatterView() with an ASCII viewer with the format PETSC_VIEWER_ASCII_INFO (make sure you use this format) and it provides information about how much communication is being done and how many neighbors are being communicat

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-21 Thread Zhang, Junchao via petsc-users
Ale, Did you use Intel KNL nodes? Mr. Hong (cc'ed) did experiments on KNL nodes one year ago. He used 32768 processors and called MatAssemblyEnd 118 times and it used only 1.5 seconds in total. So I guess something was wrong with your test. If you can share your code, I can have a test on o

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-24 Thread Mills, Richard Tran via petsc-users
Hi Ale, I don't know if this has anything to do with the strange performance you are seeing, but I notice that some of your Intel MPI settings are inconsistent and I'm not sure what you are intending. You have specified a value for I_MPI_PIN_DOMAIN and also a value for I_MPI_PIN_PROCESSOR_LIST.

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-25 Thread Zhang, Junchao via petsc-users
Ale, I successfully built your code and submitted a job to the NERSC Cori machine requiring 32768 KNL cores and one and a half hours. It is estimated to run in 3 days. If you also observed the same problem with less cores, what is your input arguments? Currently, I use what in your log file,

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-26 Thread Zhang, Junchao via petsc-users
Ale, The job got a chance to run but failed with out-of-memory, "Some of your processes may have been killed by the cgroup out-of-memory handler." I also tried with 128 core with ./main.x 2 ... and got a weird error message "The size of the basis has to be at least equal to the number

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-28 Thread Ale Foggia via petsc-users
Junchao, I'm sorry for the late response. El mié., 26 jun. 2019 a las 16:39, Zhang, Junchao () escribió: > Ale, > The job got a chance to run but failed with out-of-memory, "Some of your > processes may have been killed by the cgroup out-of-memory handler." > I mentioned that I used 1024 nodes a

Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-28 Thread Zhang, Junchao via petsc-users
Ran with 64 nodes and 32 ranks/node, met slepc errors and did not know how to proceed :( [363]PETSC ERROR: - Error Message -- [363]PETSC ERROR: Error in external library [363]PETSC ERROR: Error in LAPACK subroutine

Re: [petsc-users] Communication during MatAssemblyEnd

2019-07-01 Thread Jose E. Roman via petsc-users
You can try the following: - Try with a different DS method: -ds_method 1 or -ds_method 2 (see DSSetMethod) - Run with -ds_parallel synchronized (see DSSetParallel) If it does not help, send a reproducible code to slepc-maint Jose > El 1 jul 2019, a las 11:10, Ale Foggia via petsc-users >

Re: [petsc-users] Communication during MatAssemblyEnd

2019-07-01 Thread Zhang, Junchao via petsc-users
Jose & Ale, -ds_method 2 fixed the problem. I used PETSc master (f1480a5c) and slepc master(675b89d7) through --download-slepc. I used MKL /opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/ I got the following results with 2048 processors. MatAssemblyEnd looks expensive to me. I a

Re: [petsc-users] Communication during MatAssemblyEnd

2019-07-03 Thread Ale Foggia via petsc-users
Thank you Richard for your explanation. I first changed the way I was running the code. In the machine there's SLURM and I was using "srun -n ./my_program.x". I've seen more than 60% improvement in execution time by just running with "srun -n --ntasks-per-core=1 --cpu-bind=cores> ./my_program.x"