Those messages were used to build MatMult communication pattern for the matrix.
They were not part of the matrix entries-passing you imagined, but indeed
happened in MatAssemblyEnd. If you want to make sure processors do not set
remote entries, you can use MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,
Note that this is a one time cost if the nonzero structure of the matrix
stays the same. It will not happen in future MatAssemblies.
> On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users
> wrote:
>
> Those messages were used to build MatMult communication pattern for the
> matrix.
On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia
mailto:amfog...@gmail.com>> wrote:
Thanks both of you for your answers,
El jue., 20 jun. 2019 a las 22:20, Smith, Barry F.
(mailto:bsm...@mcs.anl.gov>>) escribió:
Note that this is a one time cost if the nonzero structure of the matrix
stays the s
The load balance is definitely out of whack.
BuildTwoSidedF 1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06
0.0e+00 2 4 13 13 0 15 25100100 0 2935476
MatAss
What is the partition like? Suppose you randomly assigned nodes to
processes; then in the typical case, all neighbors would be on different
processors. Then the "diagonal block" would be nearly diagonal and the
off-diagonal block would be huge, requiring communication with many
other processes.
MatAssembly was called once (in stage 5) and cost 2.5% of the total time. Look
at stage 5. It says MatAssemblyBegin calls BuildTwoSidedF, which does global
synchronization. The high max/min ratio means load imbalance. What I do not
understand is MatAssemblyEnd. The ratio is 1.0. It means proces
You could access the VecScatter inside the matrix-multiply and call
VecScatterView() with an ASCII viewer with the format PETSC_VIEWER_ASCII_INFO
(make sure you use this format) and it provides information about how much
communication is being done and how many neighbors are being communicat
Ale,
Did you use Intel KNL nodes? Mr. Hong (cc'ed) did experiments on KNL nodes
one year ago. He used 32768 processors and called MatAssemblyEnd 118 times and
it used only 1.5 seconds in total. So I guess something was wrong with your
test. If you can share your code, I can have a test on o
Hi Ale,
I don't know if this has anything to do with the strange performance you are
seeing, but I notice that some of your Intel MPI settings are inconsistent and
I'm not sure what you are intending. You have specified a value for
I_MPI_PIN_DOMAIN and also a value for I_MPI_PIN_PROCESSOR_LIST.
Ale,
I successfully built your code and submitted a job to the NERSC Cori machine
requiring 32768 KNL cores and one and a half hours. It is estimated to run in 3
days. If you also observed the same problem with less cores, what is your input
arguments? Currently, I use what in your log file,
Ale,
The job got a chance to run but failed with out-of-memory, "Some of your
processes may have been killed by the cgroup out-of-memory handler."
I also tried with 128 core with ./main.x 2 ... and got a weird error message
"The size of the basis has to be at least equal to the number
Junchao,
I'm sorry for the late response.
El mié., 26 jun. 2019 a las 16:39, Zhang, Junchao ()
escribió:
> Ale,
> The job got a chance to run but failed with out-of-memory, "Some of your
> processes may have been killed by the cgroup out-of-memory handler."
>
I mentioned that I used 1024 nodes a
Ran with 64 nodes and 32 ranks/node, met slepc errors and did not know how to
proceed :(
[363]PETSC ERROR: - Error Message
--
[363]PETSC ERROR: Error in external library
[363]PETSC ERROR: Error in LAPACK subroutine
You can try the following:
- Try with a different DS method: -ds_method 1 or -ds_method 2 (see
DSSetMethod)
- Run with -ds_parallel synchronized (see DSSetParallel)
If it does not help, send a reproducible code to slepc-maint
Jose
> El 1 jul 2019, a las 11:10, Ale Foggia via petsc-users
>
Jose & Ale,
-ds_method 2 fixed the problem. I used PETSc master (f1480a5c) and slepc
master(675b89d7) through --download-slepc. I used MKL
/opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/
I got the following results with 2048 processors. MatAssemblyEnd looks
expensive to me. I a
Thank you Richard for your explanation.
I first changed the way I was running the code. In the machine there's
SLURM and I was using "srun -n ./my_program.x". I've
seen more than 60% improvement in execution time by just running with "srun
-n --ntasks-per-core=1 --cpu-bind=cores>
./my_program.x"
16 matches
Mail list logo