Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Keith Lindsay
Hi, I use SuperLU_dist, outside of PETSc, and use the parallel symbolic factorization functionality. In my experience it is significantly faster than the serial symbolic factorization. I don't have clean numbers on hand, but my recollection is that going from serial to parallel reduced time spent

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Xiaoye S. Li
Is it possible to download this particular matrix, so I can do standalone investigation? Sherry On Tue, May 22, 2018 at 12:22 PM, Eric Chamberland < eric.chamberl...@giref.ulaval.ca> wrote: > Hi Fande, > > I don't know, I am working and validating with a DEBUG version of PETSc, > and this "mwe"

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Eric Chamberland
Hi Fande, I don't know, I am working and validating with a DEBUG version of PETSc, and this "mwe" is a 30x30 matrix... But I "hope" the parallel version is faster for large problems... if it is not maybe it should be somewhat reviewed... Eric On 22/05/18 02:22 PM, Fande Kong wrote: Hi Er

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Fande Kong
Hi Eric, I am curious if the parallel symbolic factoriation is faster than the sequential version? Do you have timing? Fande, On Tue, May 22, 2018 at 12:18 PM, Eric Chamberland < eric.chamberl...@giref.ulaval.ca> wrote: > > > On 22/05/18 02:03 PM, Smith, Barry F. wrote: > >> >> Hmm, why wo

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Eric Chamberland
On 22/05/18 02:03 PM, Smith, Barry F. wrote: Hmm, why would the resolution with *sequential* symbolic factorisation gives ans err around 1e-6 instead of 1e-16 for parallel one (when it works). ? One would think that doing a "sequential" symbolic factorization won't affect the answ

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Smith, Barry F.
Hmm, why would > the resolution with *sequential* symbolic factorisation gives ans err around > 1e-6 instead of 1e-16 for parallel one (when it works). ? One would think that doing a "sequential" symbolic factorization won't affect the answer to this huge amount? Perhaps this is the pro

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Eric Chamberland
On 22/05/18 12:11 PM, Xiaoye S. Li wrote: > Default setting is to use sequential symbolic factorization, precisely > due to the ParMETIS bugs. Ok, and I saw you reported the bug "a few years ago" and still have not received a fix... I would like to "live with the patch" (ie working in sequent

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Lawrence Mitchell
On 22/05/18 17:11, Xiaoye S. Li wrote: > Numerical factorization is always parallel (based on number of MPI > tasks and OMP_NUM_THREADS you set), the issue here is only related to > symbolic factorization (figuring out the nonzero pattern in the LU > factors). Default setting is to use sequential

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Xiaoye S. Li
Numerical factorization is always parallel (based on number of MPI tasks and OMP_NUM_THREADS you set), the issue here is only related to symbolic factorization (figuring out the nonzero pattern in the LU factors). Default setting is to use sequential symbolic factorization, precisely due to the Par

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Eric Chamberland
And I will add a question: Shouldn't there be an automatic switch to parallele factorisation when num. of process is greater than 1 ? Eric On 22/05/18 11:55 AM, Eric Chamberland wrote: Exactly: this bug shows up when I activate the parallel symbolic factorisation, otherwise I do not have it.

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Eric Chamberland
On 22/05/18 11:45 AM, Xiaoye S. Li wrote: This bug seems to show up when the graph is relatively dense. Can you try to use serial symbolic factorization and Metis? Exactly: this bug shows up when I activate the parallel symbolic factorisation, otherwise I do not have it. Eric

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Xiaoye S. Li
Indeed, I am pretty sure the bug is in ParMETIS. A few years ago, I sent a sample matrix and debug trace to George Karypis, he was going to look at it, but never did. This bug seems to show up when the graph is relatively dense. Can you try to use serial symbolic factorization and Metis? Sherry

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Smith, Barry F.
0x7f96a2148e52 in libmetis__FM_2WayCutRefine (ctrl=0x2784d20, graph=0x2784940, ntpwgts=0x7ffdfa323060, niter=4) at /home/mefpp_ericc/petsc-3.9.2-debug/arch-linux2-c-debug/externalpackages/git.metis/libmetis/fm.c:60 It appears the crash is in metis, not SuperLU_Dist. So either a bug in Me

Re: [petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Hong
Eric: Likely, you encounter a zero pivot. Run your code with '-ksp_error_if_not_converged' would show it. Adding option '-mat_superlu_dist_replacetinypivot' might help. Hong Hi, > > The given matrix+vector is bogus with SuperLU_Dist on some of our nighlty > validation tests since I activated the p

[petsc-users] SuperLU_dist bug with parallel symbolic factorisation

2018-05-22 Thread Eric Chamberland
Hi, The given matrix+vector is bogus with SuperLU_Dist on some of our nighlty validation tests since I activated the parallel symbolic factorisation. (with -mat_superlu_dist_colperm PARMETIS -mat_superlu_dist_parsymbfact 1 ) I extracted an example system and reproduced the bug with src/ksp/