Re: [petsc-users] Tough to reproduce petsctablefind error

2020-11-03 Thread Barry Smith
Everyone, Previously we checked the bounds range for the debug version of the code but not the optimized version. Based on Mark's experience I felt that the tiny hit on performance on checking was worth it all the time and our intention is now to always check these bounds. Barry >

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-11-03 Thread Matthew Knepley
On Tue, Nov 3, 2020 at 8:23 AM Mark McClure wrote: > Hi, all. > > I am emailing to close the loop on this. > > There were two things combining to cause our issue. > > 1. At some point, several years ago, I had set > PetscPushErrorHandler(PetscAbortErrorHandler, NULL), and then forgotten > about i

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-11-03 Thread Mark McClure
Sorry, meant to say " are *not* within the range from 0 to size of matrix" On Tue, Nov 3, 2020 at 8:22 AM Mark McClure wrote: > Hi, all. > > I am emailing to close the loop on this. > > There were two things combining to cause our issue. > > 1. At some point, several years ago, I had set > Petsc

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-11-03 Thread Mark McClure
Hi, all. I am emailing to close the loop on this. There were two things combining to cause our issue. 1. At some point, several years ago, I had set PetscPushErrorHandler(PetscAbortErrorHandler, NULL), and then forgotten about it. This caused the program to terminate when an error was encountere

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-26 Thread Mark McClure
Ok, I think we've made some progress. We were already calling the function like this: ierr = PetscCall(); if (ierr != 0) {do something to handle error}. We actually are doing that on every single call made to Petsc, just to be careful. This is what was confusing to me. Why was the program terminat

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-26 Thread Barry Smith
> On Sep 26, 2020, at 5:58 PM, Junchao Zhang wrote: > > > > On Sat, Sep 26, 2020 at 5:44 PM Mark Adams > wrote: > > > On Sat, Sep 26, 2020 at 1:07 PM Matthew Knepley > wrote: > On Sat, Sep 26, 2020 at 11:17 AM Mark McClure

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-26 Thread Junchao Zhang
On Sat, Sep 26, 2020 at 5:44 PM Mark Adams wrote: > > > On Sat, Sep 26, 2020 at 1:07 PM Matthew Knepley wrote: > >> On Sat, Sep 26, 2020 at 11:17 AM Mark McClure wrote: >> >>> Thank you, all for the explanations. >>> >>> Following Matt's suggestion, we'll use -g (and not use >>> -with-debugging

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-26 Thread Mark Adams
On Sat, Sep 26, 2020 at 1:07 PM Matthew Knepley wrote: > On Sat, Sep 26, 2020 at 11:17 AM Mark McClure wrote: > >> Thank you, all for the explanations. >> >> Following Matt's suggestion, we'll use -g (and not use -with-debugging=0) >> all future compiles to all users, so in future, we can provid

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-26 Thread Matthew Knepley
On Sat, Sep 26, 2020 at 11:17 AM Mark McClure wrote: > Thank you, all for the explanations. > > Following Matt's suggestion, we'll use -g (and not use -with-debugging=0) > all future compiles to all users, so in future, we can provide better > information. > > Second, Chris is going to boil our f

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-26 Thread Mark McClure
Thank you, all for the explanations. Following Matt's suggestion, we'll use -g (and not use -with-debugging=0) all future compiles to all users, so in future, we can provide better information. Second, Chris is going to boil our function down to minimum stub and share in case there is some subtle

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-25 Thread Barry Smith
I May 2019 Lisandro changed the versions of Metis and ParMetis that PETSc uses to use a portable machine independent random number generator so if you are having PETSc install Metis then its random number generator should generate the exact same random numbers on repeated identical runs on t

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-25 Thread Matthew Knepley
On Fri, Sep 25, 2020 at 1:29 PM Mark McClure wrote: > Hello, > > I work with Chris, and have a few comments, hopefully helpful. Thank you > all, for your help. > > Our program is unfortunately behaving a little bit nondeterministically. I > am not sure why because for the OpenMP parts, I test it

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-25 Thread Mark McClure
Hello, I work with Chris, and have a few comments, hopefully helpful. Thank you all, for your help. Our program is unfortunately behaving a little bit nondeterministically. I am not sure why because for the OpenMP parts, I test it for race conditions using Intel Inspector and see none. We are usi

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Mark Adams
You might add code here like: if (ierr) { for (; iB->rmap->n; i++) { for ( jilen[i]; j++) { PetscInt data,gid1 = aj[B->i[i] + j] + 1; // don't use ierr print rank, gid1; } CHKERRQ(ierr); I am guessing that somehow you have a table that is bad and too small. It failed on

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Barry Smith
Oh, sorry, my mistake. > On Sep 24, 2020, at 3:31 PM, Junchao Zhang wrote: > > That error stack was from Fande Kong. We should wait for Chris's. > > --Junchao Zhang > > > On Thu, Sep 24, 2020 at 2:42 PM Barry Smith > wrote: > > The stack is listed below. It cr

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Junchao Zhang
That error stack was from Fande Kong. We should wait for Chris's. --Junchao Zhang On Thu, Sep 24, 2020 at 2:42 PM Barry Smith wrote: > > The stack is listed below. It crashes inside MatPtAP(). > > It is possible there is some subtle bug in the rather complex PETSc code > for MatPtAP() but

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Barry Smith
> On Sep 24, 2020, at 2:47 PM, Matthew Knepley wrote: > > On Thu, Sep 24, 2020 at 3:42 PM Barry Smith > wrote: > > The stack is listed below. It crashes inside MatPtAP(). > > What about just checking that the column indices that PtAP receives are > valid? Are we

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Matthew Knepley
On Thu, Sep 24, 2020 at 3:42 PM Barry Smith wrote: > > The stack is listed below. It crashes inside MatPtAP(). > What about just checking that the column indices that PtAP receives are valid? Are we not doing that? Matt > It is possible there is some subtle bug in the rather complex PE

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Barry Smith
The stack is listed below. It crashes inside MatPtAP(). It is possible there is some subtle bug in the rather complex PETSc code for MatPtAP() but I am included to blame MPI again. I think we should add some simple low-overhead always on communication error detecting code to PetscSF wh

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Matthew Knepley
On Thu, Sep 24, 2020 at 1:22 PM Chris Hewson wrote: > Hi Guys, > > Thanks for all of the prompt responses, very helpful and appreciated. > > By "when debugging", did you mean when configure petsc --with-debugging=1 > COPTFLAGS=-O0 -g etc or when you attached a debugger? > - Both, I have run with

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Chris Hewson
Hi Guys, Thanks for all of the prompt responses, very helpful and appreciated. By "when debugging", did you mean when configure petsc --with-debugging=1 COPTFLAGS=-O0 -g etc or when you attached a debugger? - Both, I have run with a debugger attached and detached, all compiled with the following

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Barry Smith
Chris, We realize how frustrating this type of problem is to deal with. Here is the code: ierr = PetscTableCreate(aij->B->rmap->n,mat->cmap->N+1,&gid1_lid1);CHKERRQ(ierr); for (i=0; iB->rmap->n; i++) { for (j=0; jilen[i]; j++) { PetscInt data,gid1 = aj[B->i[i] +

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Matthew Knepley
On Thu, Sep 24, 2020 at 9:47 AM Chris Hewson wrote: > After about a month of not having this issue pop up, it has come up again > > We have been struggling with a similar PETSc Error for awhile now, the > error message is as follows: > > [7]PETSC ERROR: PetscTableFind() line 132 in > /home/chewso

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Junchao Zhang
On Thu, Sep 24, 2020 at 8:47 AM Chris Hewson wrote: > After about a month of not having this issue pop up, it has come up again > > We have been struggling with a similar PETSc Error for awhile now, the > error message is as follows: > > [7]PETSC ERROR: PetscTableFind() line 132 in > /home/chewso

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-09-24 Thread Chris Hewson
After about a month of not having this issue pop up, it has come up again We have been struggling with a similar PETSc Error for awhile now, the error message is as follows: [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.3/include/petscctable.h key 24443 is greater than lar

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-08-13 Thread Junchao Zhang
That is a great idea. I'll figure it out. --Junchao Zhang On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: > > Junchao, > > Any way in the PETSc configure to warn that MPICH version is "bad" or > "untrustworthy" or even the vague "we have suspicians about this version > and recommend av

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-08-13 Thread Barry Smith
Junchao, Any way in the PETSc configure to warn that MPICH version is "bad" or "untrustworthy" or even the vague "we have suspicians about this version and recommend avoiding it"? A lot of time could be saved if others don't deal with the same problem. Maybe add arrays of suspect_ve

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-08-13 Thread Junchao Zhang
Thanks for the update. Let's assume it is a bug in MPI :) --Junchao Zhang On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson wrote: > Just as an update to this, I can confirm that using the mpich version > (3.3.2) downloaded with the petsc download solved this issue on my end. > > *Chris Hewson* > S

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-08-13 Thread Chris Hewson
Just as an update to this, I can confirm that using the mpich version (3.3.2) downloaded with the petsc download solved this issue on my end. *Chris Hewson* Senior Reservoir Simulation Engineer ResFrac +1.587.575.9792 On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang wrote: > On Mon, Jul 20, 2020

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-23 Thread Junchao Zhang
On Mon, Jul 20, 2020 at 7:05 AM Barry Smith wrote: > > Is there a comprehensive MPI test suite (perhaps from MPICH)? Is > there any way to run this full test suite under the problematic MPI and see > if it detects any problems? > > Is so, could someone add it to the FAQ in the debugging

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Chris Hewson
Do not use mpich v3.3a2, which is an alpha version released in 2016. Use current stable release mpich-3.3.2 - Thanks Junchao, that makes sense also with Fande's observations. I will give this a try and see *Chris Hewson* Senior Reservoir Simulation Engineer ResFrac +1.587.575.9792 On Mon, Jul 2

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Fande Kong
On Mon, Jul 20, 2020 at 1:14 PM Mark Adams wrote: > This is indeed a nasty bug, but having two separate should be useful. > > Chris is using Haswell, what MPI are you using? I trust you are not using > Moose. > > Fande what machine/MPI are you using? > #define PETSC_MPICC_SHOW "/apps/local/spack

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Junchao Zhang
On Mon, Jul 20, 2020 at 2:26 PM Chris Hewson wrote: > Chris is using Haswell, what MPI are you using? I trust you are not using > Moose. > - yes, using haswell, mpi is mpich v3.3a2 on ubuntu 18.04. I am not using > MOOSE. > Do not use mpich v3.3a2, which is an alpha version released in 2016. Use

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Chris Hewson
Chris is using Haswell, what MPI are you using? I trust you are not using Moose. - yes, using haswell, mpi is mpich v3.3a2 on ubuntu 18.04. I am not using MOOSE. *Chris Hewson* Senior Reservoir Simulation Engineer ResFrac +1.587.575.9792 On Mon, Jul 20, 2020 at 1:14 PM Mark Adams wrote: > This

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Mark Adams
This is indeed a nasty bug, but having two separate should be useful. Chris is using Haswell, what MPI are you using? I trust you are not using Moose. Fande what machine/MPI are you using? On Mon, Jul 20, 2020 at 3:04 PM Chris Hewson wrote: > Hi Mark, > > Chris: It sounds like you just have on

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Chris Hewson
Hi Mark, Chris: It sounds like you just have one matrix that you give to MUMPS. You seem to be creating a matrix in the middle of your run. Are you doing dynamic adaptivity? - I have 2 separate matrices I give to mumps, but as this is happening in the production build of my code, I can't determine

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Mark Adams
On Mon, Jul 20, 2020 at 2:36 PM Fande Kong wrote: > Hi Mark, > > Just to be clear, I do not think it is related to GAMG or PtAP. It is a > communication issue: > Youe stack trace was from PtAP, but Chris's problem is not. > > Reran the same code, and I just got : > > [252]PETSC ERROR:

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Fande Kong
The most frustrating part is that the issue is not reproducible. Fande, On Mon, Jul 20, 2020 at 12:36 PM Fande Kong wrote: > Hi Mark, > > Just to be clear, I do not think it is related to GAMG or PtAP. It is a > communication issue: > > Reran the same code, and I just got : > > [252]PETSC ERROR

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Fande Kong
Hi Mark, Just to be clear, I do not think it is related to GAMG or PtAP. It is a communication issue: Reran the same code, and I just got : [252]PETSC ERROR: - Error Message -- [252]PETSC ERROR: Petsc has generated i

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Mark Adams
OK, so this is happening in MatProductNumeric_PtAP. This must be in constructing the coarse grid. GAMG sort of wants to coarse at a rate of 30:1 but that needs to be verified. With that your index is at about the size of the first coarse grid. I'm trying to figure out if the index is valid. But th

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Fande Kong
Hi Mark, Thanks for your reply. On Mon, Jul 20, 2020 at 7:13 AM Mark Adams wrote: > Fande, > do you know if your 45226154 was out of range in the real matrix? > I do not know since it was in building the AMG hierarchy. The size of the original system is 1,428,284,880 > What size integers d

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Mark Adams
Fande, do you know if your 45226154 was out of range in the real matrix? What size integers do you use? Thanks, Mark On Mon, Jul 20, 2020 at 1:17 AM Fande Kong wrote: > Trace could look like this: > > [640]PETSC ERROR: - Error Message > --

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-20 Thread Barry Smith
Is there a comprehensive MPI test suite (perhaps from MPICH)? Is there any way to run this full test suite under the problematic MPI and see if it detects any problems? Is so, could someone add it to the FAQ in the debugging section? Thanks Barry > On Jul 20, 2020, at 1

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-19 Thread Fande Kong
Trace could look like this: [640]PETSC ERROR: - Error Message -- [640]PETSC ERROR: Argument out of range [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521 [640]PETSC ERROR: See https://www.mc

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-19 Thread Fande Kong
I am not entirely sure what is happening, but we encountered similar issues recently. It was not reproducible. It might occur at different stages, and errors could be weird other than "ctable stuff." Our code was Valgrind clean since every PR in moose needs to go through rigorous Valgrind checks b

Re: [petsc-users] Tough to reproduce petsctablefind error

2020-07-19 Thread Jed Brown
It'll be hard to narrow down without a stack trace or reproducer. I'd recommend running a small test to ensure that your code is Valgrind clean before going any further. Then try to get a stack trace by running inside a debugger or setting your environment to dump core on errors. Chris Hewson w

[petsc-users] Tough to reproduce petsctablefind error

2020-07-19 Thread Chris Hewson
Hi, I am having a bug that is occurring in PETSC with the return string: [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than largest key allowed 5693 This is using petsc-3.13.2, compiled and running using mpich with -O3 and debug