Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-28 Thread Mark Adams via petsc-dev
On Sat, Sep 28, 2019 at 12:55 AM Karl Rupp wrote: > Hi Mark, > > > OK, so now the problem has shifted somewhat in that it now manifests > > itself on small cases. It is somewhat random and anecdotal but it does happen on the smaller test problem now. When I try to narrow down when the problem m

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-28 Thread Mark Adams via petsc-dev
The logic is basically correct because I simple zero out yy vector (the output vector) and it runs great now. The numerics look fine without CPU pinning. AND, it worked with 1,2, and 3 GPUs (one node, one socket), but failed with 4 GPU's which uses the second socket. Strange. On Sat, Sep 28, 2019

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-28 Thread Stefano Zampini via petsc-dev
Mark, MatMultTransposeAdd_SeqAIJCUSPARSE checks if the matrix is in compressed row storage, MatMultTranspose_SeqAIJCUSPARSE does not. Probably is this the issue? The CUSPARSE classes are kind of messy Il giorno sab 28 set 2019 alle ore 07:55 Karl Rupp via petsc-dev < petsc-dev@mcs.anl.gov> ha

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-27 Thread Karl Rupp via petsc-dev
Hi Mark, OK, so now the problem has shifted somewhat in that it now manifests itself on small cases. In earlier investigation I was drawn to MatTranspose but had a hard time pinning it down. The bug seems more stable now or you probably fixed what looks like all the other bugs. I added print

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-26 Thread Balay, Satish via petsc-dev
Mark, The branch karlrupp/fix-cuda-streams is already merged to master. [and the branch is now deleted] I guess - if you wish to compare the difference this feature makes - you can compare with master snapshot before this merge. i.e compare master (includes karlrupp/fix-cuda-streams feature) and

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-26 Thread Mark Adams via petsc-dev
Karl, I have it running but I am not seeing any difference from master. I wonder if I have the right version: Using Petsc Development GIT revision: v3.11.3-2207-ga8e311a I could not find karlrupp/fix-cuda-streams on the gitlab page to check your last commit SHA1 (???), and now I get: 08:37 karlr

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
> > If jsrun is not functional from configure, alternatives are > --with-mpiexec=/bin/true or --with-batch=1 > > --with-mpiexec=/bin/true seems to be working. Thanks, Mark > Satish >

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote: > On Wed, Sep 25, 2019 at 8:40 PM Balay, Satish wrote: > > > > Unable to run jsrun -g 1 with option "-n 1" > > > Error: It is only possible to use js commands within a job allocation > > > unless CSM is running > > > > > > Nope this is a diff

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
On Wed, Sep 25, 2019 at 8:40 PM Balay, Satish wrote: > > Unable to run jsrun -g 1 with option "-n 1" > > Error: It is only possible to use js commands within a job allocation > > unless CSM is running > > > Nope this is a different error message. > > The message suggests - you can't run 'jsrun -

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
This log is from the wrong build. It says: Defined "VERSION_GIT" to ""v3.11.3-2242-gb5e99a5"" i.e its not with commit cb53a04 Satish On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote: > Here is the log. > > On Wed, Sep 25, 2019 at 8:34 PM Mark Adams wrote: > > > > > > > On Wed

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
> Unable to run jsrun -g 1 with option "-n 1" > Error: It is only possible to use js commands within a job allocation > unless CSM is running Nope this is a different error message. The message suggests - you can't run 'jsrun -g 1 -n 1 binary' Can you try this manually and see what you get? j

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
On Wed, Sep 25, 2019 at 6:23 PM Balay, Satish wrote: > > 18:16 (cb53a04...) ~/petsc-karl$ > > So this is the commit I recommended you test against - and that's what > you have got now. Please go ahead and test. > > I sent the log for this. This is the output: 18:16 (cb53a04...) ~/petsc-karl$ ../

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
> 18:16 (cb53a04...) ~/petsc-karl$ So this is the commit I recommended you test against - and that's what you have got now. Please go ahead and test. [note: the branch is rebased - so 'git pull' won't work -(as you can see from the "(forced update)" message - and '<>' status from git prompt on ba

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
I will test this now but 17:52 balay/fix-mpiexec-shell-escape= ~/petsc-karl$ git fetch remote: Enumerating objects: 119, done. remote: Counting objects: 100% (119/119), done. remote: Compressing objects: 100% (91/91), done. remote: Total 119 (delta 49), reused 74 (delta 28) Receiving objects:

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
Defined "VERSION_GIT" to ""v3.11.3-2242-gb5e99a5"" This is not the latest state - It should be: commit cb53a042369fb946804f53931a88b58e10588da1 (HEAD -> balay/fix-mpiexec-shell-escape, origin/balay/fix-mpiexec-shell-escape) Try: git fetch git checkout origin/balay/fix-mpiexec-shell

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote: > I did test this and sent the log (error). Mark, I made more changes - can you retry again - and resend log. Satish

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
> Yes, it's supported, but it's a little different than what "-n" usually > does in mpiexec, where it means the number of processes. For 'jsrun', it > means the number of resource sets, which is multiplied by the "tasks per > resource set" specified by "-a" to get the MPI process count. I think if

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
I did test this and sent the log (error). On Wed, Sep 25, 2019 at 2:58 PM Balay, Satish wrote: > I made changes and asked to retest with the latest changes. > > Satish > > On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote: > > > Oh, and I tested the branch and it didn't work. file was attached

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
I made changes and asked to retest with the latest changes. Satish On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote: > Oh, and I tested the branch and it didn't work. file was attached. > > On Wed, Sep 25, 2019 at 2:38 PM Mark Adams wrote: > > > > > > > On Wed, Sep 25, 2019 at 2:23 PM Bala

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
Oh, and I tested the branch and it didn't work. file was attached. On Wed, Sep 25, 2019 at 2:38 PM Mark Adams wrote: > > > On Wed, Sep 25, 2019 at 2:23 PM Balay, Satish wrote: > >> On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote: >> >> > On Wed, Sep 25, 2019 at 12:44 PM Balay, Satish >> wr

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mills, Richard Tran via petsc-dev
On 9/25/19 11:38 AM, Mark Adams via petsc-dev wrote: [...] > jsrun does take -n. It just has other args. I am trying to check if it > requires other args. I thought it did but let me check. https://www.olcf.ornl.gov/for-users/system-user-guides/summitdev-quickstart-guide/ -n --nrs Number o

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote: > On Wed, Sep 25, 2019 at 12:44 PM Balay, Satish wrote: > > > Can you retry with updated balay/fix-mpiexec-shell-escape branch? > > > > > > current mpiexec interface/code in petsc is messy. > > > > Its primarily needed for the test suite. But

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
On Wed, Sep 25, 2019 at 12:44 PM Balay, Satish wrote: > Can you retry with updated balay/fix-mpiexec-shell-escape branch? > > > current mpiexec interface/code in petsc is messy. > > Its primarily needed for the test suite. But then - you can't easily > run the test suite on machines like summit.

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
Can you retry with updated balay/fix-mpiexec-shell-escape branch? current mpiexec interface/code in petsc is messy. Its primarily needed for the test suite. But then - you can't easily run the test suite on machines like summit. Also - it assumes mpiexec provided supports '-n 1'. However if one

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
Let me know if you still want me to test this fix. On Wed, Sep 25, 2019 at 10:01 AM Balay, Satish wrote: > Mark, > > Can you try the fix in branch balay/fix-mpiexec-shell-escape and see if it > works? > > Satish > > On Wed, 25 Sep 2019, Balay, Satish via petsc-dev wrote: > > > Mark, > > > > Can

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
On Wed, Sep 25, 2019 at 8:51 AM Karl Rupp wrote: > > > I double checked that a clean build of your (master) branch has this > > error by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include > > stuff from Barry that is not yet in master, works. > > so did master work recently (i.e. rig

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
Mark, Can you try the fix in branch balay/fix-mpiexec-shell-escape and see if it works? Satish On Wed, 25 Sep 2019, Balay, Satish via petsc-dev wrote: > Mark, > > Can you send configure.log from mark/fix-cuda-with-gamg-pintocpu branch? > > Satish > > On Wed, 25 Sep 2019, Mark Adams via pets

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Balay, Satish via petsc-dev
Mark, Can you send configure.log from mark/fix-cuda-with-gamg-pintocpu branch? Satish On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote: > I double checked that a clean build of your (master) branch has this error > by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include stuff > fr

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Karl Rupp via petsc-dev
I double checked that a clean build of your (master) branch has this error by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include stuff from Barry that is not yet in master, works. so did master work recently (i.e. right before my branch got merged)? Best regards, Karli On

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Mark Adams via petsc-dev
I double checked that a clean build of your (master) branch has this error by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include stuff from Barry that is not yet in master, works. On Wed, Sep 25, 2019 at 5:26 AM Karl Rupp via petsc-dev < petsc-dev@mcs.anl.gov> wrote: > > > On 9/25/19

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Karl Rupp via petsc-dev
On 9/25/19 11:12 AM, Mark Adams via petsc-dev wrote: I am using karlrupp/fix-cuda-streams, merged with master, and I get this error: Could not execute "['jsrun -g\\ 1 -c\\ 1 -a\\ 1 --oversubscribe -n 1 printenv']": Error, invalid argument:  1 My branch mark/fix-cuda-with-gamg-pintocpu see