On Sat, Sep 28, 2019 at 12:55 AM Karl Rupp wrote:
> Hi Mark,
>
> > OK, so now the problem has shifted somewhat in that it now manifests
> > itself on small cases.
It is somewhat random and anecdotal but it does happen on the smaller test
problem now. When I try to narrow down when the problem m
The logic is basically correct because I simple zero out yy vector (the
output vector) and it runs great now. The numerics look fine without CPU
pinning.
AND, it worked with 1,2, and 3 GPUs (one node, one socket), but failed with
4 GPU's which uses the second socket. Strange.
On Sat, Sep 28, 2019
Mark,
MatMultTransposeAdd_SeqAIJCUSPARSE checks if the matrix is in compressed
row storage, MatMultTranspose_SeqAIJCUSPARSE does not. Probably is this the
issue? The CUSPARSE classes are kind of messy
Il giorno sab 28 set 2019 alle ore 07:55 Karl Rupp via petsc-dev <
petsc-dev@mcs.anl.gov> ha
Hi Mark,
OK, so now the problem has shifted somewhat in that it now manifests
itself on small cases. In earlier investigation I was drawn to
MatTranspose but had a hard time pinning it down. The bug seems more
stable now or you probably fixed what looks like all the other bugs.
I added print
Mark,
The branch karlrupp/fix-cuda-streams is already merged to master. [and
the branch is now deleted]
I guess - if you wish to compare the difference this feature makes - you
can compare with master snapshot before this merge.
i.e compare master (includes karlrupp/fix-cuda-streams feature) and
Karl, I have it running but I am not seeing any difference from master. I
wonder if I have the right version:
Using Petsc Development GIT revision: v3.11.3-2207-ga8e311a
I could not find karlrupp/fix-cuda-streams on the gitlab page to check your
last commit SHA1 (???), and now I get:
08:37 karlr
>
> If jsrun is not functional from configure, alternatives are
> --with-mpiexec=/bin/true or --with-batch=1
>
>
--with-mpiexec=/bin/true seems to be working.
Thanks,
Mark
> Satish
>
On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> On Wed, Sep 25, 2019 at 8:40 PM Balay, Satish wrote:
>
> > > Unable to run jsrun -g 1 with option "-n 1"
> > > Error: It is only possible to use js commands within a job allocation
> > > unless CSM is running
> >
> >
> > Nope this is a diff
On Wed, Sep 25, 2019 at 8:40 PM Balay, Satish wrote:
> > Unable to run jsrun -g 1 with option "-n 1"
> > Error: It is only possible to use js commands within a job allocation
> > unless CSM is running
>
>
> Nope this is a different error message.
>
> The message suggests - you can't run 'jsrun -
This log is from the wrong build. It says:
Defined "VERSION_GIT" to ""v3.11.3-2242-gb5e99a5""
i.e its not with commit cb53a04
Satish
On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> Here is the log.
>
> On Wed, Sep 25, 2019 at 8:34 PM Mark Adams wrote:
>
> >
> >
> > On Wed
> Unable to run jsrun -g 1 with option "-n 1"
> Error: It is only possible to use js commands within a job allocation
> unless CSM is running
Nope this is a different error message.
The message suggests - you can't run 'jsrun -g 1 -n 1 binary' Can you try this
manually and see
what you get?
j
On Wed, Sep 25, 2019 at 6:23 PM Balay, Satish wrote:
> > 18:16 (cb53a04...) ~/petsc-karl$
>
> So this is the commit I recommended you test against - and that's what
> you have got now. Please go ahead and test.
>
>
I sent the log for this. This is the output:
18:16 (cb53a04...) ~/petsc-karl$ ../
> 18:16 (cb53a04...) ~/petsc-karl$
So this is the commit I recommended you test against - and that's what
you have got now. Please go ahead and test.
[note: the branch is rebased - so 'git pull' won't work -(as you can
see from the "(forced update)" message - and '<>' status from git
prompt on ba
I will test this now but
17:52 balay/fix-mpiexec-shell-escape= ~/petsc-karl$ git fetch
remote: Enumerating objects: 119, done.
remote: Counting objects: 100% (119/119), done.
remote: Compressing objects: 100% (91/91), done.
remote: Total 119 (delta 49), reused 74 (delta 28)
Receiving objects:
Defined "VERSION_GIT" to ""v3.11.3-2242-gb5e99a5""
This is not the latest state - It should be:
commit cb53a042369fb946804f53931a88b58e10588da1 (HEAD ->
balay/fix-mpiexec-shell-escape, origin/balay/fix-mpiexec-shell-escape)
Try:
git fetch
git checkout origin/balay/fix-mpiexec-shell
On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> I did test this and sent the log (error).
Mark,
I made more changes - can you retry again - and resend log.
Satish
> Yes, it's supported, but it's a little different than what "-n" usually
> does in mpiexec, where it means the number of processes. For 'jsrun', it
> means the number of resource sets, which is multiplied by the "tasks per
> resource set" specified by "-a" to get the MPI process count. I think if
I did test this and sent the log (error).
On Wed, Sep 25, 2019 at 2:58 PM Balay, Satish wrote:
> I made changes and asked to retest with the latest changes.
>
> Satish
>
> On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
>
> > Oh, and I tested the branch and it didn't work. file was attached
I made changes and asked to retest with the latest changes.
Satish
On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> Oh, and I tested the branch and it didn't work. file was attached.
>
> On Wed, Sep 25, 2019 at 2:38 PM Mark Adams wrote:
>
> >
> >
> > On Wed, Sep 25, 2019 at 2:23 PM Bala
Oh, and I tested the branch and it didn't work. file was attached.
On Wed, Sep 25, 2019 at 2:38 PM Mark Adams wrote:
>
>
> On Wed, Sep 25, 2019 at 2:23 PM Balay, Satish wrote:
>
>> On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
>>
>> > On Wed, Sep 25, 2019 at 12:44 PM Balay, Satish
>> wr
On 9/25/19 11:38 AM, Mark Adams via petsc-dev wrote:
[...]
> jsrun does take -n. It just has other args. I am trying to check if it
> requires other args. I thought it did but let me check.
https://www.olcf.ornl.gov/for-users/system-user-guides/summitdev-quickstart-guide/
-n --nrs Number o
On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> On Wed, Sep 25, 2019 at 12:44 PM Balay, Satish wrote:
>
> > Can you retry with updated balay/fix-mpiexec-shell-escape branch?
> >
> >
> > current mpiexec interface/code in petsc is messy.
> >
> > Its primarily needed for the test suite. But
On Wed, Sep 25, 2019 at 12:44 PM Balay, Satish wrote:
> Can you retry with updated balay/fix-mpiexec-shell-escape branch?
>
>
> current mpiexec interface/code in petsc is messy.
>
> Its primarily needed for the test suite. But then - you can't easily
> run the test suite on machines like summit.
Can you retry with updated balay/fix-mpiexec-shell-escape branch?
current mpiexec interface/code in petsc is messy.
Its primarily needed for the test suite. But then - you can't easily
run the test suite on machines like summit.
Also - it assumes mpiexec provided supports '-n 1'. However if one
Let me know if you still want me to test this fix.
On Wed, Sep 25, 2019 at 10:01 AM Balay, Satish wrote:
> Mark,
>
> Can you try the fix in branch balay/fix-mpiexec-shell-escape and see if it
> works?
>
> Satish
>
> On Wed, 25 Sep 2019, Balay, Satish via petsc-dev wrote:
>
> > Mark,
> >
> > Can
On Wed, Sep 25, 2019 at 8:51 AM Karl Rupp wrote:
>
> > I double checked that a clean build of your (master) branch has this
> > error by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include
> > stuff from Barry that is not yet in master, works.
>
> so did master work recently (i.e. rig
Mark,
Can you try the fix in branch balay/fix-mpiexec-shell-escape and see if it
works?
Satish
On Wed, 25 Sep 2019, Balay, Satish via petsc-dev wrote:
> Mark,
>
> Can you send configure.log from mark/fix-cuda-with-gamg-pintocpu branch?
>
> Satish
>
> On Wed, 25 Sep 2019, Mark Adams via pets
Mark,
Can you send configure.log from mark/fix-cuda-with-gamg-pintocpu branch?
Satish
On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> I double checked that a clean build of your (master) branch has this error
> by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include stuff
> fr
I double checked that a clean build of your (master) branch has this
error by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include
stuff from Barry that is not yet in master, works.
so did master work recently (i.e. right before my branch got merged)?
Best regards,
Karli
On
I double checked that a clean build of your (master) branch has this error
by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include stuff
from Barry that is not yet in master, works.
On Wed, Sep 25, 2019 at 5:26 AM Karl Rupp via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:
>
>
> On 9/25/19
On 9/25/19 11:12 AM, Mark Adams via petsc-dev wrote:
I am using karlrupp/fix-cuda-streams, merged with master, and I get this
error:
Could not execute "['jsrun -g\\ 1 -c\\ 1 -a\\ 1 --oversubscribe -n 1
printenv']":
Error, invalid argument: 1
My branch mark/fix-cuda-with-gamg-pintocpu see
31 matches
Mail list logo