[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-04 Thread Barry Smith
Dang, I had two copies of thrust and cusp and was using the "wrong" one hence everything was working for me. Ok, I'll try to get txpetscgpu updated Barry On Oct 3, 2011, at 10:18 PM, Satish Balay wrote: > /home/wdn/Projects/Petsc/src/branches/master/petsc-dev/LINUX_GNU_OPTIMI

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-03 Thread Dave Nystrom
So, for the moment, I have backed off using --download-txpetscgpu and can now get a successful build that seems to work and recover my April performance results on the gpu. I am very eager to try the --download-txpetscgpu option though. Thanks, Dave Dave Nystrom writes: > Hi Barry, > > Just

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-03 Thread Satish Balay
>>> /home/wdn/Projects/Petsc/src/branches/master/petsc-dev/LINUX_GNU_OPTIMIZE_SERIAL_CUDA_40_LITE/include/txpetscgpu/include/csr_spmv_part_vector_gpu.h:23:44: error: thrust/detail/device/cuda/arch.h: No such file or directory /usr/bin/ar: aijcusp.o: No such file or directory <<< This file is at .

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-03 Thread Barry Smith
Dave, I have found the cause of the problem you were seeing and have fixed it. It was caused by bad code when --download-txpetscgpu was used. To eliminate the problem 1) upgrade to latest cusp and thrust via mecurial 2) rm -rf externpackages/txpetscgpu* 3) hg pull; hg update 4

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-03 Thread Matthew Knepley
On Sun, Oct 2, 2011 at 4:43 PM, Dave Nystrom wrote: > Dave Nystrom writes: > > In case it might be useful, I have attached two log files of runs with > the > > ex2f petsc example from src/ksp/ksp/examples/tutorials. One was run > back in > > April with petsc-dev linked to Cuda 3.2. It shows e

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-03 Thread Matthew Knepley
On Sun, Oct 2, 2011 at 10:50 PM, Dave Nystrom wrote: > Hi Barry, > > Barry Smith writes: > > Dave, > > > > I cannot explain why it does not use the MatMult_SeqAIJCusp() - it does > for me. > > Do you get good performance running a problem like ex2? > Okay, now the problem is clear. This does

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-03 Thread Dave Nystrom
Matthew Knepley writes: > On Sun, Oct 2, 2011 at 10:50 PM, Dave Nystrom tachyonlogic.com> wrote: > > > Hi Barry, > > > > Barry Smith writes: > > > Dave, > > > > > > I cannot explain why it does not use the MatMult_SeqAIJCusp() - it does > > for me. > > > > Do you get good performan

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Dave Nystrom
Hi Barry, Barry Smith writes: > Dave, > > I cannot explain why it does not use the MatMult_SeqAIJCusp() - it does for > me. Do you get good performance running a problem like ex2? > Have you updated to the latest cusp/thrust? From the mecurial repositories? I did try the latest version o

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Barry Smith
Dave, I cannot explain why it does not use the MatMult_SeqAIJCusp() it does for me. Have you updated to the latest cusp/thrust? From the mecurial repositories There is a difference, in your new 4.0 build you added --download-txpetscgpu=yes BTW: that doesn't work for me with the la

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Barry Smith
On Oct 2, 2011, at 6:39 PM, Dave Nystrom wrote: > Thanks for the update. I don't believe I have gotten a run with good > performance yet, either from C or Fortran. I wish there was an easy way for > me to force use of only one of my gpus. I don't want to have to pull one of > the gpus in order

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Dave Nystrom
Barry Smith writes: > On Oct 2, 2011, at 6:39 PM, Dave Nystrom wrote: > >> Thanks for the update. I don't believe I have gotten a run with good >> performance yet, either from C or Fortran. I wish there was an easy way for >> me to force use of only one of my gpus. I don't want to have to

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Barry Smith
It is not doing the MatMult operation on the GPU and hence needs to move the vectors back and forth for each operation (since MatMult is done on the CPU with the vector while vector operations are done on the GPU) hence the terrible performance. Not sure why yet. It is copying the Mat do

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Dave Nystrom
Thanks for the update. I don't believe I have gotten a run with good performance yet, either from C or Fortran. I wish there was an easy way for me to force use of only one of my gpus. I don't want to have to pull one of the gpus in order to see if that is complicating things with Cuda 4.0. I'l

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Dave Nystrom
Dave Nystrom writes: > In case it might be useful, I have attached two log files of runs with the > ex2f petsc example from src/ksp/ksp/examples/tutorials. One was run back in > April with petsc-dev linked to Cuda 3.2. It shows excellent runtime > performance. The other was run today with pe

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Dave Nystrom
In case it might be useful, I have attached two log files of runs with the ex2f petsc example from src/ksp/ksp/examples/tutorials. One was run back in April with petsc-dev linked to Cuda 3.2. It shows excellent runtime performance. The other was run today with petsc-dev checked out of the mercur

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Dave Nystrom
Matthew Knepley writes: > On Sat, Oct 1, 2011 at 11:26 PM, Dave Nystrom tachyonlogic.com> wrote: > > Barry Smith writes: > > > On Oct 1, 2011, at 9:22 PM, Dave Nystrom wrote: > > > > Hi Barry, > > > > > > > > I've sent a couple more emails on this topic. What I am trying to do > > at

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-02 Thread Matthew Knepley
On Sat, Oct 1, 2011 at 11:26 PM, Dave Nystrom wrote: > Barry Smith writes: > > > > On Oct 1, 2011, at 9:22 PM, Dave Nystrom wrote: > > > > > Hi Barry, > > > > > > I've sent a couple more emails on this topic. What I am trying to do > at the > > > moment is to figure out how to have a prob

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-01 Thread Satish Balay
our testbox has 2gpus balay at bb30:~>lspci |grep -i nvidia 0b:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1) 0c:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1) balay at bb30:~> Is there some test I can run on this? [it has cuda 4.0] satish On Sat, 1 Oc

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-01 Thread Matthew Knepley
This diagnosis is total crap (I think), as I tried to explain. We would never get the same result (or the right result), and partitioning makes no sense. Something else is going on. Can't we run on a 2 GPU system at ANL? Matt On Sat, Oct 1, 2011 at 9:30 PM, Barry Smith wrote: > > On Oct 1, 2

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-01 Thread Barry Smith
On Oct 1, 2011, at 9:22 PM, Dave Nystrom wrote: > Hi Barry, > > I've sent a couple more emails on this topic. What I am trying to do at the > moment is to figure out how to have a problem run on only one gpu if it will > fit in the memory of that gpu. Back in April when I had built petsc-dev w

[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

2011-10-01 Thread Barry Smith
Dave, We have no mechanism in the PETSc code for a PETSc single CPU process to use two GPUs at the same time. However you could have two MPI processes each using their own GPU. The one tricky part is you need to make sure each MPI process uses a different GPU. We currently do not