Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-13 Thread Smith, Barry F. via petsc-dev
> On Feb 13, 2020, at 5:39 PM, Zhang, Hong wrote: > > > >> On Feb 13, 2020, at 7:39 AM, Smith, Barry F. wrote: >> >> >> How are the two being compiled and linked? The same way, one with the PETSc >> library in the path and the other without? Or does the PETSc one have lots >> of flags an

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-13 Thread Zhang, Hong via petsc-dev
> On Feb 13, 2020, at 7:39 AM, Smith, Barry F. wrote: > > > How are the two being compiled and linked? The same way, one with the PETSc > library in the path and the other without? Or does the PETSc one have lots of > flags and stuff while the non-PETSc one is just simple by hand? PETSc was

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-13 Thread Karl Rupp
Hi Hong, have you tried running the code through gprof and look at the output (e.g. with kcachegrind)? (apologies if this has been suggested already) Best regards, Karli On 2/12/20 7:29 PM, Zhang, Hong via petsc-dev wrote: On Feb 12, 2020, at 5:11 PM, Smith, Barry F. wrote: ldd -o

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-13 Thread Smith, Barry F. via petsc-dev
How are the two being compiled and linked? The same way, one with the PETSc library in the path and the other without? Or does the PETSc one have lots of flags and stuff while the non-PETSc one is just simple by hand? Barry > On Feb 12, 2020, at 7:29 PM, Zhang, Hong wrote: > > > >> O

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-12 Thread Zhang, Hong via petsc-dev
> On Feb 12, 2020, at 5:11 PM, Smith, Barry F. wrote: > > > ldd -o on the petsc program (static) and the non petsc program (static), > what are the differences? There is no difference in the outputs. > > nm -o both executables | grep cudaFree() Non petsc program: [hongzh@login3.summit

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-12 Thread Smith, Barry F. via petsc-dev
ldd -o on the petsc program (static) and the non petsc program (static), what are the differences? nm -o both executables | grep cudaFree() > On Feb 12, 2020, at 1:51 PM, Munson, Todd via petsc-dev > wrote: > > > There are some side effects when loading shared libraries, such as >

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-12 Thread Munson, Todd via petsc-dev
There are some side effects when loading shared libraries, such as initializations of static variables, etc. Is something like that happening? Another place is the initial runtime library that gets linked (libcrt0 maybe?). I think some MPI compilers insert their own version. Todd. > On Feb

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-12 Thread Matthew Knepley
On Wed, Feb 12, 2020 at 11:38 AM Zhang, Hong wrote: > > > On Feb 12, 2020, at 11:09 AM, Matthew Knepley wrote: > > On Wed, Feb 12, 2020 at 11:06 AM Zhang, Hong via petsc-dev < > petsc-dev@mcs.anl.gov> wrote: > >> Sorry for the long post. Here are replies I have got from OLCF so far. We >> still

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-12 Thread Zhang, Hong via petsc-dev
On Feb 12, 2020, at 11:09 AM, Matthew Knepley mailto:knep...@gmail.com>> wrote: On Wed, Feb 12, 2020 at 11:06 AM Zhang, Hong via petsc-dev mailto:petsc-dev@mcs.anl.gov>> wrote: Sorry for the long post. Here are replies I have got from OLCF so far. We still don’t know how to solve the problem.

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-12 Thread Matthew Knepley
On Wed, Feb 12, 2020 at 11:06 AM Zhang, Hong via petsc-dev < petsc-dev@mcs.anl.gov> wrote: > Sorry for the long post. Here are replies I have got from OLCF so far. We > still don’t know how to solve the problem. > > One interesting thing that Tom noticed is PetscInitialize() may have > called cuda

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-12 Thread Zhang, Hong via petsc-dev
Sorry for the long post. Here are replies I have got from OLCF so far. We still don’t know how to solve the problem. One interesting thing that Tom noticed is PetscInitialize() may have called cudaFree(0) 32 times as NVPROF shows, and they all run very fast. These calls may be triggered by some

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-10 Thread Smith, Barry F. via petsc-dev
gprof or some similar tool? > On Feb 10, 2020, at 11:18 AM, Zhang, Hong via petsc-dev > wrote: > > -cuda_initialize 0 does not make any difference. Actually this issue has > nothing to do with PetscInitialize(). I tried to call cudaFree(0) before > PetscInitialize(), and it still took 7.

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-10 Thread Zhang, Hong via petsc-dev
-cuda_initialize 0 does not make any difference. Actually this issue has nothing to do with PetscInitialize(). I tried to call cudaFree(0) before PetscInitialize(), and it still took 7.5 seconds. Hong On Feb 10, 2020, at 10:44 AM, Zhang, Junchao mailto:jczh...@mcs.anl.gov>> wrote: As I mentio

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-10 Thread Junchao Zhang via petsc-dev
As I mentioned, have you tried -cuda_initialize 0? Also, PetscCUDAInitialize contains ierr = PetscCUBLASInitializeHandle();CHKERRQ(ierr); ierr = PetscCUSOLVERDnInitializeHandle();CHKERRQ(ierr); Have you tried to comment out them and test again? --Junchao Zhang On Sat, Feb 8, 2020 at 5:22 PM Zha

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-08 Thread Zhang, Hong via petsc-dev
On Feb 8, 2020, at 5:03 PM, Matthew Knepley mailto:knep...@gmail.com>> wrote: On Sat, Feb 8, 2020 at 4:34 PM Zhang, Hong via petsc-dev mailto:petsc-dev@mcs.anl.gov>> wrote: I did some further investigation. The overhead persists for both the PETSc shared library and the static library. In the

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-08 Thread Matthew Knepley
On Sat, Feb 8, 2020 at 4:34 PM Zhang, Hong via petsc-dev < petsc-dev@mcs.anl.gov> wrote: > I did some further investigation. The overhead persists for both the PETSc > shared library and the static library. In the previous example, it does not > call any PETSc function, the first CUDA function bec

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-08 Thread Zhang, Hong via petsc-dev
I did some further investigation. The overhead persists for both the PETSc shared library and the static library. In the previous example, it does not call any PETSc function, the first CUDA function becomes very slow when it is linked to the petsc so. This indicates that the slowdown occurs if

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-08 Thread Jeff Hammond
Given that OLCF filesystems are the issue, have you engaged their support personnel regarding this issue? Jeff On Fri, Feb 7, 2020 at 6:37 PM Junchao Zhang via petsc-dev < petsc-dev@mcs.anl.gov> wrote: > Have you tried passing -cuda_initialize 0 to petsc? > > --Junchao Zhang > > > On Fri, Feb 7,

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-07 Thread Junchao Zhang via petsc-dev
Have you tried passing -cuda_initialize 0 to petsc? --Junchao Zhang On Fri, Feb 7, 2020 at 5:16 PM Zhang, Hong via petsc-dev < petsc-dev@mcs.anl.gov> wrote: > I tried to install PETSc shared library in /gpfs/alpine/scratch, which > should be faster than the home directory. But the same overhead

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-07 Thread Zhang, Hong via petsc-dev
I tried to install PETSc shared library in /gpfs/alpine/scratch, which should be faster than the home directory. But the same overhead still persists. Hong > On Feb 7, 2020, at 4:32 PM, Smith, Barry F. wrote: > > > Perhaps the intent is that you build or install (--prefix) your libraries >

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-07 Thread Smith, Barry F. via petsc-dev
Perhaps the intent is that you build or install (--prefix) your libraries in a different place than /autofs/nccs-svm1_home1 > On Feb 7, 2020, at 3:09 PM, Zhang, Hong wrote: > > Note that the overhead was triggered by the first call to a CUDA function. So > it seems that the first CUDA

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-07 Thread Zhang, Hong via petsc-dev
Note that the overhead was triggered by the first call to a CUDA function. So it seems that the first CUDA function triggered loading petsc so (if petsc so is linked), which is slow on the summit file system. Hong On Feb 7, 2020, at 2:54 PM, Zhang, Hong via petsc-dev mailto:petsc-dev@mcs.anl.g

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-07 Thread Zhang, Hong via petsc-dev
Linking any other shared library does not slow down the execution. The PETSc shared library is the only one causing trouble. Here are the ldd output for two different versions. For the first version, I removed -lpetsc and it ran very fast. The second (slow) version was linked to petsc so. bash

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-07 Thread Smith, Barry F. via petsc-dev
ldd -o on the executable of both linkings of your code. My guess is that without PETSc it is linking the static version of the needed libraries and with PETSc the shared. And, in typical fashion, the shared libraries are off on some super slow file system so take a long time to be loaded

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-07 Thread Zhang, Hong via petsc-dev
Statically linked excitable works fine. The dynamic linker is probably broken. Hong On Feb 7, 2020, at 12:53 PM, Matthew Knepley mailto:knep...@gmail.com>> wrote: On Fri, Feb 7, 2020 at 1:23 PM Zhang, Hong via petsc-dev mailto:petsc-dev@mcs.anl.gov>> wrote: Hi all, Previously I have noticed t

Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-07 Thread Matthew Knepley
On Fri, Feb 7, 2020 at 1:23 PM Zhang, Hong via petsc-dev < petsc-dev@mcs.anl.gov> wrote: > Hi all, > > Previously I have noticed that the first call to a CUDA function such as > cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. > Then I prepared a simple example as attach

[petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-07 Thread Zhang, Hong via petsc-dev
Hi all, Previously I have noticed that the first call to a CUDA function such as cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. Then I prepared a simple example as attached to help OCLF reproduce the problem. It turned out that the problem was caused by PETSc. The