Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
> Mark, > > Can you run both with GPU aware MPI? > > Perlmuter fails with GPU aware MPI. I think there are know problems with this that are being worked on. And here is Crusher with GPU aware MPI. DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per rank:

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Barry Smith
Not clear how to interpret, the "gpu" FLOP rate for dot and norm are a good amount higher (exact details of where the log functions are located can affect this) but the over flop rates of them are not much better. Scatter is better without GPU MPI. How much of this is noise, need to see statisti

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
> > Mark, can we compare with Spock? > Looks much better. This puts two processes/GPU because there are only 4. DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per rank: 274625 274625 274625 274625 274625 274625 274625 274625 Number of 1-cells per rank: 811

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Barry Smith
Here except for VecNorm the GPU is used effectively in that most of the time is time is spent doing real work on the GPU VecNorm 402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 9 1 0 0 33 30230 225393 0 0.00e+000 0.00e+00 100 Even the

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
On Mon, Jan 24, 2022 at 12:44 PM Barry Smith wrote: > > Here except for VecNorm the GPU is used effectively in that most of the > time is time is spent doing real work on the GPU > > VecNorm 402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00 > 4.0e+02 0 1 0 0 20 9 1 0 0 3

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Junchao Zhang
Mark, I think you can benchmark individual vector operations, and once we get reasonable profiling results, we can move to solvers etc. --Junchao Zhang On Mon, Jan 24, 2022 at 12:09 PM Mark Adams wrote: > > > On Mon, Jan 24, 2022 at 12:44 PM Barry Smith wrote: > >> >> Here except for VecNor

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang wrote: > Mark, I think you can benchmark individual vector operations, and once we > get reasonable profiling results, we can move to solvers etc. > Can you suggest a code to run or are you suggesting making a vector benchmark code? > > --Junchao Z

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Junchao Zhang
On Mon, Jan 24, 2022 at 12:55 PM Mark Adams wrote: > > > On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang > wrote: > >> Mark, I think you can benchmark individual vector operations, and once we >> get reasonable profiling results, we can move to solvers etc. >> > > Can you suggest a code to run or

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Matthew Knepley
On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang wrote: > > > On Mon, Jan 24, 2022 at 12:55 PM Mark Adams wrote: > >> >> >> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang >> wrote: >> >>> Mark, I think you can benchmark individual vector operations, and once >>> we get reasonable profiling results,

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Barry Smith
I think you should contact the crusher ECP technical support team and tell them you are getting dismel performance and ask if you should expect better. Don't waste time flogging a dead horse. > On Jan 24, 2022, at 2:16 PM, Matthew Knepley wrote: > > On Mon, Jan 24, 2022 at 2:11 PM Junchao

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
Yea, CG/Jacobi is as close to a benchmark code as we could want. I could run this on one processor to get cleaner numbers. Is there a designated ECP technical support contact? On Mon, Jan 24, 2022 at 2:18 PM Barry Smith wrote: > > I think you should contact the crusher ECP technical support

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Barry Smith
> On Jan 24, 2022, at 2:46 PM, Mark Adams wrote: > > Yea, CG/Jacobi is as close to a benchmark code as we could want. I could run > this on one processor to get cleaner numbers. > > Is there a designated ECP technical support contact? Mark, you've forgotten you work for DOE. There isn't a

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Justin Chang
My name has been called. Mark, if you're having issues with Crusher, please contact Veronica Vergara (vergar...@ornl.gov). You can cc me (justin.ch...@amd.com) in those emails On Mon, Jan 24, 2022 at 1:49 PM Barry Smith wrote: > > > On Jan 24, 2022, at 2:46 PM, Mark Adams wrote: > > Yea, CG/Ja

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Justin Chang
Also, do you guys have an OLCF liaison? That's actually your better bet if you do. Performance issues with ROCm/Kokkos are pretty common in apps besides just PETSc. We have several teams actively working on rectifying this. However, I think performance issues can be quicker to identify if we had a

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
On Mon, Jan 24, 2022 at 2:57 PM Justin Chang wrote: > My name has been called. > > Mark, if you're having issues with Crusher, please contact Veronica > Vergara (vergar...@ornl.gov). You can cc me (justin.ch...@amd.com) in > those emails > I have worked with Veronica before. I'll ask Tood if we

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Barry Smith
For this, to start, someone can run src/vec/vec/tutorials/performance.c and compare the performance to that in the technical report Evaluation of PETSc on a Heterogeneous Architecture \\ the OLCF Summit System \\ Part I: Vector Node Performance. Google to find. One does not have to and sho

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Munson, Todd via petsc-dev
I want to note that crusher is early access hardware, so we should expect performance to not be great right now. Doing what we can to help identify the performance issues and keeping OLCF informed would be the best. Note that we cannot make any of the preliminary results publicly available wit

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Barry Smith
Sure, this is definitely not for the public, it is just numbers one can give to OLCF, AMD, and Kokkos to ensure things are as they should be going to. > On Jan 24, 2022, at 3:30 PM, Munson, Todd wrote: > > I want to note that crusher is early access hardware, so we should expect > performa