Re: [petsc-dev] MATMPIAIJ

2013-09-24 Thread Barry Smith
You are likely prematurely optimizing. Are you already solving systems using the new solver on "real" problems? Does profiling indicate that the duplicate memory usage or time to copy are limiting the problems you can run? I highly recommend finishing and using your solver interface well

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Jed Brown
Karl Rupp writes: > Hi, > >>> If the context and queue are not attached to objects, then they would >>> essentially represent global state, which is something I want to avoid. >> >> I was thinking that the context returned would be specific to the Mat >> and the device it was about to run on. > >

Re: [petsc-dev] MATMPIAIJ

2013-09-24 Thread Matthew Knepley
On Tue, Sep 24, 2013 at 9:07 AM, Jose David Bermeol wrote: > Hi, I have some questions? > > 1) What is the main reason to split the matrix in each MPI process in > diagonal matrix and off diagonal matrix? > To overlap communication with computation. > 2) Is this just for MATMPIAIJ matrices? >

[petsc-dev] MATMPIAIJ

2013-09-24 Thread Jose David Bermeol
Hi, I have some questions? 1) What is the main reason to split the matrix in each MPI process in diagonal matrix and off diagonal matrix? 2) Is this just for MATMPIAIJ matrices? 3) Right now we are interested in safe all the memory as possible, so I guess the right path would be to implement a

Re: [petsc-dev] Error in PetscMallocValidate

2013-09-24 Thread John Mousel
Matt, The problem results with 1-4 cores. I have only observed the error when using GAMG. I do not get any error with: -ksp_type bcgsl -lspaint_pc_type hypre -lspaint_pc_hypre_type boomeramg or -ksp_type bcgsl I do get an error with -ksp_type bcgsl -pc_type gamg -pc_gamg_threshold 0.01 -pc_ga

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Karl Rupp
Hi, Fair enough. Is a brute-force implementation for P1 elements sufficient as a baseline for discussion? src//ksp/ksp/examples/tutorials/ex4.c Ok, thanks, that's the COO part of the comparison then. I'll need to provide the CSR-like case then. :-) Best regards, Karli

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Karl Rupp
Hi Matt, Here I believe strongly that we need tests. Nathan assured me that nothing is faster on the GPU than sort+reduce-by-key since they are highly optimized. I think they will be hard to beat, and the initial timings I had say that this is the case. I am willing to be wrong, but I am not wil

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Karl Rupp
Hi Dominic, I think you were referring to the 'Mat' on the device, while I was referring to the plain PETSc Mat. The difficulty for a 'Mat' on the device is a limitation of OpenCL in defining opaque types: It is not possible to have something like typedef struct OpenCLMat { __global int row_

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Matthew Knepley
On Tue, Sep 24, 2013 at 8:11 AM, Karl Rupp wrote: > Hi Matt, > > > Here I believe strongly that we need tests. Nathan assured me that >> nothing is faster on the GPU than sort+reduce-by-key since >> they are highly optimized. I think they will be hard to beat, and the >> initial timings I had sa

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Karl Rupp
Hi, If the context and queue are not attached to objects, then they would essentially represent global state, which is something I want to avoid. I was thinking that the context returned would be specific to the Mat and the device it was about to run on. Users who want to do the assembly rig

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Dominic Meiser
Hi, I think you were referring to the 'Mat' on the device, while I was referring to the plain PETSc Mat. The difficulty for a 'Mat' on the device is a limitation of OpenCL in defining opaque types: It is not possible to have something like typedef struct OpenCLMat { __global int row_indic

Re: [petsc-dev] Error in PetscMallocValidate

2013-09-24 Thread Jed Brown
John Mousel writes: > I cloned a fresh repo and built from scratch. I'm seeing the same error I > previously reported with both intel and gnu compilers. Can you reproduce with a PETSc example or create a test case so that we can reproduce? > Also, I get warnings in MatSOR_SeqAIJ_Inode during t

Re: [petsc-dev] Error in PetscMallocValidate

2013-09-24 Thread Matthew Knepley
On Tue, Sep 24, 2013 at 7:49 AM, John Mousel wrote: > I cloned a fresh repo and built from scratch. I'm seeing the same error I > previously reported with both intel and gnu compilers. Also, I get warnings > in MatSOR_SeqAIJ_Inode during the build. > Do you get the error in serial? Do you get er

Re: [petsc-dev] Error in PetscMallocValidate

2013-09-24 Thread John Mousel
I cloned a fresh repo and built from scratch. I'm seeing the same error I previously reported with both intel and gnu compilers. Also, I get warnings in MatSOR_SeqAIJ_Inode during the build. src/mat/impls/aij/seq/inode.c: In function ‘MatSOR_SeqAIJ_Inode’: src/mat/impls/aij/seq/inode.c:2758: warni

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Jed Brown
Karl Rupp writes: >> Okay, but why do they need to provide their own "Mat" data? > > If the context and queue are not attached to objects, then they would > essentially represent global state, which is something I want to avoid. I was thinking that the context returned would be specific to the

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Matthew Knepley
On Tue, Sep 24, 2013 at 7:07 AM, Karl Rupp wrote: > Hey, > > > On 09/24/2013 03:53 PM, Jed Brown wrote: > >> Karl Rupp writes: >> >>> I'm not talking about CSR vs. COO from the SpMV point of view, but >>> rather on how to store the actual data in global memory without >>> expensive subsequent so

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Karl Rupp
Hey, On 09/24/2013 03:53 PM, Jed Brown wrote: Karl Rupp writes: I'm not talking about CSR vs. COO from the SpMV point of view, but rather on how to store the actual data in global memory without expensive subsequent sorts. Sure, but this seems like such a minor detail. With PetscScalar=doub

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Karl Rupp
Hi, Perhaps that *GetSource method should also return an opaque device "Mat" pointer that the user is responsible for shepherding into the kernel From which they call the device MatSetValues? This is easy of the OpenCL management is within PETSc (i.e. context, buffers and command queues mana

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Jed Brown
Karl Rupp writes: > I'm not talking about CSR vs. COO from the SpMV point of view, but > rather on how to store the actual data in global memory without > expensive subsequent sorts. Sure, but this seems like such a minor detail. With PetscScalar=double and PetscInt=int, we have 16 bytes/entry

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Karl Rupp
Hey, >> My primary metric for GPU kernels is memory transfers from global memory ('flops are free'), hence what I suggest for the assembly stage is to go with something CSR-like rather than COO. Pure CSR may be too expensive in terms of element lookup if there are several fields involved (partic

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Jed Brown
Karl Rupp writes: > Hey, > >> Perhaps that *GetSource method should also return an opaque device "Mat" >> pointer that the user is responsible for shepherding into the kernel >> From which they call the device MatSetValues? > > This is easy of the OpenCL management is within PETSc (i.e. context,

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Matthew Knepley
On Tue, Sep 24, 2013 at 2:45 AM, Jed Brown wrote: > Karl Rupp writes: > > >>> This can obviously be done incrementally, so storing a batch of > >>> element matrices to global memory is not a problem. > >> > >> If you store element matrices to global memory, you're using a ton of > >> bandwidth (

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Jed Brown
Karl Rupp writes: >>> This can obviously be done incrementally, so storing a batch of >>> element matrices to global memory is not a problem. >> >> If you store element matrices to global memory, you're using a ton of >> bandwidth (about 20x the size of the matrix if using P1 tets). >> >> What if

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Karl Rupp
Hey, Perhaps that *GetSource method should also return an opaque device "Mat" pointer that the user is responsible for shepherding into the kernel From which they call the device MatSetValues? This is easy of the OpenCL management is within PETSc (i.e. context, buffers and command queues man

Re: [petsc-dev] Supporting OpenCL matrix assembly

2013-09-24 Thread Karl Rupp
This can obviously be done incrementally, so storing a batch of element matrices to global memory is not a problem. If you store element matrices to global memory, you're using a ton of bandwidth (about 20x the size of the matrix if using P1 tets). What if you do the sort/reduce thing within