15 PM Junchao Zhang
wrote:
>
> I haven't used CUDA graph with PETSc. Do you happen to have a working
> example so we can debug?
>
> --Junchao Zhang
>
>
> On Tue, May 14, 2024 at 6:08 PM Sreeram R Venkat
> wrote:
>
>> I have a MatShell object that I want
I have a MatShell object that I want to convert to a MATDENSECUDA.
Normally, I use MatComputeOperator for this. However, I would now also like
to use a CUDA Graph so that all the calls to MatMult are captured. I can
wrap a code like
for (int i = 0; i < N; i++)
MatMult(A, x,y);
in a CUDA
I am trying to check my program for GPU memory leaks with the
compute-sanitizer tool. If I run my application with:
mpiexec -n 1 compute-sanitizer --tool memcheck --leak-check full ./a.out
args
I get the message:
Error: No attachable process found. compute-sanitizer timed-out.
Adding
I also just ran with cupy.linalg.eigvalsh (which wraps cuSOLVER), and it
only took 3.1 seconds. I will probably use this, but it is good to know
about the SLEPc cases if I don't need the full spectrum or have sparse
matrices, etc.
Thanks,
Sreeram
On Mon, May 13, 2024 at 2:13 PM Sreeram R Venkat
it to 500, say. The
> paper
> https://urldefense.us/v3/__https://doi.org/10.1016/j.cpc.2010.09.007__;!!G_uCfscf7eWS!elC1IPQR-J0CwJ8mp-zPPEXvkzawO1RCvusBFlcaG2xfHHzqRFzOQqHgXXqpFf1EBfhJx3LzgTYotr_Jwv1_o0wyag$
> includes a discussion on
> the ncv and mpd paramters, mostly in terms of
I have a MatShell object that computes matrix-vector products of a dense
symmetric matrix of size NxN. The MatShell does not actually form the dense
matrix, so it is never in memory/storage. For my application, N ranges from
1e4 to 1e5.
I want to compute the full spectrum of this matrix. For an
have to replace all MPI device
> communications (what if they are from a third-party library?) with NCCL.
>
> --Junchao Zhang
>
>
> On Wed, Apr 17, 2024 at 8:27 AM Sreeram R Venkat
> wrote:
>
>> Yes, I saw this paper
>> https://urldefense.us/v3/__https://www.s
this year about
the need for stream-aware MPI, so I was wondering if NCCL would be used in
PETSc to do GPU-GPU communication.
On Wed, Apr 17, 2024, 7:58 AM Junchao Zhang wrote:
>
>
>
>
> On Wed, Apr 17, 2024 at 7:51 AM Sreeram R Venkat
> wrote:
>
>> Do you know if there
Do you know if there are plans for NCCL support in PETSc?
On Tue, Apr 16, 2024, 10:41 PM Junchao Zhang
wrote:
> Glad to hear you found a way. Did you use Frontera at TACC? If yes, I
> could have a try.
>
> --Junchao Zhang
>
>
> On Tue, Apr 16, 2024 at 8:35 PM Sree
about not having a
GPU-aware MPI.
On Fri, Dec 8, 2023 at 5:30 PM Mark Adams wrote:
> You may need to set some env variables. This can be system specific so you
> might want to look at docs or ask TACC how to run with GPU-aware MPI.
>
> Mark
>
> On Fri, Dec 8, 2023 at 5:17 P
2023 at 5:54 AM Matthew Knepley wrote:
>
>> On Thu, Dec 21, 2023 at 6:46 AM Sreeram R Venkat
>> wrote:
>>
>>> Ok, I think the error I'm getting has something to do with how the
>>> multiple solves are being done in succession. I'll try to see if there's
Would using the CHOLMOD Cholesky factorization (
https://petsc.org/release/manualpages/Mat/MATSOLVERCHOLMOD/) let us do the
factorization on device as well?
On Wed, Dec 20, 2023 at 1:21 PM Pierre Jolivet wrote:
>
>
> On 20 Dec 2023, at 8:42 AM, Sreeram R Venkat wrote:
>
&
gt; Probably, the fastest approach would indeed be -pc_type lu -ksp_type
> preonly -ksp_matsolve_batch_size 100 or something, depending on the memory
> available on your host.
>
> Thanks,
> Pierre
>
> On 15 Dec 2023, at 9:52 PM, Sreeram R Venkat wrote:
>
> Here are the ksp
.
Thanks,
Sreeram
On Thu, Dec 14, 2023, 1:12 PM Pierre Jolivet wrote:
>
>
> On 14 Dec 2023, at 8:02 PM, Sreeram R Venkat wrote:
>
> Hello Pierre,
>
> Thank you for your reply. I tried out the HPDDM CG as you said, and it
> seems to be doing the batched solves, but the K
.
Is there anything else I need to do?
Thanks,
Sreeram
On Fri, Dec 8, 2023 at 3:29 PM Sreeram R Venkat wrote:
> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr module
> didn't require CUDA 11.4 as a dependency, so I was using 12.0
>
> On Fri, Dec 8, 2023 at 1:15 PM Satish
use cuda-11.4 - with this install of mvapich..
>
> Satish
>
> On Fri, 8 Dec 2023, Matthew Knepley wrote:
>
> > On Fri, Dec 8, 2023 at 1:54 PM Sreeram R Venkat
> wrote:
> >
> > > I am trying to build PETSc with CUDA using the CUDA-Aware MVAPICH2-GDR.
> &
Oh, in that case I will try out BoomerAMG. Getting AMGX to build correctly
was also tricky so hopefully the HYPRE build will be easier.
Thanks,
Sreeram
On Thu, Dec 7, 2023, 3:03 PM Pierre Jolivet wrote:
>
>
> On 7 Dec 2023, at 9:37 PM, Sreeram R Venkat wrote:
>
> Thank you Barr
rformance of the code.
>
> Thanks,
> Pierre
>
> On 7 Dec 2023, at 8:34 PM, Barry Smith wrote:
>
>
>
> On Dec 7, 2023, at 1:17 PM, Sreeram R Venkat wrote:
>
> I have 2 sequential matrices M and R (both MATSEQAIJCUSPARSE of size n x
> n) and a vector v of size n*m.
I have 2 sequential matrices M and R (both MATSEQAIJCUSPARSE of size n x n)
and a vector v of size n*m. v = [v_1 , v_2 ,... , v_m] where v_i has size
n. The data for v can be stored either in column-major or row-major order.
Now, I want to do 2 types of operations:
1. Matvecs of the form M*v_i =
19.
> 20.
> 21.
> 22.
> 23.
> 24.
> 25.
> 26.
> Process [3]
> 27.
> 28.
> 29.
> 30.
> 31.
> 32.
> 33.
> 34.
> 35.
> Process [4]
> Process [5]
> Process [6]
> Process [7]
> Process [8]
> Process [9]
> Process [10]
> Proc
3 at 9:30 PM Junchao Zhang
wrote:
> I think your approach is correct. Do you have an example code?
>
> --Junchao Zhang
>
>
> On Tue, Dec 5, 2023 at 5:15 PM Sreeram R Venkat
> wrote:
>
>> Hi, I have a follow up question on this.
>>
>> Now, I'm trying to
ULL, ).
However, when I try to do the scatter, I get some illegal memory access
errors.
Is there something wrong with how I define the index sets?
Thanks,
Sreeram
On Thu, Oct 5, 2023 at 12:57 PM Sreeram R Venkat
wrote:
> Thank you. This works for me.
>
> Sreeram
>
> On Wed
ion on the problem.
>
>I will post a fix shortly.
>
>Barry
>
>
> On Nov 16, 2023, at 6:19 PM, Sreeram R Venkat wrote:
>
> I have a program which reads a vector from file into an array, and then
> uses that array to create a PETSc Vec object. The Vec is defin
3: CMakeFiles/test.dir/all] Error 2
> make: *** [Makefile:91: all] Error 2
> (base) 06:31 2 login10 master= perlmutter:~/petsc-test$
>
>
> On Thu, Nov 16, 2023 at 9:42 PM Sreeram R Venkat
> wrote:
>
>> Actually, here's a short test case I just made.
>> I have it o
Venkat
wrote:
> Ok, will do. It may take me a few days to get a minimal reproducible
> example though since the rest of the program has gotten quite large.
>
> Thanks,
> Sreeram
>
> On Thu, Nov 16, 2023 at 8:27 PM Matthew Knepley wrote:
>
>> On Thu, Nov 16, 20
Ok, will do. It may take me a few days to get a minimal reproducible
example though since the rest of the program has gotten quite large.
Thanks,
Sreeram
On Thu, Nov 16, 2023 at 8:27 PM Matthew Knepley wrote:
> On Thu, Nov 16, 2023 at 6:19 PM Sreeram R Venkat
> wrote:
>
>> I
I have a program which reads a vector from file into an array, and then
uses that array to create a PETSc Vec object. The Vec is defined on the
global communicator, but not all processes actually contain entries of it.
For example, suppose we have 4 processors, and the vector is of size 10.
Rank 0
scObject)x), rend - rstart, rstart, 1,
> );
> VecScatterCreate(x, ix, y, ix, );
>
> --Junchao Zhang
>
>
> On Wed, Oct 4, 2023 at 6:03 PM Sreeram R Venkat
> wrote:
>
>> Suppose I am running on 12 processors, and I have a vector "v" of size 36
>> partit
Suppose I am running on 12 processors, and I have a vector "v" of size 36
partitioned over the first 4. v still uses the PETSC_COMM_WORLD, so it has
a layout of (9, 9, 9, 9, 0, 0, ..., 0). Now, I would like to repartition it
over all 12 processors, so that the layout becomes (3, 3, 3, ..., 3).
tCreateShell().
>
>
> On Sep 19, 2023, at 8:44 PM, Sreeram R Venkat wrote:
>
> Thank you for your reply.
>
> Let's call this matrix *M*:
> (A B C D)
> (E F G H)
> (I J K L)
>
> Now, instead of doing KSP with just *M*, what if I want *M^TM*? In this
>
around.
>
>I don't it makes sense to use PETSc with such vector decompositions as
> you would like.
>
> Barry
>
>
>
> On Sep 19, 2023, at 7:44 PM, Sreeram R Venkat wrote:
>
> With the example you have given, here is what I would like to do:
>
>-
t; Do you want one matrix A .. Z on each rank?
>
> Do you want the (a,b,c) vector spread over all ranks? What about the (w,x,y,z)
> vector?
>
> Barry
>
>
>
> On Sep 19, 2023, at 4:42 PM, Sreeram R Venkat wrote:
>
> I have a custom implementation o
I have a custom implementation of a matrix-vector product that inherently
relies on a 2D processor partitioning of the matrix. That is, if the matrix
looks like:
A B C D
E F G H
I J K L
in block form, we use 12 processors, each having one block. The input
vector is partitioned across each row,
33 matches
Mail list logo