Re: [petsc-users] PetscSF Object on Distributed DMPlex for Halo Data Exchange

2022-05-20 Thread Mike Michell
Thanks for the reply. > "What I want to do is to exchange data (probably just MPI_Reduce)" which confuses me, because halo exchange is a point-to-point exchange and not a reduction. Can you clarify? PetscSFReduceBegin/End seems to be the function that do reduction for PetscSF object. What I

Re: [petsc-users] PetscSF Object on Distributed DMPlex for Halo Data Exchange

2022-05-20 Thread Toby Isaac
The PetscSF that is created automatically is the "point sf" ( https://petsc.org/main/docs/manualpages/DM/DMGetPointSF/): it says which mesh points (cells, faces, edges and vertices) are duplicates of others. In a finite volume application we typically want to assign degrees of freedom just to

[petsc-users] Problem with MUMPS package when running ex1 from src/tao/constrained/tutorials

2022-05-20 Thread jgaray
Hello PETSc users, I have been trying to run example ex1 from src/tao/constrained/tutorials, which requires the use of the MUMPS package. I have downloaded MUMPS and configured PETSc with it using ./configure --download-make --download-cmake --download-mumps --download-metis

Re: [petsc-users] API cal to set mg_levels_pc_type

2022-05-20 Thread Barry Smith
> Yes, this is what I did. > But I have a function that sets up the preconditioner by taking a KSP > as an argument. Depending on the problem being solved, it can be a > single KSP or a sub-KSP coming from a field split (which if there are > lagrange multipliers, can be further split again). So

Re: [petsc-users] [Ext] Re: Very slow VecDot operations

2022-05-20 Thread Junchao Zhang
You can also use -log_view -log_sync to sync before timing so that you can clearly see which operations are really imbalanced. --Junchao Zhang On Fri, May 20, 2022 at 12:37 PM Ernesto Prudencio via petsc-users < petsc-users@mcs.anl.gov> wrote: > Thank you, Barry. I will dig more on the issue

Re: [petsc-users] API cal to set mg_levels_pc_type

2022-05-20 Thread Jed Brown
Barry Smith writes: >> 1. Is there a way to set the mg_levels_pc_type via an API call? >> 2. Are there any changes in efficiency expected with this new PC? > > This was changed mainly because PCSOR is not effective (currently in PETSc) > for GPUs. For CPUs > it will definitely be problem

Re: [petsc-users] API cal to set mg_levels_pc_type

2022-05-20 Thread Barry Smith
> On May 20, 2022, at 2:39 PM, Jeremy Theler wrote: > > The default smoothing PC changed from sor to jacobi in 3.17. Note that this is only for GAMG, it is not for geometric multigrid (using PCMG directly). > The > Changelog says the old behavior can be recovered by using >

[petsc-users] API cal to set mg_levels_pc_type

2022-05-20 Thread Jeremy Theler
The default smoothing PC changed from sor to jacobi in 3.17. The Changelog says the old behavior can be recovered by using -mg_levels_pc_type sor. 1. Is there a way to set the mg_levels_pc_type via an API call? 2. Are there any changes in efficiency expected with this new PC? Regards -- jeremy

Re: [petsc-users] [Ext] Re: Very slow VecDot operations

2022-05-20 Thread Ernesto Prudencio via petsc-users
Thank you, Barry. I will dig more on the issue with your suggestions. Schlumberger-Private From: Barry Smith Sent: Friday, May 20, 2022 12:33 PM To: Ernesto Prudencio Cc: PETSc users list Subject: [Ext] Re: [petsc-users] Very slow VecDot operations Ernesto, If you ran (or can run)

Re: [petsc-users] Very slow VecDot operations

2022-05-20 Thread Barry Smith
Ernesto, If you ran (or can run) with -log_view you could see the time "ratio" in the output that tells how much time the "fastest" rank spent on the dot product versus the "slowest". Based on the different counts per rank you report that ratio might be around 3. But based on the times

[petsc-users] Very slow VecDot operations

2022-05-20 Thread Ernesto Prudencio via petsc-users
I am using LSQR to minimize || L x - b ||_2, where L is a sparse rectangular matrix with 145,253,395 rows, 209,423,775 columns, and around 54 billion non zeros. The numbers reported below are for a run with 27 compute nodes, each compute node with 4 MPI ranks, so a total of 108 ranks.