Re: [petsc-users] Smaller assemble time with increasing processors

2023-07-03 Thread Runfeng Jin
Thank you very much☺! Runfeng Barry Smith 于2023年7月3日周一 22:52写道: > > > On Jul 3, 2023, at 10:11 AM, Runfeng Jin wrote: > > Hi, > >> We use a hash table to store the nonzeros on the fly, and then convert to >> packed storage on assembly. >> > >There is "extra memory" since the matrix entries

Re: [petsc-users] Smaller assemble time with increasing processors

2023-07-03 Thread Barry Smith
> On Jul 3, 2023, at 10:11 AM, Runfeng Jin wrote: > > Hi, >> We use a hash table to store the nonzeros on the fly, and then convert to >> packed storage on assembly. There is "extra memory" since the matrix entries are first stored in a hash and then converted into the regular CSR format

Re: [petsc-users] Smaller assemble time with increasing processors

2023-07-03 Thread Runfeng Jin
Hi, > We use a hash table to store the nonzeros on the fly, and then convert to > packed storage on assembly. > Maybe can you tell me which file implements this function? Runfeng Runfeng Jin 于2023年7月3日周一 22:05写道: > Thank you for all your help! > > Runfeng > > Matthew Knepley 于2023年7月3日周一 2

Re: [petsc-users] Smaller assemble time with increasing processors

2023-07-03 Thread Runfeng Jin
Thank you for all your help! Runfeng Matthew Knepley 于2023年7月3日周一 22:03写道: > On Mon, Jul 3, 2023 at 9:56 AM Runfeng Jin wrote: > >> Hi, impressive performance! >> I use the newest version of petsc(release branch), and it almost >> deletes all assembly and stash time in large processors (asse

Re: [petsc-users] Smaller assemble time with increasing processors

2023-07-03 Thread Matthew Knepley
On Mon, Jul 3, 2023 at 9:56 AM Runfeng Jin wrote: > Hi, impressive performance! > I use the newest version of petsc(release branch), and it almost deletes > all assembly and stash time in large processors (assembly time > 64-4s/128-2s/256-0.2s, stash time all below 2s). For the zero programming

Re: [petsc-users] Smaller assemble time with increasing processors

2023-07-03 Thread Runfeng Jin
Hi, impressive performance! I use the newest version of petsc(release branch), and it almost deletes all assembly and stash time in large processors (assembly time 64-4s/128-2s/256-0.2s, stash time all below 2s). For the zero programming cost, it really incredible. The order code has a regular

Re: [petsc-users] Smaller assemble time with increasing processors

2023-07-02 Thread Barry Smith
The main branch of PETSc now supports filling sparse matrices without providing any preallocation information. You can give it a try. Use your current fastest code but just remove ALL the preallocation calls. I would be interested in what kind of performance you get compared to your best

Re: [petsc-users] Smaller assemble time with increasing processors

2023-07-02 Thread Runfeng Jin
Hi! Good advice! I set value with MatSetValues() API, which sets one part of a row at a time(I use a kind of tiling technology so I cannot get all values of a row at a time). I tested the number of malloc in these three cases. The number of mallocs is decreasing with the increase of proces

Re: [petsc-users] Smaller assemble time with increasing processors

2023-07-01 Thread Barry Smith
I see no reason not to trust the times below, they seem reasonable. You get more than 2 times speed from 64 to 128 and then about 1.38 from 128 to 256. The total amount of data moved (number of messages moved times average length) goes from 7.0e+03 * 2.8e+05 1.9600e+09 to 2.1060e+09 to

Re: [petsc-users] Smaller assemble time with increasing processors

2023-06-30 Thread Barry Smith
You cannot look just at the VecAssemblyEnd() time, that will very likely give the wrong impression of the total time it takes to put the values in. You need to register a new Event and put a PetscLogEvent() just before you start generating the vector entries and calling VecSetValues() an

[petsc-users] Smaller assemble time with increasing processors

2023-06-30 Thread Runfeng Jin
Hello! When I use PETSc build a sbaij matrix, I find a strange thing. When I increase the number of processors, the assemble time become smaller. All these are totally same matrix. The assemble time mainly arouse from message passing, which because I use dynamic workload that it is random for whic