Xiangdong writes:
> For these functions, the flop ratios are all 1.1, while the time ratio are
> 1.5-2.2. So the amount of work are sort of balanced for each processes.
> Both runs on Stampede and my group cluster gave similar behaviors. Given
> that I only use 256 cores, do you think it is likely
On Mon, Feb 8, 2016 at 6:45 PM, Jed Brown wrote:
> Xiangdong writes:
>
> > iii) since the time ratios of VecDot (2.5) and MatMult (1.5) are still
> > high, I rerun the program with ipm module. The IPM summary is here:
> >
> https://drive.google.com/file/d/0BxEfb1tasJxhYXI0VkV0cjlLWUU/view?usp=sh
Xiangdong writes:
> iii) since the time ratios of VecDot (2.5) and MatMult (1.5) are still
> high, I rerun the program with ipm module. The IPM summary is here:
> https://drive.google.com/file/d/0BxEfb1tasJxhYXI0VkV0cjlLWUU/view?usp=sharing.
> From this IPM reuslts, MPI_Allreduce takes 74% of MPI
The following routines are all embarrassingly parallel.
VecAXPY 1001160 1.0 2.0483e+01 2.7 1.85e+10 1.1 0.0e+00 0.0e+00
0.0e+00 3 4 0 0 0 3 4 0 0 0 219358
VecAYPX 600696 1.0 6.6270e+00 2.0 1.11e+10 1.1 0.0e+00 0.0e+00
0.0e+00 1 2 0 0 0 1 2 0 0 0 40616
Based on what you suggested, I have done the following:
i) rerun the same problem without output. The ratios are still roughly the
same. So it is not the problem of IO.
ii) rerun the program on a supercomputer (Stampede), instead of group
cluster. the MPI_Barrier time got better:
Average time to
In this case you need to provide two pieces of information to the
PCFIELDSPLIT. What we call the "block size" or bs which is the number of "basic
fields" in the problem. For example if at each grid point you have x-velocity,
y-velocity, and pressure the block size is 3. The second is you ne
Hi Matt, Hi all,
I am trying to have some feel for PCFIELDSPLIT testing on a very small
matrix (attached below). I have 4 degrees of freedoms. I use 4
processors. I want to split 4 dofs into two fields each having two
dofs. I don't know whether this my be a problem for petsc. When I use
the comman
On 8 February 2016 at 12:31, Jacek Miloszewski
wrote:
> Dear PETSc users,
>
> I use PETSc to assembly a square matrix (in the attached example it is n =
> 4356) which has around 12% of non-zero entries. I timed my code using
> various number of process (data in table). Now I have 2 questions:
>
>
On Mon, Feb 8, 2016 at 5:37 AM, Dave May wrote:
>
>
> On 8 February 2016 at 12:31, Jacek Miloszewski <
> jacek.miloszew...@gmail.com> wrote:
>
>> Dear PETSc users,
>>
>> I use PETSc to assembly a square matrix (in the attached example it is n
>> = 4356) which has around 12% of non-zero entries. I
On 8 February 2016 at 12:31, Jacek Miloszewski
wrote:
> Dear PETSc users,
>
> I use PETSc to assembly a square matrix (in the attached example it is n =
> 4356) which has around 12% of non-zero entries. I timed my code using
> various number of process (data in table). Now I have 2 questions:
>
>
Dear PETSc users,
I use PETSc to assembly a square matrix (in the attached example it is n =
4356) which has around 12% of non-zero entries. I timed my code using
various number of process (data in table). Now I have 2 questions:
1. Why with doubling number of processes the speed-up is 4x? I woul
On Mon, Feb 8, 2016 at 2:41 AM, Nicola Creati wrote:
> Hello,
> I'm trying to convert some PETSc examples from C to Python. I noted a
> strange behavior of the snes solver. It seems that the same example in
> Python and C are characterized by a different number of function
> evaluations. I conver
Hello,
I'm trying to convert some PETSc examples from C to Python. I noted a
strange behavior of the snes solver. It seems that the same example in
Python and C are characterized by a different number of function
evaluations. I converted ex25.c example in the "
snes/examples/tutorials" folder.
13 matches
Mail list logo