On 29 Jul 2015, at 18:15, Erik Schnetter <[email protected]> wrote:

> On Fri, Jul 24, 2015 at 10:57 AM, Ian Hinder <[email protected]> wrote:
> 
> On 8 Jul 2015, at 16:53, Ian Hinder <[email protected]> wrote:
> 
>> 
>> On 8 Jul 2015, at 15:14, Erik Schnetter <[email protected]> wrote:
>> 
>>> I added a second benchmark, using a Thornburg04 patch system, 8th order 
>>> finite differencing, and 4th order patch interpolation. The results are
>>> 
>>> original: 8.53935e-06 sec
>>> rewrite:  8.55188e-06 sec
>>> 
>>> this time with 1 thread per MPI process, since that was most efficient in 
>>> both cases. Most of the time is spent in inter-patch interpolation, which 
>>> is much more expensive than in a "regular" case since this benchmark is run 
>>> on a single node and hence with very small grids.
>>> 
>>> With these numbers under our belt, can we merge the rewrite branch?
>> 
>> The "jacobian" benchmark that I gave you was still a pure kernel benchmark, 
>> involving no interpatch interpolation.  It just measured the speed of the 
>> RHSs when Jacobians were included.  I would also not use a single-threaded 
>> benchmark with very small grid sizes; this might have been fastest in this 
>> artificial case, but in practice I don't think we would use that 
>> configuration.  The benchmark you have now run seems to be more of a 
>> "complete system" benchmark, which is useful, but different.
>> 
>> I think it is important that the kernel itself has not gotten slower, even 
>> if the kernel is not currently a major contributor to runtime.  We 
>> specifically split out the advection derivatives because they made the code 
>> with 8th order and Jacobians a fair bit slower.  I would just like to see 
>> that this is not still the case with the new version, which has changed the 
>> way this is handled.
> 
> I have now run my benchmarks on both the original and the rewritten 
> McLachlan.  I seem to find that the ML_BSSN_* functions in
> Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns, excluding the constraint 
> calculations, are between 11% and 15% slower with the rewrite branch, 
> depending on the details of the evolution.  See attached plot.  This is on 
> Datura with quite old CPUs (Intel Xeon CPU X5650 2.67GHz).
> 
> I just realized that you probably used the wrong rhs_evaluation method for 
> McLachlan. While improving performance, I implemented 3 different ways to 
> evaluate the RHS: (1) all in one routine, (2) split manually, and (3) split 
> semi-automatically by Kranc. (2) and (3) are identical for practical 
> purposes, and thus (2) should not be used. In my benchmarks, I always 
> explicitly specified (3). However, the default in McLachlan is still at (1), 
> and thus likely not as efficient as it should be.
> 
> The parameter setting ML_BSSN::rhs_evaluation = "splitBy" chooses (3).
> 
> I will soon push McLachlan changes to make (3) the default and to remove (2).

Hi Erik,

Regarding the dissipation discussion: would it be possible to select whether to 
evaluate the dissipation terms in McLachlan using a runtime parameter; i.e. by 
changing

        Dissipation[var_] := IfDiss[epsdiss[ux] PDdiss[var, lx], 0];

to

        Dissipation[var_] := IfDiss[IfThen[useDissipation, epsdiss[ux] 
PDdiss[var, lx], 0], 0];

We can choose a better name for useDissipation.  This would be constant across 
the loop, so the compiler should still be able to vectorise.

-- 
Ian Hinder
http://members.aei.mpg.de/ianhin

_______________________________________________
Users mailing list
[email protected]
http://lists.einsteintoolkit.org/mailman/listinfo/users

Reply via email to