Thanks Ed - I will experiment with -O3.

Indeed it would be good to make the basic ops as optimised as possible

Karl


> On 27 Oct 2021, at 3:08 am, Ed . <ej...@hotmail.com> wrote:
> 
> If you’re using a typical consumer computer, you’ll get limitations of memory 
> bandwidth, which it seems will limit simple calculations on large amounts of 
> data. It would probably be worth ensuring one’s installation of PDL is 
> compiled with -O3 just in case; -O2 (the usual default) enables vectorisation 
> on clang, but not on GCC which only does so on -O3.
>  
> I just did a bit more experimenting with very latest PDL on a MacBook with 6 
> cores/12 hyperthreads (which apparently defaults to -O3). For comparison, 
> normal Perl takes about 28ms for 1000 iterations, so C will be about 1ms. 
> Best performance was with PDL_AUTOPTHREAD_SIZE=0 PDL_AUTOPTHREAD_TARG=10 (11 
> was about 1.5x as long), where 1000 iterations took about 0.31ms, or a bit 
> over 3x quicker than C, and comparable with the JavaScript (which I suspect 
> benefits from using GPU or maybe just multicore).
>  
> This 2019 presentation 
> (https://indico.cern.ch/event/814979/contributions/3401203/attachments/1831468/3115808/VectorParallelismMultiCoreProc.pdf
>  
> <https://indico.cern.ch/event/814979/contributions/3401203/attachments/1831468/3115808/VectorParallelismMultiCoreProc.pdf>)
>  discusses the various issues in making parallel process go Really Fast. For 
> me, a key takeaway is the problem is generally quite hard, and it’s wise to 
> use e.g. BLAS where all the possible optimisations have been wrung out. PDL 
> could benefit from that by parsing the “Code” etc sections, and inserting 
> BLAS calls. Similarly, we should probably start using LAPACK in core, like 
> GNU Octave etc do. An interesting possibility would be to use the “Matriplex” 
> library for vectorising operations on many smallish matrices (it even 
> generates code using Perl).
>  
> It also mentions Amdahl’s Law, which gives limits to parallelism speedups 
> (fundamentally, the non-parallelisable bits impose limits, including 
> main-memory access).
>  
> From: Karl Glazebrook <mailto:karlglazebr...@mac.com>
> Sent: 26 October 2021 08:57
> To: Ed . <mailto:ej...@hotmail.com>
> Cc: Luis Mochan <mailto:moc...@icf.unam.mx>; pdl-de...@lists.sourceforge.net 
> <mailto:pdl-de...@lists.sourceforge.net>; perldl 
> <mailto:pdl-general@lists.sourceforge.net>
> Subject: Re: [Pdl-devel] benchmarks
>  
> This thread is interesting.
>  
> I was wondering if anyone has ever seen speedups of 2x or better with 
> PDL_AUTOPTHREAD_TARG > 2? I find it tends to max out at around 1.5-1.7x 
> whatever I set.
>  
> I know about overhead etc. but kind of feel for some of the basic stuff (e.g. 
> A=B*C for large arrays with big chunks) I should see 4x for 
> PDL_AUTOPTHREAD_TARG=4 and never do)
>  
> The various numbers in the tests reported by Ed show <2x.
>  
> Nice getting faster than C!
>  
> Karl
>  
> 
> 
> On 4 Oct 2021, at 1:05 am, Ed . <ej...@hotmail.com 
> <mailto:ej...@hotmail.com>> wrote:
>  
> Thank you for the independent measurement!
>  
> From: Luis Mochan <mailto:moc...@icf.unam.mx>
> Sent: 03 October 2021 15:03
> To: pdl-de...@lists.sourceforge.net <mailto:pdl-de...@lists.sourceforge.net>; 
> perldl <mailto:pdl-general@lists.sourceforge.net>
> Subject: Re: [Pdl-devel] benchmarks
>  
> 
> Now I have run the C benchmark and Ed's. My results are:
> 
>    | Program      | # iterations | time (s) | speed (K/s) | factor |
>    |--------------+--------------+----------+-------------+--------|
>    | ansi c       |        150e6 |      133 |   1127.8195 |     1. |
>    | perl         |        1.5e6 |       56 |   26.785714 |   42.1 |
>    | my pdl       |         15e6 |       67 |   223.88060 |    5.0 |
>    | Ed's pdl     |         15e6 |       16 |       937.5 |    1.2 |
>    | Ed's 4 cores |         15e6 |       11 |   1363.6364 |    0.8 |
> 
> So, as Ed wrote, just by stting and environment variable,
> perl+pdl+pp_def can be made faster than c.
> 
> 
> 
> 
> On Sat, Oct 02, 2021 at 07:03:50PM -0500, Luis Mochan wrote:
> > I made my own version of the ray-tracing program (as I tried to
> > understand it). I didn't use pp_def, only Perl and ordinary PDL. I used
> > ...
> 
> --
> 
>                                                                   o
> W. Luis Mochán,                      | tel:(52)(777)329-1734     /<(*)
> Instituto de Ciencias Físicas, UNAM  | fax:(52)(777)317-5388     `>/   /\
> Av. Universidad s/n CP 62210         |                           (*)/\/  \
> Cuernavaca, Morelos, México          | moc...@fis.unam.mx 
> <mailto:moc...@fis.unam.mx>   /\_/\__/
> GPG: 791EB9EB, C949 3F81 6D9B 1191 9A16  C2DF 5F0A C52B 791E B9EB
> 
> 
> _______________________________________________
> pdl-devel mailing list
> pdl-de...@lists.sourceforge.net <mailto:pdl-de...@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/pdl-devel 
> <https://lists.sourceforge.net/lists/listinfo/pdl-devel>
>  
> _______________________________________________
> pdl-devel mailing list
> pdl-de...@lists.sourceforge.net <mailto:pdl-de...@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/pdl-devel 
> <https://lists.sourceforge.net/lists/listinfo/pdl-devel>
>  
>  

_______________________________________________
pdl-general mailing list
pdl-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pdl-general

Reply via email to