On 07/02/2015 04:43, Michael Black wrote:

Hi Mike & All,
> And to add two threads to this testing....still no improvement so sticking
> with 1 thread on the OMP seems to be the thing to do.
>
> And I've been running the jt9_omp now since yesterday with no problems
> noted.  What's the plan for this?  An option to enable?
The current situation as far as I am concerned is that we need to make a 
decision whether to go ahead with using OpenMP to get parallelism in 
WSJT-X. If we decide "yes" then the CMake script will build jt9 with 
OpenMP on platforms that have a tool chain available that supports 
OpenMP, this is a trivial change and jt9_omp will just disappear. Having 
an option is not really viable as it is not something you can turn on 
and off effectively, this is because the OpenMP code is declarative and 
stays in place even if the available threads are limited to one. There 
is a small trade off in that the potentially large benefits gained on a 
multi-CPU system will cost a small overhead on a single CPU machine 
compared to not using parallelism at all. IMHO single CPU processors are 
history and the physics of electronics mean that will not change until 
switching flows of electrons is abandoned for computing. Note my other 
post on the latest Raspberry Pi fro example with 4 cores on a $35 
computer using a processor architecture that most of us have on our 
mobile phones.

We have a problem on Mac which John G4KLA and I are working on, it 
appears there is no tool chain available to support OpenMP AND Qt in the 
same application. This may change in the future but all attempts to 
build a usable tool chain have failed so far. The bottom line is that 
Apple do not support OpenMP in their compilers and Qt do not support 
building with anything other than Apple compilers on Mac. This may 
change if Apple include OpenMP support AND a Fortran compiler or if the 
Qt project start supporting building Qt with mainstream gcc on Mac but 
I'm not holding my breath for either.

There is a project developing a fork of clang, on which the Apple 
compilers are based, with OpenMP support which is actively developed and 
proposed to be merged back into the clang trunk. Unfortunately this 
would only solve part of the problem because clang does not include a 
Fortran compiler. LLVM the compiler back end clang uses does have the 
required support for Fortran and there are proposals for FLANG along 
with the Dragon Egg team porting gcc to use the LLVM back end. Neither 
of these converge with the Apple supplied tool chain as far as I know.

So I believe that on Mac we will not have the benefit of OpenMP :( This 
is not a problem since the code that is using OpenMP is written such 
that it works without OpenMP as well.

Looking further ahead, I have plans to merge jt9 and wsjtx into a single 
executable, at that point it is definitely worth considering using the 
Qt thread facilities to achieve the same sort of parallelism we have 
just implemented in jt9 rather than using OpenMP. The OpenMP facilities 
we have used have direct parallels with the Qt threading facility (and 
the current developments for parallelism in the C++ Standard) so that 
would not necessarily be a large change. OTOH if we were to start using 
the more compiler integrated facilities of OpenMP like the distribution 
of loops across threads (see the workshare1 example that Mike and I 
posted for a trivial example) then there is no easy switch to the Qt 
parallelism facilities without major rewriting. This would also benefit 
Mac users if Apple or Qt don't give us a route to OpenMP since the Qt 
parallelism works just fine on Mac.
>
> M1,M2
> 1.50,1.03     Jt9 -w 2
> 1.44,1.01     Jt9 -w 3
> 1.08,0.79     Jt9_omp -w 2
> 1.06,0.73     Jt9_omp -w 3
> 1.09,0.75     jt9_omp -w 3 -m 2
Mike, one point you may not have considered in these tests is the 
population of JT65 vs. JT9 messages, the parallel decoding benefit 
should be maximized (in terms of messages decoded in a given time) when 
there is equal work in each mode. It would be valuable to measure the 
performance of jt9 vs. jt9_omp when decoding in single mode (this needs 
a modification to jt9 to allow single mode decoding from the command 
line) and also to test them in dual mode with carefully selected .WAV 
files that have equal work in each mode. These factors also have an 
interaction with threaded FFTW3 since if only one thread is being used 
for decoding then the parallel FFT gain may well be more apparent.
>
> Mike W9MDB
73
Bill
G4WJS.
>
> -----Original Message-----
> From: Michael Black [mailto:[email protected]]
> Sent: Friday, February 06, 2015 5:04 PM
> To: 'WSJT software development'
> Subject: OMP/wisdom testing
>
> I thought -w 3 might be worth revisiting since the speedups seem to have
> stabilized some...I only tested this with "-m 1" since prior testing showed
> no improvement for more threads.  I am going to test -m 2 -w 3 and will show
> those results later.
>
> On two machines -- one Windows 7 desktop several years old with Dual
> x5450@3Ghz which we'll call M1.  Newer HP Envy laptop with I7-4702MQ @2.2Ghz
> which we'll call M2.
> Using r4940 and testing wisdom 2 & 3 (modified the code and recompiled for
> that).
> Numbers here reported after a few dozen iterations so is a pretty accurate
> average.
>
> Mike W9MDB
>
> M1, M2
> 1.50,1.03     Jt9 -w 2
> 1.44,1.01     Jt9 -w 3
> 1.08,0.79     Jt9_omp -w 2
> 1.06,0.73     Jt9_omp -w 3
>
> So -w 3 gives a 3.8% improvement on the old machine for jt9 and a 1.9%
> improvement on the new laptop.
> And -w 3 gives a 1.9% improvements on the old machine for jt9_omp and a 7.6%
> improvement on the new laptop.
>
> Mike W9MDB
>
> #include <stdio.h>
> #include <math.h>
>
> int main(int argc,char *argv[])
> {
>       char cmdbuf[4096];
>       double total=0;
>       int n=0;
>       int nthreads=1;
>       char buf[4096];
>       char *cmd="jt9";
>
>       if (argc > 1) {
>               nthreads = atoi(argv[1]);
>       }
>       if (nthreads > 0) {
>               cmd = "jt9_omp";
>       }
>       printf("Testing %s with %d thread%c\n",cmd,nthreads,nthreads==1?'
> ':'s');
>       sprintf(cmdbuf,"TimeMem-1.0.exe %s -p 1 -d 3 -w 3 -m %d
> 130610_2343.wav | grep Elapsed | cut -f2 -d: >doit.txt",cmd,nthreads);
>       while(1) {
>               system(cmdbuf);
>               FILE *fp=fopen("doit.txt","r");
>               fgets(buf,sizeof(buf),fp);
>               fclose(fp);
>               double sec = atof(buf);
>               ++n;
>               total+=sec;
>               double avg = total/n;
>               if (sec > avg*1.5) {
>                       printf("\nlong run %.2f avg=%.2f\n",sec,avg);
>               }
>               printf("%d sec=%.2f avg=%.2f\r",n,sec,avg);
>               fflush(stdout);
>       }
> }
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> wsjt-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/wsjt-devel


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
wsjt-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Reply via email to