On 04/02/2015 20:30, Joe Taylor wrote:
> Hi Bill,
>
>> Before completely discarding MT FFTW3 I would like to try something. Can
>> you give a brief rough summary of all the FFT sizes used by jt9?
> The file wisdom1.bat in .../wsjtx/lib shows the following FFT plans
> being used:
>
> rif672000 cif77175 cib77175 rif16384 rif884736 cib2048 rif8192 rif512
> rib512 cib512
>
> Several days ago the big FFT in downsam9 was changed from length 884736
> to 604800.  I changed the two big ones to "out of place" transforms.  So
> I think the new lineup is
>
> rof672000 cif77175 cib77175 rif16384 rof604800 cib2048 rif8192 rif512
> rib512 cib512
Thanks for the quick response on that.
>
>> If there are many smaller FFT being run then I think their plans should
>> be limited to 1 thread and only unleash 2 or more threads for the big FFTS.
> I think the only ones for which MT will help are rof672000 and
> rof604800.  For these, three threads (on a 4-core machine) helps
> significantly:
OK, I have amended filbig and dowsam9 to use the '-m #' argument for the 
those two big FFTs, all the rest use 1 thread.
>
> (JTSDK-QT) C:\JTSDK\src\wsjtx\lib)timefft 1 4 or672000
>
> Problem  Threads Plan    Time    Gflops     RMS   iters
> --------------------------------------------------------
> or672000    1   0.005  0.004878  13.34  0.0000002  100
> or672000    2   1.427  0.004469  14.55  0.0000002  100
> or672000    3   1.828  0.003406  19.10  0.0000002  100
> or672000    4   2.037  0.003459  18.81  0.0000002  100
>
> (JTSDK-QT) C:\JTSDK\src\wsjtx\lib)timefft 1 4 or604800
>
> Problem  Threads Plan    Time    Gflops     RMS   iters
> --------------------------------------------------------
> or604800    1   0.858  0.005361  10.83  0.0000002   94
> or604800    2   1.901  0.003405  17.06  0.0000002  100
> or604800    3   2.509  0.002658  21.85  0.0000002  100
> or604800    4   2.544  0.002618  22.19  0.0000002  100
>
> However, these long FFTs make up only about 10% of the total running
> time.  Speeding them up by a factor of 2 will shave about 5% off the
> running time, at best.  And probably not that much, when we're already
> running the two decoders in parallel.
>
> It's not that MT FFTs won't help at all; they just won't help much.
Agreed, but we should remember that at least one of the decoding threads 
is stalled when the FFT executes so, apart from the context switching 
overhead which should be relatively small, there are several CPU threads 
waiting for work on all but the lowest end dual core non-hyperthreaded 
processors.

I am using jt9_omp launch parameters as follows at the moment:

       , "-m", QString::number (qMin (qMax (QThread::idealThreadCount () 
- 1, 1), 3)) //FFTW threads

which will use 3 thread big FFTs if the processor has at least 4 CPU 
threads but only use 1 thread big FFTs on processors with lesser capability.
>
>       -- Joe
73
Bill
G4WJS.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Reply via email to