On 04/02/2015 20:30, Joe Taylor wrote: > Hi Bill, > >> Before completely discarding MT FFTW3 I would like to try something. Can >> you give a brief rough summary of all the FFT sizes used by jt9? > The file wisdom1.bat in .../wsjtx/lib shows the following FFT plans > being used: > > rif672000 cif77175 cib77175 rif16384 rif884736 cib2048 rif8192 rif512 > rib512 cib512 > > Several days ago the big FFT in downsam9 was changed from length 884736 > to 604800. I changed the two big ones to "out of place" transforms. So > I think the new lineup is > > rof672000 cif77175 cib77175 rif16384 rof604800 cib2048 rif8192 rif512 > rib512 cib512 Thanks for the quick response on that. > >> If there are many smaller FFT being run then I think their plans should >> be limited to 1 thread and only unleash 2 or more threads for the big FFTS. > I think the only ones for which MT will help are rof672000 and > rof604800. For these, three threads (on a 4-core machine) helps > significantly: OK, I have amended filbig and dowsam9 to use the '-m #' argument for the those two big FFTs, all the rest use 1 thread. > > (JTSDK-QT) C:\JTSDK\src\wsjtx\lib)timefft 1 4 or672000 > > Problem Threads Plan Time Gflops RMS iters > -------------------------------------------------------- > or672000 1 0.005 0.004878 13.34 0.0000002 100 > or672000 2 1.427 0.004469 14.55 0.0000002 100 > or672000 3 1.828 0.003406 19.10 0.0000002 100 > or672000 4 2.037 0.003459 18.81 0.0000002 100 > > (JTSDK-QT) C:\JTSDK\src\wsjtx\lib)timefft 1 4 or604800 > > Problem Threads Plan Time Gflops RMS iters > -------------------------------------------------------- > or604800 1 0.858 0.005361 10.83 0.0000002 94 > or604800 2 1.901 0.003405 17.06 0.0000002 100 > or604800 3 2.509 0.002658 21.85 0.0000002 100 > or604800 4 2.544 0.002618 22.19 0.0000002 100 > > However, these long FFTs make up only about 10% of the total running > time. Speeding them up by a factor of 2 will shave about 5% off the > running time, at best. And probably not that much, when we're already > running the two decoders in parallel. > > It's not that MT FFTs won't help at all; they just won't help much. Agreed, but we should remember that at least one of the decoding threads is stalled when the FFT executes so, apart from the context switching overhead which should be relatively small, there are several CPU threads waiting for work on all but the lowest end dual core non-hyperthreaded processors.
I am using jt9_omp launch parameters as follows at the moment: , "-m", QString::number (qMin (qMax (QThread::idealThreadCount () - 1, 1), 3)) //FFTW threads which will use 3 thread big FFTs if the processor has at least 4 CPU threads but only use 1 thread big FFTs on processors with lesser capability. > > -- Joe 73 Bill G4WJS. ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel