[wsjt-devel] WSJT-X Decoder Performance

Joe Taylor Mon, 02 Feb 2015 11:46:54 -0800

Hi all,

I have made further improvements to the speed of the decoders in WSJT-X, 
independently of any recourse to concurrent processing in machines with 
multiple CPUs.  The changes involve


1. Making better choices for NFFT1 and NFFT2 (the lengths of forward and 
inverse FFTs in the JT9 downsampler.

2. Adjusting values of "limit" (the Fano timeout parameter) and "ccflim" 
(JT9 synchronizing threshold) under specified conditions.

3. Using "-O3" for the gfortran optimizer level.

The following table presents measurements of decoding speed for a number 
of tests using WSJT-X versions 1.3, 1.4.0-rc2, 1.5r4925, and 1.5r4926. 
"Time" gives the time is seconds to decode the sample file 
130610_2343.wav, which has 8 decodable JT9 signals and 9 decodable JT65 
signals.  "Decode" is the setting on the WSJT-X *Decode* menu.  The 
column labeled "#" gives the number of decoded signals.  (Note that 
selecting "Deepest" is required in order to decode one of the JT9 
signals.)

These measurements were made on a Windows 7 machine with 4-core i5-2500 CPU.

Program Version        Time    Decode   #
-------------------------------------------
v1.3 r3673             2.48 s  Deepest  17
v1.4.0-rc2, r4400      2.28    Deepest  17
v1.5, r4925            1.01    Deepest  17
v1.5, r4926            0.83    Deepest  17
v1.5, r4926 -w 2 -m 2  0.80    Deepest  17
v1.5, r4926            0.75    Normal   16
v1.5, r4926            0.69    Fast     16

The bottom line: At this stage, much has been gained by some careful 
algorithmic tuning.  The decoder in r4926 is 3 times faster than the one 
in r3673, and 2.7 times faster than the one in r4400.  In r4926 a small 
further improvement (about 4%) is obtained by using patience level "-w 
2" and two threads ("-m 2") for the FFTs.

Similar speed improvements were measured on a linux machine (Core 2 Duo, 
E6750 CPU).

A further speed improvement around 10% should be obtainable by computing 
the JT65 symbol spectra (subroutine symspec65) on the fly, during the Rx 
minute, rather than as part of the end-of-minute *Decode* procedure. 
(This is already done for the JT9 symbol spectra.)  My current view is 
that beyond that step, further speed improvement on single-core machines 
(or single-core processing on multi-core machines, as in all of the 
tabulated tests except one) will be difficult.

Further improvements can probably be made by using more than one core 
concurrently, e.g., by using OpenMP.  As I mentioned before, the biggest 
(or at least easiest) gain may come from running the JT9 and JT65 
decoders concurrently.  It's hard to know whether the gains will be 
worthwhile, without trying.  The programming effort may not be trivial.

        -- Joe, K1JT

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

[wsjt-devel] WSJT-X Decoder Performance

Reply via email to