Hi all,
I have made further improvements to the speed of the decoders in WSJT-X,
independently of any recourse to concurrent processing in machines with
multiple CPUs. The changes involve
1. Making better choices for NFFT1 and NFFT2 (the lengths of forward and
inverse FFTs in the JT9 downsampler.
2. Adjusting values of "limit" (the Fano timeout parameter) and "ccflim"
(JT9 synchronizing threshold) under specified conditions.
3. Using "-O3" for the gfortran optimizer level.
The following table presents measurements of decoding speed for a number
of tests using WSJT-X versions 1.3, 1.4.0-rc2, 1.5r4925, and 1.5r4926.
"Time" gives the time is seconds to decode the sample file
130610_2343.wav, which has 8 decodable JT9 signals and 9 decodable JT65
signals. "Decode" is the setting on the WSJT-X *Decode* menu. The
column labeled "#" gives the number of decoded signals. (Note that
selecting "Deepest" is required in order to decode one of the JT9
signals.)
These measurements were made on a Windows 7 machine with 4-core i5-2500 CPU.
Program Version Time Decode #
-------------------------------------------
v1.3 r3673 2.48 s Deepest 17
v1.4.0-rc2, r4400 2.28 Deepest 17
v1.5, r4925 1.01 Deepest 17
v1.5, r4926 0.83 Deepest 17
v1.5, r4926 -w 2 -m 2 0.80 Deepest 17
v1.5, r4926 0.75 Normal 16
v1.5, r4926 0.69 Fast 16
The bottom line: At this stage, much has been gained by some careful
algorithmic tuning. The decoder in r4926 is 3 times faster than the one
in r3673, and 2.7 times faster than the one in r4400. In r4926 a small
further improvement (about 4%) is obtained by using patience level "-w
2" and two threads ("-m 2") for the FFTs.
Similar speed improvements were measured on a linux machine (Core 2 Duo,
E6750 CPU).
A further speed improvement around 10% should be obtainable by computing
the JT65 symbol spectra (subroutine symspec65) on the fly, during the Rx
minute, rather than as part of the end-of-minute *Decode* procedure.
(This is already done for the JT9 symbol spectra.) My current view is
that beyond that step, further speed improvement on single-core machines
(or single-core processing on multi-core machines, as in all of the
tabulated tests except one) will be difficult.
Further improvements can probably be made by using more than one core
concurrently, e.g., by using OpenMP. As I mentioned before, the biggest
(or at least easiest) gain may come from running the JT9 and JT65
decoders concurrently. It's hard to know whether the gains will be
worthwhile, without trying. The programming effort may not be trivial.
-- Joe, K1JT
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
wsjt-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/wsjt-devel