Hi all, I have made further improvements to the speed of the decoders in WSJT-X, independently of any recourse to concurrent processing in machines with multiple CPUs. The changes involve
1. Making better choices for NFFT1 and NFFT2 (the lengths of forward and inverse FFTs in the JT9 downsampler. 2. Adjusting values of "limit" (the Fano timeout parameter) and "ccflim" (JT9 synchronizing threshold) under specified conditions. 3. Using "-O3" for the gfortran optimizer level. The following table presents measurements of decoding speed for a number of tests using WSJT-X versions 1.3, 1.4.0-rc2, 1.5r4925, and 1.5r4926. "Time" gives the time is seconds to decode the sample file 130610_2343.wav, which has 8 decodable JT9 signals and 9 decodable JT65 signals. "Decode" is the setting on the WSJT-X *Decode* menu. The column labeled "#" gives the number of decoded signals. (Note that selecting "Deepest" is required in order to decode one of the JT9 signals.) These measurements were made on a Windows 7 machine with 4-core i5-2500 CPU. Program Version Time Decode # ------------------------------------------- v1.3 r3673 2.48 s Deepest 17 v1.4.0-rc2, r4400 2.28 Deepest 17 v1.5, r4925 1.01 Deepest 17 v1.5, r4926 0.83 Deepest 17 v1.5, r4926 -w 2 -m 2 0.80 Deepest 17 v1.5, r4926 0.75 Normal 16 v1.5, r4926 0.69 Fast 16 The bottom line: At this stage, much has been gained by some careful algorithmic tuning. The decoder in r4926 is 3 times faster than the one in r3673, and 2.7 times faster than the one in r4400. In r4926 a small further improvement (about 4%) is obtained by using patience level "-w 2" and two threads ("-m 2") for the FFTs. Similar speed improvements were measured on a linux machine (Core 2 Duo, E6750 CPU). A further speed improvement around 10% should be obtainable by computing the JT65 symbol spectra (subroutine symspec65) on the fly, during the Rx minute, rather than as part of the end-of-minute *Decode* procedure. (This is already done for the JT9 symbol spectra.) My current view is that beyond that step, further speed improvement on single-core machines (or single-core processing on multi-core machines, as in all of the tabulated tests except one) will be difficult. Further improvements can probably be made by using more than one core concurrently, e.g., by using OpenMP. As I mentioned before, the biggest (or at least easiest) gain may come from running the JT9 and JT65 decoders concurrently. It's hard to know whether the gains will be worthwhile, without trying. The programming effort may not be trivial. -- Joe, K1JT ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel