[wsjt-devel] Crazy test
Hi All, last night I read some documentation on a library used by wsjtx for fft. I notice that on some processors the library is optimized using different instruction set. I am so crazy that do the follow: |- From the example file 130610_2343.wav create 10 copies with incremental numbers in a directory (00.wav, 01.wav etc..) - Open the directory with wsjtx ,adjust the level and modes to obtain what described in wsjtx-main.html section 6.4 Sample file 2 - Created a copy of wsjtx and modified the compile fortran and c parameters adding '-mtune=native' | ||DANGER: this parameters cause that generated code is not portable between processors, but can be an idea for some users that compile code for use on theirs PC |Operating alternative with the 2 program and function Open and Shift+F6 'Decode remaining in directory' I see the decoding times, when the 'decode' button is blue. My idea was to see if can be a ||visible ||difference. Instead the decode times is half or less and the graph seem fly. The test is on i5 processor on windows 7. This week I am too busy, so intend to repeat the same test on other processors and Linux next week. ||I not able to enter into the ||mathematical ||discussion but is obvious there are many thousands of base operations (* ** / etc.) on float repeated every cycle. The GCC can handle this operations in many ways from 'emulation' to 'hardware' ||NOTE: GCC refer to the suite "GNU Compiler Collection" and a lot of parameters are common between several compilers. ||In the gnu.org site there some links for the compilers used in Superb JTSDK The index in https://gcc.gnu.org/onlinedocs/gcc/ at ||3.17.17 and 3.17.18 refer to PC processors There is also a section 3.17.4 ARM Options that can be of interest for the ||ARMv7 users as read in some recent discussions. | |What do you think of this crazy message ? 73 Sandro IW3RAB | -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
Hi Sandro, Another resource for compiler / app tuning is the Gentoo. They are constantly looking for the best "safe" settings and playing with advance compiler flags: [1] http://wiki.gentoo.org/wiki/GCC_optimization [2] https://wiki.gentoo.org/wiki/Handbook:Main_Page I already had the Gnu 3.17 section bookmarked, allot of great info there for sure. One command I find interesting (from Gentoo pages): $ gcc -c -Q -march=native --help=target |grep "enabled" I've not used many flags with WSJT apps, but may do just to see how things work, or not :-) 73's Greg, KI7MT On 12/15/2014 23:12, Alessandro Gorobey wrote: > Hi All, > > last night I read some documentation on a library used by wsjtx for fft. > I notice that on some processors the library is optimized using > different instruction set. > > I am so crazy that do the follow: > > |- From the example file 130610_2343.wav create 10 copies with > incremental numbers in a directory (00.wav, 01.wav etc..) > - Open the directory with wsjtx ,adjust the level and modes to obtain > what described in wsjtx-main.html section 6.4 Sample file 2 > - Created a copy of wsjtx and modified the compile fortran and c > parameters adding '-mtune=native' > | > ||DANGER: this parameters cause that generated code is not portable > between processors, but can be an idea for some users that compile code > for use on theirs PC > > |Operating alternative with the 2 program and function Open and Shift+F6 > 'Decode remaining in directory' I see the decoding times, when the > 'decode' button is blue. > > My idea was to see if can be a ||visible ||difference. > > Instead the decode times is half or less and the graph seem fly. > > The test is on i5 processor on windows 7. > This week I am too busy, so intend to repeat the same test on other > processors and Linux next week. > > ||I not able to enter into the ||mathematical ||discussion but is > obvious there are many thousands of base operations (* ** / etc.) on > float repeated every cycle. > > The GCC can handle this operations in many ways from 'emulation' to > 'hardware' > ||NOTE: GCC refer to the suite "GNU Compiler Collection" and a lot of > parameters are common between several compilers. > > ||In the gnu.org site there some links for the compilers used in Superb > JTSDK > > The index in https://gcc.gnu.org/onlinedocs/gcc/ at ||3.17.17 and > 3.17.18 refer to PC processors > > There is also a section 3.17.4 ARM Options that can be of interest for > the ||ARMv7 users as read in some recent discussions. > | > |What do you think of this crazy message ? > > 73 > Sandro > IW3RAB > > > > > > > | > > > -- > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > > > > ___ > wsjt-devel mailing list > wsjt-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/wsjt-devel > -- 73's Greg, KI7MT -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
Hi Alessandro and all, Compiler optimizations can be helpful when tuning code for good performance, but I am surprised that you see anything like a 2x improvement in decoding speed. The following table shows the results of a series of tests I made today for the decoder (the executable program jt9) running on files like the ones you created (01.wav, 02.wav, ... 10.wav, all copies of the example file 130610_2343.wav). The first column lists the Fortran compiler flags used; the numerical column gives the total execution time (wall clock) for processing the ten files. FFLAGS Time -O0 -fbounds-check 42.4 -O1 -fbounds-check 22.8 -Os -fbounds-check 22.8 -O2 -fbounds-check -funroll-all-loops20.4 * -O2 -fbounds-check 20.2 -O3 -fbounds-check 19.8 -Ofast -fbounds-check18.9 -O2 18.4 -O2 -mtune=native18.4 -O2 -funroll-all-loops 18.2 -O3 18.0 -Ofast 17.8 * Used in the release builds of WSJT-X As you can see, "-mtune=native" made essentially no difference. The biggest improvement in execution performance (over the default Release build) is gained by turning off bounds-checking. A slight additional improvement is obtained by using -O3 or -Ofast rather than -O2. However, the total available improvement is less than 15%. Obviously, such tests will give different results on different machines. Those described above were done on a machine with a Core2 Duo E6750 CPU, 2.66 Ghz. Here is a similar set of results for a Windows machine (Core i5-2500, 3.3 GHz): FFLAGS Time -O0 -fbounds-check 28.5 -O1 -fbounds-check 18.2 -O2 -fbounds-check -funroll-all-loops16.6 * -O2 -fbounds-check 16.2 -O3 -fbounds-check 16.2 -Ofast 15.7 -O3 -m32 -msse -funroll-all-loops15.4 -O3 -mtune=core2 15.1 -O3 -m32 -msse 15.0 -O3 -mtune=native15.0 * Used in the release builds of WSJT-X for Windows The flags we're currently using for Windows Release builds give results within about 10% of the best one listed. One way to look at all of this is that the most important optimizations are those that have already been done, by the programmer. These include making the best possible choices of data structures, algorithms, loop ordering, etc., etc. -- 73, Joe, K1JT -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
Hi Joe and all, I can not in any way discuss the data provided but on my machine the difference is more that notable. It is a Pavilion g6 Notebook PC with i5-2430M CPU@2.40Ghz ram 4G with windows 7 SP1 home This is the diff of the two files: C:\JTSDK\src\wsjtx-1.4>diff CMakeLists.txt CMakeListsMY.txt 429c429 < set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -std=c++11 -fexceptions -frtti") --- > set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -std=c++11 -fexceptions -frtti -mtune=native") 476c476 < set (General_FFLAGS "-fbounds-check -Wall -Wno-conversion -fno-second-underscore") --- > set (General_FFLAGS "-fbounds-check -Wall -Wno-conversion -fno-second-underscore -mtune=native") The program run with JT65+JT9 mode. The version is v1.4.0-rc3 r4783[-dirty] I removed the build and install directories files to be sure that all will be rebuild. I increase the number of files to analyze, stop services and anti-virus, but the difference continue to be high. Next week I try on other machines or others OS. I start to think that is not only decoders influence the results. Please note that i refer to the time to complete several loops. With Shift+F6 on a directory the program loop on: - read file - display graph - decode jt65 - show results - decode jt9 - show results - write ALL.TXTand other files I put the mtune=native in the cmake file, so all fortran code but also c will be generated different. It may be that I have to measure the execution time between the decodings to understand what happens. Many thanks for the detailed informations. Next days I'll investigate the strange time difference. 73, Merry Christmas and Happy New Year to You your Family and all the Group Sandro IW3RAB Il 16/12/2014 21:16, Joe Taylor ha scritto: > Hi Alessandro and all, > > Compiler optimizations can be helpful when tuning code for good > performance, but I am surprised that you see anything like a 2x > improvement in decoding speed. > > The following table shows the results of a series of tests I made today > for the decoder (the executable program jt9) running on files like the > ones you created (01.wav, 02.wav, ... 10.wav, all copies of the example > file 130610_2343.wav). The first column lists the Fortran compiler > flags used; the numerical column gives the total execution time (wall > clock) for processing the ten files. > > FFLAGS Time > > -O0 -fbounds-check 42.4 > -O1 -fbounds-check 22.8 > -Os -fbounds-check 22.8 > -O2 -fbounds-check -funroll-all-loops20.4 * > -O2 -fbounds-check 20.2 > -O3 -fbounds-check 19.8 > -Ofast -fbounds-check18.9 > -O2 18.4 > -O2 -mtune=native18.4 > -O2 -funroll-all-loops 18.2 > -O3 18.0 > -Ofast 17.8 > > * Used in the release builds of WSJT-X > > As you can see, "-mtune=native" made essentially no difference. The > biggest improvement in execution performance (over the default Release > build) is gained by turning off bounds-checking. A slight additional > improvement is obtained by using -O3 or -Ofast rather than -O2. > However, the total available improvement is less than 15%. > > Obviously, such tests will give different results on different machines. >Those described above were done on a machine with a Core2 Duo E6750 > CPU, 2.66 Ghz. Here is a similar set of results for a Windows machine > (Core i5-2500, 3.3 GHz): > > FFLAGS Time > > -O0 -fbounds-check 28.5 > -O1 -fbounds-check 18.2 > -O2 -fbounds-check -funroll-all-loops16.6 * > -O2 -fbounds-check 16.2 > -O3 -fbounds-check 16.2 > -Ofast 15.7 > -O3 -m32 -msse -funroll-all-loops15.4 > -O3 -mtune=core2 15.1 > -O3 -m32 -msse 15.0 > -O3 -mtune=native15.0 > > * Used in the release builds of WSJT-X for Windows > > The flags we're currently using for Windows Release builds give results > within about 10% of the best one listed. > > One way to look at all of this is that the most important optimizations > are those that have already been done, by the programmer. These include > making the best possible choices of data structures, algorithms, loop > ordering, etc., etc. > > -- 73, Joe, K1JT > > -- > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantl
Re: [wsjt-devel] Crazy test
Hi guys, WSPR 4.0 SVN 4795 recompiled with FFLAGS += -Ofast and the rest.. Has run from 9:50 this morning to now, 18:10 without crashing: Memory allocation error Cannot start rx thread 11 System: Banana Pi, Fedora 21, http://mirror.as24220.net/pub/fedora/linux/releases/21/Images/armhfp/Fedora-LXDE-armhfp-21-5-sda.raw.xz copied to SATA disk. Booted with kernel 3.4.105+ because Fedora have not ported the Mali video driver yet. ONE MAJOR MOD: Replace Python Imaging Library (Pillow) with 2.5.3 Use the "pip3" tool, Python's installer. 2.6.1 crashes when a waterfall is displayed. For development, a SATA disk is essential. Forget SD cards, full stop. Alan VK2ZIW On Wed, 17 Dec 2014 01:36:44 +0100, Alessandro Gorobey wrote > Hi Joe and all, > I can not in any way discuss the data provided but on my machine the > difference is more that notable. > It is a Pavilion g6 Notebook PC with i5-2430M CPU@2.40Ghz ram 4G > with windows 7 SP1 home > > This is the diff of the two files: > > C:\JTSDK\src\wsjtx-1.4>diff CMakeLists.txt CMakeListsMY.txt > 429c429 > < set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -std=c++11 > -fexceptions -frtti") > --- > > set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -std=c++11 > -fexceptions -frtti -mtune=native") > 476c476 > < set (General_FFLAGS "-fbounds-check -Wall -Wno-conversion > -fno-second-underscore") > --- > > set (General_FFLAGS "-fbounds-check -Wall -Wno-conversion > -fno-second-underscore -mtune=native") > > The program run with JT65+JT9 mode. The version is v1.4.0-rc3 r4783[- > dirty] I removed the build and install directories files to be sure > that all will be rebuild. I increase the number of files to analyze, > stop services and anti-virus, but the difference continue to be > high. Next week I try on other machines or others OS. I start to > think that is not only decoders influence the results. Please note > that i refer to the time to complete several loops. With Shift+F6 on > a directory the program loop on: - read file - display graph - > decode jt65 - show results - decode jt9 - show results - write > ALL.TXTand other files > > I put the mtune=native in the cmake file, so all fortran code but > also c will be generated different. It may be that I have to measure > the execution time between the decodings to understand what happens. > > Many thanks for the detailed informations. > > Next days I'll investigate the strange time difference. > > 73, Merry Christmas and Happy New Year to You your Family and all > the Group > > Sandro IW3RAB > > Il 16/12/2014 21:16, Joe Taylor ha scritto: > > Hi Alessandro and all, > > > > Compiler optimizations can be helpful when tuning code for good > > performance, but I am surprised that you see anything like a 2x > > improvement in decoding speed. > > > > The following table shows the results of a series of tests I made today > > for the decoder (the executable program jt9) running on files like the > > ones you created (01.wav, 02.wav, ... 10.wav, all copies of the example > > file 130610_2343.wav). The first column lists the Fortran compiler > > flags used; the numerical column gives the total execution time (wall > > clock) for processing the ten files. > > > > FFLAGS Time > > > > -O0 -fbounds-check 42.4 > > -O1 -fbounds-check 22.8 > > -Os -fbounds-check 22.8 > > -O2 -fbounds-check -funroll-all-loops20.4 * > > -O2 -fbounds-check 20.2 > > -O3 -fbounds-check 19.8 > > -Ofast -fbounds-check18.9 > > -O2 18.4 > > -O2 -mtune=native18.4 > > -O2 -funroll-all-loops 18.2 > > -O3 18.0 > > -Ofast 17.8 > > > > * Used in the release builds of WSJT-X > > > > As you can see, "-mtune=native" made essentially no difference. The > > biggest improvement in execution performance (over the default Release > > build) is gained by turning off bounds-checking. A slight additional > > improvement is obtained by using -O3 or -Ofast rather than -O2. > > However, the total available improvement is less than 15%. > > > > Obviously, such tests will give different results on different machines. > >Those described above were done on a machine with a Core2 Duo E6750 > > CPU, 2.66 Ghz. Here is a similar set of results for a Windows machine > > (Core i5-2500, 3.3 GHz): > > > > FFLAGS Time > > > > -O0 -fbounds-check 28.5 > > -O1 -fbounds-check 18.2 > > -O2 -fbounds-check -funroll-all-loops16.6 * > > -O2 -fbounds-check 16.2 > > -O3 -fbounds-check
Re: [wsjt-devel] Crazy test
Hi Alessandro, I replicated your tests as exactly as possible, modifying CMAKE_CXX_FLAGS and General_FFLAGS by the addition of "-mtune=native". Using WSJT-X and the "Shift+F6" command, the sequence of ten files (01.wav, 02,wav, ... 10.wav) was processed in 21 seconds with or without the addition of "-mtune=native" before building the program from scratch. I could find no measurable difference in execution speed for the two cases. Certainly they were the same to within 1 second. I note also that the total execution time is very nearly the same as what I reported yesterday for the execution of jt9[.exe] from the command line. Almost all of the CPU-intensive "number crunching" in WSJT-X occurs in the Fortran code in jt9. Other tasks such as display of graphical information and decoded text, writing output files, etc., make comparatively trivial demands on CPU resources. It remains a mystery to me why you have seen large differences in execution speed after adding the compiler flag "-mtune=native". -- 73, Joe, K1JT -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
Have you got a batch file or such so perhaps I can try and replicate this too? It seems the current fftwf build from JTSDK uses different flags. CFLAGS = -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math Mike W9MDB -Original Message- From: Joe Taylor [mailto:j...@princeton.edu] Sent: Wednesday, December 17, 2014 9:13 AM To: WSJT software development Subject: Re: [wsjt-devel] Crazy test Hi Alessandro, I replicated your tests as exactly as possible, modifying CMAKE_CXX_FLAGS and General_FFLAGS by the addition of "-mtune=native". Using WSJT-X and the "Shift+F6" command, the sequence of ten files (01.wav, 02,wav, ... 10.wav) was processed in 21 seconds with or without the addition of "-mtune=native" before building the program from scratch. I could find no measurable difference in execution speed for the two cases. Certainly they were the same to within 1 second. I note also that the total execution time is very nearly the same as what I reported yesterday for the execution of jt9[.exe] from the command line. Almost all of the CPU-intensive "number crunching" in WSJT-X occurs in the Fortran code in jt9. Other tasks such as display of graphical information and decoded text, writing output files, etc., make comparatively trivial demands on CPU resources. It remains a mystery to me why you have seen large differences in execution speed after adding the compiler flag "-mtune=native". -- 73, Joe, K1JT -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
On 12/17/2014 10:22 AM, Michael Black wrote: > Have you got a batch file or such so perhaps I can try and replicate this > too? Replicating Alessandro's test requires a trivial change to the file CMakeLists.txt for WSJT-X. See his email for details. > It seems the current fftwf build from JTSDK uses different flags. > CFLAGS = -O3 -fomit-frame-pointer -mtune=native -malign-double > -fstrict-aliasing -fno-schedule-insns -ffast-math Alessandro's test has nothing to do with the way the FFTW library was built. -- Joe, K1JT -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
On 17/12/2014 15:22, Michael Black wrote: Hi All, > Have you got a batch file or such so perhaps I can try and replicate this > too? Sandro listed the options he changed, they are all set in CMakeLists.txt. > > It seems the current fftwf build from JTSDK uses different flags. > CFLAGS = -O3 -fomit-frame-pointer -mtune=native -malign-double > -fstrict-aliasing -fno-schedule-insns -ffast-math Let's not get confused here about the various components of WSJT-X. We have The Fortran code which implements the decoding algorithms and filters. There is the library FFTW which does discrete Fourier transforms and inverse Fourier transforms. Most users, possible all, use a pre-built version of this library i.e. for a given architecture we are all using the same machine code. Also we have the C++/C code that implements the UI and operating system interfaces. The Fortran code is basically running at CPU speed apart from when it delegates work to FFTW. This is not continuous but for significant periods it is unthrottled while the decodes are processed. The FFTW library itself is also CPU intensive while processing a DFT/IDFT task. The C++/C code is largely event driven and spends most of its time either waiting for operating system services like audio streaming, user actions, or timed events. So the Fortran code and FFTW are the CPU bound areas that would be sensitive to processor speed and feature usage. Therefore these are potentially tunable by taking advantage of special features of a particular CPU. On Intel processors these include the MMX, and SSE series of special machine instructions. AMD and ARM have there own variants. In general these special instructions sets are know as SIMD extensions. These extra instructions are basically a vector floating point engine that allows a small number of multiple floating point calculations to be executed in he time the conventional instruction set CPU can only do one such calculation. The Fortran and C++ compilers can be told to emit machine code tailored to the exact machine architecture the compiler is running on. This generates potentially faster code which is less portable. The FFTW library takes a runtime approach to machine level optimization, it examines the CPU and also does trial calculations using the various available features to choose the best available strategy for the DFT and IDFT algorithms it uses. So this thread is really discussing how the various architecture specific Fortran compiler options impact decoding performance. It is quite possible that the CPU Sandro is testing on is poor in performance when the specialized SIMD are not used whereas Joe's machine has better performance without the enhanced instructions relative to with them. > > Mike W9MDB 73 Bill G4WJS. > > > -Original Message- > From: Joe Taylor [mailto:j...@princeton.edu] > Sent: Wednesday, December 17, 2014 9:13 AM > To: WSJT software development > Subject: Re: [wsjt-devel] Crazy test > > Hi Alessandro, > > I replicated your tests as exactly as possible, modifying CMAKE_CXX_FLAGS > and General_FFLAGS by the addition of "-mtune=native". > Using WSJT-X and the "Shift+F6" command, the sequence of ten files (01.wav, > 02,wav, ... 10.wav) was processed in 21 seconds with or without the addition > of "-mtune=native" before building the program from scratch. I could find > no measurable difference in execution speed for the two cases. Certainly > they were the same to within 1 second. > > I note also that the total execution time is very nearly the same as what I > reported yesterday for the execution of jt9[.exe] from the command line. > Almost all of the CPU-intensive "number crunching" in WSJT-X occurs in the > Fortran code in jt9. Other tasks such as display of graphical information > and decoded text, writing output files, etc., make comparatively trivial > demands on CPU resources. > > It remains a mystery to me why you have seen large differences in execution > speed after adding the compiler flag "-mtune=native". > > -- 73, Joe, K1JT > > > -- > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from > Actuate! Instantly Supercharge Your Business Reports and Dashboards with > Interactivity, Sharing, Native Excel Exports, App Integration & more Get > technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > ___ > wsjt-devel mailing list > wsjt-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/wsjt-devel > > > --
Re: [wsjt-devel] Crazy test
I didn't quite realize that jt9 didn't use the dll. What's the command line to run jt9? Not sure what args to give since neither of you mentioned what you're passing to it. Mike W9MDB -Original Message- From: Joe Taylor [mailto:j...@princeton.edu] Sent: Wednesday, December 17, 2014 9:36 AM To: WSJT software development Subject: Re: [wsjt-devel] Crazy test On 12/17/2014 10:22 AM, Michael Black wrote: > Have you got a batch file or such so perhaps I can try and replicate > this too? Replicating Alessandro's test requires a trivial change to the file CMakeLists.txt for WSJT-X. See his email for details. > It seems the current fftwf build from JTSDK uses different flags. > CFLAGS = -O3 -fomit-frame-pointer -mtune=native -malign-double > -fstrict-aliasing -fno-schedule-insns -ffast-math Alessandro's test has nothing to do with the way the FFTW library was built. -- Joe, K1JT -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
Mike -- > I didn't quite realize that jt9 didn't use the dll. What dll are you referring to? If you mean the FFTW library, jt9[.exe] definitely *does* use it. > What's the command line to run jt9? Type jt9 by itself at the command prompt, to get a brief "usage" message. For example: C:\JTSDK-QT\wsjtx\install\Release\bin>jt9 Usage: jt9 -p TRperiod [-d ndepth] [-f rxfreq] {-w patience] -e exe_dir file1 [ file2 ...] Reads data from *.wav files. jt9 -s [-w patience] -e exe_dir -a data_dir -t temp_dir Gets data from shared memory region with key== > Not sure what args to give since neither of you mentioned what you're > passing to it. My tests used the command jt9 -p 1 -d 3 /tmp5/0?.wav Directory /tmp5 contained the files 00.wav, 01.wav, ... 09.wav -- ten files in all, each a copy of the example file 130610_2343.wav. -- Joe, K1JT -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
Hi Joe, I will try other tests in next days. Today I try the test on a desktop i7-2600 cpu@3.4GHz OS window7 64 pro The test is on 100 files named from test00.wav to test99.wav I modified in display.cpp the line 40 routine void DisplayText::insertLineSpacer() to void DisplayText::insertLineSpacer() { //QString tt=""; QTime time = QTime::currentTime(); QString tt = time.toString(); QString bg="#d3d3d3"; _insertText(tt,bg); } So i have time markers. Putting the '.mtune=native' ONLY on fortran flags reduce the time from 5 to 4 minutes on the 98 cycles (first and last are excluded). This confirm your table on the decoder times. I think some other parameter influence the program (other hardware? other libs? other video drivers?) I will try to analyze between 'decodings' to understand the strange difference on some machines. I will keep you informed as I am very curious Best regards 73 Sandro IW3RAB Il 17/12/2014 16:13, Joe Taylor ha scritto: > Hi Alessandro, > > I replicated your tests as exactly as possible, modifying > CMAKE_CXX_FLAGS and General_FFLAGS by the addition of "-mtune=native". > Using WSJT-X and the "Shift+F6" command, the sequence of ten files > (01.wav, 02,wav, ... 10.wav) was processed in 21 seconds with or without > the addition of "-mtune=native" before building the program from > scratch. I could find no measurable difference in execution speed for > the two cases. Certainly they were the same to within 1 second. > > I note also that the total execution time is very nearly the same as > what I reported yesterday for the execution of jt9[.exe] from the > command line. Almost all of the CPU-intensive "number crunching" in > WSJT-X occurs in the Fortran code in jt9. Other tasks such as display > of graphical information and decoded text, writing output files, etc., > make comparatively trivial demands on CPU resources. > > It remains a mystery to me why you have seen large differences in > execution speed after adding the compiler flag "-mtune=native". > > -- 73, Joe, K1JT > > -- > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > ___ > wsjt-devel mailing list > wsjt-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/wsjt-devel > -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
Hello All, I added a Windows script (Complements to StackOverflow) to JTSDK v2.0.0 \scripts directory that allows timing of commands, for example, Need to run the update first to get the new file, then: [1] Open JTSDK-QT [2] cd /d < WSJTX JT9.exe location > [3][ Run command < using Joe's example below >: I copied 10 of the example wave files to wsjtx\Release\bin\tmp5\ timecmd jt9 -p 1 -d 3 .\tmp5\0?.wav .. .. .. Command Took: 0:0:29.86 (29.86s total) It similar to using the time function in Linux ( time ). Just thought it may be useful for what your doing here. 73's Greg, KI7MT On 12/17/2014 16:04, Joe Taylor wrote: > Mike -- > >> I didn't quite realize that jt9 didn't use the dll. > > What dll are you referring to? If you mean the FFTW library, jt9[.exe] > definitely *does* use it. > >> What's the command line to run jt9? > > Type jt9 by itself at the command prompt, to get a brief "usage" > message. For example: > > C:\JTSDK-QT\wsjtx\install\Release\bin>jt9 > Usage: jt9 -p TRperiod [-d ndepth] [-f rxfreq] {-w patience] -e > exe_dir file1 [ > file2 ...] > Reads data from *.wav files. > > jt9 -s [-w patience] -e exe_dir -a data_dir -t temp_dir > Gets data from shared memory region with key== > >> Not sure what args to give since neither of you mentioned what you're >> passing to it. > > My tests used the command > > jt9 -p 1 -d 3 /tmp5/0?.wav > > Directory /tmp5 contained the files 00.wav, 01.wav, ... 09.wav -- ten > files in all, each a copy of the example file 130610_2343.wav. > > -- Joe, K1JT > > -- > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > ___ > wsjt-devel mailing list > wsjt-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/wsjt-devel > -- 73's Greg, KI7MT -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test
Big idea, this remember me in 1985 with timex command Tomorrow night I update JTSDK By the way the execution time difference seem be relevant only on non desktop computers ( Intel turbo mode ? video drivers/hardware ? direct X ?) Merry Christmas to all 73 de Sandro IW3RAB Il 18/12/2014 19:24, KI7MT ha scritto: > Hello All, > > I added a Windows script (Complements to StackOverflow) to JTSDK v2.0.0 > \scripts directory that allows timing of commands, for example, > > Need to run the update first to get the new file, then: > > [1] Open JTSDK-QT > [2] cd /d < WSJTX JT9.exe location > > [3][ Run command < using Joe's example below >: > > I copied 10 of the example wave files to wsjtx\Release\bin\tmp5\ > > timecmd jt9 -p 1 -d 3 .\tmp5\0?.wav > .. > .. > .. > Command Took: 0:0:29.86 (29.86s total) > > It similar to using the time function in Linux ( time ). > Just thought it may be useful for what your doing here. > > 73's > Greg, KI7MT > On 12/17/2014 16:04, Joe Taylor wrote: >> Mike -- >> >>> I didn't quite realize that jt9 didn't use the dll. >> What dll are you referring to? If you mean the FFTW library, jt9[.exe] >> definitely *does* use it. >> >>> What's the command line to run jt9? >> Type jt9 by itself at the command prompt, to get a brief "usage" >> message. For example: >> >> C:\JTSDK-QT\wsjtx\install\Release\bin>jt9 >>Usage: jt9 -p TRperiod [-d ndepth] [-f rxfreq] {-w patience] -e >> exe_dir file1 [ >> file2 ...] >> Reads data from *.wav files. >> >> jt9 -s [-w patience] -e exe_dir -a data_dir -t temp_dir >> Gets data from shared memory region with key== >> >>> Not sure what args to give since neither of you mentioned what you're >>> passing to it. >> My tests used the command >> >>jt9 -p 1 -d 3 /tmp5/0?.wav >> >> Directory /tmp5 contained the files 00.wav, 01.wav, ... 09.wav -- ten >> files in all, each a copy of the example file 130610_2343.wav. >> >> -- Joe, K1JT >> >> -- >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> ___ >> wsjt-devel mailing list >> wsjt-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/wsjt-devel >> -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test understood
Hi All, I understand what happen on my Core i5-2430M see also: http://en.wikipedia.org/wiki/List_of_Intel_Core_i5_microprocessors http://en.wikipedia.org/wiki/Intel_Turbo_Boost outdated but still valid The solution is very simple, do not try any benchmark. Do not try any repetitive benchmark that head up the CPU I found a program that show some hidden parameters, as power, clock and temperature. The frequency of processor is from 600 Mhz (idle) to 2.4GHz (max) up to near 3GHz ( turbo boost enable) but depend on external parameters. If the temperature go high or the power absorption go over some limits the clock go down. Seem that some instruction need less power, so the strange results. As written few day ago on desktop computers and processors the difference is low as in tables mailed by Joe. Also GPU do some energy saving (battery operation) to maintain the PC temperature low or in some parameters (infamous maximum junction temperatureas for CPU). Pressurizing the bottom of the PC as a tube with a very powerful fan (power absorption double of PC and noise as a Jet) minimize problem. I apologize for the message Merry Christmas and Happy New Year to All 73 Sandro IW3RAB Il 17/12/2014 22:36, Alessandro Gorobey ha scritto: > Hi Joe, > I will try other tests in next days. > Today I try the test on a desktop i7-2600 cpu@3.4GHz OS window7 64 pro > The test is on 100 files named from test00.wav to test99.wav > I modified in display.cpp the line 40 routine void > DisplayText::insertLineSpacer() > to > void DisplayText::insertLineSpacer() > { > //QString tt=""; > QTime time = QTime::currentTime(); > QString tt = time.toString(); > QString bg="#d3d3d3"; > _insertText(tt,bg); > } > So i have time markers. Putting the '.mtune=native' ONLY on fortran > flags reduce the time from 5 to 4 minutes on the 98 cycles (first and > last are excluded). This confirm your table on the decoder times. > I think some other parameter influence the program (other hardware? > other libs? other video drivers?) > I will try to analyze between 'decodings' to understand the strange > difference on some machines. > I will keep you informed as I am very curious > > Best regards > > 73 Sandro IW3RAB > > Il 17/12/2014 16:13, Joe Taylor ha scritto: >> Hi Alessandro, >> >> I replicated your tests as exactly as possible, modifying >> CMAKE_CXX_FLAGS and General_FFLAGS by the addition of "-mtune=native". >> Using WSJT-X and the "Shift+F6" command, the sequence of ten files >> (01.wav, 02,wav, ... 10.wav) was processed in 21 seconds with or without >> the addition of "-mtune=native" before building the program from >> scratch. I could find no measurable difference in execution speed for >> the two cases. Certainly they were the same to within 1 second. >> >> I note also that the total execution time is very nearly the same as >> what I reported yesterday for the execution of jt9[.exe] from the >> command line. Almost all of the CPU-intensive "number crunching" in >> WSJT-X occurs in the Fortran code in jt9. Other tasks such as display >> of graphical information and decoded text, writing output files, etc., >> make comparatively trivial demands on CPU resources. >> >> It remains a mystery to me why you have seen large differences in >> execution speed after adding the compiler flag "-mtune=native". >> >> -- 73, Joe, K1JT >> >> -- >> >> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & >> more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> >> ___ >> wsjt-devel mailing list >> wsjt-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/wsjt-devel >> > -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] Crazy test understood
Hi Sandro, Thanks for your latest report. Clearly the mystery has been solved! There is certainly no need to apologize for starting this thread. Your reports are always clear and informative. I learned something from the follow-ups that several of us did, and perhaps others did too. Best wishes of the season to all! -- 73, Joe, K1JT On 12/22/2014 3:07 PM, Alessandro Gorobey wrote: > Hi All, > > I understand what happen on my Core i5-2430M > > see also: > http://en.wikipedia.org/wiki/List_of_Intel_Core_i5_microprocessors > http://en.wikipedia.org/wiki/Intel_Turbo_Boost outdated but still valid > > The solution is very simple, do not try any benchmark. > Do not try any repetitive benchmark that head up the CPU > I found a program that show some hidden parameters, as power, clock and > temperature. > > The frequency of processor is from 600 Mhz (idle) to 2.4GHz (max) up to > near 3GHz ( turbo boost enable) but depend on external parameters. > If the temperature go high or the power absorption go over some limits > the clock go down. > > Seem that some instruction need less power, so the strange results. > > As written few day ago on desktop computers and processors the > difference is low as in tables mailed by Joe. > > Also GPU do some energy saving (battery operation) to maintain the PC > temperature low or in some parameters (infamous maximum junction > temperatureas for CPU). > > Pressurizing the bottom of the PC as a tube with a very powerful fan > (power absorption double of PC and noise as a Jet) minimize problem. > > I apologize for the message > > Merry Christmas and Happy New Year to All > > 73 > Sandro > IW3RAB -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel