[wsjt-devel] Crazy test

2014-12-15 Thread Alessandro Gorobey

Hi All,

last night I read some documentation on a library used by wsjtx for fft.
I notice that on some processors the library is optimized using 
different instruction set.


I am so crazy that do the follow:

|- From the example file 130610_2343.wav create 10 copies with 
incremental numbers in a directory (00.wav, 01.wav etc..)
- Open the directory with wsjtx ,adjust the level and modes to obtain 
what described in wsjtx-main.html section 6.4 Sample file 2
- Created a copy of wsjtx and modified the compile fortran and c 
parameters adding '-mtune=native'

|
||DANGER: this parameters cause that generated code is not portable 
between processors, but can be an idea for some users that compile code 
for use on theirs PC


|Operating alternative with the 2 program and function Open and Shift+F6 
'Decode remaining in directory' I see the decoding times, when the 
'decode' button is blue.


My idea was to see if can be a ||visible ||difference.

Instead the decode times is half or less and the graph seem fly.

The test is on i5 processor on windows 7.
This week I am too busy, so intend to repeat the same test on other 
processors and Linux next week.


||I not able to enter into the ||mathematical ||discussion but is 
obvious there are many thousands of base operations (* ** / etc.) on 
float repeated every cycle.


The GCC can handle this operations in many ways from 'emulation' to 
'hardware'
||NOTE: GCC refer to the suite "GNU Compiler Collection" and a lot of 
parameters are common between several compilers.


||In the gnu.org site there some links for the compilers used in Superb 
JTSDK


The index in https://gcc.gnu.org/onlinedocs/gcc/ at ||3.17.17 and 
3.17.18 refer to PC processors


There is also a section 3.17.4 ARM Options that can be of interest for 
the ||ARMv7 users as read in some recent discussions.

|
|What do you think of this crazy message ?

73
Sandro
IW3RAB






|
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-16 Thread KI7MT
Hi Sandro,

Another resource for compiler / app tuning is the Gentoo. They are
constantly looking for the best "safe" settings and playing with advance
compiler flags:

[1] http://wiki.gentoo.org/wiki/GCC_optimization
[2] https://wiki.gentoo.org/wiki/Handbook:Main_Page

I already had the Gnu 3.17 section bookmarked, allot of great info there
for sure.

One command I find interesting (from Gentoo pages):
$ gcc -c -Q -march=native --help=target |grep "enabled"

I've not used many flags with WSJT apps, but may do just to see how
things work, or not :-)

73's
Greg, KI7MT

On 12/15/2014 23:12, Alessandro Gorobey wrote:
> Hi All,
> 
> last night I read some documentation on a library used by wsjtx for fft.
> I notice that on some processors the library is optimized using
> different instruction set.
> 
> I am so crazy that do the follow:
> 
> |- From the example file 130610_2343.wav create 10 copies with
> incremental numbers in a directory (00.wav, 01.wav etc..)
> - Open the directory with wsjtx ,adjust the level and modes to obtain
> what described in wsjtx-main.html section 6.4 Sample file 2
> - Created a copy of wsjtx and modified the compile fortran and c
> parameters adding '-mtune=native'
> |
> ||DANGER: this parameters cause that generated code is not portable
> between processors, but can be an idea for some users that compile code
> for use on theirs PC
> 
> |Operating alternative with the 2 program and function Open and Shift+F6
> 'Decode remaining in directory' I see the decoding times, when the
> 'decode' button is blue.
> 
> My idea was to see if can be a ||visible ||difference.
> 
> Instead the decode times is half or less and the graph seem fly.
> 
> The test is on i5 processor on windows 7.
> This week I am too busy, so intend to repeat the same test on other
> processors and Linux next week.
> 
> ||I not able to enter into the ||mathematical ||discussion but is
> obvious there are many thousands of base operations (* ** / etc.) on
> float repeated every cycle.
> 
> The GCC can handle this operations in many ways from 'emulation' to
> 'hardware'
> ||NOTE: GCC refer to the suite "GNU Compiler Collection" and a lot of
> parameters are common between several compilers.
> 
> ||In the gnu.org site there some links for the compilers used in Superb
> JTSDK
> 
> The index in https://gcc.gnu.org/onlinedocs/gcc/ at ||3.17.17 and
> 3.17.18 refer to PC processors
> 
> There is also a section 3.17.4 ARM Options that can be of interest for
> the ||ARMv7 users as read in some recent discussions.
> |
> |What do you think of this crazy message ?
> 
> 73
> Sandro
> IW3RAB
> 
> 
> 
> 
>  
> 
> |
> 
> 
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> 
> 
> 
> ___
> wsjt-devel mailing list
> wsjt-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/wsjt-devel
> 

-- 
73's
Greg, KI7MT

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-16 Thread Joe Taylor
Hi Alessandro and all,

Compiler optimizations can be helpful when tuning code for good 
performance, but I am surprised that you see anything like a 2x 
improvement in decoding speed.

The following table shows the results of a series of tests I made today 
for the decoder (the executable program jt9) running on files like the 
ones you created (01.wav, 02.wav, ... 10.wav, all copies of the example 
file 130610_2343.wav).  The first column lists the Fortran compiler 
flags used; the numerical column gives the total execution time (wall 
clock) for processing the ten files.

FFLAGS   Time

-O0 -fbounds-check   42.4
-O1 -fbounds-check   22.8
-Os -fbounds-check   22.8
-O2 -fbounds-check -funroll-all-loops20.4 *
-O2 -fbounds-check   20.2
-O3 -fbounds-check   19.8
-Ofast -fbounds-check18.9
-O2  18.4
-O2 -mtune=native18.4
-O2 -funroll-all-loops   18.2
-O3  18.0
-Ofast   17.8

* Used in the release builds of WSJT-X

As you can see, "-mtune=native" made essentially no difference.  The 
biggest improvement in execution performance (over the default Release 
build) is gained by turning off bounds-checking.  A slight additional 
improvement is obtained by using -O3 or -Ofast rather than -O2. 
However, the total available improvement is less than 15%.

Obviously, such tests will give different results on different machines. 
  Those described above were done on a machine with a Core2 Duo E6750 
CPU, 2.66 Ghz.  Here is a similar set of results for a Windows machine 
(Core i5-2500, 3.3 GHz):

FFLAGS   Time

-O0 -fbounds-check   28.5
-O1 -fbounds-check   18.2
-O2 -fbounds-check -funroll-all-loops16.6 *
-O2 -fbounds-check   16.2
-O3 -fbounds-check   16.2
-Ofast   15.7
-O3 -m32 -msse -funroll-all-loops15.4
-O3 -mtune=core2 15.1
-O3 -m32 -msse   15.0
-O3 -mtune=native15.0

* Used in the release builds of WSJT-X for Windows

The flags we're currently using for Windows Release builds give results 
within about 10% of the best one listed.

One way to look at all of this is that the most important optimizations 
are those that have already been done, by the programmer.  These include 
making the best possible choices of data structures, algorithms, loop 
ordering, etc., etc.

-- 73, Joe, K1JT

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-16 Thread Alessandro Gorobey
Hi Joe and all,
I can not in any way discuss the data provided but on my machine the 
difference is more that notable.
It is a Pavilion g6 Notebook PC with i5-2430M CPU@2.40Ghz ram 4G with 
windows 7 SP1 home

This is the diff of the two files:

C:\JTSDK\src\wsjtx-1.4>diff CMakeLists.txt CMakeListsMY.txt
429c429
< set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -std=c++11 
-fexceptions -frtti")
---
 > set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -std=c++11 
-fexceptions -frtti -mtune=native")
476c476
< set (General_FFLAGS "-fbounds-check -Wall -Wno-conversion 
-fno-second-underscore")
---
 > set (General_FFLAGS "-fbounds-check -Wall -Wno-conversion 
-fno-second-underscore -mtune=native")

The program run with JT65+JT9 mode. The version is v1.4.0-rc3 r4783[-dirty]
I removed the build and install directories files to be sure that all 
will be rebuild.
I increase the number of files to analyze, stop services and anti-virus, 
but the difference continue to be high.
Next week I try on other machines or others OS.
I start to think that is not only decoders influence the results.
Please note that i refer to the time to complete several loops.
With Shift+F6 on a directory the program loop on:
- read file
- display graph
- decode jt65
- show results
- decode jt9
- show results
- write ALL.TXTand other files

I put the mtune=native in the cmake file, so all fortran code but also c 
will be generated different.
It may be that I have to measure the execution time between the 
decodings to understand what happens.

Many thanks for the detailed informations.

Next days I'll investigate the strange time difference.

73, Merry Christmas and Happy New Year to You your Family and all the Group

Sandro IW3RAB


Il 16/12/2014 21:16, Joe Taylor ha scritto:
> Hi Alessandro and all,
>
> Compiler optimizations can be helpful when tuning code for good
> performance, but I am surprised that you see anything like a 2x
> improvement in decoding speed.
>
> The following table shows the results of a series of tests I made today
> for the decoder (the executable program jt9) running on files like the
> ones you created (01.wav, 02.wav, ... 10.wav, all copies of the example
> file 130610_2343.wav).  The first column lists the Fortran compiler
> flags used; the numerical column gives the total execution time (wall
> clock) for processing the ten files.
>
> FFLAGS   Time
> 
> -O0 -fbounds-check   42.4
> -O1 -fbounds-check   22.8
> -Os -fbounds-check   22.8
> -O2 -fbounds-check -funroll-all-loops20.4 *
> -O2 -fbounds-check   20.2
> -O3 -fbounds-check   19.8
> -Ofast -fbounds-check18.9
> -O2  18.4
> -O2 -mtune=native18.4
> -O2 -funroll-all-loops   18.2
> -O3  18.0
> -Ofast   17.8
> 
> * Used in the release builds of WSJT-X
>
> As you can see, "-mtune=native" made essentially no difference.  The
> biggest improvement in execution performance (over the default Release
> build) is gained by turning off bounds-checking.  A slight additional
> improvement is obtained by using -O3 or -Ofast rather than -O2.
> However, the total available improvement is less than 15%.
>
> Obviously, such tests will give different results on different machines.
>Those described above were done on a machine with a Core2 Duo E6750
> CPU, 2.66 Ghz.  Here is a similar set of results for a Windows machine
> (Core i5-2500, 3.3 GHz):
>
> FFLAGS   Time
> 
> -O0 -fbounds-check   28.5
> -O1 -fbounds-check   18.2
> -O2 -fbounds-check -funroll-all-loops16.6 *
> -O2 -fbounds-check   16.2
> -O3 -fbounds-check   16.2
> -Ofast   15.7
> -O3 -m32 -msse -funroll-all-loops15.4
> -O3 -mtune=core2 15.1
> -O3 -m32 -msse   15.0
> -O3 -mtune=native15.0
> 
> * Used in the release builds of WSJT-X for Windows
>
> The flags we're currently using for Windows Release builds give results
> within about 10% of the best one listed.
>
> One way to look at all of this is that the most important optimizations
> are those that have already been done, by the programmer.  These include
> making the best possible choices of data structures, algorithms, loop
> ordering, etc., etc.
>
>   -- 73, Joe, K1JT
>
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantl

Re: [wsjt-devel] Crazy test

2014-12-16 Thread Alan VK2ZIW
Hi guys,
WSPR 4.0  SVN 4795 recompiled with FFLAGS += -Ofast and the rest..

Has run from 9:50 this morning to now, 18:10 without crashing:
Memory allocation error
Cannot start rx thread  11

System: Banana Pi, Fedora 21, 
 
http://mirror.as24220.net/pub/fedora/linux/releases/21/Images/armhfp/Fedora-LXDE-armhfp-21-5-sda.raw.xz
 copied to SATA disk.
Booted with kernel 3.4.105+ because Fedora have not ported the Mali video
driver yet.

ONE MAJOR MOD: Replace Python Imaging Library (Pillow) with 2.5.3
Use the "pip3" tool, Python's installer.
2.6.1 crashes when a waterfall is displayed.

For development, a SATA disk is essential. Forget SD cards, full stop.

Alan VK2ZIW

On Wed, 17 Dec 2014 01:36:44 +0100, Alessandro Gorobey wrote
> Hi Joe and all,
> I can not in any way discuss the data provided but on my machine the 
> difference is more that notable.
> It is a Pavilion g6 Notebook PC with i5-2430M CPU@2.40Ghz ram 4G 
> with windows 7 SP1 home
> 
> This is the diff of the two files:
> 
> C:\JTSDK\src\wsjtx-1.4>diff CMakeLists.txt CMakeListsMY.txt
> 429c429
> < set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -std=c++11 
> -fexceptions -frtti")
> ---
>  > set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -std=c++11 
> -fexceptions -frtti -mtune=native")
> 476c476
> < set (General_FFLAGS "-fbounds-check -Wall -Wno-conversion 
> -fno-second-underscore")
> ---
>  > set (General_FFLAGS "-fbounds-check -Wall -Wno-conversion 
> -fno-second-underscore -mtune=native")
> 
> The program run with JT65+JT9 mode. The version is v1.4.0-rc3 r4783[-
> dirty] I removed the build and install directories files to be sure 
> that all will be rebuild. I increase the number of files to analyze, 
> stop services and anti-virus, but the difference continue to be 
> high. Next week I try on other machines or others OS. I start to 
> think that is not only decoders influence the results. Please note 
> that i refer to the time to complete several loops. With Shift+F6 on 
> a directory the program loop on: - read file - display graph - 
> decode jt65 - show results - decode jt9 - show results - write 
> ALL.TXTand other files
> 
> I put the mtune=native in the cmake file, so all fortran code but 
> also c will be generated different. It may be that I have to measure 
> the execution time between the decodings to understand what happens.
> 
> Many thanks for the detailed informations.
> 
> Next days I'll investigate the strange time difference.
> 
> 73, Merry Christmas and Happy New Year to You your Family and all 
> the Group
> 
> Sandro IW3RAB
> 
> Il 16/12/2014 21:16, Joe Taylor ha scritto:
> > Hi Alessandro and all,
> >
> > Compiler optimizations can be helpful when tuning code for good
> > performance, but I am surprised that you see anything like a 2x
> > improvement in decoding speed.
> >
> > The following table shows the results of a series of tests I made today
> > for the decoder (the executable program jt9) running on files like the
> > ones you created (01.wav, 02.wav, ... 10.wav, all copies of the example
> > file 130610_2343.wav).  The first column lists the Fortran compiler
> > flags used; the numerical column gives the total execution time (wall
> > clock) for processing the ten files.
> >
> > FFLAGS   Time
> > 
> > -O0 -fbounds-check   42.4
> > -O1 -fbounds-check   22.8
> > -Os -fbounds-check   22.8
> > -O2 -fbounds-check -funroll-all-loops20.4 *
> > -O2 -fbounds-check   20.2
> > -O3 -fbounds-check   19.8
> > -Ofast -fbounds-check18.9
> > -O2  18.4
> > -O2 -mtune=native18.4
> > -O2 -funroll-all-loops   18.2
> > -O3  18.0
> > -Ofast   17.8
> > 
> > * Used in the release builds of WSJT-X
> >
> > As you can see, "-mtune=native" made essentially no difference.  The
> > biggest improvement in execution performance (over the default Release
> > build) is gained by turning off bounds-checking.  A slight additional
> > improvement is obtained by using -O3 or -Ofast rather than -O2.
> > However, the total available improvement is less than 15%.
> >
> > Obviously, such tests will give different results on different machines.
> >Those described above were done on a machine with a Core2 Duo E6750
> > CPU, 2.66 Ghz.  Here is a similar set of results for a Windows machine
> > (Core i5-2500, 3.3 GHz):
> >
> > FFLAGS   Time
> > 
> > -O0 -fbounds-check   28.5
> > -O1 -fbounds-check   18.2
> > -O2 -fbounds-check -funroll-all-loops16.6 *
> > -O2 -fbounds-check   16.2
> > -O3 -fbounds-check 

Re: [wsjt-devel] Crazy test

2014-12-17 Thread Joe Taylor
Hi Alessandro,

I replicated your tests as exactly as possible, modifying 
CMAKE_CXX_FLAGS and General_FFLAGS by the addition of "-mtune=native".
Using WSJT-X and the "Shift+F6" command, the sequence of ten files 
(01.wav, 02,wav, ... 10.wav) was processed in 21 seconds with or without 
the addition of "-mtune=native" before building the program from 
scratch.  I could find no measurable difference in execution speed for 
the two cases.  Certainly they were the same to within 1 second.

I note also that the total execution time is very nearly the same as 
what I reported yesterday for the execution of jt9[.exe] from the 
command line.  Almost all of the CPU-intensive "number crunching" in 
WSJT-X occurs in the Fortran code in jt9.  Other tasks such as display 
of graphical information and decoded text, writing output files, etc.,
make comparatively trivial demands on CPU resources.

It remains a mystery to me why you have seen large differences in 
execution speed after adding the compiler flag "-mtune=native".

-- 73, Joe, K1JT

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-17 Thread Michael Black
Have you got a batch file or such so perhaps I can try and replicate this
too?

It seems the current fftwf build from JTSDK uses different flags.
CFLAGS = -O3 -fomit-frame-pointer -mtune=native -malign-double
-fstrict-aliasing -fno-schedule-insns -ffast-math

Mike W9MDB


-Original Message-
From: Joe Taylor [mailto:j...@princeton.edu] 
Sent: Wednesday, December 17, 2014 9:13 AM
To: WSJT software development
Subject: Re: [wsjt-devel] Crazy test

Hi Alessandro,

I replicated your tests as exactly as possible, modifying CMAKE_CXX_FLAGS
and General_FFLAGS by the addition of "-mtune=native".
Using WSJT-X and the "Shift+F6" command, the sequence of ten files (01.wav,
02,wav, ... 10.wav) was processed in 21 seconds with or without the addition
of "-mtune=native" before building the program from scratch.  I could find
no measurable difference in execution speed for the two cases.  Certainly
they were the same to within 1 second.

I note also that the total execution time is very nearly the same as what I
reported yesterday for the execution of jt9[.exe] from the command line.
Almost all of the CPU-intensive "number crunching" in WSJT-X occurs in the
Fortran code in jt9.  Other tasks such as display of graphical information
and decoded text, writing output files, etc., make comparatively trivial
demands on CPU resources.

It remains a mystery to me why you have seen large differences in execution
speed after adding the compiler flag "-mtune=native".

-- 73, Joe, K1JT


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from
Actuate! Instantly Supercharge Your Business Reports and Dashboards with
Interactivity, Sharing, Native Excel Exports, App Integration & more Get
technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-17 Thread Joe Taylor
On 12/17/2014 10:22 AM, Michael Black wrote:
> Have you got a batch file or such so perhaps I can try and replicate this
> too?

Replicating Alessandro's test requires a trivial change to the file 
CMakeLists.txt for WSJT-X.  See his email for details.

> It seems the current fftwf build from JTSDK uses different flags.
> CFLAGS = -O3 -fomit-frame-pointer -mtune=native -malign-double
> -fstrict-aliasing -fno-schedule-insns -ffast-math

Alessandro's test has nothing to do with the way the FFTW library was built.

-- Joe, K1JT

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-17 Thread Bill Somerville
On 17/12/2014 15:22, Michael Black wrote:
Hi All,
> Have you got a batch file or such so perhaps I can try and replicate this
> too?
Sandro listed the options he changed, they are all set in CMakeLists.txt.
>
> It seems the current fftwf build from JTSDK uses different flags.
> CFLAGS = -O3 -fomit-frame-pointer -mtune=native -malign-double
> -fstrict-aliasing -fno-schedule-insns -ffast-math
Let's not get confused here about the various components of WSJT-X.

We have The Fortran code which implements the decoding algorithms and 
filters.

There is the library FFTW which does discrete Fourier transforms and 
inverse Fourier transforms. Most users, possible all, use a pre-built 
version of this library i.e. for a given architecture we are all using 
the same machine code.

Also we have the C++/C code that implements the UI and operating system 
interfaces.

The Fortran code is basically running at CPU speed apart from when it 
delegates work to FFTW. This is not continuous but for significant 
periods it is unthrottled while the decodes are processed.

The FFTW library itself is also CPU intensive while processing a 
DFT/IDFT task.

The C++/C code is largely event driven and spends most of its time 
either waiting for operating system services like audio streaming, user 
actions, or timed events.

So the Fortran code and FFTW are the CPU bound areas that would be 
sensitive to processor speed and feature usage. Therefore these are 
potentially tunable by taking advantage of special features of a 
particular CPU. On Intel processors these include the MMX, and SSE 
series of special machine instructions. AMD and ARM have there own 
variants. In general these special instructions sets are know as SIMD 
extensions. These extra instructions are basically a vector floating 
point engine that allows a small number of multiple floating point 
calculations to be executed in he time the conventional instruction set 
CPU can only do one such calculation.

The Fortran and C++ compilers can be told to emit machine code tailored 
to the exact machine architecture the compiler is running on. This 
generates potentially faster code which is less portable.

The FFTW library takes a runtime approach to machine level optimization, 
it examines the CPU and also does trial calculations using the various 
available features to choose the best available strategy for the DFT and 
IDFT algorithms it uses.

So this thread is really discussing how the various architecture 
specific Fortran compiler options impact decoding performance.

It is quite possible that the CPU Sandro is testing on is poor in 
performance when the specialized SIMD are not used whereas Joe's machine 
has better performance without the enhanced  instructions relative to 
with them.
>
> Mike W9MDB
73
Bill
G4WJS.
>
>
> -Original Message-
> From: Joe Taylor [mailto:j...@princeton.edu]
> Sent: Wednesday, December 17, 2014 9:13 AM
> To: WSJT software development
> Subject: Re: [wsjt-devel] Crazy test
>
> Hi Alessandro,
>
> I replicated your tests as exactly as possible, modifying CMAKE_CXX_FLAGS
> and General_FFLAGS by the addition of "-mtune=native".
> Using WSJT-X and the "Shift+F6" command, the sequence of ten files (01.wav,
> 02,wav, ... 10.wav) was processed in 21 seconds with or without the addition
> of "-mtune=native" before building the program from scratch.  I could find
> no measurable difference in execution speed for the two cases.  Certainly
> they were the same to within 1 second.
>
> I note also that the total execution time is very nearly the same as what I
> reported yesterday for the execution of jt9[.exe] from the command line.
> Almost all of the CPU-intensive "number crunching" in WSJT-X occurs in the
> Fortran code in jt9.  Other tasks such as display of graphical information
> and decoded text, writing output files, etc., make comparatively trivial
> demands on CPU resources.
>
> It remains a mystery to me why you have seen large differences in execution
> speed after adding the compiler flag "-mtune=native".
>
>   -- 73, Joe, K1JT
>
> 
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from
> Actuate! Instantly Supercharge Your Business Reports and Dashboards with
> Interactivity, Sharing, Native Excel Exports, App Integration & more Get
> technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> ___
> wsjt-devel mailing list
> wsjt-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/wsjt-devel
>
>
> --

Re: [wsjt-devel] Crazy test

2014-12-17 Thread Michael Black
I didn't quite realize that jt9 didn't use the dll.  

What's the command line to run jt9?
Not sure what args to give since neither of you mentioned what you're
passing to it.

Mike W9MDB

-Original Message-
From: Joe Taylor [mailto:j...@princeton.edu] 
Sent: Wednesday, December 17, 2014 9:36 AM
To: WSJT software development
Subject: Re: [wsjt-devel] Crazy test

On 12/17/2014 10:22 AM, Michael Black wrote:
> Have you got a batch file or such so perhaps I can try and replicate 
> this too?

Replicating Alessandro's test requires a trivial change to the file
CMakeLists.txt for WSJT-X.  See his email for details.

> It seems the current fftwf build from JTSDK uses different flags.
> CFLAGS = -O3 -fomit-frame-pointer -mtune=native -malign-double 
> -fstrict-aliasing -fno-schedule-insns -ffast-math

Alessandro's test has nothing to do with the way the FFTW library was built.

-- Joe, K1JT


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from
Actuate! Instantly Supercharge Your Business Reports and Dashboards with
Interactivity, Sharing, Native Excel Exports, App Integration & more Get
technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-17 Thread Joe Taylor
Mike --

> I didn't quite realize that jt9 didn't use the dll.

What dll are you referring to?  If you mean the FFTW library, jt9[.exe] 
definitely *does* use it.

> What's the command line to run jt9?

Type jt9 by itself at the command prompt, to get a brief "usage"
message.  For example:

C:\JTSDK-QT\wsjtx\install\Release\bin>jt9
  Usage: jt9 -p TRperiod [-d ndepth] [-f rxfreq] {-w patience] -e 
exe_dir file1 [
file2 ...]
 Reads data from *.wav files.

 jt9 -s  [-w patience] -e exe_dir -a data_dir -t temp_dir
 Gets data from shared memory region with key==

> Not sure what args to give since neither of you mentioned what you're
> passing to it.

My tests used the command

  jt9 -p 1 -d 3 /tmp5/0?.wav

Directory /tmp5 contained the files 00.wav, 01.wav, ... 09.wav -- ten 
files in all, each a copy of the example file 130610_2343.wav.

-- Joe, K1JT

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-17 Thread Alessandro Gorobey
Hi Joe,
I will try other tests in next days.
Today I try the test on a desktop i7-2600 cpu@3.4GHz OS window7 64 pro
The test is on 100 files named from test00.wav to test99.wav
I modified in display.cpp the line 40 routine void 
DisplayText::insertLineSpacer()
to
void DisplayText::insertLineSpacer()
{
//QString tt="";
QTime time = QTime::currentTime();
 QString tt = time.toString();
 QString bg="#d3d3d3";
 _insertText(tt,bg);
}
So i have time markers. Putting the '.mtune=native' ONLY on fortran 
flags reduce the time from 5 to 4 minutes on the 98 cycles (first and 
last are excluded). This confirm your table on the decoder times.
I think some other parameter influence the program (other hardware? 
other libs? other video drivers?)
I will try to analyze between 'decodings' to understand the strange 
difference on some machines.
I will keep you informed as I am very curious

Best regards

73 Sandro IW3RAB

Il 17/12/2014 16:13, Joe Taylor ha scritto:
> Hi Alessandro,
>
> I replicated your tests as exactly as possible, modifying
> CMAKE_CXX_FLAGS and General_FFLAGS by the addition of "-mtune=native".
> Using WSJT-X and the "Shift+F6" command, the sequence of ten files
> (01.wav, 02,wav, ... 10.wav) was processed in 21 seconds with or without
> the addition of "-mtune=native" before building the program from
> scratch.  I could find no measurable difference in execution speed for
> the two cases.  Certainly they were the same to within 1 second.
>
> I note also that the total execution time is very nearly the same as
> what I reported yesterday for the execution of jt9[.exe] from the
> command line.  Almost all of the CPU-intensive "number crunching" in
> WSJT-X occurs in the Fortran code in jt9.  Other tasks such as display
> of graphical information and decoded text, writing output files, etc.,
> make comparatively trivial demands on CPU resources.
>
> It remains a mystery to me why you have seen large differences in
> execution speed after adding the compiler flag "-mtune=native".
>
>   -- 73, Joe, K1JT
>
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> ___
> wsjt-devel mailing list
> wsjt-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/wsjt-devel
>


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-18 Thread KI7MT
Hello All,

I added a Windows script (Complements to StackOverflow) to JTSDK v2.0.0
\scripts directory that allows timing of commands, for example,

Need to run the update first to get the new file, then:

[1] Open JTSDK-QT
[2] cd /d < WSJTX JT9.exe location >
[3][ Run command < using Joe's example below >:

I copied 10 of the example wave files to wsjtx\Release\bin\tmp5\

timecmd jt9 -p 1 -d 3 .\tmp5\0?.wav
..
..
..
Command Took: 0:0:29.86 (29.86s total)

It similar to using the time function in Linux ( time  ).
Just thought it may be useful for what your doing here.

73's
Greg, KI7MT
On 12/17/2014 16:04, Joe Taylor wrote:
> Mike --
> 
>> I didn't quite realize that jt9 didn't use the dll.
> 
> What dll are you referring to?  If you mean the FFTW library, jt9[.exe] 
> definitely *does* use it.
> 
>> What's the command line to run jt9?
> 
> Type jt9 by itself at the command prompt, to get a brief "usage"
> message.  For example:
> 
> C:\JTSDK-QT\wsjtx\install\Release\bin>jt9
>   Usage: jt9 -p TRperiod [-d ndepth] [-f rxfreq] {-w patience] -e 
> exe_dir file1 [
> file2 ...]
>  Reads data from *.wav files.
> 
>  jt9 -s  [-w patience] -e exe_dir -a data_dir -t temp_dir
>  Gets data from shared memory region with key==
> 
>> Not sure what args to give since neither of you mentioned what you're
>> passing to it.
> 
> My tests used the command
> 
>   jt9 -p 1 -d 3 /tmp5/0?.wav
> 
> Directory /tmp5 contained the files 00.wav, 01.wav, ... 09.wav -- ten 
> files in all, each a copy of the example file 130610_2343.wav.
> 
>   -- Joe, K1JT
> 
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> ___
> wsjt-devel mailing list
> wsjt-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/wsjt-devel
> 

-- 
73's
Greg, KI7MT

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test

2014-12-18 Thread Alessandro Gorobey
Big idea,
this remember me in 1985 with timex command
Tomorrow night I update JTSDK

By the way the execution time difference seem be relevant only on non 
desktop computers ( Intel turbo mode ? video drivers/hardware ? direct X ?)

Merry Christmas to all

73 de Sandro IW3RAB


Il 18/12/2014 19:24, KI7MT ha scritto:
> Hello All,
>
> I added a Windows script (Complements to StackOverflow) to JTSDK v2.0.0
> \scripts directory that allows timing of commands, for example,
>
> Need to run the update first to get the new file, then:
>
> [1] Open JTSDK-QT
> [2] cd /d < WSJTX JT9.exe location >
> [3][ Run command < using Joe's example below >:
>
> I copied 10 of the example wave files to wsjtx\Release\bin\tmp5\
>
> timecmd jt9 -p 1 -d 3 .\tmp5\0?.wav
> ..
> ..
> ..
> Command Took: 0:0:29.86 (29.86s total)
>
> It similar to using the time function in Linux ( time  ).
> Just thought it may be useful for what your doing here.
>
> 73's
> Greg, KI7MT
> On 12/17/2014 16:04, Joe Taylor wrote:
>> Mike --
>>
>>> I didn't quite realize that jt9 didn't use the dll.
>> What dll are you referring to?  If you mean the FFTW library, jt9[.exe]
>> definitely *does* use it.
>>
>>> What's the command line to run jt9?
>> Type jt9 by itself at the command prompt, to get a brief "usage"
>> message.  For example:
>>
>> C:\JTSDK-QT\wsjtx\install\Release\bin>jt9
>>Usage: jt9 -p TRperiod [-d ndepth] [-f rxfreq] {-w patience] -e
>> exe_dir file1 [
>> file2 ...]
>>   Reads data from *.wav files.
>>
>>   jt9 -s  [-w patience] -e exe_dir -a data_dir -t temp_dir
>>   Gets data from shared memory region with key==
>>
>>> Not sure what args to give since neither of you mentioned what you're
>>> passing to it.
>> My tests used the command
>>
>>jt9 -p 1 -d 3 /tmp5/0?.wav
>>
>> Directory /tmp5 contained the files 00.wav, 01.wav, ... 09.wav -- ten
>> files in all, each a copy of the example file 130610_2343.wav.
>>
>>  -- Joe, K1JT
>>
>> --
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>> ___
>> wsjt-devel mailing list
>> wsjt-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/wsjt-devel
>>


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test understood

2014-12-22 Thread Alessandro Gorobey
Hi All,

I understand what happen on my Core i5-2430M

see also:
http://en.wikipedia.org/wiki/List_of_Intel_Core_i5_microprocessors
http://en.wikipedia.org/wiki/Intel_Turbo_Boost outdated but still valid

The solution is very simple, do not try any benchmark.
Do not try any repetitive benchmark that head up the CPU
I found a program that show some hidden parameters, as power, clock and 
temperature.

The frequency of processor is from 600 Mhz (idle) to 2.4GHz (max) up to 
near 3GHz ( turbo boost enable) but depend on external parameters.
If the temperature go high or the power absorption go over some limits 
the clock go down.

Seem that some instruction need less power, so the strange results.

As written few day ago on desktop computers and processors the 
difference is low as in tables mailed by Joe.

Also GPU do some energy saving (battery operation) to maintain the PC 
temperature low or in some parameters (infamous maximum junction 
temperatureas for CPU).

Pressurizing the bottom of the PC as a tube with a very powerful fan 
(power absorption double of PC and noise as a Jet) minimize problem.

I apologize for the message

Merry Christmas and Happy New Year to All

73
Sandro
IW3RAB


Il 17/12/2014 22:36, Alessandro Gorobey ha scritto:
> Hi Joe,
> I will try other tests in next days.
> Today I try the test on a desktop i7-2600 cpu@3.4GHz OS window7 64 pro
> The test is on 100 files named from test00.wav to test99.wav
> I modified in display.cpp the line 40 routine void 
> DisplayText::insertLineSpacer()
> to
> void DisplayText::insertLineSpacer()
> {
> //QString tt="";
> QTime time = QTime::currentTime();
> QString tt = time.toString();
> QString bg="#d3d3d3";
> _insertText(tt,bg);
> }
> So i have time markers. Putting the '.mtune=native' ONLY on fortran 
> flags reduce the time from 5 to 4 minutes on the 98 cycles (first and 
> last are excluded). This confirm your table on the decoder times.
> I think some other parameter influence the program (other hardware? 
> other libs? other video drivers?)
> I will try to analyze between 'decodings' to understand the strange 
> difference on some machines.
> I will keep you informed as I am very curious
>
> Best regards
>
> 73 Sandro IW3RAB
>
> Il 17/12/2014 16:13, Joe Taylor ha scritto:
>> Hi Alessandro,
>>
>> I replicated your tests as exactly as possible, modifying
>> CMAKE_CXX_FLAGS and General_FFLAGS by the addition of "-mtune=native".
>> Using WSJT-X and the "Shift+F6" command, the sequence of ten files
>> (01.wav, 02,wav, ... 10.wav) was processed in 21 seconds with or without
>> the addition of "-mtune=native" before building the program from
>> scratch.  I could find no measurable difference in execution speed for
>> the two cases.  Certainly they were the same to within 1 second.
>>
>> I note also that the total execution time is very nearly the same as
>> what I reported yesterday for the execution of jt9[.exe] from the
>> command line.  Almost all of the CPU-intensive "number crunching" in
>> WSJT-X occurs in the Fortran code in jt9.  Other tasks such as display
>> of graphical information and decoded text, writing output files, etc.,
>> make comparatively trivial demands on CPU resources.
>>
>> It remains a mystery to me why you have seen large differences in
>> execution speed after adding the compiler flag "-mtune=native".
>>
>> -- 73, Joe, K1JT
>>
>> --
>>  
>>
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & 
>> more
>> Get technology previously reserved for billion-dollar corporations, FREE
>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk 
>>
>> ___
>> wsjt-devel mailing list
>> wsjt-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/wsjt-devel
>>
>


--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] Crazy test understood

2014-12-22 Thread Joe Taylor
Hi Sandro,

Thanks for your latest report.  Clearly the mystery has been solved!

There is certainly no need to apologize for starting this thread.  Your 
reports are always clear and informative.  I learned something from the 
follow-ups that several of us did, and perhaps others did too.

Best wishes of the season to all!

-- 73, Joe, K1JT

On 12/22/2014 3:07 PM, Alessandro Gorobey wrote:
> Hi All,
>
> I understand what happen on my Core i5-2430M
>
> see also:
> http://en.wikipedia.org/wiki/List_of_Intel_Core_i5_microprocessors
> http://en.wikipedia.org/wiki/Intel_Turbo_Boost outdated but still valid
>
> The solution is very simple, do not try any benchmark.
> Do not try any repetitive benchmark that head up the CPU
> I found a program that show some hidden parameters, as power, clock and
> temperature.
>
> The frequency of processor is from 600 Mhz (idle) to 2.4GHz (max) up to
> near 3GHz ( turbo boost enable) but depend on external parameters.
> If the temperature go high or the power absorption go over some limits
> the clock go down.
>
> Seem that some instruction need less power, so the strange results.
>
> As written few day ago on desktop computers and processors the
> difference is low as in tables mailed by Joe.
>
> Also GPU do some energy saving (battery operation) to maintain the PC
> temperature low or in some parameters (infamous maximum junction
> temperatureas for CPU).
>
> Pressurizing the bottom of the PC as a tube with a very powerful fan
> (power absorption double of PC and noise as a Jet) minimize problem.
>
> I apologize for the message
>
> Merry Christmas and Happy New Year to All
>
> 73
> Sandro
> IW3RAB

--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel