Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Michael Black
I sent this in via HTML and it got blocked...so here it is plain text...
Mike W9MDB

Did a 20-pass run on the last two versions of interest - I have a dual
4-Core CPU so apparently would have 8 threads available on 4928 cut to 2
threads on 4930.  So an ever so slight improvement with 1 thread..2 threads
got worse though but they were already worse to start with.
Col1 = TimeMem-1.0.exe jt9_omp -p 1 -d 3 -w 2 -m 1 130610_2343.wav | grep
Elapsed | cut -f2 -d:
Col2 = TimeMem-1.0.exe jt9_omp -p 1 -d 3 -w 2 -m 2 130610_2343.wav | grep
Elapsed | cut -f2 -d:

Thread  1   2   %Diff
49281.131.14-0.88%
1.081.13-4.63%
1.1 1.2 -9.09%
1.1 1.13-2.73%
1.191.190.00%
1.121.1 1.79%
1.1 1.14-3.64%
1.081.13-4.63%
1.111.18-6.31%
1.1 1.12-1.82%
1.091.12-2.75%
1.1 1.13-2.73%
1.091.18-8.26%
1.1 1.12-1.82%
1.091.17-7.34%
1.081.2 -11.11%
1.091.31-20.18%
1.111.23-10.81%
1.1 1.24-12.73%
1.091.19-9.17%
Average 1.1025  1.1675  -5.94%

Thread  1   2   %Diff
49301.1 1.28-16.36%
1.081.21-12.04%
1.081.2 -11.11%
1.1 1.22-10.91%
1.081.23-13.89%
1.081.22-12.96%
1.091.22-11.93%
1.071.23-14.95%
1.091.23-12.84%
1.131.22-7.96%
1.091.22-11.93%
1.081.22-12.96%
1.081.25-15.74%
1.081.22-12.96%
1.111.24-11.71%
1.091.22-11.93%
1.111.24-11.71%
1.1 1.22-10.91%
1.081.2 -11.11%
1.091.2 -10.09%
Average 1.0905  1.2245  -12.30%

-Original Message-
From: Bill Somerville [mailto:g4...@classdesign.com] 
Sent: Wednesday, February 04, 2015 9:31 AM
To: wsjt-devel@lists.sourceforge.net
Subject: Re: [wsjt-devel] WSJT-X Decoder Performance

On 04/02/2015 15:27, Joe Taylor wrote:
 Hi Bill,
Hi Joe,

 OK, by all means go ahead.

 BTW: I notice that jt9_omp.exe r4929 always runs with 4 threads on my 
 4-core machine.  Since we have only two tasks running in parallal, I 
 can see little reason to use more than 2 threads.  Should we specify 
 two threads explicitly?
Yes, I have addressed that as well.

   -- Joe
73
Bill
G4WJS.

 On 2/4/2015 10:24 AM, Bill Somerville wrote:
 On 04/02/2015 15:21, Joe Taylor wrote:
 Hi Bill and all,
 Hi Joe,

 snip
 Note that decoder.f90 now decodes the two modes in parallel sections
 *ONLY* if txmode is JT9.  I will fix this.
 Joe, I already have this in hand, I can check it in if you wish.

 snip
 -- Joe
 73
 Bill
 G4WJS.

 -
 - Dive into the World of Parallel Programming. The Go 
 Parallel Website, sponsored by Intel and developed in partnership 
 with Slashdot Media, is your hub for all things parallel software 
 development, from weekly thought leadership blogs to news, videos, 
 case studies, tutorials and more. Take a look and join the 
 conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel
 --
  Dive into the World of Parallel Programming. The Go Parallel 
 Website, sponsored by Intel and developed in partnership with Slashdot 
 Media, is your hub for all things parallel software development, from 
 weekly thought leadership blogs to news, videos, case studies, 
 tutorials and more. Take a look and join the conversation now. 
 http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel



--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Michael Black
Doing the same 20-pass run on my Windows 10 HP Envy with i7-4702MQ @ 2.2Ghz
(compare to the dual X5450 4-core CPU at 3Ghz) -- you can see Ghz doesn't
tell the whole story...
With 2 threads on Windows 10 I see a long run once in a great while.

Thread  1   2   %Diff
49300.790.82-3.80%
0.780.79-1.28%
0.780.79-1.28%
0.750.78-4.00%
0.780.780.00%
0.770.8 -3.90%
0.780.780.00%
0.770.79-2.60%
0.780.79-1.28%
0.790.82-3.80%
0.8 0.782.50%
0.760.77-1.32%
0.770.8 -3.90%
0.780.83-6.41%
0.8 0.82-2.50%
0.780.79-1.28%
0.780.780.00%
0.780.780.00%
0.770.78-1.30%
0.770.78-1.30%
Average 0.778   0.7925  -1.87%



--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Michael Black
I removed all the flush(6) except the one in decoder.f90.  There was an
unprotected one in jt9c.f90 which may explain the long runtimes I see
one-in-a-great while on my Windows 10 system.  Last long runtime was 7
seconds using 2 threads before I removed the flushes.
I am now running a loop test to see if any long run times are seen on both
my computers.  
Mike W9MDB
#include stdio.h

int main(int argc,char *argv[])
{
char *cmd = TimeMem-1.0.exe jt9_omp -p 1 -d 3 -w 2 -m 2
130610_2343.wav | grep Elapsed | cut -f2 -d: doit.txt;
double total=0;
int n=0;
char buf[4096];
while(1) {
system(cmd);
FILE *fp=fopen(doit.txt,r);
fgets(buf,sizeof(buf),fp);
fclose(fp);
double sec = atof(buf);
++n;
total+=sec;
double avg = total/n;
if (sec  avg*1.5) {
printf(long run %.2f avg=.2f\n,sec,avg);
}
printf(%d\r,n);
fflush(stdout);
}
}

Looking at how the output comes out of jt9_omp it would appear to me these
flushes are not necessary as it appears each line is being flushed anyways.

Not really any change in the timing
Mike W9MDB

Thread  1   2   %Diff   !flush  
49301.1 1.28-16.36% 1.111.21-9.01%
1.081.21-12.04% 1.111.22-9.91%
1.081.2 -11.11% 1.091.22-11.93%
1.1 1.22-10.91% 1.071.22-14.02%
1.081.23-13.89% 1.071.23-14.95%
1.081.22-12.96% 1.141.22-7.02%
1.091.22-11.93% 1.081.23-13.89%
1.071.23-14.95% 1.081.24-14.81%
1.091.23-12.84% 1.091.25-14.68%
1.131.22-7.96%  1.1 1.26-14.55%
1.091.22-11.93% 1.091.26-15.60%
1.081.22-12.96% 1.081.26-16.67%
1.081.25-15.74% 1.091.21-11.01%
1.081.22-12.96% 1.091.23-12.84%
1.111.24-11.71% 1.061.26-18.87%
1.091.22-11.93% 1.091.24-13.76%
1.111.24-11.71% 1.071.23-14.95%
1.1 1.22-10.91% 1.081.24-14.81%
1.081.2 -11.11% 1.071.22-14.02%
1.091.2 -10.09% 1.081.21-12.04%
Avg 1.0905  1.2245  -12.30% 1.087   1.233   -13.47%



--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Michael Black
Doing some testing on the 4928 jt9_omp on my Windows 10 box using command
line test.  I'm getting periodic long runs of 40-100 seconds...kind of like
it's running wisdom again or such.

There are a lot more page faults when that happens too.

I haven't see this behavior on Windows 7 yet.

Mike W9MDB






--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Michael Black
Did a 20-pass run on the last two versions of interest - I have a dual
4-Core CPU so apparently would have 8 threads available on 4928 cut to 2
threads on 4930.  So an ever so slight improvement with 1 thread..2 threads
got worse though but they were already worse to start with.

Col1 = TimeMem-1.0.exe jt9_omp -p 1 -d 3 -w 2 -m 1 130610_2343.wav | grep
Elapsed | cut -f2 -d:

Col2 = TimeMem-1.0.exe jt9_omp -p 1 -d 3 -w 2 -m 2 130610_2343.wav | grep
Elapsed | cut -f2 -d:

 

 


Threads

1

2

%Diff


4928

1.13

1.14

-0.88%


1.08

1.13

-4.63%


1.1

1.2

-9.09%


1.1

1.13

-2.73%


1.19

1.19

0.00%


1.12

1.1

1.79%


1.1

1.14

-3.64%


1.08

1.13

-4.63%


1.11

1.18

-6.31%


1.1

1.12

-1.82%


1.09

1.12

-2.75%


1.1

1.13

-2.73%


1.09

1.18

-8.26%


1.1

1.12

-1.82%


1.09

1.17

-7.34%


1.08

1.2

-11.11%


1.09

1.31

-20.18%


1.11

1.23

-10.81%


1.1

1.24

-12.73%


1.09

1.19

-9.17%


Average

1.1025

1.1675

-5.94%



 

Threads

1

2

%Diff


4930

1.1

1.28

-16.36%


1.08

1.21

-12.04%


1.08

1.2

-11.11%


1.1

1.22

-10.91%


1.08

1.23

-13.89%


1.08

1.22

-12.96%


1.09

1.22

-11.93%


1.07

1.23

-14.95%


1.09

1.23

-12.84%


1.13

1.22

-7.96%


1.09

1.22

-11.93%


1.08

1.22

-12.96%


1.08

1.25

-15.74%


1.08

1.22

-12.96%


1.11

1.24

-11.71%


1.09

1.22

-11.93%


1.11

1.24

-11.71%


1.1

1.22

-10.91%


1.08

1.2

-11.11%


1.09

1.2

-10.09%


Average

1.0905

1.2245

-12.30%

 

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Claude Frantz
On 02/04/2015 08:42 AM, Claude Frantz wrote:

 Please see here the result I have got with SVNVERSION 4928.

I'm very sorry, I have used the wrong executables. Here the right output 
now:

$ time ./jt9 -p 1 -d 3 -w 2 -m 1 
/home/claude/.wsjtx/bin/save/samples/130610_2343.wav
2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
2343  14  0.1 3490 @ CQ AG4M EM75
2343 -20 -1.3 3567 @ CQ TA4A KM37
2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
2343 -16  0.2 3774 @ CQ M0ABA JO01
2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
2343  -7  0.3  815 # KK4DSD W7VP -16
2343 -10  0.5  975 # CQ DL7ACA JO40
2343  -9  0.8 1089 # N2SU W0JMW R-14
2343 -11  0.8 1259 # YV6BFE F6GUU R-08
2343  -9  1.7 1471 # VA3UG F1HMR 73
2343  -1  0.6 1718 # BG THX JOE 73
2343 -15  1.3 1951 # RA3Y VE3NLS 73
2343 -20  0.4 2065 # K2OI AJ4UU R-20
DecodeFinished   0   1

real0m2.407s
user0m2.324s
sys 0m0.073s



$ time ./jt9_omp -p 1 -d 3 -w 2 -m 1 
/home/claude/.wsjtx/bin/save/samples/130610_2343.wav
2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
2343  -7  0.3  815 # KK4DSD W7VP -16
2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
2343 -10  0.5  975 # CQ DL7ACA JO40
2343  -9  0.8 1089 # N2SU W0JMW R-14
2343 -11  0.8 1259 # YV6BFE F6GUU R-08
2343  14  0.1 3490 @ CQ AG4M EM75
2343 -20 -1.3 3567 @ CQ TA4A KM37
2343  -9  1.7 1471 # VA3UG F1HMR 73
2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
2343 -16  0.2 3774 @ CQ M0ABA JO01
2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
2343  -1  0.6 1718 # BG THX JOE 73
2343 -15  1.3 1951 # RA3Y VE3NLS 73
2343 -20  0.4 2065 # K2OI AJ4UU R-20
DecodeFinished   0   1

real0m1.663s
user0m2.502s
sys 0m0.090s



--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Joe Taylor
Hi Claude,

Thanks for your timing report.

Your first test may have used the correct executables, too.  To get a 
good test you must run a configuration at least twice.  In the first 
run, the program accumulates wisdom about the best way to configure 
the FFT calculations.  This wisdom is saved and used for subsequent 
runs.  If you change the -w # or -m # parameters, new wisdom will 
need to be accumulated.

-- 73, Joe, K1JT

On 2/4/2015 5:44 AM, Claude Frantz wrote:
 On 02/04/2015 08:42 AM, Claude Frantz wrote:

 Please see here the result I have got with SVNVERSION 4928.

 I'm very sorry, I have used the wrong executables. Here the right output
 now:

 $ time ./jt9 -p 1 -d 3 -w 2 -m 1
 /home/claude/.wsjtx/bin/save/samples/130610_2343.wav
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished0   1

 real  0m2.407s
 user  0m2.324s
 sys   0m0.073s

 

 $ time ./jt9_omp -p 1 -d 3 -w 2 -m 1
 /home/claude/.wsjtx/bin/save/samples/130610_2343.wav
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished0   1

 real  0m1.663s
 user  0m2.502s
 sys   0m0.090s



 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Bill Somerville
On 04/02/2015 13:37, Joe Taylor wrote:
 Hi Claude,
Hi Claude  Joe,

 Thanks for your timing report.

 Your first test may have used the correct executables, too.  To get a
 good test you must run a configuration at least twice.  In the first
 run, the program accumulates wisdom about the best way to configure
 the FFT calculations.  This wisdom is saved and used for subsequent
 runs.  If you change the -w # or -m # parameters, new wisdom will
 need to be accumulated.
That is also the case if the number of threads (the '-m #' option) is 
changed.

   -- 73, Joe, K1JT
73
Bill
G4WJS.

 On 2/4/2015 5:44 AM, Claude Frantz wrote:
 On 02/04/2015 08:42 AM, Claude Frantz wrote:

 Please see here the result I have got with SVNVERSION 4928.
 I'm very sorry, I have used the wrong executables. Here the right output
 now:

 $ time ./jt9 -p 1 -d 3 -w 2 -m 1
 /home/claude/.wsjtx/bin/save/samples/130610_2343.wav
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished0   1

 real 0m2.407s
 user 0m2.324s
 sys  0m0.073s

 

 $ time ./jt9_omp -p 1 -d 3 -w 2 -m 1
 /home/claude/.wsjtx/bin/save/samples/130610_2343.wav
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished0   1

 real 0m1.663s
 user 0m2.502s
 sys  0m0.090s



 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel
 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Joe Taylor
Hi Bill and all,

Tests here suggest that r4929 produces a Windows jt9_omp.exe that runs 
correctly.  At least, it runs to completion on my sequence of 25 test 
files -- which r4928 does not.

Timing results on a 4-core Win7 machine:

Params   jt9   jt9_omp
--
-w 2 -m 1   25.5 s  21.1 s
-w 2 -m 2   24.921.0

When using OpenMP to run JT9 and JT65 decoders in parallel, we gain 
almost nothing by using multi-threading for the FFTW plans.

Note that decoder.f90 now decodes the two modes in parallel sections 
*ONLY* if txmode is JT9.  I will fix this.

I may also look for additional places where concurrent processing could 
help performance... but I don't consider this a very high priority.

-- Joe

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Bill Somerville
On 04/02/2015 15:21, Joe Taylor wrote:
 Hi Bill and all,
Hi Joe,

snip
 Note that decoder.f90 now decodes the two modes in parallel sections
 *ONLY* if txmode is JT9.  I will fix this.
Joe, I already have this in hand, I can check it in if you wish.

snip
   -- Joe
73
Bill
G4WJS.

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Joe Taylor
Hi Bill,

OK, by all means go ahead.

BTW: I notice that jt9_omp.exe r4929 always runs with 4 threads on my 
4-core machine.  Since we have only two tasks running in parallal, I can 
see little reason to use more than 2 threads.  Should we specify two 
threads explicitly?

-- Joe

On 2/4/2015 10:24 AM, Bill Somerville wrote:
 On 04/02/2015 15:21, Joe Taylor wrote:
 Hi Bill and all,
 Hi Joe,

 snip
 Note that decoder.f90 now decodes the two modes in parallel sections
 *ONLY* if txmode is JT9.  I will fix this.
 Joe, I already have this in hand, I can check it in if you wish.

 snip
  -- Joe
 73
 Bill
 G4WJS.

 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-04 Thread Bill Somerville
On 04/02/2015 15:27, Joe Taylor wrote:
 Hi Bill,
Hi Joe,

 OK, by all means go ahead.

 BTW: I notice that jt9_omp.exe r4929 always runs with 4 threads on my
 4-core machine.  Since we have only two tasks running in parallal, I can
 see little reason to use more than 2 threads.  Should we specify two
 threads explicitly?
Yes, I have addressed that as well.

   -- Joe
73
Bill
G4WJS.

 On 2/4/2015 10:24 AM, Bill Somerville wrote:
 On 04/02/2015 15:21, Joe Taylor wrote:
 Hi Bill and all,
 Hi Joe,

 snip
 Note that decoder.f90 now decodes the two modes in parallel sections
 *ONLY* if txmode is JT9.  I will fix this.
 Joe, I already have this in hand, I can check it in if you wish.

 snip
 -- Joe
 73
 Bill
 G4WJS.

 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel
 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Claude Frantz
Please see here the result I have got with SVNVERSION 4928.

I have suppressed the -m flag because the software rejects it.

Best 88 de Claude


  $ time ./jt9 -p 1 -d 3 -w 2 -e . 
/home/claude/.wsjtx/bin/save/samples/130610_2343.wav
2343  -7  0.3 3196 @ WB8QPG IZ0MIT -11
2343 -16  1.0 3372 @ KK4HEG KE0CO CN87
2343  16  0.1 3490 @ CQ AG4M EM75
2343 -18 -1.3 3567 @ CQ TA4A KM37
2343 -14  0.1 3627 @ CT1FBK IK5YZT R+02
2343 -22  0.3 3721 @ KF5SLN KB1SUA FN42
2343 -15  0.2 3774 @ CQ M0ABA JO01
2343  -1  0.2 3843 @ EI3HGB DD2EE JO31
2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
2343  -7  0.3  815 # KK4DSD W7VP -16
2343 -10  0.5  975 # CQ DL7ACA JO40
2343  -9  0.8 1089 # N2SU W0JMW R-14
2343 -11  0.8 1259 # YV6BFE F6GUU R-08
2343  -9  1.7 1471 # VA3UG F1HMR 73
2343  -1  0.6 1718 # BG THX JOE 73
2343 -15  1.3 1951 # RA3Y VE3NLS 73
2343 -20  0.4 2065 # K2OI AJ4UU R-20
DecodeFinished   1   1

real0m30.057s
user0m27.397s
sys 0m1.847s



$ time /home/claude/ham/JoeTaylor/wsjtx/build/jt9_omp -p 1 -d 3 -w 2 -e 
.   /home/claude/.wsjtx/bin/save/samples/130610_2343.wav
2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
2343  14  0.1 3490 @ CQ AG4M EM75
2343 -20 -1.3 3567 @ CQ TA4A KM37
2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
2343 -16  0.2 3774 @ CQ M0ABA JO01
2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
2343  -7  0.3  815 # KK4DSD W7VP -16
2343 -10  0.5  975 # CQ DL7ACA JO40
2343  -9  0.8 1089 # N2SU W0JMW R-14
2343 -11  0.8 1259 # YV6BFE F6GUU R-08
2343  -9  1.7 1471 # VA3UG F1HMR 73
2343  -1  0.6 1718 # BG THX JOE 73
2343 -15  1.3 1951 # RA3Y VE3NLS 73
2343 -20  0.4 2065 # K2OI AJ4UU R-20
DecodeFinished   0   1

real1m57.819s
user1m52.374s
sys 0m4.787s



$ uname -a
Linux defi 3.18.3-201.fc21.i686+PAE #1 SMP Mon Jan 19 16:09:58 UTC 2015 
i686 i686 i386 GNU/Linux

# lshw

 description: Notebook
 product: P50IJ
 vendor: ASUSTeK Computer Inc.
 version: 1.0
 serial: 103144040038
 width: 32 bits
 capabilities: smbios-2.5 dmi-2.5 smp-1.4 smp
 configuration: chassis=notebook cpus=2

 *-cpu:0
   description: CPU
   product: Core 2 Duo (PPN12345678901234567)
   vendor: Intel Corp.
   physical id: 4
   bus info: cpu@0
   version: 6.7.10
   serial: 0001-067A----
   slot: Socket 478
   size: 2101MHz
   capacity: 2101MHz
   width: 64 bits
   clock: 200MHz
   capabilities: x86-64 boot fpu fpu_exception wp vme de pse tsc 
msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi 
mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon pebs bts 
aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm 
sse4_1 xsave lahf_lm ida dtherm tpr_shadow vnmi flexpriority cpufreq
   configuration: cores=2 enabledcores=2 id=0 threads=2
 *-cache:0
  description: L1 cache
  physical id: 5
  slot: L1-Cache
  size: 64KiB
  capacity: 64KiB
  capabilities: internal write-back data
 *-cache:1
  description: L2 cache
  physical id: 7
  slot: L2-Cache
  size: 2MiB
  capacity: 2MiB
  capabilities: internal write-back unified
 *-logicalcpu:0
  description: Logical CPU
  physical id: 0.1
  width: 64 bits
  capabilities: logical
 *-logicalcpu:1
  description: Logical CPU
  physical id: 0.2
  width: 64 bits
  capabilities: logical
  *-cache
   description: L1 cache
   physical id: 6
   slot: L1-Cache
   size: 64KiB
   capacity: 64KiB
   capabilities: internal write-back instruction
  *-memory
   description: System Memory
   physical id: 1d
   slot: System board or motherboard
   size: 2GiB
 *-bank:0
  description: SODIMM DDR2 Synchronous 667 MHz (1.5 ns)
  product: N/A
  vendor: N/A
  physical id: 0
  serial: N/A
  slot: SODIMM0
  size: 2GiB
  width: 64 bits
  clock: 667MHz (1.5ns)
 *-bank:1
  description: SODIMM [empty]
  product: N/A
  vendor: N/A
  physical id: 1
  serial: N/A
  slot: SODIMM1
  *-cpu:1
   physical id: 1
   bus info: cpu@1
   version: 6.7.10
   serial: 0001-067A----
   size: 2101MHz
   capacity: 2101MHz
   capabilities: vmx ht cpufreq
   configuration: id=1
 *-logicalcpu:0
  description: Logical CPU
  physical id: 1.1
  capabilities: logical
 

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Joe Taylor
Dear Colleagues,

I have made some further performance tests of the decoders in WSJT-X.

I copied a collection of 25 *.wav files into a clean directory.  The 
files wererecorded in *JT9+JT65* mode during a busy period of activity 
on 20 meters.  On average, around a dozen decodable signals are present 
in each file -- typically 7 or 8 JT65 signals and 4 or 5 JT9 signals.

My procedure was as follows:

1. Start the program.

2. Activate File | Erase ALL.TXT.

3. Activate File | Open and select first file in the test directory,
clicking the Open button exactly at the top of a UTC minute.

4. Allow decoding of the first file to finish, then hit Shift+F6 as
soon as the blue background has cleared from the *Decode* button.

5. The program then proceeds to decode the remaining 24 files.

6. Manually record the UTC when the last decode has finished, thereby
producing the total wall clock time to decode the 25 files.

7. Record the number of decoded lines in the file ALL.TXT.  (Don't
count the date line, at the top.)

8. Record the larger of the two bottom-line numbers from the file
timer.out.  This is the time that would be spent in the decoders at
the end of an Rx minute -- in this case, it is essentially the
wall-clock time minus the time spent reading files, producing the
waterfall, etc.

Here's a summary of my results:

Program Version  Wall Clock   Time   Decode#
---
v1.3 r3673   90 s62.14 s Deepest  290
v1.4.0-rc2, r440076  53.98   Deepest  302
v1.5, r4926  46  24.27   Deepest  309
v1.5, r4926  42  21.47   Normal   307
v1.5, r4926  40  20.14   Fast 305

Bottom line: The decoder in v1.5 r4926 is 2.2 to 3 times faster than the 
ones in v1.3 r3673 and v1.4.0-rc2, and it also decodes more signals.

Note that in revision v1.5 r4926 we are not yet taking advantage of 
concurrent processing in the decoder, on computers with more than one 
CPU.  Further gains can probably be achieved, if we put the effort into it.

-- 73, Joe, K1JT

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Claude Frantz
On 02/02/2015 08:45 PM, Joe Taylor wrote:

 The following table presents measurements of decoding speed for a number
 of tests using WSJT-X versions 1.3, 1.4.0-rc2, 1.5r4925, and 1.5r4926.
 Time gives the time is seconds to decode the sample file
 130610_2343.wav, which has 8 decodable JT9 signals and 9 decodable JT65
 signals.  Decode is the setting on the WSJT-X *Decode* menu.  The
 column labeled # gives the number of decoded signals.  (Note that
 selecting Deepest is required in order to decode one of the JT9
 signals.)

Hi Joe,

I think that many ones of us are interested to supply the results of the 
test in her/his own environment. Please give us a good recommendation 
how to make the test, so that the results become comparable. Perhaps, 
such a test could be incorporated in the Makefile.

Best 88 de Claude

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread ki7mt
Hi Bill,

Thanks for the explanation. I was looking for the CMake module OpenMP, 
didn't realize it was the different find module.

It seems to be working here ( JTSDK v2  Cmake 3.0.2 ):

snip
--
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
--
snip

I'm not sure why I didn't see it initially, but, in any case, all seems 
to be working properly.

73's
Greg, KI7MT


On 2/3/2015 2:16 PM, Bill Somerville wrote:
 On 03/02/2015 21:04, Joe Taylor wrote:
 Hi Greg,
 Hi Greg  Joe,

 Yes, we need the -fopenmp flag to be set.  I think that's being done
 appropriately now in CMakeLists.txt, although I confess I'm not always
 confident that I've understood its syntax fully.  Bill should be able to
 give the definitive answer.
 Yes that is correct.

 The CMake script uses the package finder for OpenMP (part of the CMake
 distribution) to find the OpenMP capabilities of the compilers on the
 build platform. That sets the variable OPENMP_FOUND (or OPENMP-NOTFOUND
 if it's not available) along with the result variables OpenMP_C_FLAGS,
 OpenMP_CXX_FLAGS and, OpenMP_Fortran_FLAGS (actually this last one is
 only set by CMake v3.1 and later so I have substituted the C compiler
 flag as it is the same for the compilers we use at present). These flags
 are added to '*_omp' source compiles.

 The CMake script builds two versions of the internal static library
 target 'wsjt', the second target is 'wsjt_omp'. It also builds two
 versions of the 'jt9' target, the second being 'jt9_omp' which itself
 depends on the 'wsjt_omp' library.

 The 'wsjt' library is basically all the Fortran and C modules that are
 used by jt9, jt9code, jt9sim, jt65code and wsjt-x.

  -- Joe
 73
 Bill
 G4WJS.

 On 2/3/2015 3:36 PM, ki...@yahoo.com wrote:
 Hi Joe,

 I'm not 100% certain on this, but for openmp on Windows, don't you have
 to enable that as a C flag with something l-ke: -fopenmp

 I would assume that's to be done in the CMakeLists.txt file. I'm not
 sure about linking the libraries.

 The Qt5 Tool chain has winpthreads and gcc -v has --enable-libgomp so
 the tool chain looks to be OpenMP capable.

 73's
 Greg, KI7MT


 On 2/3/2015 1:06 PM, Joe Taylor wrote:
 Hi Bill and all,

 Perhaps you already tried jt9_omp in Linux, but I had not.  I tried it
 today, and it seems to work OK, as is.

 Here are some timing tests made on my rather elderly 2-core Linux
 machine.  This time all tests were made with the Deepest setting,
 ndepth=3, and all resulted in 17 good decodes of the sample file
 130610_2343.wav.  To get the times I measured real time to execute jt9
 or jt9_omp from the command-prompt.

 Program Version ParamsTime
 (s)
 
 jt9 v1.3 r36732.467
 jt9 v1.4.0-rc2, r4400 2.658
 jt9 v1.5 r4926 -w 1 -m 1  1.243
 jt9 v1.5 r4926 -w 2 -m 1  1.202
 jt9 v1.5 r4926 -w 2 -m 2  1.140
 jt9_omp v1.5 r4926 -w 2 -m 1  0.834
 jt9_omp v1.5 r4926 -w 2 -m 2  0.843

 When jt9_omp is used it's better *not* to use the multi-threaded FFTW
 plans, at least on this 2-core machine.  The two cores are already being
 used effectively by running the two big FFTs concurrently.


 For interest, here are the actual outputs of a pair of timing runs with
 jt9 and jt9_omp.  Note that the decoded lines are the same, but JT65
 lines are intermingled with JT9 lines.  (I like the original ordering
 better -- first the one at the decode frequency; then others in the same
 mode in order of increasing frequancy; then thos in the other mode,
 again in order of increasing frequancy.  With effort, I guess we could
 have it both ways by letting the GUI insert decodes (after the first
 one) in the proper place in the sequence.)

 #
 $ time jt9 -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished0   1

 real0m1.196s
 user0m1.157s
 sys 0m0.037s


 $ time jt9_omp -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343  -7  0.3  815 # KK4DSD W7VP -16
 

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Bill Somerville
On 03/02/2015 21:48, Michael Black wrote:

Hi Mike,
 I ran a test using a sample program.
 Compiled with
 gfortran -g -fopenmp -o omp1 omp1.f

 Every few runs it hangs after all threads have completed.
That program is not thread safe. All I/O statements need serializing to 
make it safe. Try adding:

!$omp critical(io)

immediately before each print or write and:

!$omp end critical(io)

immediately after each print or write.

It also is not a good example for another reason as it doesn't 
conditionally compile some OpenMP code so it will not compile without 
OpenMP support. This is bad practice because one usually wants a program 
to give identical results (as possible) when run with and without 
multi-threading.
 I haven't let them run to completion...I tried the jt9_omp and let it run
 and eventually it dies without an intelligent message after quite a few
 minutes.
jt9 is not yet suitable for multi-threaded decoding, I will be posting 
some amendments to deal with this when another unrelated issue has been 
tracked down.
 Running this program under gdb never hangs.
GDB will interfere with the thread scheduling, that is one of the 
gotchas of multi-threaded testing in that programs often run a different 
path when debugged or even when output statements are added to try and 
locate issues.

73
Bill
G4WJS.


 C***
 ***
 C FILE: omp_workshare1.f
 C DESCRIPTION:
 C   OpenMP Example - Loop Work-sharing - Fortran Version
 C   In this example, the iterations of a loop are scheduled dynamically
 C   across the team of threads.  A thread will perform CHUNK iterations
 C   at a time before being scheduled for the next CHUNK of work.
 C AUTHOR: Blaise Barney  5/99
 C LAST REVISED: 01/09/04
 C***
 ***
   
PROGRAM WORKSHARE1

INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,
   +  OMP_GET_THREAD_NUM, N, CHUNKSIZE, CHUNK, I
PARAMETER (N=100)
PARAMETER (CHUNKSIZE=10)
REAL A(N), B(N), C(N)

 ! Some initializations
DO I = 1, N
  A(I) = I * 1.0
  B(I) = A(I)
ENDDO
CHUNK = CHUNKSIZE

 !$OMP PARALLEL SHARED(A,B,C,NTHREADS,CHUNK) PRIVATE(I,TID)

TID = OMP_GET_THREAD_NUM()
IF (TID .EQ. 0) THEN
  NTHREADS = OMP_GET_NUM_THREADS()
  PRINT *, 'Number of threads =', NTHREADS
END IF
PRINT *, 'Thread',TID,' starting...'

 !$OMP DO SCHEDULE(DYNAMIC,CHUNK)
DO I = 1, N
  C(I) = A(I) + B(I)
  WRITE(*,100) TID,I,C(I)
   100FORMAT(' Thread',I2,': C(',I3,')=',F8.2)
ENDDO
 !$OMP END DO NOWAIT

PRINT *, 'Thread',TID,' done.'

 !$OMP END PARALLEL

END

 Mike W9MDB

 -Original Message-
 From: Bill Somerville [mailto:g4...@classdesign.com]
 Sent: Tuesday, February 03, 2015 3:17 PM
 To: wsjt-devel@lists.sourceforge.net
 Subject: Re: [wsjt-devel] WSJT-X Decoder Performance

 On 03/02/2015 21:04, Joe Taylor wrote:
 Hi Greg,
 Hi Greg  Joe,
 Yes, we need the -fopenmp flag to be set.  I think that's being done
 appropriately now in CMakeLists.txt, although I confess I'm not always
 confident that I've understood its syntax fully.  Bill should be able
 to give the definitive answer.
 Yes that is correct.

 The CMake script uses the package finder for OpenMP (part of the CMake
 distribution) to find the OpenMP capabilities of the compilers on the build
 platform. That sets the variable OPENMP_FOUND (or OPENMP-NOTFOUND if it's
 not available) along with the result variables OpenMP_C_FLAGS,
 OpenMP_CXX_FLAGS and, OpenMP_Fortran_FLAGS (actually this last one is only
 set by CMake v3.1 and later so I have substituted the C compiler flag as it
 is the same for the compilers we use at present). These flags are added to
 '*_omp' source compiles.

 The CMake script builds two versions of the internal static library target
 'wsjt', the second target is 'wsjt_omp'. It also builds two versions of the
 'jt9' target, the second being 'jt9_omp' which itself depends on the
 'wsjt_omp' library.

 The 'wsjt' library is basically all the Fortran and C modules that are used
 by jt9, jt9code, jt9sim, jt65code and wsjt-x.
  -- Joe
 73
 Bill
 G4WJS.
 On 2/3/2015 3:36 PM, ki...@yahoo.com wrote:
 Hi Joe,

 I'm not 100% certain on this, but for openmp on Windows, don't you
 have to enable that as a C flag with something l-ke: -fopenmp

 I would assume that's to be done in the CMakeLists.txt file. I'm not
 sure about linking the libraries.

 The Qt5 Tool chain has winpthreads and gcc -v has --enable-libgomp so
 the tool chain looks to be OpenMP capable.

 73's
 Greg, KI7MT


 On 2/3/2015 1:06 PM, Joe Taylor wrote:
 Hi Bill and all,

 Perhaps you already tried jt9_omp in Linux, but I had not.  I tried
 it today, and it seems to work OK, as is.

 Here are some timing tests made on my rather elderly 2-core Linux
 machine.  This time all tests

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Bill Somerville
On 03/02/2015 21:52, Michael Black wrote:
Hi Mike,
 Also -- the equivalent C program doesn't hang either...so it's a problem
It is not an equivalent program. First off it doesn't print a 
termination message at the end of each thread run. Secondly there is no 
certainty that C I/O will fail in the same way as Fortran I/O when it is 
used in an thread-unsafe way.
 with FORTRAN in 4.8.0 or so it would appear.
73
Bill
G4WJS.
 gcc -g -fopenmp -o omp1c omp1c.c
 /***
 ***
 * FILE: omp_workshare1.c
 * DESCRIPTION:
 *   OpenMP Example - Loop Work-sharing - C/C++ Version
 *   In this example, the iterations of a loop are scheduled dynamically
 *   across the team of threads.  A thread will perform CHUNK iterations
 *   at a time before being scheduled for the next CHUNK of work.
 * AUTHOR: Blaise Barney  5/99
 * LAST REVISED: 04/06/05
 
 **/
 #include omp.h
 #include stdio.h
 #include stdlib.h
 #define CHUNKSIZE   10
 #define N   100

 int main (int argc, char *argv[])
 {
 int nthreads, tid, i, chunk;
 float a[N], b[N], c[N];

 /* Some initializations */
 for (i=0; i  N; i++)
a[i] = b[i] = i * 1.0;
 chunk = CHUNKSIZE;

 #pragma omp parallel shared(a,b,c,nthreads,chunk) private(i,tid)
{
tid = omp_get_thread_num();
if (tid == 0)
  {
  nthreads = omp_get_num_threads();
  printf(Number of threads = %d\n, nthreads);
  }
printf(Thread %d starting...\n,tid);

#pragma omp for schedule(dynamic,chunk)
for (i=0; iN; i++)
  {
  c[i] = a[i] + b[i];
  printf(Thread %d: c[%d]= %f\n,tid,i,c[i]);
  }

}  /* end of parallel section */

 }





 -Original Message-
 From: Bill Somerville [mailto:g4...@classdesign.com]
 Sent: Tuesday, February 03, 2015 3:17 PM
 To: wsjt-devel@lists.sourceforge.net
 Subject: Re: [wsjt-devel] WSJT-X Decoder Performance

 On 03/02/2015 21:04, Joe Taylor wrote:
 Hi Greg,
 Hi Greg  Joe,
 Yes, we need the -fopenmp flag to be set.  I think that's being done
 appropriately now in CMakeLists.txt, although I confess I'm not always
 confident that I've understood its syntax fully.  Bill should be able
 to give the definitive answer.
 Yes that is correct.

 The CMake script uses the package finder for OpenMP (part of the CMake
 distribution) to find the OpenMP capabilities of the compilers on the build
 platform. That sets the variable OPENMP_FOUND (or OPENMP-NOTFOUND if it's
 not available) along with the result variables OpenMP_C_FLAGS,
 OpenMP_CXX_FLAGS and, OpenMP_Fortran_FLAGS (actually this last one is only
 set by CMake v3.1 and later so I have substituted the C compiler flag as it
 is the same for the compilers we use at present). These flags are added to
 '*_omp' source compiles.

 The CMake script builds two versions of the internal static library target
 'wsjt', the second target is 'wsjt_omp'. It also builds two versions of the
 'jt9' target, the second being 'jt9_omp' which itself depends on the
 'wsjt_omp' library.

 The 'wsjt' library is basically all the Fortran and C modules that are used
 by jt9, jt9code, jt9sim, jt65code and wsjt-x.
  -- Joe
 73
 Bill
 G4WJS.
 On 2/3/2015 3:36 PM, ki...@yahoo.com wrote:
 Hi Joe,

 I'm not 100% certain on this, but for openmp on Windows, don't you
 have to enable that as a C flag with something l-ke: -fopenmp

 I would assume that's to be done in the CMakeLists.txt file. I'm not
 sure about linking the libraries.

 The Qt5 Tool chain has winpthreads and gcc -v has --enable-libgomp so
 the tool chain looks to be OpenMP capable.

 73's
 Greg, KI7MT


 On 2/3/2015 1:06 PM, Joe Taylor wrote:
 Hi Bill and all,

 Perhaps you already tried jt9_omp in Linux, but I had not.  I tried
 it today, and it seems to work OK, as is.

 Here are some timing tests made on my rather elderly 2-core Linux
 machine.  This time all tests were made with the Deepest setting,
 ndepth=3, and all resulted in 17 good decodes of the sample file
 130610_2343.wav.  To get the times I measured real time to execute
 jt9 or jt9_omp from the command-prompt.

 Program Version ParamsTime
 (s)
 
 jt9 v1.3 r36732.467
 jt9 v1.4.0-rc2, r4400 2.658
 jt9 v1.5 r4926 -w 1 -m 1  1.243
 jt9 v1.5 r4926 -w 2 -m 1  1.202
 jt9 v1.5 r4926 -w 2 -m 2  1.140
 jt9_omp v1.5 r4926 -w 2 -m 1  0.834
 jt9_omp v1.5 r4926 -w 2 -m 2  0.843

 When jt9_omp is used it's better *not* to use the multi-threaded
 FFTW plans, at least on this 2-core machine.  The two cores are
 already being used effectively by running the two big FFTs concurrently.


 For interest, here are the actual outputs of a pair of timing runs
 with
 jt9 and jt9_omp.  Note that the decoded lines are the same, but JT65
 lines are intermingled with JT9 lines.  (I like the original
 ordering better -- first the one

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Bill Somerville
On 03/02/2015 22:15, Michael Black wrote:

Hi Mike,

read my suggestion again, particularly the word each!

See below.

73
Bill
G4WJS.
 Nope...still hangs...
 C***
 ***
 C FILE: omp_workshare1.f
 C DESCRIPTION:
 C   OpenMP Example - Loop Work-sharing - Fortran Version
 C   In this example, the iterations of a loop are scheduled dynamically
 C   across the team of threads.  A thread will perform CHUNK iterations
 C   at a time before being scheduled for the next CHUNK of work.
 C AUTHOR: Blaise Barney  5/99
 C LAST REVISED: 01/09/04
 C***
 ***
   
PROGRAM WORKSHARE1

INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,
   +  OMP_GET_THREAD_NUM, N, CHUNKSIZE, CHUNK, I
PARAMETER (N=100)
PARAMETER (CHUNKSIZE=10)
REAL A(N), B(N), C(N)

 ! Some initializations
DO I = 1, N
  A(I) = I * 1.0
  B(I) = A(I)
ENDDO
CHUNK = CHUNKSIZE

 !$OMP PARALLEL SHARED(A,B,C,NTHREADS,CHUNK) PRIVATE(I,TID)

TID = OMP_GET_THREAD_NUM()
IF (TID .EQ. 0) THEN
  NTHREADS = OMP_GET_NUM_THREADS()
!$omp critical(io)
  PRINT *, 'Number of threads =', NTHREADS
!$omp end critical(io)
END IF
!$omp critical(io)
PRINT *, 'Thread',TID,' starting...'
!$omp end critical(io)

 !$OMP DO SCHEDULE(DYNAMIC,CHUNK)
DO I = 1, N
  C(I) = A(I) + B(I)
 !$OMP CRITICAL(io)
  WRITE(*,100) TID,I,C(I)
 !$OMP END CRITICAL(io)
   100FORMAT(' Thread',I2,': C(',I3,')=',F8.2)
ENDDO
 !$OMP END DO NOWAIT
!$omp critical(io)
PRINT *, 'Thread',TID,' done.
!$omp end critical(io)

 !$OMP END PARALLEL

END

 -Original Message-
 From: Bill Somerville [mailto:g4...@classdesign.com]
 Sent: Tuesday, February 03, 2015 4:10 PM
 To: wsjt-devel@lists.sourceforge.net
 Subject: Re: [wsjt-devel] WSJT-X Decoder Performance

 On 03/02/2015 21:48, Michael Black wrote:

 Hi Mike,
 I ran a test using a sample program.
 Compiled with
 gfortran -g -fopenmp -o omp1 omp1.f

 Every few runs it hangs after all threads have completed.
 That program is not thread safe. All I/O statements need serializing to make
 it safe. Try adding:

 !$omp critical(io)

 immediately before each print or write and:

 !$omp end critical(io)

 immediately after each print or write.

 It also is not a good example for another reason as it doesn't conditionally
 compile some OpenMP code so it will not compile without OpenMP support. This
 is bad practice because one usually wants a program to give identical
 results (as possible) when run with and without multi-threading.
 I haven't let them run to completion...I tried the jt9_omp and let it
 run and eventually it dies without an intelligent message after quite
 a few minutes.
 jt9 is not yet suitable for multi-threaded decoding, I will be posting some
 amendments to deal with this when another unrelated issue has been tracked
 down.
 Running this program under gdb never hangs.
 GDB will interfere with the thread scheduling, that is one of the gotchas of
 multi-threaded testing in that programs often run a different path when
 debugged or even when output statements are added to try and locate issues.

 73
 Bill
 G4WJS.


 C***
 ***
 C FILE: omp_workshare1.f
 C DESCRIPTION:
 C   OpenMP Example - Loop Work-sharing - Fortran Version
 C   In this example, the iterations of a loop are scheduled dynamically
 C   across the team of threads.  A thread will perform CHUNK iterations
 C   at a time before being scheduled for the next CHUNK of work.
 C AUTHOR: Blaise Barney  5/99
 C LAST REVISED: 01/09/04

 C***
 ***

 PROGRAM WORKSHARE1

 INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,
+  OMP_GET_THREAD_NUM, N, CHUNKSIZE, CHUNK, I
 PARAMETER (N=100)
 PARAMETER (CHUNKSIZE=10)
 REAL A(N), B(N), C(N)

 ! Some initializations
 DO I = 1, N
   A(I) = I * 1.0
   B(I) = A(I)
 ENDDO
 CHUNK = CHUNKSIZE

 !$OMP PARALLEL SHARED(A,B,C,NTHREADS,CHUNK) PRIVATE(I,TID)

 TID = OMP_GET_THREAD_NUM()
 IF (TID .EQ. 0) THEN
   NTHREADS = OMP_GET_NUM_THREADS()
   PRINT *, 'Number of threads =', NTHREADS
 END IF
 PRINT *, 'Thread',TID,' starting...'

 !$OMP DO SCHEDULE(DYNAMIC,CHUNK)
 DO I = 1, N
   C(I) = A(I) + B(I)
   WRITE(*,100) TID,I,C(I)
100FORMAT(' Thread',I2,': C(',I3,')=',F8.2)
 ENDDO
 !$OMP END DO NOWAIT

 PRINT *, 'Thread',TID,' done.'

 !$OMP END PARALLEL

 END

 Mike W9MDB

 -Original Message-
 From: Bill Somerville [mailto:g4...@classdesign.com]
 Sent: Tuesday, February 03, 2015 3:17 PM
 To: wsjt-devel@lists.sourceforge.net

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Bill Somerville
Sorry, I wrote free form so you will need spaces in front of those extra 
directives to make it fixed form compatible.

73
Bill
G4WJS.

On 03/02/2015 22:19, Bill Somerville wrote:
 On 03/02/2015 22:15, Michael Black wrote:

 Hi Mike,

 read my suggestion again, particularly the word each!

 See below.

 73
 Bill
 G4WJS.
 Nope...still hangs...
 C***
 ***
 C FILE: omp_workshare1.f
 C DESCRIPTION:
 C   OpenMP Example - Loop Work-sharing - Fortran Version
 C   In this example, the iterations of a loop are scheduled dynamically
 C   across the team of threads.  A thread will perform CHUNK iterations
 C   at a time before being scheduled for the next CHUNK of work.
 C AUTHOR: Blaise Barney  5/99
 C LAST REVISED: 01/09/04
 C***
 ***

 PROGRAM WORKSHARE1

 INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,
+  OMP_GET_THREAD_NUM, N, CHUNKSIZE, CHUNK, I
 PARAMETER (N=100)
 PARAMETER (CHUNKSIZE=10)
 REAL A(N), B(N), C(N)

 ! Some initializations
 DO I = 1, N
   A(I) = I * 1.0
   B(I) = A(I)
 ENDDO
 CHUNK = CHUNKSIZE

 !$OMP PARALLEL SHARED(A,B,C,NTHREADS,CHUNK) PRIVATE(I,TID)

 TID = OMP_GET_THREAD_NUM()
 IF (TID .EQ. 0) THEN
   NTHREADS = OMP_GET_NUM_THREADS()
 !$omp critical(io)
   PRINT *, 'Number of threads =', NTHREADS
 !$omp end critical(io)
 END IF
 !$omp critical(io)
 PRINT *, 'Thread',TID,' starting...'
 !$omp end critical(io)
 !$OMP DO SCHEDULE(DYNAMIC,CHUNK)
 DO I = 1, N
   C(I) = A(I) + B(I)
 !$OMP CRITICAL(io)
   WRITE(*,100) TID,I,C(I)
 !$OMP END CRITICAL(io)
100FORMAT(' Thread',I2,': C(',I3,')=',F8.2)
 ENDDO
 !$OMP END DO NOWAIT
 !$omp critical(io)
 PRINT *, 'Thread',TID,' done.
 !$omp end critical(io)
 !$OMP END PARALLEL

 END

 -Original Message-
 From: Bill Somerville [mailto:g4...@classdesign.com]
 Sent: Tuesday, February 03, 2015 4:10 PM
 To: wsjt-devel@lists.sourceforge.net
 Subject: Re: [wsjt-devel] WSJT-X Decoder Performance

 On 03/02/2015 21:48, Michael Black wrote:

 Hi Mike,
 I ran a test using a sample program.
 Compiled with
 gfortran -g -fopenmp -o omp1 omp1.f

 Every few runs it hangs after all threads have completed.
 That program is not thread safe. All I/O statements need serializing to make
 it safe. Try adding:

 !$omp critical(io)

 immediately before each print or write and:

 !$omp end critical(io)

 immediately after each print or write.

 It also is not a good example for another reason as it doesn't conditionally
 compile some OpenMP code so it will not compile without OpenMP support. This
 is bad practice because one usually wants a program to give identical
 results (as possible) when run with and without multi-threading.
 I haven't let them run to completion...I tried the jt9_omp and let it
 run and eventually it dies without an intelligent message after quite
 a few minutes.
 jt9 is not yet suitable for multi-threaded decoding, I will be posting some
 amendments to deal with this when another unrelated issue has been tracked
 down.
 Running this program under gdb never hangs.
 GDB will interfere with the thread scheduling, that is one of the gotchas of
 multi-threaded testing in that programs often run a different path when
 debugged or even when output statements are added to try and locate issues.

 73
 Bill
 G4WJS.

 C***
 ***
 C FILE: omp_workshare1.f
 C DESCRIPTION:
 C   OpenMP Example - Loop Work-sharing - Fortran Version
 C   In this example, the iterations of a loop are scheduled dynamically
 C   across the team of threads.  A thread will perform CHUNK iterations
 C   at a time before being scheduled for the next CHUNK of work.
 C AUTHOR: Blaise Barney  5/99
 C LAST REVISED: 01/09/04

 C***
 ***
 
  PROGRAM WORKSHARE1

  INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,
 +  OMP_GET_THREAD_NUM, N, CHUNKSIZE, CHUNK, I
  PARAMETER (N=100)
  PARAMETER (CHUNKSIZE=10)
  REAL A(N), B(N), C(N)

 ! Some initializations
  DO I = 1, N
A(I) = I * 1.0
B(I) = A(I)
  ENDDO
  CHUNK = CHUNKSIZE

 !$OMP PARALLEL SHARED(A,B,C,NTHREADS,CHUNK) PRIVATE(I,TID)

  TID = OMP_GET_THREAD_NUM()
  IF (TID .EQ. 0) THEN
NTHREADS = OMP_GET_NUM_THREADS()
PRINT *, 'Number of threads =', NTHREADS
  END IF
  PRINT *, 'Thread',TID,' starting...'

 !$OMP DO SCHEDULE(DYNAMIC,CHUNK)
  DO I = 1, N
C(I) = A(I) + B(I)
WRITE(*,100) TID,I,C(I)
 100FORMAT(' Thread',I2,': C(',I3,')=',F8.2)
  ENDDO
 !$OMP END DO NOWAIT

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Bill Somerville
On 02/02/2015 19:45, Joe Taylor wrote:
 Hi all,

 I have made further improvements to the speed of the decoders in WSJT-X,
 independently of any recourse to concurrent processing in machines with
 multiple CPUs.
snip

 These measurements were made on a Windows 7 machine with 4-core i5-2500 CPU.

 Program VersionTimeDecode   #
 ---
 v1.3 r3673 2.48 s  Deepest  17
 v1.4.0-rc2, r4400  2.28Deepest  17
 v1.5, r49251.01Deepest  17
 v1.5, r49260.83Deepest  17
 v1.5, r4926 -w 2 -m 2  0.80Deepest  17
 v1.5, r49260.75Normal   16
 v1.5, r49260.69Fast 16

 The bottom line: At this stage, much has been gained by some careful
 algorithmic tuning.  The decoder in r4926 is 3 times faster than the one
 in r3673, and 2.7 times faster than the one in r4400.  In r4926 a small
 further improvement (about 4%) is obtained by using patience level -w
 2 and two threads (-m 2) for the FFTs.

 Similar speed improvements were measured on a linux machine (Core 2 Duo,
 E6750 CPU).
This is an excellent performance improvement and in the context of the 
~10 second window where successful decodes are most desirable, it is 
significant improvement to operating experience.

snip
   -- Joe, K1JT
73
Bill
G4WJS.

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Joe Taylor
Hi Bill and all,

Perhaps you already tried jt9_omp in Linux, but I had not.  I tried it 
today, and it seems to work OK, as is.

Here are some timing tests made on my rather elderly 2-core Linux 
machine.  This time all tests were made with the Deepest setting, 
ndepth=3, and all resulted in 17 good decodes of the sample file 
130610_2343.wav.  To get the times I measured real time to execute jt9 
or jt9_omp from the command-prompt.

Program Version ParamsTime
(s)

jt9 v1.3 r36732.467
jt9 v1.4.0-rc2, r4400 2.658
jt9 v1.5 r4926 -w 1 -m 1  1.243
jt9 v1.5 r4926 -w 2 -m 1  1.202
jt9 v1.5 r4926 -w 2 -m 2  1.140
jt9_omp v1.5 r4926 -w 2 -m 1  0.834
jt9_omp v1.5 r4926 -w 2 -m 2  0.843

When jt9_omp is used it's better *not* to use the multi-threaded FFTW 
plans, at least on this 2-core machine.  The two cores are already being 
used effectively by running the two big FFTs concurrently.


For interest, here are the actual outputs of a pair of timing runs with 
jt9 and jt9_omp.  Note that the decoded lines are the same, but JT65 
lines are intermingled with JT9 lines.  (I like the original ordering 
better -- first the one at the decode frequency; then others in the same 
mode in order of increasing frequancy; then thos in the other mode, 
again in order of increasing frequancy.  With effort, I guess we could 
have it both ways by letting the GUI insert decodes (after the first 
one) in the proper place in the sequence.)

#
$ time jt9 -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
2343  14  0.1 3490 @ CQ AG4M EM75
2343 -20 -1.3 3567 @ CQ TA4A KM37
2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
2343 -16  0.2 3774 @ CQ M0ABA JO01
2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
2343  -7  0.3  815 # KK4DSD W7VP -16
2343 -10  0.5  975 # CQ DL7ACA JO40
2343  -9  0.8 1089 # N2SU W0JMW R-14
2343 -11  0.8 1259 # YV6BFE F6GUU R-08
2343  -9  1.7 1471 # VA3UG F1HMR 73
2343  -1  0.6 1718 # BG THX JOE 73
2343 -15  1.3 1951 # RA3Y VE3NLS 73
2343 -20  0.4 2065 # K2OI AJ4UU R-20
DecodeFinished   0   1

real0m1.196s
user0m1.157s
sys 0m0.037s


$ time jt9_omp -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
2343  -7  0.3  815 # KK4DSD W7VP -16
2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
2343 -10  0.5  975 # CQ DL7ACA JO40
2343  -9  0.8 1089 # N2SU W0JMW R-14
2343 -11  0.8 1259 # YV6BFE F6GUU R-08
2343  -9  1.7 1471 # VA3UG F1HMR 73
2343  14  0.1 3490 @ CQ AG4M EM75
2343 -20 -1.3 3567 @ CQ TA4A KM37
2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
2343 -16  0.2 3774 @ CQ M0ABA JO01
2343  -1  0.6 1718 # BG THX JOE 73
2343 -15  1.3 1951 # RA3Y VE3NLS 73
2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
2343 -20  0.4 2065 # K2OI AJ4UU R-20
DecodeFinished   0   1

real0m0.806s
user0m1.260s
sys 0m0.055s

#

In its present state the jt9_omp code does not run in Windows.  I 
haven't yet determined why.

-- Joe, K1JT

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Joe Taylor
Hi Greg,

Yes, we need the -fopenmp flag to be set.  I think that's being done 
appropriately now in CMakeLists.txt, although I confess I'm not always 
confident that I've understood its syntax fully.  Bill should be able to 
give the definitive answer.

-- Joe

On 2/3/2015 3:36 PM, ki...@yahoo.com wrote:
 Hi Joe,

 I'm not 100% certain on this, but for openmp on Windows, don't you have
 to enable that as a C flag with something l-ke: -fopenmp

 I would assume that's to be done in the CMakeLists.txt file. I'm not
 sure about linking the libraries.

 The Qt5 Tool chain has winpthreads and gcc -v has --enable-libgomp so
 the tool chain looks to be OpenMP capable.

 73's
 Greg, KI7MT


 On 2/3/2015 1:06 PM, Joe Taylor wrote:
 Hi Bill and all,

 Perhaps you already tried jt9_omp in Linux, but I had not.  I tried it
 today, and it seems to work OK, as is.

 Here are some timing tests made on my rather elderly 2-core Linux
 machine.  This time all tests were made with the Deepest setting,
 ndepth=3, and all resulted in 17 good decodes of the sample file
 130610_2343.wav.  To get the times I measured real time to execute jt9
 or jt9_omp from the command-prompt.

 Program Version ParamsTime
   (s)
 
 jt9 v1.3 r36732.467
 jt9 v1.4.0-rc2, r4400 2.658
 jt9 v1.5 r4926 -w 1 -m 1  1.243
 jt9 v1.5 r4926 -w 2 -m 1  1.202
 jt9 v1.5 r4926 -w 2 -m 2  1.140
 jt9_omp v1.5 r4926 -w 2 -m 1  0.834
 jt9_omp v1.5 r4926 -w 2 -m 2  0.843

 When jt9_omp is used it's better *not* to use the multi-threaded FFTW
 plans, at least on this 2-core machine.  The two cores are already being
 used effectively by running the two big FFTs concurrently.


 For interest, here are the actual outputs of a pair of timing runs with
 jt9 and jt9_omp.  Note that the decoded lines are the same, but JT65
 lines are intermingled with JT9 lines.  (I like the original ordering
 better -- first the one at the decode frequency; then others in the same
 mode in order of increasing frequancy; then thos in the other mode,
 again in order of increasing frequancy.  With effort, I guess we could
 have it both ways by letting the GUI insert decodes (after the first
 one) in the proper place in the sequence.)

 #
 $ time jt9 -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished0   1

 real0m1.196s
 user0m1.157s
 sys 0m0.037s


 $ time jt9_omp -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished0   1

 real0m0.806s
 user0m1.260s
 sys 0m0.055s

 #

 In its present state the jt9_omp code does not run in Windows.  I
 haven't yet determined why.

  -- Joe, K1JT

 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel


 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for 

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Bill Somerville
On 03/02/2015 20:06, Joe Taylor wrote:
 Hi Bill and all,
Hi Joe,

 Perhaps you already tried jt9_omp in Linux, but I had not.  I tried it
 today, and it seems to work OK, as is.
There are definitely thread safety issues but I think I have most of 
them dealt with. I will commit them soon after I have done some basic 
validation checks.

 Here are some timing tests made on my rather elderly 2-core Linux
 machine.  This time all tests were made with the Deepest setting,
 ndepth=3, and all resulted in 17 good decodes of the sample file
 130610_2343.wav.  To get the times I measured real time to execute jt9
 or jt9_omp from the command-prompt.

 Program Version ParamsTime
  (s)
 
 jt9 v1.3 r36732.467
 jt9 v1.4.0-rc2, r4400 2.658
 jt9 v1.5 r4926 -w 1 -m 1  1.243
 jt9 v1.5 r4926 -w 2 -m 1  1.202
 jt9 v1.5 r4926 -w 2 -m 2  1.140
 jt9_omp v1.5 r4926 -w 2 -m 1  0.834
 jt9_omp v1.5 r4926 -w 2 -m 2  0.843

 When jt9_omp is used it's better *not* to use the multi-threaded FFTW
 plans, at least on this 2-core machine.  The two cores are already being
 used effectively by running the two big FFTs concurrently.
Yes that will need a basic algorithm to avoid using more threads than 
CPUs at any time.


 For interest, here are the actual outputs of a pair of timing runs with
 jt9 and jt9_omp.  Note that the decoded lines are the same, but JT65
 lines are intermingled with JT9 lines.  (I like the original ordering
 better -- first the one at the decode frequency; then others in the same
 mode in order of increasing frequancy; then thos in the other mode,
 again in order of increasing frequancy.  With effort, I guess we could
 have it both ways by letting the GUI insert decodes (after the first
 one) in the proper place in the sequence.)

 #
 $ time jt9 -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished   0   1

 real0m1.196s
 user0m1.157s
 sys 0m0.037s


 $ time jt9_omp -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished   0   1

 real0m0.806s
 user0m1.260s
 sys 0m0.055s

 #

 In its present state the jt9_omp code does not run in Windows.  I
 haven't yet determined why.
I am seeing that too but not every run, I am looking for the cause(s).

   -- Joe, K1JT
73
Bill
G4WJS.

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread ki7mt
Hi Joe,

I'm not 100% certain on this, but for openmp on Windows, don't you have 
to enable that as a C flag with something l-ke: -fopenmp

I would assume that's to be done in the CMakeLists.txt file. I'm not 
sure about linking the libraries.

The Qt5 Tool chain has winpthreads and gcc -v has --enable-libgomp so 
the tool chain looks to be OpenMP capable.

73's
Greg, KI7MT


On 2/3/2015 1:06 PM, Joe Taylor wrote:
 Hi Bill and all,

 Perhaps you already tried jt9_omp in Linux, but I had not.  I tried it
 today, and it seems to work OK, as is.

 Here are some timing tests made on my rather elderly 2-core Linux
 machine.  This time all tests were made with the Deepest setting,
 ndepth=3, and all resulted in 17 good decodes of the sample file
 130610_2343.wav.  To get the times I measured real time to execute jt9
 or jt9_omp from the command-prompt.

 Program Version ParamsTime
  (s)
 
 jt9 v1.3 r36732.467
 jt9 v1.4.0-rc2, r4400 2.658
 jt9 v1.5 r4926 -w 1 -m 1  1.243
 jt9 v1.5 r4926 -w 2 -m 1  1.202
 jt9 v1.5 r4926 -w 2 -m 2  1.140
 jt9_omp v1.5 r4926 -w 2 -m 1  0.834
 jt9_omp v1.5 r4926 -w 2 -m 2  0.843

 When jt9_omp is used it's better *not* to use the multi-threaded FFTW
 plans, at least on this 2-core machine.  The two cores are already being
 used effectively by running the two big FFTs concurrently.


 For interest, here are the actual outputs of a pair of timing runs with
 jt9 and jt9_omp.  Note that the decoded lines are the same, but JT65
 lines are intermingled with JT9 lines.  (I like the original ordering
 better -- first the one at the decode frequency; then others in the same
 mode in order of increasing frequancy; then thos in the other mode,
 again in order of increasing frequancy.  With effort, I guess we could
 have it both ways by letting the GUI insert decodes (after the first
 one) in the proper place in the sequence.)

 #
 $ time jt9 -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished   0   1

 real0m1.196s
 user0m1.157s
 sys 0m0.037s


 $ time jt9_omp -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished   0   1

 real0m0.806s
 user0m1.260s
 sys 0m0.055s

 #

 In its present state the jt9_omp code does not run in Windows.  I
 haven't yet determined why.

   -- Joe, K1JT

 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread ki7mt
HI Joe,

Bill may be still working on this. I couldn't' find several things 
including the -fopenmp flag, so I'll just wait and see what shakes out.

73's
Greg, KI7MT

On 2/3/2015 2:04 PM, Joe Taylor wrote:
 Hi Greg,

 Yes, we need the -fopenmp flag to be set.  I think that's being done
 appropriately now in CMakeLists.txt, although I confess I'm not always
 confident that I've understood its syntax fully.  Bill should be able to
 give the definitive answer.

   -- Joe

 On 2/3/2015 3:36 PM, ki...@yahoo.com wrote:
 Hi Joe,

 I'm not 100% certain on this, but for openmp on Windows, don't you have
 to enable that as a C flag with something l-ke: -fopenmp

 I would assume that's to be done in the CMakeLists.txt file. I'm not
 sure about linking the libraries.

 The Qt5 Tool chain has winpthreads and gcc -v has --enable-libgomp so
 the tool chain looks to be OpenMP capable.

 73's
 Greg, KI7MT


 On 2/3/2015 1:06 PM, Joe Taylor wrote:
 Hi Bill and all,

 Perhaps you already tried jt9_omp in Linux, but I had not.  I tried it
 today, and it seems to work OK, as is.

 Here are some timing tests made on my rather elderly 2-core Linux
 machine.  This time all tests were made with the Deepest setting,
 ndepth=3, and all resulted in 17 good decodes of the sample file
 130610_2343.wav.  To get the times I measured real time to execute jt9
 or jt9_omp from the command-prompt.

 Program Version ParamsTime
(s)
 
 jt9 v1.3 r36732.467
 jt9 v1.4.0-rc2, r4400 2.658
 jt9 v1.5 r4926 -w 1 -m 1  1.243
 jt9 v1.5 r4926 -w 2 -m 1  1.202
 jt9 v1.5 r4926 -w 2 -m 2  1.140
 jt9_omp v1.5 r4926 -w 2 -m 1  0.834
 jt9_omp v1.5 r4926 -w 2 -m 2  0.843

 When jt9_omp is used it's better *not* to use the multi-threaded FFTW
 plans, at least on this 2-core machine.  The two cores are already being
 used effectively by running the two big FFTs concurrently.


 For interest, here are the actual outputs of a pair of timing runs with
 jt9 and jt9_omp.  Note that the decoded lines are the same, but JT65
 lines are intermingled with JT9 lines.  (I like the original ordering
 better -- first the one at the decode frequency; then others in the same
 mode in order of increasing frequancy; then thos in the other mode,
 again in order of increasing frequancy.  With effort, I guess we could
 have it both ways by letting the GUI insert decodes (after the first
 one) in the proper place in the sequence.)

 #
 $ time jt9 -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished0   1

 real0m1.196s
 user0m1.157s
 sys 0m0.037s


 $ time jt9_omp -p 1 -d 3 -w 2 -m 1 130610_2343.wav  junk
 2343 -20  0.3  718 # VE6WQ SQ2NIJ -14
 2343  -9  0.3 3196 @ WB8QPG IZ0MIT -11
 2343  -7  0.3  815 # KK4DSD W7VP -16
 2343 -18  1.0 3372 @ KK4HEG KE0CO CN87
 2343 -10  0.5  975 # CQ DL7ACA JO40
 2343  -9  0.8 1089 # N2SU W0JMW R-14
 2343 -11  0.8 1259 # YV6BFE F6GUU R-08
 2343  -9  1.7 1471 # VA3UG F1HMR 73
 2343  14  0.1 3490 @ CQ AG4M EM75
 2343 -20 -1.3 3567 @ CQ TA4A KM37
 2343 -15  0.1 3627 @ CT1FBK IK5YZT R+02
 2343 -23  0.3 3721 @ KF5SLN KB1SUA FN42
 2343 -16  0.2 3774 @ CQ M0ABA JO01
 2343  -1  0.6 1718 # BG THX JOE 73
 2343 -15  1.3 1951 # RA3Y VE3NLS 73
 2343  -2  0.2 3843 @ EI3HGB DD2EE JO31
 2343 -20  0.4 2065 # K2OI AJ4UU R-20
 DecodeFinished0   1

 real0m0.806s
 user0m1.260s
 sys 0m0.055s

 #

 In its present state the jt9_omp code does not run in Windows.  I
 haven't yet determined why.

 -- Joe, K1JT

 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel


 

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-03 Thread Michael Black
I ran a test using a sample program.  
Compiled with
gfortran -g -fopenmp -o omp1 omp1.f

Every few runs it hangs after all threads have completed.
I haven't let them run to completion...I tried the jt9_omp and let it run
and eventually it dies without an intelligent message after quite a few
minutes.
Running this program under gdb never hangs.


C***
***
C FILE: omp_workshare1.f
C DESCRIPTION:
C   OpenMP Example - Loop Work-sharing - Fortran Version
C   In this example, the iterations of a loop are scheduled dynamically
C   across the team of threads.  A thread will perform CHUNK iterations
C   at a time before being scheduled for the next CHUNK of work.
C AUTHOR: Blaise Barney  5/99
C LAST REVISED: 01/09/04
C***
***
 
  PROGRAM WORKSHARE1

  INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,
 +  OMP_GET_THREAD_NUM, N, CHUNKSIZE, CHUNK, I
  PARAMETER (N=100)
  PARAMETER (CHUNKSIZE=10) 
  REAL A(N), B(N), C(N)

! Some initializations
  DO I = 1, N
A(I) = I * 1.0
B(I) = A(I)
  ENDDO
  CHUNK = CHUNKSIZE

!$OMP PARALLEL SHARED(A,B,C,NTHREADS,CHUNK) PRIVATE(I,TID)

  TID = OMP_GET_THREAD_NUM()
  IF (TID .EQ. 0) THEN
NTHREADS = OMP_GET_NUM_THREADS()
PRINT *, 'Number of threads =', NTHREADS
  END IF
  PRINT *, 'Thread',TID,' starting...'

!$OMP DO SCHEDULE(DYNAMIC,CHUNK)
  DO I = 1, N
C(I) = A(I) + B(I)
WRITE(*,100) TID,I,C(I)
 100FORMAT(' Thread',I2,': C(',I3,')=',F8.2)
  ENDDO
!$OMP END DO NOWAIT

  PRINT *, 'Thread',TID,' done.'

!$OMP END PARALLEL

  END

Mike W9MDB

-Original Message-
From: Bill Somerville [mailto:g4...@classdesign.com] 
Sent: Tuesday, February 03, 2015 3:17 PM
To: wsjt-devel@lists.sourceforge.net
Subject: Re: [wsjt-devel] WSJT-X Decoder Performance

On 03/02/2015 21:04, Joe Taylor wrote:
 Hi Greg,
Hi Greg  Joe,

 Yes, we need the -fopenmp flag to be set.  I think that's being done 
 appropriately now in CMakeLists.txt, although I confess I'm not always 
 confident that I've understood its syntax fully.  Bill should be able 
 to give the definitive answer.
Yes that is correct.

The CMake script uses the package finder for OpenMP (part of the CMake
distribution) to find the OpenMP capabilities of the compilers on the build
platform. That sets the variable OPENMP_FOUND (or OPENMP-NOTFOUND if it's
not available) along with the result variables OpenMP_C_FLAGS,
OpenMP_CXX_FLAGS and, OpenMP_Fortran_FLAGS (actually this last one is only
set by CMake v3.1 and later so I have substituted the C compiler flag as it
is the same for the compilers we use at present). These flags are added to
'*_omp' source compiles.

The CMake script builds two versions of the internal static library target
'wsjt', the second target is 'wsjt_omp'. It also builds two versions of the
'jt9' target, the second being 'jt9_omp' which itself depends on the
'wsjt_omp' library.

The 'wsjt' library is basically all the Fortran and C modules that are used
by jt9, jt9code, jt9sim, jt65code and wsjt-x.

   -- Joe
73
Bill
G4WJS.

 On 2/3/2015 3:36 PM, ki...@yahoo.com wrote:
 Hi Joe,

 I'm not 100% certain on this, but for openmp on Windows, don't you 
 have to enable that as a C flag with something l-ke: -fopenmp

 I would assume that's to be done in the CMakeLists.txt file. I'm not 
 sure about linking the libraries.

 The Qt5 Tool chain has winpthreads and gcc -v has --enable-libgomp so 
 the tool chain looks to be OpenMP capable.

 73's
 Greg, KI7MT


 On 2/3/2015 1:06 PM, Joe Taylor wrote:
 Hi Bill and all,

 Perhaps you already tried jt9_omp in Linux, but I had not.  I tried 
 it today, and it seems to work OK, as is.

 Here are some timing tests made on my rather elderly 2-core Linux 
 machine.  This time all tests were made with the Deepest setting, 
 ndepth=3, and all resulted in 17 good decodes of the sample file 
 130610_2343.wav.  To get the times I measured real time to execute 
 jt9 or jt9_omp from the command-prompt.

 Program Version ParamsTime
(s)
 
 jt9 v1.3 r36732.467
 jt9 v1.4.0-rc2, r4400 2.658
 jt9 v1.5 r4926 -w 1 -m 1  1.243
 jt9 v1.5 r4926 -w 2 -m 1  1.202
 jt9 v1.5 r4926 -w 2 -m 2  1.140
 jt9_omp v1.5 r4926 -w 2 -m 1  0.834
 jt9_omp v1.5 r4926 -w 2 -m 2  0.843

 When jt9_omp is used it's better *not* to use the multi-threaded 
 FFTW plans, at least on this 2-core machine.  The two cores are 
 already being used effectively by running the two big FFTs concurrently.


 For interest, here are the actual outputs of a pair of timing runs 
 with
 jt9 and jt9_omp.  Note that the decoded lines are the same, but JT65 
 lines are intermingled with JT9 lines.  (I like the original 
 ordering better -- first the one at the decode frequency

Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-02 Thread Jim Pennino
Can one download an executable with these changes?




On Mon, 2/2/15, Joe Taylor j...@princeton.edu wrote:

 Subject: [wsjt-devel] WSJT-X Decoder Performance
 To: wsjt-devel@lists.sourceforge.net
 Date: Monday, February 2, 2015, 11:45 AM
 
 Hi all,
 
 I have made further improvements to the speed of the
 decoders in WSJT-X, 
 independently of any recourse to concurrent processing in
 machines with 
 multiple CPUs.  The changes involve
 
 1. Making better choices for NFFT1 and NFFT2 (the lengths of
 forward and 
 inverse FFTs in the JT9 downsampler.
 
 2. Adjusting values of limit (the Fano timeout parameter)
 and ccflim 
 (JT9 synchronizing threshold) under specified conditions.
 
 3. Using -O3 for the gfortran optimizer level.
 
 The following table presents measurements of decoding speed
 for a number 
 of tests using WSJT-X versions 1.3, 1.4.0-rc2, 1.5r4925, and
 1.5r4926. 
 Time gives the time is seconds to decode the sample file 
 130610_2343.wav, which has 8 decodable JT9 signals and 9
 decodable JT65 
 signals.  Decode is the setting on the WSJT-X
 *Decode* menu.  The 
 column labeled # gives the number of decoded
 signals.  (Note that 
 selecting Deepest is required in order to decode one of
 the JT9 
 signals.)
 
 These measurements were made on a Windows 7 machine with
 4-core i5-2500 CPU.
 
 Program Version        Time   
 Decode   #
 ---
 v1.3 r3673         
    2.48 s  Deepest  17
 v1.4.0-rc2, r4400      2.28   
 Deepest  17
 v1.5, r4925           
 1.01    Deepest  17
 v1.5, r4926           
 0.83    Deepest  17
 v1.5, r4926 -w 2 -m 2  0.80    Deepest 
 17
 v1.5, r4926           
 0.75    Normal   16
 v1.5, r4926           
 0.69    Fast     16
 
 The bottom line: At this stage, much has been gained by some
 careful 
 algorithmic tuning.  The decoder in r4926 is 3 times
 faster than the one 
 in r3673, and 2.7 times faster than the one in r4400. 
 In r4926 a small 
 further improvement (about 4%) is obtained by using patience
 level -w 
 2 and two threads (-m 2) for the FFTs.
 
 Similar speed improvements were measured on a linux machine
 (Core 2 Duo, 
 E6750 CPU).
 
 A further speed improvement around 10% should be obtainable
 by computing 
 the JT65 symbol spectra (subroutine symspec65) on the fly,
 during the Rx 
 minute, rather than as part of the end-of-minute *Decode*
 procedure. 
 (This is already done for the JT9 symbol spectra.)  My
 current view is 
 that beyond that step, further speed improvement on
 single-core machines 
 (or single-core processing on multi-core machines, as in all
 of the 
 tabulated tests except one) will be difficult.
 
 Further improvements can probably be made by using more than
 one core 
 concurrently, e.g., by using OpenMP.  As I mentioned
 before, the biggest 
 (or at least easiest) gain may come from running the JT9 and
 JT65 
 decoders concurrently.  It's hard to know whether the
 gains will be 
 worthwhile, without trying.  The programming effort may
 not be trivial.
 
     -- Joe, K1JT
 
 --
 Dive into the World of Parallel Programming. The Go Parallel
 Website,
 sponsored by Intel and developed in partnership with
 Slashdot Media, is your
 hub for all things parallel software development, from
 weekly thought
 leadership blogs to news, videos, case studies, tutorials
 and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel
 

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-02 Thread Joe Taylor
I should have mentioned that if you're especially interested in snappy 
performance of the JT9 decoder, you may consider the penalty for using 
menu setting Decode | Fast to be unimportant.  It will cost you 
nothing at all at the QSO frequency: that first decoding attempt is 
always done at Deepest level.

-- Joe, K1JT

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel


Re: [wsjt-devel] WSJT-X Decoder Performance

2015-02-02 Thread Joe Taylor
Hi Jim,


On 2/2/2015 5:11 PM, Jim Pennino wrote:
 Can one download an executable with these changes?

No, not yet.  Like everything else in the development (aka v1.5) branch, 
these changes are available only on a compile-it-yourself basis.

-- 73, Joe, K1JT

 
 On Mon, 2/2/15, Joe Taylorj...@princeton.edu  wrote:

   Subject: [wsjt-devel] WSJT-X Decoder Performance
   To: wsjt-devel@lists.sourceforge.net
   Date: Monday, February 2, 2015, 11:45 AM

   Hi all,

   I have made further improvements to the speed of the
   decoders in WSJT-X,
   independently of any recourse to concurrent processing in
   machines with
   multiple CPUs.  The changes involve

   1. Making better choices for NFFT1 and NFFT2 (the lengths of
   forward and
   inverse FFTs in the JT9 downsampler.

   2. Adjusting values of limit (the Fano timeout parameter)
   and ccflim
   (JT9 synchronizing threshold) under specified conditions.

   3. Using -O3 for the gfortran optimizer level.

   The following table presents measurements of decoding speed
   for a number
   of tests using WSJT-X versions 1.3, 1.4.0-rc2, 1.5r4925, and
   1.5r4926.
   Time gives the time is seconds to decode the sample file
   130610_2343.wav, which has 8 decodable JT9 signals and 9
   decodable JT65
   signals.  Decode is the setting on the WSJT-X
   *Decode* menu.  The
   column labeled # gives the number of decoded
   signals.  (Note that
   selecting Deepest is required in order to decode one of
   the JT9
   signals.)

   These measurements were made on a Windows 7 machine with
   4-core i5-2500 CPU.

   Program VersionTime
   Decode   #
   ---
   v1.3 r3673
  2.48 s  Deepest  17
   v1.4.0-rc2, r4400  2.28
   Deepest  17
   v1.5, r4925
   1.01Deepest  17
   v1.5, r4926
   0.83Deepest  17
   v1.5, r4926 -w 2 -m 2  0.80Deepest
   17
   v1.5, r4926
   0.75Normal   16
   v1.5, r4926
   0.69Fast 16

   The bottom line: At this stage, much has been gained by some
   careful
   algorithmic tuning.  The decoder in r4926 is 3 times
   faster than the one
   in r3673, and 2.7 times faster than the one in r4400.
   In r4926 a small
   further improvement (about 4%) is obtained by using patience
   level -w
   2 and two threads (-m 2) for the FFTs.

   Similar speed improvements were measured on a linux machine
   (Core 2 Duo,
   E6750 CPU).

   A further speed improvement around 10% should be obtainable
   by computing
   the JT65 symbol spectra (subroutine symspec65) on the fly,
   during the Rx
   minute, rather than as part of the end-of-minute *Decode*
   procedure.
   (This is already done for the JT9 symbol spectra.)  My
   current view is
   that beyond that step, further speed improvement on
   single-core machines
   (or single-core processing on multi-core machines, as in all
   of the
   tabulated tests except one) will be difficult.

   Further improvements can probably be made by using more than
   one core
   concurrently, e.g., by using OpenMP.  As I mentioned
   before, the biggest
   (or at least easiest) gain may come from running the JT9 and
   JT65
   decoders concurrently.  It's hard to know whether the
   gains will be
   worthwhile, without trying.  The programming effort may
   not be trivial.

   -- Joe, K1JT

   
 --
   Dive into the World of Parallel Programming. The Go Parallel
   Website,
   sponsored by Intel and developed in partnership with
   Slashdot Media, is your
   hub for all things parallel software development, from
   weekly thought
   leadership blogs to news, videos, case studies, tutorials
   and more. Take a
   look and join the conversation now. http://goparallel.sourceforge.net/
   ___
   wsjt-devel mailing list
   wsjt-devel@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/wsjt-devel


 --
 Dive into the World of Parallel Programming. The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net/
 ___
 wsjt-devel mailing list
 wsjt-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/wsjt-devel

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more.