Re: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X
Hi Bill, Has the patch been checked-in or will it be available only via the patch ? 73's Greg, KI7MT On 2/4/2015 10:01 AM, Bill Somerville wrote: Hi All, it looks like the latest jt9 using OpenMP and multi-threaded FFTs along with Joe's recent re-factorings for performance seem to be approaching stability. If anyone wants to try them out on air with WSJT-X, the attached patch will allow WSJT-X to be built with them enabled. Note that the patch enables a fairly lengthy FFT plan optimization and the first decode cycle may take a few minutes to complete, do not kill the program as the accumulated FFT wisdom is written out at the end of a session. Once the FFTW wisdom is saved there will be no further delays. Testing here results in decodes that are so fast that it hardly seems worth looking fro any more performance improvements until we start getting 100's concurrent of QSOs per band. Impressive stuff! Running on a quad-core Core i7 laptop here. 73 Bill G4WJS. -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X
On 04/02/2015 17:09, ki...@yahoo.com wrote: Hi Bill, Hi Greg, Has the patch been checked-in or will it be available only via the patch ? As it will not work on Mac at this stage I cannot check it in. For now I think the developers who build themselves should try it out for a while. Given that multi-threading is very hard to empirically test, there are bound to be a few outstanding problems to solve anyway. 73's Greg, KI7MT 73 Bill G4WJS. On 2/4/2015 10:01 AM, Bill Somerville wrote: Hi All, it looks like the latest jt9 using OpenMP and multi-threaded FFTs along with Joe's recent re-factorings for performance seem to be approaching stability. If anyone wants to try them out on air with WSJT-X, the attached patch will allow WSJT-X to be built with them enabled. Note that the patch enables a fairly lengthy FFT plan optimization and the first decode cycle may take a few minutes to complete, do not kill the program as the accumulated FFT wisdom is written out at the end of a session. Once the FFTW wisdom is saved there will be no further delays. Testing here results in decodes that are so fast that it hardly seems worth looking fro any more performance improvements until we start getting 100's concurrent of QSOs per band. Impressive stuff! Running on a quad-core Core i7 laptop here. 73 Bill G4WJS. -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X
Hi Bill, Ok, I'll just wait then, as I can't process / act upon more than 10 or 12 decodes in 8 seconds anyway. 73's Greg, KI7MT On 2/4/2015 10:13 AM, Bill Somerville wrote: On 04/02/2015 17:09, ki...@yahoo.com wrote: Hi Bill, Hi Greg, Has the patch been checked-in or will it be available only via the patch ? As it will not work on Mac at this stage I cannot check it in. For now I think the developers who build themselves should try it out for a while. Given that multi-threading is very hard to empirically test, there are bound to be a few outstanding problems to solve anyway. 73's Greg, KI7MT 73 Bill G4WJS. On 2/4/2015 10:01 AM, Bill Somerville wrote: Hi All, it looks like the latest jt9 using OpenMP and multi-threaded FFTs along with Joe's recent re-factorings for performance seem to be approaching stability. If anyone wants to try them out on air with WSJT-X, the attached patch will allow WSJT-X to be built with them enabled. Note that the patch enables a fairly lengthy FFT plan optimization and the first decode cycle may take a few minutes to complete, do not kill the program as the accumulated FFT wisdom is written out at the end of a session. Once the FFTW wisdom is saved there will be no further delays. Testing here results in decodes that are so fast that it hardly seems worth looking fro any more performance improvements until we start getting 100's concurrent of QSOs per band. Impressive stuff! Running on a quad-core Core i7 laptop here. 73 Bill G4WJS. -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X
On 04/02/2015 19:49, Joe Taylor wrote: Hi Joe, On 2/4/2015 2:26 PM, Michael Black wrote: Not sure we want more than 1 thread As I demonstrated and wrote here several hours ago: When using OpenMP to run JT9 and JT65 decoders in parallel, we gain almost nothing by using multi-threading for the FFTW plans. I think this will remain true. I recommend using -w 2 -m 1 to set up the FFTW plans, and using two threads (and only two) for the parallel sections initiated in decoder.f90 on multi-core machines. Before completely discarding MT FFTW3 I would like to try something. Can you give a brief rough summary of all the FFT sizes used by jt9? If there are many smaller FFT being run then I think their plans should be limited to 1 thread and only unleash 2 or more threads for the big FFTS. -- Joe 73 Bill G4WJS. -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X
Can't this be decided by objective testing? Can anybody show any advantage to -m 1 with r4940? And we can always revisit this if it ever proves to be worthwhile. On both my machines -m 1 is the best time by about 20% now. I'm using this now -- the argument you pass is the # of threads. Easily adaptable to Unix. Mike W9MDB #include stdio.h #include math.h int main(int argc,char *argv[]) { char cmd[4096]; double total=0; int n=0; int nthreads=1; char buf[4096]; if (argc 1) { nthreads = atoi(argv[1]); } printf(Testing %d threads\n,nthreads); sprintf(cmd,TimeMem-1.0.exe jt9_omp -p 1 -d 3 -w 2 -m %d 130610_2343.wav | grep Elapsed | cut -f2 -d: doit.txt,nthreads); while(1) { system(cmd); FILE *fp=fopen(doit.txt,r); fgets(buf,sizeof(buf),fp); fclose(fp); double sec = atof(buf); ++n; total+=sec; double avg = total/n; if (sec avg*1.5) { printf(\nlong run %.2f avg=%.2f\n,sec,avg); } printf(%d sec=%.2f avg=%.2f\r,n,sec,avg); fflush(stdout); } } -Original Message- From: Bill Somerville [mailto:g4...@classdesign.com] Sent: Wednesday, February 04, 2015 1:55 PM To: wsjt-devel@lists.sourceforge.net Subject: Re: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X On 04/02/2015 19:49, Joe Taylor wrote: Hi Joe, On 2/4/2015 2:26 PM, Michael Black wrote: Not sure we want more than 1 thread As I demonstrated and wrote here several hours ago: When using OpenMP to run JT9 and JT65 decoders in parallel, we gain almost nothing by using multi-threading for the FFTW plans. I think this will remain true. I recommend using -w 2 -m 1 to set up the FFTW plans, and using two threads (and only two) for the parallel sections initiated in decoder.f90 on multi-core machines. Before completely discarding MT FFTW3 I would like to try something. Can you give a brief rough summary of all the FFT sizes used by jt9? If there are many smaller FFT being run then I think their plans should be limited to 1 thread and only unleash 2 or more threads for the big FFTS. -- Joe 73 Bill G4WJS. -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X
On 2/4/2015 2:26 PM, Michael Black wrote: Not sure we want more than 1 thread As I demonstrated and wrote here several hours ago: When using OpenMP to run JT9 and JT65 decoders in parallel, we gain almost nothing by using multi-threading for the FFTW plans. I think this will remain true. I recommend using -w 2 -m 1 to set up the FFTW plans, and using two threads (and only two) for the parallel sections initiated in decoder.f90 on multi-core machines. -- Joe -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X
On 04/02/2015 19:26, Michael Black wrote: Hi Mike, Not sure we want more than 1 threadmy testing shows this on my 8-core box since this patch would give me 6 threads. Agreed, the patch uses a trivially crude algorithm for the FFT thread count which will use way too many threads on a processor with more than 4 CPUs. In your case that will be 14 threads for FFTs :( I think the old improvement 1 thread showed was overtaken by multi-threading the top level. You need to ensure that you run at least one decode in each mode before running any timing tests. Changing the number of FFT threads requires new FFT wisdom to be calculated. You could try: , -m, QString::number (qMin (qMax (QThread::idealThreadCount () - 2, 1), 2)) //FFTW threads for line 379 of mainwindow.cpp for a more controlled thread utilization.That sort of approach will also give sane results for those who run multiple instances of WSJT-X with multiple RX SDRs. We need to fine tune the FFT plans with personalized thread counts and perhaps have some user setting to set a maximum number of compute intensive threads. Thread1 2 3 4 5 6 4930 1.101.271.241.231.291.31 1.101.211.251.251.271.31 1.091.251.231.221.291.32 1.091.221.241.221.271.29 1.111.231.221.231.271.33 1.121.231.251.271.291.31 1.081.221.241.231.261.30 1.101.251.231.251.311.30 1.141.231.241.261.271.32 1.131.221.221.231.281.29 1.111.231.251.251.271.32 1.121.231.231.241.271.29 1.121.251.241.231.271.29 1.081.241.251.241.281.34 1.081.231.231.211.301.29 1.081.271.241.251.281.27 1.151.251.251.291.281.30 1.131.261.221.271.271.30 1.101.251.231.241.271.30 Avg 1.091.251.241.241.281.30 73 Bill G4WJS. -Original Message- From: Bill Somerville [mailto:g4...@classdesign.com] Sent: Wednesday, February 04, 2015 11:02 AM To: WSJT software development Subject: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X Hi All, it looks like the latest jt9 using OpenMP and multi-threaded FFTs along with Joe's recent re-factorings for performance seem to be approaching stability. If anyone wants to try them out on air with WSJT-X, the attached patch will allow WSJT-X to be built with them enabled. Note that the patch enables a fairly lengthy FFT plan optimization and the first decode cycle may take a few minutes to complete, do not kill the program as the accumulated FFT wisdom is written out at the end of a session. Once the FFTW wisdom is saved there will be no further delays. Testing here results in decodes that are so fast that it hardly seems worth looking fro any more performance improvements until we start getting 100's concurrent of QSOs per band. Impressive stuff! Running on a quad-core Core i7 laptop here. 73 Bill G4WJS. -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel
Re: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X
Not sure we want more than 1 threadmy testing shows this on my 8-core box since this patch would give me 6 threads. I think the old improvement 1 thread showed was overtaken by multi-threading the top level. Thread 1 2 3 4 5 6 49301.101.271.241.231.291.31 1.101.211.251.251.271.31 1.091.251.231.221.291.32 1.091.221.241.221.271.29 1.111.231.221.231.271.33 1.121.231.251.271.291.31 1.081.221.241.231.261.30 1.101.251.231.251.311.30 1.141.231.241.261.271.32 1.131.221.221.231.281.29 1.111.231.251.251.271.32 1.121.231.231.241.271.29 1.121.251.241.231.271.29 1.081.241.251.241.281.34 1.081.231.231.211.301.29 1.081.271.241.251.281.27 1.151.251.251.291.281.30 1.131.261.221.271.271.30 1.101.251.231.241.271.30 Avg 1.091.251.241.241.281.30 -Original Message- From: Bill Somerville [mailto:g4...@classdesign.com] Sent: Wednesday, February 04, 2015 11:02 AM To: WSJT software development Subject: [wsjt-devel] WSJT-X: Using the latest decoder improvements in WSJT-X Hi All, it looks like the latest jt9 using OpenMP and multi-threaded FFTs along with Joe's recent re-factorings for performance seem to be approaching stability. If anyone wants to try them out on air with WSJT-X, the attached patch will allow WSJT-X to be built with them enabled. Note that the patch enables a fairly lengthy FFT plan optimization and the first decode cycle may take a few minutes to complete, do not kill the program as the accumulated FFT wisdom is written out at the end of a session. Once the FFTW wisdom is saved there will be no further delays. Testing here results in decodes that are so fast that it hardly seems worth looking fro any more performance improvements until we start getting 100's concurrent of QSOs per band. Impressive stuff! Running on a quad-core Core i7 laptop here. 73 Bill G4WJS. -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel