[Bug libfortran/51119] MATMUL slow for large matrices

2017-05-28 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Bug 51119 depends on bug 37131, which changed state. Bug 37131 Summary: inline matmul for small matrix sizes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37131 What|Removed |Added ---

[Bug libfortran/51119] MATMUL slow for large matrices

2017-05-08 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Bug 51119 depends on bug 68600, which changed state. Bug 68600 Summary: Inlined MATMUL is too slow. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600 What|Removed |Added

[Bug libfortran/51119] MATMUL slow for large matrices

2017-02-26 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #49 from Thomas Koenig --- Author: tkoenig Date: Sun Feb 26 13:22:43 2017 New Revision: 245745 URL: https://gcc.gnu.org/viewcvs?rev=245745&root=gcc&view=rev Log: 2017-02-26 Thomas Koenig PR fortran/51119 * options

[Bug libfortran/51119] MATMUL slow for large matrices

2016-12-22 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Bug 51119 depends on bug 66189, which changed state. Bug 66189 Summary: Block loops for inline matmul https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66189 What|Removed |Added --

[Bug libfortran/51119] MATMUL slow for large matrices

2016-12-03 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Jerry DeLisle changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-16 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #47 from Jerry DeLisle --- Author: jvdelisle Date: Wed Nov 16 21:54:25 2016 New Revision: 242518 URL: https://gcc.gnu.org/viewcvs?rev=242518&root=gcc&view=rev Log: 2016-11-16 Jerry DeLisle PR libgfortran/51119 * M

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-16 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #46 from Thomas Koenig --- (In reply to Jerry DeLisle from comment #44) > Yes I am aware of these. I was willing to live with them, but if it is a > problem, we can remove those options easy enough. I think it is no big deal, but on

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-16 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #45 from Dominique d'Humieres --- I have some tests coming from pr37131 which now fail due to too stringent comparisons between REAL. This illustrated by the following test program main implicit none integer, parameter :: factor=

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-16 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #44 from Jerry DeLisle --- (In reply to Janne Blomqvist from comment #43) > Compile warnings caused by this patch: > > cc1: warning: command line option ‘-fno-protect-parens’ is valid for Fortran > but not for C > cc1: warning: comma

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-16 Thread jb at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #43 from Janne Blomqvist --- Compile warnings caused by this patch: cc1: warning: command line option ‘-fno-protect-parens’ is valid for Fortran but not for C cc1: warning: command line option ‘-fstack-arrays’ is valid for Fortran bu

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-15 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #42 from Jerry DeLisle --- Author: jvdelisle Date: Tue Nov 15 23:03:00 2016 New Revision: 242462 URL: https://gcc.gnu.org/viewcvs?rev=242462&root=gcc&view=rev Log: 2016-11-15 Jerry DeLisle Thomas Koenig PR l

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-14 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Jerry DeLisle changed: What|Removed |Added Assignee|jb at gcc dot gnu.org |jvdelisle at gcc dot gnu.org ---

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-08 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #40 from Jerry DeLisle --- (In reply to Joost VandeVondele from comment #37) > (In reply to Joost VandeVondele from comment #36) > > #pragma GCC optimize ( "-Ofast -fvariable-expansion-in-unroller > > -funroll-loops" ) > Using: (I fou

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-08 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #39 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #38) > > Jerry, what Netlib code were you basing your code on? http://www.netlib.org/blas/index.html#_level_3_blas_tuned_for_single_processors_with_caches Used the

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-08 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #38 from Thomas Koenig --- (In reply to Joost VandeVondele from comment #37) > (In reply to Joost VandeVondele from comment #36) > > #pragma GCC optimize ( "-Ofast -fvariable-expansion-in-unroller > > -funroll-loops" ) > > and really

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-08 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #37 from Joost VandeVondele --- (In reply to Joost VandeVondele from comment #36) > #pragma GCC optimize ( "-Ofast -fvariable-expansion-in-unroller > -funroll-loops" ) and really beneficial for larger matrices would be -floop-nest

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-08 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #36 from Joost VandeVondele --- (In reply to Jerry DeLisle from comment #34) > -Ofast does reorder execution.. > Opinions welcome. That is absolutely OK for a matmul, and all techniques to get near peak performance require that (e.

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-08 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #35 from Thomas Koenig --- (In reply to Jerry DeLisle from comment #34) > -Ofast does reorder execution.. So does a block algorithm. > Opinions welcome. I'd say go for -Ofast, or at least its subset that enables reordering of exp

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-07 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #34 from Jerry DeLisle --- Created attachment 39987 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39987&action=edit A test program Just ran some tests comparing reference results and results using -Ofast. -Ofast does reorder

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-07 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #33 from Jerry DeLisle --- With #pragma GCC optimize ( "-O3" ) $ gfc -static -O2 -finline-matmul-limit=0 compare.f90 $ ./a.out = MEASURED GIGAFLO

[Bug libfortran/51119] MATMUL slow for large matrices

2016-11-07 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #32 from Jerry DeLisle --- Created attachment 39985 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39985&action=edit Proposed patch to get testing going This patch works pretty good for me. My results are as follows: gfortran

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-28 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #31 from Dominique d'Humieres --- From comment 27 > > I agree that inline should be faster, if the compiler is reasonably smart, > > if the matrix dimensions are known at compile time (i.e. should be able to > > generate the same ker

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-24 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #30 from Jerry DeLisle --- (In reply to Joost VandeVondele from comment #29) > These slides show how to reach 90% of peak: > http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/ > the code actually is not too ugly, and I think there is

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-24 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #29 from Joost VandeVondele --- (In reply to Thomas Koenig from comment #27) > (In reply to Joost VandeVondele from comment #22) > If the compiler turns out not to be reasonably smart, file a bug report :-) what is needed for large

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-24 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #28 from Jerry DeLisle --- (In reply to Janne Blomqvist from comment #25) > > But, that is not particularly impressive, is it? I don't know about current > low end graphics adapters, but at least the high end GPU cards (Tesla) are >

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-24 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #27 from Thomas Koenig --- (In reply to Joost VandeVondele from comment #22) > I agree that inline should be faster, if the compiler is reasonably smart, > if the matrix dimensions are known at compile time (i.e. should be able to >

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-24 Thread jb at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #26 from Janne Blomqvist --- (In reply to Thomas Koenig from comment #15) > Another issue: What should we do if the user supplies an external > subroutine DGEMM which does something unrelated? > > I suppose we should then make DGEMM

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-24 Thread jb at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #25 from Janne Blomqvist --- (In reply to Jerry DeLisle from comment #24) > (In reply to Jerry DeLisle from comment #16) > > For what its worth: > > > > $ gfc pr51119.f90 -lblas -fno-external-blas -Ofast -march=native > > $ ./a.out

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-23 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #24 from Jerry DeLisle --- (In reply to Jerry DeLisle from comment #16) > For what its worth: > > $ gfc pr51119.f90 -lblas -fno-external-blas -Ofast -march=native > $ ./a.out > Time, MATMUL:21.2483196 21.25444964601

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-23 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #23 from Jerry DeLisle --- (In reply to Thomas Koenig from comment #21) > > Hidden behind a -fexternal-blas-n switch might be an option. Including GPUs > > seems even a tad more tricky. We have a paper on GPU (small) matrix > > multip

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-23 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #22 from Joost VandeVondele --- (In reply to Thomas Koenig from comment #21) > I assume that for small matrices bordering on the silly > (say, a matrix multiplication with dimensions of (1,2) and (2,1)) > the inline code will be fas

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-23 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #21 from Thomas Koenig --- > Hidden behind a -fexternal-blas-n switch might be an option. Including GPUs > seems even a tad more tricky. We have a paper on GPU (small) matrix > multiplication, http://dbcsr.cp2k.org/_media/gpu_book_ch

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-22 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #20 from Joost VandeVondele --- (In reply to Jerry DeLisle from comment #19) > If I can get something working I am thinking something like > -fexternal-blas-n, if -n not given then default to current libblas > behaviour. This way use

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-22 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #19 from Jerry DeLisle --- If I can get something working I am thinking something like -fexternal-blas-n, if -n not given then default to current libblas behaviour. This way users have some control. With GPUs, it is not unusual to hav

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-22 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #18 from Joost VandeVondele --- (In reply to Jerry DeLisle from comment #17) > I have done some experimenting. Since gcc supports OMP and I think to some > extent ACC why not come up with a MATMUL that exploits these if present? On

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-22 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #17 from Jerry DeLisle --- I have done some experimenting. Since gcc supports OMP and I think to some extent ACC why not come up with a MATMUL that exploits these if present? On the darwin platform discussed in comment #12, the perf

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-08 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Jerry DeLisle changed: What|Removed |Added CC||jvdelisle at gcc dot gnu.org --- Comment

[Bug libfortran/51119] MATMUL slow for large matrices

2015-11-01 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #15 from Thomas Koenig --- Another issue: What should we do if the user supplies an external subroutine DGEMM which does something unrelated? I suppose we should then make DGEMM (and SGEMM) an intrinsic subroutine.

[Bug libfortran/51119] MATMUL slow for large matrices

2015-10-31 Thread jb at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #14 from Janne Blomqvist --- (In reply to Dominique d'Humieres from comment #12) > I suppose most modern OS provide such optimized BLAS and, if not, one can > install libraries such as atlas. So I wonder if it would not be more > effe

[Bug libfortran/51119] MATMUL slow for large matrices

2015-10-31 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #13 from Thomas Koenig --- (In reply to Dominique d'Humieres from comment #12) > I suppose most modern OS provide such optimized BLAS and, if not, one can > install libraries such as atlas. So I wonder if it would not be more > effec

[Bug libfortran/51119] MATMUL slow for large matrices

2015-10-31 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #12 from Dominique d'Humieres --- Some new numbers for a four cores Corei7 2.8Ghz, turboboost 3.8Ghz, 1.6Ghz DDR3 on x86_64-apple-darwin14.5 for the following test program t2 implicit none REAL time_begin, time_end integer, parame

[Bug libfortran/51119] MATMUL slow for large matrices

2013-04-01 Thread tkoenig at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Thomas Koenig changed: What|Removed |Added Depends on||37131 --- Comment #11 from Thom

[Bug libfortran/51119] MATMUL slow for large matrices

2013-03-29 Thread Joost.VandeVondele at mat dot ethz.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Joost VandeVondele changed: What|Removed |Added Last reconfirmed|2011-11-14 00:00:00 |2013-03-29 --- Comment #10

[Bug libfortran/51119] MATMUL slow for large matrices

2012-06-29 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Steven Bosscher changed: What|Removed |Added CC||steven at gcc dot gnu.org --- Comment #

[Bug libfortran/51119] MATMUL slow for large matrices

2012-06-29 Thread Joost.VandeVondele at mat dot ethz.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Joost VandeVondele changed: What|Removed |Added CC||Joost.VandeVondele at mat

[Bug libfortran/51119] MATMUL slow for large matrices

2012-06-28 Thread jb at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #7 from Janne Blomqvist 2012-06-28 12:15:05 UTC --- (In reply to comment #6) > Janne, have you had a chance to look at this ? For larger matrices MATMMUL is > really slow. Anything that includes even the most basic blocking scheme sho

[Bug libfortran/51119] MATMUL slow for large matrices

2012-06-28 Thread Joost.VandeVondele at mat dot ethz.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #6 from Joost VandeVondele 2012-06-28 11:58:20 UTC --- Janne, have you had a chance to look at this ? For larger matrices MATMMUL is really slow. Anything that includes even the most basic blocking scheme should be faster. I think thi

[Bug libfortran/51119] MATMUL slow for large matrices

2011-11-15 Thread jb at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #5 from Janne Blomqvist 2011-11-15 15:47:54 UTC --- (In reply to comment #3) > I believe it would be more important to have actually highly efficient > (inlined) implementations for very small matrices. There's already PR 37131 for t

[Bug libfortran/51119] MATMUL slow for large matrices

2011-11-15 Thread Joost.VandeVondele at mat dot ethz.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #4 from Joost VandeVondele 2011-11-15 12:31:10 UTC --- Created attachment 25826 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25826 comparison in performance for small matrix multiplies (libsmm vs mkl) added some data showing t

[Bug libfortran/51119] MATMUL slow for large matrices

2011-11-15 Thread Joost.VandeVondele at mat dot ethz.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 --- Comment #3 from Joost VandeVondele 2011-11-15 12:19:59 UTC --- (In reply to comment #1) > I have a cunning plan. It is doable to come within a factor of 2 of highly efficient implementations using a cache-oblivious matrix multiply, which is

[Bug libfortran/51119] MATMUL slow for large matrices

2011-11-14 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Tobias Burnus changed: What|Removed |Added CC||burnus at gcc dot gnu.org --- Comment #2

[Bug libfortran/51119] MATMUL slow for large matrices

2011-11-13 Thread jb at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119 Janne Blomqvist changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed|