Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-12-06 Thread Berthold Reinwald
he.org Date: 12/01/2016 02:15 AM Subject: Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS sure, I understand. Let me just quickly explain the matrix-vector behavior you've observed: I don't know your experimental setting, but for a 1kx1k matrix-

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-12-01 Thread Matthias Boehm
sure, I understand. Let me just quickly explain the matrix-vector behavior you've observed: I don't know your experimental setting, but for a 1kx1k matrix-vector the small input (8MB) likely fits into L3 cache. If you would increase the data sizes, let's say to 8GB (where you actually read

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-11-30 Thread Matthias Boehm
ok, then let's sort this out one by one 1) Benchmarks: There are a couple of things we should be aware of for these native/java benchmarks. First, please specify k as the number of logical cores on your machine and use a sufficiently large heap with Xms=Xmx and Xmn=0.1*Xmx. Second, exclude

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-11-30 Thread Matthias Boehm
Could you please make sure you're comparing the right thing. Even on old sandy bridge CPUs our matrix mult for 1kx1k usually takes 40-50ms. We also did the same experiments with larger matrices and SystemML was about 2x faster compared to Breeze. Please decomment the timings in

Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-11-30 Thread fschueler
Hi all, I have run a very quick comparison between SystemML's LibMatrixMult and Breeze matrix multiplication using native BLAS (OpenBLAS through netlib-java). As per my very small comparison I get the result that there is a performance difference for dense-dense Matrices of size 1000 x 1000