he.org
Date: 12/01/2016 02:15 AM
Subject: Re: Performance differences between SystemML LibMatrixMult
and Breeze with native BLAS
sure, I understand. Let me just quickly explain the matrix-vector
behavior you've observed: I don't know your experimental setting, but
for a 1kx1k matrix-
sure, I understand. Let me just quickly explain the matrix-vector
behavior you've observed: I don't know your experimental setting, but
for a 1kx1k matrix-vector the small input (8MB) likely fits into L3
cache. If you would increase the data sizes, let's say to 8GB (where you
actually read
ok, then let's sort this out one by one
1) Benchmarks: There are a couple of things we should be aware of for
these native/java benchmarks. First, please specify k as the number of
logical cores on your machine and use a sufficiently large heap with
Xms=Xmx and Xmn=0.1*Xmx. Second, exclude
Could you please make sure you're comparing the right thing. Even on old
sandy bridge CPUs our matrix mult for 1kx1k usually takes 40-50ms. We
also did the same experiments with larger matrices and SystemML was
about 2x faster compared to Breeze. Please decomment the timings in
Hi all,
I have run a very quick comparison between SystemML's LibMatrixMult and
Breeze matrix multiplication using native BLAS (OpenBLAS through
netlib-java). As per my very small comparison I get the result that
there is a performance difference for dense-dense Matrices of size 1000
x 1000