Re: [math] matrix multiplication surprisingly slow?

Gilles Thu, 07 Apr 2016 07:40:14 -0700

On Wed, 6 Apr 2016 12:55:06 +0100, Chris Lucas wrote:

Thanks for the quick reply!

I've pasted a small self-contained example at the bottom. It createsthematrices in advance, but nothing meaningful changes if they'recreated on a

per-operation basis.

Results for 50 multiplications of [size]x[size] matrices:
   Size: 10, total time: 0.012 seconds, time per: 0.000 seconds
   Size: 100, total time: 0.062 seconds, time per: 0.001 seconds
   Size: 300, total time: 3.050 seconds, time per: 0.061 seconds
   Size: 500, total time: 15.186 seconds, time per: 0.304 seconds
   Size: 600, total time: 17.532 seconds, time per: 0.351 seconds

For comparison:

Results for 50 additions of [size]x[size] matrices (which should befaster,

be the extent of the difference is nonetheless striking to me):
   Size: 10, total time: 0.011 seconds, time per: 0.000 seconds
   Size: 100, total time: 0.012 seconds, time per: 0.000 seconds
   Size: 300, total time: 0.020 seconds, time per: 0.000 seconds
   Size: 500, total time: 0.032 seconds, time per: 0.001 seconds
   Size: 600, total time: 0.050 seconds, time per: 0.001 seconds

Results for 50 inversions of a [size]x[size] matrix, which I'd expectto be

slower than multiplication for larger matrices:
   Size: 10, total time: 0.014 seconds, time per: 0.000 seconds
   Size: 100, total time: 0.074 seconds, time per: 0.001 seconds
   Size: 300, total time: 1.005 seconds, time per: 0.020 seconds
   Size: 500, total time: 5.024 seconds, time per: 0.100 seconds
   Size: 600, total time: 9.297 seconds, time per: 0.186 seconds

I hope this is useful, and if I'm doing something wrong that'sleading to

this performance gap, I'd love to know.

Here is the result of a JMH run (only benchmarking matrixmultiplication):

---CUT---
[...]

Benchmark Mode Cnt Score ErrorUnitsMyBenchmarkMatrix.size100x100_a2d thrpt 200 738.787 ± 8.501ops/sMyBenchmarkMatrix.size100x100_b thrpt 200 1527.522 ± 16.158ops/sMyBenchmarkMatrix.size10x10_a2d thrpt 200 697662.468 ± 4807.916ops/sMyBenchmarkMatrix.size10x10_b thrpt 200 1003800.610 ± 6612.299ops/sMyBenchmarkMatrix.size600x600_a2d thrpt 200 0.669 ± 0.007ops/sMyBenchmarkMatrix.size600x600_b thrpt 200 7.794 ± 0.071ops/s

---CUT---

where the suffixindicates the layout type:
  a2d -> Array2DRowRealMatrix
    b -> BlockRealMatrix
[So, even for small matrices, "BlockRealMatrix" is a clear win.]

As for the multiplication getting slower for larger sizes, I'd assumethat it'srelated to the O(n^2) nature of the algorithm (where "n" is the numberof elements

of the matrix).

For the two layouts provided in CM and for the other library, you couldperhapsconfirm the exact dependency by fitting the parameters "a", "b", "c" ofthe

function
  t(n) = a + b n + c n^2

using e.g.http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/fitting/PolynomialCurveFitter.html

(and more data points).


Regards,
Gilles

----
import org.apache.commons.math3.linear.LUDecomposition;
import org.apache.commons.math3.linear.Array2DRowRealMatrix;
import org.apache.commons.math3.linear.RealMatrix;


public class AMCMatrices {

  public static void main(String[] args) {
    miniTest(0);
  }

  public static void miniTest(int tType) {
    int samples = 50;

    int sizes[] = { 10, 100, 300, 500, 600 };

    for (int sI = 0; sI < sizes.length; sI++) {
      int mSize = sizes[sI];
org.apache.commons.math3.linear.RealMatrix m0 = buildM(mSize,mSize);
      RealMatrix m1 = buildM(mSize, mSize);

      long start = System.nanoTime();
      for (int n = 0; n < samples; n++) {
        switch (tType) {
        case 0:
          m0.multiply(m1);
          break;
        case 1:
          m0.add(m1);
          break;
        case 2:
          new LUDecomposition(m0).getSolver().getInverse();
          break;
        }

      }
      long end = System.nanoTime();

      double dt = ((double) (end - start)) / 1E9;
      System.out.println(String.format(
"Size: %d, total time: %3.3f seconds, time per: %3.3fseconds",
          mSize, dt, dt / samples));
    }
  }

  public static Array2DRowRealMatrix buildM(int r, int c) {
    double[][] matVals = new double[r][c];
    for (int i = 0; i < r; i++) {
      for (int j = 0; j < c; j++) {
        matVals[i][j] = Math.random();
      }
    }
    return new Array2DRowRealMatrix(matVals);
  }
}

----
On 5 April 2016 at 19:36, Gilles <[email protected]>wrote:
Hi.

On Tue, 5 Apr 2016 15:43:04 +0100, Chris Lucas wrote:
I recently ran a benchmark comparing the performance math commons3.6.1's
linear algebra library to the that of scala Breeze (
https://github.com/scalanlp/breeze).

I looked at det, inverse, Cholesky factorization, addition, and
multiplication, including matrices with 10, 100, 500, and 1000elements,with symmetric, non-symmetric, and non-square cases whereapplicable.
It would be interesting to add this to the CM documentation:
  https://issues.apache.org/jira/browse/MATH-1194
In general, I was pleasantly surprised: math commons performed aboutas
well as Breeze, despite the latter relying on native libraries.There was
one exception, however:

    m0.multiply(m1)
where m0 and m1 are both Array2DRowRealMatrix instances. It scaledverypoorly in math commons, being much slower than nominally moreexpensiveoperations like inv and the Breeze implementation. Does anyone havea
thought as to what's going on?
Could your provide more information such as a plot of performance vssize?
A self-contained and minimal code to run would be nice too.
See the CM micro-benchmarking tool here:


https://github.com/apache/commons-math/blob/master/src/test/java/org/apache/commons/math4/PerfTestUtils.java
And an example of how to use it:


https://github.com/apache/commons-math/blob/master/src/userguide/java/org/apache/commons/math4/userguide/FastMathTestPerformance.java

In case it's useful, one representative test
involves multiplying two instances of

    new Array2DRowRealMatrix(matVals)
where matVals is 1000x1000 entries of math.random and the secondinstanceis created as part of the loop. This part of the benchmark is notspecificto the expensive multiplication step, and takes very little timerelativeto the multiplication itself. I'm using System.nanotime for thetiming,
and
take the average time over several consecutive iterations, on a 3.5GHz
Intel Core i7, Oracle JRE (build 1.8.0_05-b13).
You might want to confirm the behaviour using JMH (becoming the Java
standard
for benchmarking):
  http://openjdk.java.net/projects/code-tools/jmh/


Best regards,
Gilles
Thanks,

Chris



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [math] matrix multiplication surprisingly slow?

Reply via email to