Thanks for suggesting the BlockRealMatrix. I've run a few benchmarks comparing the two, along with some other libraries.
I'm using jmh 1.11.3, JDK 1.8.0_05, VM 25.5-b02, with 2 warmups and 10 test iterations. The suffixes denote what's being tested: MC: AMC 3.6.1 using Array2DRowRealMatrix SB: Scala Breeze 0.12 with native libraries OJ: OJAlgo 39.0. Some other benchmarks suggest OJAlgo is an especially speedy pure-java library. BMC: MC using a BlockRealMatrix. I'm using the same matrix creation/multiplication/inverse code as I mentioned in my previous note. When testing BlockReadMatrices, I'm using fixed random matrices and annotating my class with @State(Scope.Benchmark). For the curious, my rationale for building matrices on the spot is that I'm already caching results pretty heavily and rarely perform the same operation on the same matrix twice -- I'm most curious about performance in the absence of caching benefits. Test 1a: Creating and multiplying two random/uniform (i.e., all entries drawn from math.random()) 100x100 matrices: [info] MatrixBenchmarks.buildMultTestMC100 thrpt 10 836.579 ± 7.120 ops/s [info] MatrixBenchmarks.buildMultTestSB100 thrpt 10 1649.089 ± 170.649 ops/s [info] MatrixBenchmarks.buildMultTestOJ100 thrpt 10 1485.014 ± 44.158 ops/s [info] MatrixBenchmarks.multBMC100 thrpt 10 1051.055 ± 2.290 ops/s Test 1b: Creating and multiplying two random/uniform 500x500 matrices: [info] MatrixBenchmarks.buildMultTestMC500 thrpt 10 8.997 ± 0.064 ops/s [info] MatrixBenchmarks.buildMultTestSB500 thrpt 10 80.558 ± 0.627 ops/s [info] MatrixBenchmarks.buildMultTestOJ500 thrpt 10 34.626 ± 2.505 ops/s [info] MatrixBenchmarks.multBMC500 thrpt 10 9.132 ± 0.059 ops/s [info] MatrixBenchmarks.multCacheBMC500 thrpt 10 13.630 ± 0.107 ops/s [no matrix creation] --- Test 2a: Creating a single 500x500 matrix (to get a sense of whether the mult itself is driving differences: [info] MatrixBenchmarks.buildMC500 thrpt 10 155.026 ± 2.456 ops/s [info] MatrixBenchmarks.buildSB500 thrpt 10 197.257 ± 4.619 ops/s [info] MatrixBenchmarks.buildOJ500 thrpt 10 176.036 ± 2.121 ops/s Test 2b: Creating a single 1000x1000 matrix (to show it scales w/N): [info] MatrixBenchmarks.buildMC1000 thrpt 10 36.112 ± 2.775 ops/s [info] MatrixBenchmarks.buildSB1000 thrpt 10 45.557 ± 2.893 ops/s [info] MatrixBenchmarks.buildOJ1000 thrpt 10 47.438 ± 1.370 ops/s [info] MatrixBenchmarks.buildBMC1000 thrpt 10 37.779 ± 0.871 ops/s --- Test 3a: Inverting a random 100x100 matrix: [info] MatrixBenchmarks.invMC100 thrpt 10 1033.011 ± 9.928 ops/s [info] MatrixBenchmarks.invSB100 thrpt 10 1664.833 ± 15.170 ops/s [info] MatrixBenchmarks.invOJ100 thrpt 10 1174.044 ± 52.083 ops/s [info] MatrixBenchmarks.invBMC100 thrpt 10 1043.858 ± 22.144 ops/s Test 3b: Inverting a random 500x500 matrix: [info] MatrixBenchmarks.invMC500 thrpt 10 9.089 ± 0.284 ops/s [info] MatrixBenchmarks.invSB500 thrpt 10 43.857 ± 1.083 ops/s [info] MatrixBenchmarks.invOJ500 thrpt 10 23.444 ± 1.484 ops/s [info] MatrixBenchmarks.invBMC500 thrpt 10 9.156 ± 0.052 ops/s [info] MatrixBenchmarks.invCacheBMC500 thrpt 10 9.627 ± 0.084 ops/s [no matrix creation] Test 3c: [info] MatrixBenchmarks.invMC1000 thrpt 10 0.704 ± 0.040 ops/s [info] MatrixBenchmarks.invSB1000 thrpt 10 8.611 ± 0.557 ops/s [info] MatrixBenchmarks.invOJ1000 thrpt 10 3.539 ± 0.229 ops/s [info] MatrixBenchmarks.invBMC1000 thrpt 10 0.691 ± 0.095 ops/s Also, isn't matrix multiplication at least O(N^2.37), rather than O(N^2)? On 6 April 2016 at 12:55, Chris Lucas <[email protected]> wrote: > Thanks for the quick reply! > > I've pasted a small self-contained example at the bottom. It creates the > matrices in advance, but nothing meaningful changes if they're created on a > per-operation basis. > > Results for 50 multiplications of [size]x[size] matrices: > Size: 10, total time: 0.012 seconds, time per: 0.000 seconds > Size: 100, total time: 0.062 seconds, time per: 0.001 seconds > Size: 300, total time: 3.050 seconds, time per: 0.061 seconds > Size: 500, total time: 15.186 seconds, time per: 0.304 seconds > Size: 600, total time: 17.532 seconds, time per: 0.351 seconds > > For comparison: > > Results for 50 additions of [size]x[size] matrices (which should be > faster, be the extent of the difference is nonetheless striking to me): > Size: 10, total time: 0.011 seconds, time per: 0.000 seconds > Size: 100, total time: 0.012 seconds, time per: 0.000 seconds > Size: 300, total time: 0.020 seconds, time per: 0.000 seconds > Size: 500, total time: 0.032 seconds, time per: 0.001 seconds > Size: 600, total time: 0.050 seconds, time per: 0.001 seconds > > Results for 50 inversions of a [size]x[size] matrix, which I'd expect to > be slower than multiplication for larger matrices: > Size: 10, total time: 0.014 seconds, time per: 0.000 seconds > Size: 100, total time: 0.074 seconds, time per: 0.001 seconds > Size: 300, total time: 1.005 seconds, time per: 0.020 seconds > Size: 500, total time: 5.024 seconds, time per: 0.100 seconds > Size: 600, total time: 9.297 seconds, time per: 0.186 seconds > > I hope this is useful, and if I'm doing something wrong that's leading to > this performance gap, I'd love to know. > > ---- > import org.apache.commons.math3.linear.LUDecomposition; > import org.apache.commons.math3.linear.Array2DRowRealMatrix; > import org.apache.commons.math3.linear.RealMatrix; > > > public class AMCMatrices { > > public static void main(String[] args) { > miniTest(0); > } > > public static void miniTest(int tType) { > int samples = 50; > > int sizes[] = { 10, 100, 300, 500, 600 }; > > for (int sI = 0; sI < sizes.length; sI++) { > int mSize = sizes[sI]; > > org.apache.commons.math3.linear.RealMatrix m0 = buildM(mSize, mSize); > RealMatrix m1 = buildM(mSize, mSize); > > long start = System.nanoTime(); > for (int n = 0; n < samples; n++) { > switch (tType) { > case 0: > m0.multiply(m1); > break; > case 1: > m0.add(m1); > break; > case 2: > new LUDecomposition(m0).getSolver().getInverse(); > break; > } > > } > long end = System.nanoTime(); > > double dt = ((double) (end - start)) / 1E9; > System.out.println(String.format( > "Size: %d, total time: %3.3f seconds, time per: %3.3f seconds", > mSize, dt, dt / samples)); > } > } > > public static Array2DRowRealMatrix buildM(int r, int c) { > double[][] matVals = new double[r][c]; > for (int i = 0; i < r; i++) { > for (int j = 0; j < c; j++) { > matVals[i][j] = Math.random(); > } > } > return new Array2DRowRealMatrix(matVals); > } > } > > ---- > > On 5 April 2016 at 19:36, Gilles <[email protected]> wrote: > >> Hi. >> >> On Tue, 5 Apr 2016 15:43:04 +0100, Chris Lucas wrote: >> >>> I recently ran a benchmark comparing the performance math commons 3.6.1's >>> linear algebra library to the that of scala Breeze ( >>> https://github.com/scalanlp/breeze). >>> >>> I looked at det, inverse, Cholesky factorization, addition, and >>> multiplication, including matrices with 10, 100, 500, and 1000 elements, >>> with symmetric, non-symmetric, and non-square cases where applicable. >>> >> >> It would be interesting to add this to the CM documentation: >> https://issues.apache.org/jira/browse/MATH-1194 >> >> In general, I was pleasantly surprised: math commons performed about as >>> well as Breeze, despite the latter relying on native libraries. There was >>> one exception, however: >>> >>> m0.multiply(m1) >>> >>> where m0 and m1 are both Array2DRowRealMatrix instances. It scaled very >>> poorly in math commons, being much slower than nominally more expensive >>> operations like inv and the Breeze implementation. Does anyone have a >>> thought as to what's going on? >>> >> >> Could your provide more information such as a plot of performance vs size? >> A self-contained and minimal code to run would be nice too. >> See the CM micro-benchmarking tool here: >> >> https://github.com/apache/commons-math/blob/master/src/test/java/org/apache/commons/math4/PerfTestUtils.java >> And an example of how to use it: >> >> https://github.com/apache/commons-math/blob/master/src/userguide/java/org/apache/commons/math4/userguide/FastMathTestPerformance.java >> >> In case it's useful, one representative test >>> involves multiplying two instances of >>> >>> new Array2DRowRealMatrix(matVals) >>> >>> where matVals is 1000x1000 entries of math.random and the second instance >>> is created as part of the loop. This part of the benchmark is not >>> specific >>> to the expensive multiplication step, and takes very little time relative >>> to the multiplication itself. I'm using System.nanotime for the timing, >>> and >>> take the average time over several consecutive iterations, on a 3.5 GHz >>> Intel Core i7, Oracle JRE (build 1.8.0_05-b13). >>> >> >> You might want to confirm the behaviour using JMH (becoming the Java >> standard >> for benchmarking): >> http://openjdk.java.net/projects/code-tools/jmh/ >> >> >> Best regards, >> Gilles >> >> >>> Thanks, >>> >>> Chris >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >
