> On a 4 core machine (8 with hyperthreading) I'm observing a 10x speedup.
> The parallel related speedup is 4x. There is an additional 2.5x speedup
> which appears to be related to the lower level access to the Matrix memory
> done by RMatrix (and perhaps some elimination of copying).
>
It turns
Here's a parallel version:
https://github.com/jjallaire/RcppParallel/blob/master/inst/examples/parallel-distance-matrix.cpp
To make the code reasonable I introduced a new RMatrix class in
RcppParallel that makes offsetting into rows and columns safe and
straightforward. This class has no connecti