Re: spark-itemsimilarity scalability / Spark parallelism issues (SimilarityAnalysis.cooccurrencesIDSs)

2017-08-16 Thread Ted Dunning
It is common with large numerical codes that things run faster in memory on just a few cores if the communication required outweighs the parallel speedup. The issue is that memory bandwidth is slower than the arithmetic speed by a very good amount. If you just have to move stuff into the CPU and m

Re: spark-itemsimilarity scalability / Spark parallelism issues (SimilarityAnalysis.cooccurrencesIDSs)

2017-08-16 Thread Pat Ferrel
This uses the Mahout blas optimizing solver, which I just use and do not know well. Mahout virtualizes some things having to do with partitioning and I’ve never quite understood how they work. There is a .par() on one of the matrix classes that has a similar function to partition but in all case