Horizontally scaling / speeding up Mahout's LDA

2013-12-13 Thread Vishnu Modi
I was experimenting with using Mahout's LDA algorithm. My corpus has around 8 small documents, and roughly 45,000 terms. I was getting good results, but the algorithm takes too long to run. On every iteration the mapper takes around an hour, so with 10 iterations it takes a little over 10

Re: Horizontally scaling / speeding up Mahout's LDA

2013-12-13 Thread Gokhan Capan
Hi Vishnu, You may reduce the split size by setting mapred.max.split.size configuration parameter of hadoop. Number of map tasks then will be equal to number of splits (input size/split size) Best Sent from my iPhone On Dec 13, 2013, at 21:08, Vishnu Modi vishnu.modi...@gmail.com wrote: I