I was experimenting with using Mahout's LDA algorithm. My corpus has around
8 small documents, and roughly 45,000 terms. I was getting good
results, but the algorithm takes too long to run. On every iteration the
mapper takes around an hour, so with 10 iterations it takes a little over
10
Hi Vishnu,
You may reduce the split size by setting mapred.max.split.size
configuration parameter of hadoop.
Number of map tasks then will be equal to number of splits (input
size/split size)
Best
Sent from my iPhone
On Dec 13, 2013, at 21:08, Vishnu Modi vishnu.modi...@gmail.com wrote:
I