Thanks for the feedback everybody. I'll give 0.9 a run. Thanks!
Sent from my iPhone
On Sep 26, 2014, at 5:10 PM, Suneel Marthi suneel.mar...@gmail.com wrote:
I had seen the issue u r reporting when running CooccurrencesMapper on a 2M
document corpus on an 80 node cluster.
The job would be
What's the Mahout version? Please work off of 0.9, there was a performance
issue in RSJ that was fixed in 0.9.
On Fri, Sep 26, 2014 at 4:23 PM, Burke Webster bu...@collectiveip.com
wrote:
I've been implementing the RowSimilarityJob on our 40-node cluster and have
run into so serious
Can you say how many words you are seeing?
How many unique bigrams?
As Suneel asked, which version of Mahout?
On Fri, Sep 26, 2014 at 1:23 PM, Burke Webster bu...@collectiveip.com
wrote:
I've been implementing the RowSimilarityJob on our 40-node cluster and have
run into so serious
We are currently using 0.7 so that could be the issue. Last I looked I
believe we had around 22 million unique bi-grams in the dictionary.
I can look into the newer code and see if that fixes our problems.
On Fri, Sep 26, 2014 at 4:26 PM, Ted Dunning ted.dunn...@gmail.com wrote:
Can you say
Yeah... that is pretty ancient.
On Fri, Sep 26, 2014 at 4:02 PM, Burke Webster bu...@collectiveip.com
wrote:
We are currently using 0.7 so that could be the issue. Last I looked I
believe we had around 22 million unique bi-grams in the dictionary.
I can look into the newer code and see if
I had seen the issue u r reporting when running CooccurrencesMapper on a 2M
document corpus on an 80 node cluster.
The job would be stuck in cooccurencesMapper forever.
This has been fixed in 0.9 (I have not had a chance to try it out on the
size and cluster I had before), so it would be good if