Re: Temp directories in RecommenderJob/ItemSimilarityJob

Sebastian Schelter Tue, 07 Feb 2012 01:32:52 -0800

Hi Bala,

The different naming is simply the result of my inattention, sry. There
exists a JIRA issue that directly aims at your problem.


https://issues.apache.org/jira/browse/MAHOUT-609

Maybe you would like to work on that? You would have to modify
RecommenderJob to add a MapReduce pass that converts and outputs the
similarity matrix. You could probably reuse
ItemSimilarityJob.MostSimilarItemPairsMapper and
ItemSimilarityJob.MostSimilarItemPairsReducer

--sebastian


On 06.02.2012 23:40, Bala Rajagopal wrote:
> 
> Hi,
> I need a quick clarification that pertains to the RecommenderJob (RJ) and the 
> ItemSimilarityJob (ISJ).
> We are using  the RJ and the ISJ to compute Itembased recommendations and 
> Item similarities. We have an input dataset of about ~100 million rows and 
> would like to share the computed output (of the intermediate steps) between 
> the two jobs in favor of making the entire process quicker. In short, the 
> first two phases of the RJ and the ISJ are doing the same thing - running the 
> PreparePreferenceMatrixJob and the RowSimilarityJob. So we want to run the RJ 
> with endPhase = 1, then fork off to run the RJ (with startPhase = 2) and the 
> ISJ (with startPhase = 2) with the same temp directories as reference.
> While doing this, we noticed that there is a difference in the name of the 
> "prepPath" temp directory used in RJ and ISJ. RJ calls it 
> "preparePreferenceMatrix" and ISJ calls it "prepareRatingMatrix". Is there a 
> reason why this is different? This makes it impossible to share the computed 
> information between the two jobs. Are we overlooking something?
> Any help would be appreciated.
> Thanks,Bala Rajagopal

Re: Temp directories in RecommenderJob/ItemSimilarityJob

Reply via email to