Hi Bala, The different naming is simply the result of my inattention, sry. There exists a JIRA issue that directly aims at your problem.
https://issues.apache.org/jira/browse/MAHOUT-609 Maybe you would like to work on that? You would have to modify RecommenderJob to add a MapReduce pass that converts and outputs the similarity matrix. You could probably reuse ItemSimilarityJob.MostSimilarItemPairsMapper and ItemSimilarityJob.MostSimilarItemPairsReducer --sebastian On 06.02.2012 23:40, Bala Rajagopal wrote: > > Hi, > I need a quick clarification that pertains to the RecommenderJob (RJ) and the > ItemSimilarityJob (ISJ). > We are using the RJ and the ISJ to compute Itembased recommendations and > Item similarities. We have an input dataset of about ~100 million rows and > would like to share the computed output (of the intermediate steps) between > the two jobs in favor of making the entire process quicker. In short, the > first two phases of the RJ and the ISJ are doing the same thing - running the > PreparePreferenceMatrixJob and the RowSimilarityJob. So we want to run the RJ > with endPhase = 1, then fork off to run the RJ (with startPhase = 2) and the > ISJ (with startPhase = 2) with the same temp directories as reference. > While doing this, we noticed that there is a difference in the name of the > "prepPath" temp directory used in RJ and ISJ. RJ calls it > "preparePreferenceMatrix" and ISJ calls it "prepareRatingMatrix". Is there a > reason why this is different? This makes it impossible to share the computed > information between the two jobs. Are we overlooking something? > Any help would be appreciated. > Thanks,Bala Rajagopal