Hi,
I need a quick clarification that pertains to the RecommenderJob (RJ) and the
ItemSimilarityJob (ISJ).
We are using the RJ and the ISJ to compute Itembased recommendations and Item
similarities. We have an input dataset of about ~100 million rows and would
like to share the computed output (of the intermediate steps) between the two
jobs in favor of making the entire process quicker. In short, the first two
phases of the RJ and the ISJ are doing the same thing - running the
PreparePreferenceMatrixJob and the RowSimilarityJob. So we want to run the RJ
with endPhase = 1, then fork off to run the RJ (with startPhase = 2) and the
ISJ (with startPhase = 2) with the same temp directories as reference.
While doing this, we noticed that there is a difference in the name of the
"prepPath" temp directory used in RJ and ISJ. RJ calls it
"preparePreferenceMatrix" and ISJ calls it "prepareRatingMatrix". Is there a
reason why this is different? This makes it impossible to share the computed
information between the two jobs. Are we overlooking something?
Any help would be appreciated.
Thanks,Bala Rajagopal