What do you mean by "isolate the data model building step"? You can
run or re-run any step you want in the chain.

So I guess the answer to 2 is "yes", if you mean computed item-item
similarities. But these will change slowly over time and need to be
recomputed sometimes.

MapReduce is never ever something that works in real-time, so if your
question 3 is whether it can answer real-time queries -- no. You would
always pre-compute your results and serve them up at runtime.

It sounds like you are running on a very tiny data set. All of the
time is spent in Hadoop overhead, like starting up workers. It's not
efficient or necessary to use Hadoop at this scale.

Sean

On Wed, Sep 14, 2011 at 3:36 PM, Robert Evans <ev...@yahoo-inc.com> wrote:
> This should probably be directed more toward the Mahout list then the Hadoop 
> Map/reduce one.
>
> mahout-u...@apache.org
>
> --Bobby Evans
>
> On 9/14/11 6:28 AM, "Amit Sangroya" <sangroyaa...@gmail.com> wrote:
>
> Hi all,
>
> I am trying to run the example from
> https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+Filtering
> ,
>
> with the following command bin/mahout
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input -Dmapred.output.dir=output --itemsFile itemfile
> --tempDir tempDir
>
> The algorithm estimate the preference of a user towards an item which he/she
> has not yet seen. Once an algorithm can predict preferences it can also be
> used to do Top-N-Recommendation where the task is to find the N items a
> given user might like best. It is mentioned that given a DataModel, it can
> produce recommendations.
>
> The algorithm takes approx. 5 minutes to generate top 5 recommendations for
> one user on a 10 node hadoop cluster. The size of input is shortened only to
> 200 users from "1 Million MovieLens Dataset" from Grouplens.org.
>
> I have few questions:
>
> 1) I want to know that if it is possible to isolate the data model building
> step to generating recommendations.
>
> 2) Can we use the model once generated using the training data for
> generating recommendations for a range of users.
>
> 3) To be specific, if I want to provide an on-line service that generates
> recommendations for users, Can I minimize the cost of MapReduce interactions
> each time.
>
> I am not a data mining expert. Please help me to understand this in a better
> way.
>
>
> Thanks and Regards,
> Amit
>
>

Reply via email to