Saying that, my conclusion so far (sorry if I'm a bit slow here :)) --> I need to have the 2 parts (offline and online) in place, If I plan to have a real scalable machine that could do some of the recommendation calculations in real time in order to interact with the user dynamically.
But I'm still not quite sure I've understood how I can scale with that... As more as I'm pushing computation to offline I guess I'm less concerned with the retrieving time. From that perspective I could scale But I'm still not sure how it help me to scale from memory perspective... Even if I computed all similarities in advanced I still need to load the entire similarity result file into my memory in order that the online part will calculate his part. Maybe I'm wrong here, and I don't necessarily need to load the entire intermediate file (similarity results) into the memory?! -----Original Message----- From: Sean Owen [mailto:sro...@gmail.com] Sent: Monday, March 26, 2012 11:48 To: user@mahout.apache.org Subject: Re: Mahout beginner questions... I'm sure he's referring to the off-line model-building bit, not an online component. On Mon, Mar 26, 2012 at 9:27 AM, Razon, Oren <oren.ra...@intel.com> wrote: > By saying: "At Veoh, we built our models from several billion interactions > on a tiny cluster " you meant that you used the distributed code on your > cluster as an online recommender? > From what I've understood so far, I can't rely only on the Hadoop part if > I want a truly real time recommender that will modify his recommendations > and models per click of the user (because you need to rebuild the data in > the HDFS run you batch job, and return an answer) > > --------------------------------------------------------------------- Intel Electronics Ltd. This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.