Re: Performance Issue using item-based approach!

2014-05-16 Thread Pat Ferrel
Can we step back a bit, is speed of query the only issue? Why do you care how long it takes? This is example data, not yours. Some of the techniques you mention below are Hadoop mapreduce based approaches. These by their nature are batch oriented. The mapreduce item-based recommender may take

Re: Performance Issue using item-based approach!

2014-05-12 Thread Ted Dunning
Truer words than this were never said. Sent from my iPhone On May 9, 2014, at 8:36, Pat Ferrel pat.fer...@gmail.com wrote: let your data determine this, not example data.

Re: Performance Issue using item-based approach!

2014-05-03 Thread Najum Ali
Hi there, I mentioned a problem of using the ItemBasedRecommender. It is so much slower then using UserBasedRecommender. @Sebastian: You said limiting the precomputation file should work. For example: only 50 similarities for an Item. You also said this feature is not included in the

Fwd: Performance Issue using item-based approach!

2014-05-03 Thread Najum Ali
(Resending mail without sending my digital signature) Hi there, I mentioned a problem of using the ItemBasedRecommender. It is so much slower then using UserBasedRecommender. @Sebastian: You said limiting the precomputation file should work. For example: only 50 similarities for an Item.

Re: Performance Issue using item-based approach!

2014-04-18 Thread Ted Dunning
You can always run Hadoop in a local mode. Nothing prevents a single node from being a cluster. :-) On Thu, Apr 17, 2014 at 7:43 AM, Najum Ali naju...@googlemail.com wrote: Ted, Is it also possible to use ItemSimilarityJob in a non-distributed environment? Am 17.04.2014 um 16:22 schrieb

Re: Performance Issue using item-based approach!

2014-04-18 Thread Sebastian Schelter
You can, but you shouldn't :) On 04/18/2014 07:23 PM, Ted Dunning wrote: You can always run Hadoop in a local mode. Nothing prevents a single node from being a cluster. :-) On Thu, Apr 17, 2014 at 7:43 AM, Najum Ali naju...@googlemail.com wrote: Ted, Is it also possible to use

Re: Performance Issue using item-based approach!

2014-04-18 Thread Ted Dunning
Shouldn't, yes. But for a toy dataset, it might work out. On Fri, Apr 18, 2014 at 10:25 AM, Sebastian Schelter ssc.o...@googlemail.com wrote: You can, but you shouldn't :) On 04/18/2014 07:23 PM, Ted Dunning wrote: You can always run Hadoop in a local mode. Nothing prevents a single

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
Could you take the output of the precomputation, feed it into a standalone recommender and test it there? On 04/17/2014 11:37 AM, Najum Ali wrote: @sebastian Are you sure that the precomputation is done only once and not in every request? Yes, a @Bean annotated Object is in Spring per

Re: Performance Issue using item-based approach!

2014-04-17 Thread Najum Ali
@Sebastian What do u mean with a standalone recommender? A simple offline java main program? Am 17.04.2014 um 11:41 schrieb Sebastian Schelter s...@apache.org: Could you take the output of the precomputation, feed it into a standalone recommender and test it there? On 04/17/2014 11:37

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
Yes, just to make sure the problem is in the mahout code and not in the surrounding environment. On 04/17/2014 11:43 AM, Najum Ali wrote: @Sebastian What do u mean with a standalone recommender? A simple offline java main program? Am 17.04.2014 um 11:41 schrieb Sebastian Schelter

Re: Performance Issue using item-based approach!

2014-04-17 Thread Najum Ali
Ok, here you go:I have created a simple class with main-method (no server and other stuff):public class RecommenderTest { public static void main(String[] args) throws IOException, TasteException { DataModel dataModel = new FileDataModel(new

Re: Performance Issue using item-based approach!

2014-04-17 Thread Najum Ali
@Sebastian wow … you are right. The original csv file is about 21mb and the corresponding precomputed item-item similarity file is about 260mb!! And yes, there are wide more than 50 most similar items“ for an item .. Trying to restrict this to 50 (or something like that) most similar items for

Re: Performance Issue using item-based approach!

2014-04-17 Thread Najum Ali
Ted, Is it also possible to use ItemSimilarityJob in a non-distributed environment? Am 17.04.2014 um 16:22 schrieb Ted Dunning ted.dunn...@gmail.com: Najum, You should also be able to use the ItemSimilarityJob to compute a limited indicator set. This is stepping off of the path you have