Hadoop implementation of ParallelSGDFactorizer

2013-09-07 Thread Tevfik Aytekin
Hi, There seems to be no Hadoop implementation of ParallelSGDFactorizer. ALSWRFactorizer has a Hadoop implementation. ParallelSGDFactorizer (since it is based on stochastic gradient descent) is much faster than ALSWRFactorizer. I don't know Hadoop much. But it seems to me that a Hadoop

Re: Kmeans - clustering help

2013-09-07 Thread P Kal
It seems that I've had the wrong idea the entire time. Thanks for the help. On Fri, Sep 6, 2013 at 3:45 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: seq2sparse uses Lucene Standard tokenization to generate the tfidf vectors. But since your data is in CSV format (from the example u had

Re: Hadoop implementation of ParallelSGDFactorizer

2013-09-07 Thread Sebastian Schelter
IIRC the algorithm behind ParallelSGDFactorizer needs shared memory, which is not given in a shared-nothing environment. On 07.09.2013 19:08, Tevfik Aytekin wrote: Hi, There seems to be no Hadoop implementation of ParallelSGDFactorizer. ALSWRFactorizer has a Hadoop implementation.

Re: Solr recommender

2013-09-07 Thread Ted Dunning
On Fri, Sep 6, 2013 at 9:33 AM, Pat Ferrel pat.fer...@gmail.com wrote: One of the unique things about the Solr recommender is online recs. Two scenarios come to mind: 1) ask the user to pick from among a list of videos, taking the picks as preferences and making recs. Make more and see if

Re: Hadoop implementation of ParallelSGDFactorizer

2013-09-07 Thread Tevfik Aytekin
Sebastian, what is IIRC? On Sat, Sep 7, 2013 at 8:24 PM, Sebastian Schelter ssc.o...@googlemail.com wrote: IIRC the algorithm behind ParallelSGDFactorizer needs shared memory, which is not given in a shared-nothing environment. On 07.09.2013 19:08, Tevfik Aytekin wrote: Hi, There seems to

Re: Hadoop implementation of ParallelSGDFactorizer

2013-09-07 Thread Ted Dunning
That means If I Recall Correctly. It is an internet slang. See also http://en.wiktionary.org/wiki/Appendix:English_internet_slang On Sat, Sep 7, 2013 at 12:39 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Sebastian, what is IIRC? On Sat, Sep 7, 2013 at 8:24 PM, Sebastian Schelter

Re: Mahout readable output

2013-09-07 Thread Ted Dunning
Darius comments are good. You also have to think about what similar means to you. From the data you describe, I see several possibilities: - geo-location from machine id (if it includes IP address) - content from the query - frequency of posting - diurnal phase of posting (tells us time

Re: Solr recommender

2013-09-07 Thread Pat Ferrel
On Sep 7, 2013, at 10:36 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Fri, Sep 6, 2013 at 9:33 AM, Pat Ferrel pat.fer...@gmail.com wrote: One of the unique things about the Solr recommender is online recs. Two scenarios come to mind: 1) ask the user to pick from among a list of

Re: Solr recommender

2013-09-07 Thread Ted Dunning
On Sat, Sep 7, 2013 at 2:35 PM, Pat Ferrel p...@occamsmachete.com wrote: ... Clustering can be done by doing SVD or ALS on the user x thing matrix first or by directly clustering the columns of the user x thing matrix after some kind of IDF weighting. I think that only the streaming