Re: MongoDBDataModel in memory ?

2012-03-19 Thread Sebastian Schelter
I've created a guide for scaling out a recommender system, maybe it is useful for you: http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/ Am 19.03.2012 06:20 schrieb Mridul Kapoor mridulkap...@gmail.com: On 19 March 2012 02:24, Ted Dunning ted.dunn...@gmail.com

Re: MongoDBDataModel in memory ?

2012-03-19 Thread Ted Dunning
Session data never needs to be in memory. It can be processed sequentially or using map reduce. The item item data is all you need in memory. Sent from my iPhone On Mar 18, 2012, at 10:19 PM, Mridul Kapoor mridulkap...@gmail.com wrote: On 19 March 2012 02:24, Ted Dunning

Edit Distance

2012-03-19 Thread Ahmed Abdeen Hamed
Hello, Does Mahout have support for Edit Distance between two Strings? I looked on the web but can't find anything. Please let me know if it does. Thanks very much, -Ahmed

Re: Edit Distance

2012-03-19 Thread Sean Owen
No I don't think that really comes into play in any of the ML algorithms here. At least I do not recall seeing it. On Mon, Mar 19, 2012 at 3:44 PM, Ahmed Abdeen Hamed ahmed.elma...@gmail.com wrote: Hello, Does Mahout have support for Edit Distance between two Strings? I looked on the web

Re: Edit Distance

2012-03-19 Thread Ahmed Abdeen Hamed
Thanks very much! -Ahmed On Mon, Mar 19, 2012 at 11:46 AM, Sean Owen sro...@gmail.com wrote: No I don't think that really comes into play in any of the ML algorithms here. At least I do not recall seeing it.

Re: Edit Distance

2012-03-19 Thread reinhard schwab
lucene has some classes to calculate the edit distance. http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/contrib-spellchecker/org/apache/lucene/search/spell/package-summary.html Package org.apache.lucene.search.spell regards reinhard Am 19.03.2012 16:44, schrieb Ahmed Abdeen

Re: Edit Distance

2012-03-19 Thread David Kincaid
Mahout doesn't, but Lucene does and the Lucene libraries are shipped with Mahout http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/spell/LevensteinDistance.html On Mon, Mar 19, 2012 at 10:49, Ahmed Abdeen Hamed ahmed.elma...@gmail.comwrote: Thanks

Re: Edit Distance

2012-03-19 Thread Ted Dunning
While I didn't as nice a job as your friend, TFIDF of n-grams has consistently done very well for me. The soft TFIDF that they examine is something that I haven't previously looked at, but everything else seems just in order based on what I have seen. On Mon, Mar 19, 2012 at 1:06 PM, Dawid Weiss

Re: Edit Distance

2012-03-19 Thread Dawid Weiss
Hmm... I just realized I've sent an incorrect link. That is: the link is fine (and the paper as well), but none of these folks are my among my friends :) The one I meant to send is this one: http://www.pubzone.org/dblp/conf/ltconf/PiskorskiSW07 Dawid On Mon, Mar 19, 2012 at 9:30 PM, Ted Dunning

Re: can't get point-id, cluster-id thru -p

2012-03-19 Thread Pat Ferrel
I guess you figured this out but the cluster drivers take -cl, which tells them to put points into the calculated clusters and output to the clusterPoints directory. Then you pass that in to clusterdump. instructions here: https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering

Re: can't get point-id, cluster-id thru -p

2012-03-19 Thread Baoqiang Cao
Thanks again for the reference! One more question if you don't mind. How do I get the text keys for each cluster? I mean, so at beginning we have input file for seq2sparse, and we knew the format is textkey1 text1 textkey2 text2 .. after mahout kmeans, and clusterdump, how do I know, for

Re: MongoDBDataModel in memory ?

2012-03-19 Thread Mridul Kapoor
Thanks Sebastian, That guide turned out useful for me. In accordance I have changed my approach now. I have written a script to put the virtual sessions data in a preferences.csv file, from mongodb. Now, I need to pre-compute ItemSimilarities. I probably wont be using a hadoop cluster. Is there a

Re: MongoDBDataModel in memory ?

2012-03-19 Thread Ted Dunning
On Mon, Mar 19, 2012 at 10:06 PM, Mridul Kapoor mridulkap...@gmail.comwrote: Is there a way that I run the ItemSimilarityJob on a single machine ? Yes. There is a sequential invocation as well.