I've created a guide for scaling out a recommender system, maybe it is
useful for you:
http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/
Am 19.03.2012 06:20 schrieb Mridul Kapoor mridulkap...@gmail.com:
On 19 March 2012 02:24, Ted Dunning ted.dunn...@gmail.com
Session data never needs to be in memory. It can be processed sequentially or
using map reduce.
The item item data is all you need in memory.
Sent from my iPhone
On Mar 18, 2012, at 10:19 PM, Mridul Kapoor mridulkap...@gmail.com wrote:
On 19 March 2012 02:24, Ted Dunning
Hello,
Does Mahout have support for Edit Distance between two Strings? I looked on
the web but can't find anything. Please let me know if it does.
Thanks very much,
-Ahmed
No I don't think that really comes into play in any of the ML algorithms
here. At least I do not recall seeing it.
On Mon, Mar 19, 2012 at 3:44 PM, Ahmed Abdeen Hamed ahmed.elma...@gmail.com
wrote:
Hello,
Does Mahout have support for Edit Distance between two Strings? I looked on
the web
Thanks very much!
-Ahmed
On Mon, Mar 19, 2012 at 11:46 AM, Sean Owen sro...@gmail.com wrote:
No I don't think that really comes into play in any of the ML algorithms
here. At least I do not recall seeing it.
lucene has some classes to calculate the edit distance.
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/contrib-spellchecker/org/apache/lucene/search/spell/package-summary.html
Package org.apache.lucene.search.spell
regards
reinhard
Am 19.03.2012 16:44, schrieb Ahmed Abdeen
Mahout doesn't, but Lucene does and the Lucene libraries are shipped with
Mahout
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/spell/LevensteinDistance.html
On Mon, Mar 19, 2012 at 10:49, Ahmed Abdeen Hamed
ahmed.elma...@gmail.comwrote:
Thanks
While I didn't as nice a job as your friend, TFIDF of n-grams has
consistently done very well for me. The soft TFIDF that they examine is
something that I haven't previously looked at, but everything else seems
just in order based on what I have seen.
On Mon, Mar 19, 2012 at 1:06 PM, Dawid Weiss
Hmm... I just realized I've sent an incorrect link. That is: the link
is fine (and the paper as well), but none of these folks are my among
my friends :)
The one I meant to send is this one:
http://www.pubzone.org/dblp/conf/ltconf/PiskorskiSW07
Dawid
On Mon, Mar 19, 2012 at 9:30 PM, Ted Dunning
I guess you figured this out but the cluster drivers take -cl, which
tells them to put points into the calculated clusters and output to the
clusterPoints directory. Then you pass that in to clusterdump.
instructions here:
https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering
Thanks again for the reference!
One more question if you don't mind. How do I get the text keys for
each cluster? I mean, so at beginning we have input file for
seq2sparse, and we knew the format is
textkey1 text1
textkey2 text2
..
after mahout kmeans, and clusterdump, how do I know, for
Thanks Sebastian,
That guide turned out useful for me. In accordance I have changed my
approach now. I have written a script to put the virtual sessions data in a
preferences.csv file, from mongodb.
Now, I need to pre-compute ItemSimilarities. I probably wont be using a
hadoop cluster. Is there a
On Mon, Mar 19, 2012 at 10:06 PM, Mridul Kapoor mridulkap...@gmail.comwrote:
Is there a way that I run the ItemSimilarityJob on a single
machine ?
Yes. There is a sequential invocation as well.
13 matches
Mail list logo