Recommend a set of users for an item

2012-04-13 Thread Will C
So I've seen methods to have Mahout Taste recommend items for a user, such as: https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/recommender/Recommender.html#recommend(long, int) Is there the equivalent for the opposite, where I want to find a set of users that can be

Re: Recommended way to consume Nutch data in Mahout

2012-04-13 Thread Ted Dunning
Nutch has scalability limits that other crawlers are able to avoid so it isn't quite as fashionable lately. Ken Krugler's work with common crawl and Bixo is a bit more current. On Fri, Apr 13, 2012 at 6:36 PM, Pat Ferrel wrote: > Thanks I'll check that out. > > Actually it was pretty easy to wr

Re: Recommended way to consume Nutch data in Mahout

2012-04-13 Thread Pat Ferrel
Thanks I'll check that out. Actually it was pretty easy to write a custom SequenceFilesFromDirectoryFilter. I'm just a little surprised no one is using crawled data from Nutch already. On 4/13/12 4:22 PM, Peyman Mohajerian wrote: One solution is to use Solr, which integrates nicely with Nutc

Re: Recommended way to consume Nutch data in Mahout

2012-04-13 Thread Peyman Mohajerian
One solution is to use Solr, which integrates nicely with Nutch. Read data off Solr using SolrReader API. On Fri, Apr 13, 2012 at 7:03 AM, Pat Ferrel wrote: > I'd like to use Nutch to gather data to process with Mahout. Nutch creates > parsed text for the pages it crawls. Nutch also has several

Re: Why my evaluator.evaluate return NaN

2012-04-13 Thread Sean Owen
NaN means "not a number". It is kind of like "null". This means no answer could be computed, and usually that is because your data is too sparse. For the kind of recommender you are running, you don't want such sparse data, and it sounds like you are making the data sparse by removing a lot of rati

RE: Classification: using the Java API always returns the same category

2012-04-13 Thread Verachten Bruno
> This shows that category3 is being selected for your input string. I dont see > any apparent problems. The problem is that the category3 is always selected whatever the input string is... > Can you try to run over the training data and see if the models is > predicting right in your api vers

Re: Classification: using the Java API always returns the same category

2012-04-13 Thread Robin Anil
This shows that category3 is being selected for your input string. I dont see any apparent problems. Can you try to run over the training data and see if the models is predicting right in your api version, just as a sanity check. Again send logs of the run. -- Robin Anil 2012/4/13 Verachten

Re: Mahout recommendation engine with Map Reduce.

2012-04-13 Thread Sean Owen
The code you have been using is quite separate from the Hadoop-based recommender code. There is really no way to just modify it a little to use Hadoop. The Hadoop-based code in org.apache.mahout.cf.taste.hadoop operates as a big batch job that computes results in bulk, not real-time. The closest t

Recommended way to consume Nutch data in Mahout

2012-04-13 Thread Pat Ferrel
I'd like to use Nutch to gather data to process with Mahout. Nutch creates parsed text for the pages it crawls. Nutch also has several cl tools to turn the data into a text file (readseg for instance). The tools I've found either create one big text file with markers in it for records or allow

Re: LDA clustering documentation (mahout-07-snapshot)

2012-04-13 Thread antonio d'agata
Hi Jake, before I didn't understand what you are meaning. I tried the command cvb as (N.B. for now I'm working without hadoop): *mahout cvb -i DB-vectors/tfidf-vectors -dict DB-vectors/dictionary.file-0 -o DB-CVB-output -dt DB-CVB-document -k 50 -mt DB-CVB-states -x 10 -tf 0.2* but it gives me

Re: LDA clustering documentation (mahout-07-snapshot)

2012-04-13 Thread antonio d'agata
Thanks for answering me, i don't get error, but the output file doesn't show me the documents ID. (50 topics set) {0:0.002547369011977743,1:0.00233198734746577,2:0.0027053304459988474,,46:0.002681078237741154,47:0.0022995728183704102,48:0.0023898609263648157,49:0.0025577382030260733} {0:

RE: Classification: using the Java API always returns the same category

2012-04-13 Thread Verachten Bruno
Here you are: 17:48:12.402 [main]INFO o.a.m.c.b.SequenceFileModelReader - Read 5 feature weights 17:48:12.678 [main]INFO o.a.m.c.b.SequenceFileModelReader - Read 10 feature weights 17:48:13.187 [main]INFO o.a.m.c.b.SequenceFileModelReader - Read 150