date:20120413

Recommend a set of users for an item

2012-04-13 Thread Will C

So I've seen methods to have Mahout Taste recommend items for a user, such as: https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/recommender/Recommender.html#recommend(long, int) Is there the equivalent for the opposite, where I want to find a set of users that can be

Re: Recommended way to consume Nutch data in Mahout

2012-04-13 Thread Ted Dunning

Nutch has scalability limits that other crawlers are able to avoid so it isn't quite as fashionable lately. Ken Krugler's work with common crawl and Bixo is a bit more current. On Fri, Apr 13, 2012 at 6:36 PM, Pat Ferrel wrote: > Thanks I'll check that out. > > Actually it was pretty easy to wr

Re: Recommended way to consume Nutch data in Mahout

2012-04-13 Thread Pat Ferrel

Thanks I'll check that out. Actually it was pretty easy to write a custom SequenceFilesFromDirectoryFilter. I'm just a little surprised no one is using crawled data from Nutch already. On 4/13/12 4:22 PM, Peyman Mohajerian wrote: One solution is to use Solr, which integrates nicely with Nutc

Re: Recommended way to consume Nutch data in Mahout

2012-04-13 Thread Peyman Mohajerian

One solution is to use Solr, which integrates nicely with Nutch. Read data off Solr using SolrReader API. On Fri, Apr 13, 2012 at 7:03 AM, Pat Ferrel wrote: > I'd like to use Nutch to gather data to process with Mahout. Nutch creates > parsed text for the pages it crawls. Nutch also has several

Re: Why my evaluator.evaluate return NaN

2012-04-13 Thread Sean Owen

NaN means "not a number". It is kind of like "null". This means no answer could be computed, and usually that is because your data is too sparse. For the kind of recommender you are running, you don't want such sparse data, and it sounds like you are making the data sparse by removing a lot of rati

RE: Classification: using the Java API always returns the same category

2012-04-13 Thread Verachten Bruno

> This shows that category3 is being selected for your input string. I dont see > any apparent problems. The problem is that the category3 is always selected whatever the input string is... > Can you try to run over the training data and see if the models is > predicting right in your api vers

Re: Classification: using the Java API always returns the same category

2012-04-13 Thread Robin Anil

This shows that category3 is being selected for your input string. I dont see any apparent problems. Can you try to run over the training data and see if the models is predicting right in your api version, just as a sanity check. Again send logs of the run. -- Robin Anil 2012/4/13 Verachten

Re: Mahout recommendation engine with Map Reduce.

2012-04-13 Thread Sean Owen

The code you have been using is quite separate from the Hadoop-based recommender code. There is really no way to just modify it a little to use Hadoop. The Hadoop-based code in org.apache.mahout.cf.taste.hadoop operates as a big batch job that computes results in bulk, not real-time. The closest t

Recommended way to consume Nutch data in Mahout

2012-04-13 Thread Pat Ferrel

I'd like to use Nutch to gather data to process with Mahout. Nutch creates parsed text for the pages it crawls. Nutch also has several cl tools to turn the data into a text file (readseg for instance). The tools I've found either create one big text file with markers in it for records or allow

Re: LDA clustering documentation (mahout-07-snapshot)

2012-04-13 Thread antonio d'agata

Hi Jake, before I didn't understand what you are meaning. I tried the command cvb as (N.B. for now I'm working without hadoop): *mahout cvb -i DB-vectors/tfidf-vectors -dict DB-vectors/dictionary.file-0 -o DB-CVB-output -dt DB-CVB-document -k 50 -mt DB-CVB-states -x 10 -tf 0.2* but it gives me

Re: LDA clustering documentation (mahout-07-snapshot)

2012-04-13 Thread antonio d'agata

Thanks for answering me, i don't get error, but the output file doesn't show me the documents ID. (50 topics set) {0:0.002547369011977743,1:0.00233198734746577,2:0.0027053304459988474,,46:0.002681078237741154,47:0.0022995728183704102,48:0.0023898609263648157,49:0.0025577382030260733} {0:

RE: Classification: using the Java API always returns the same category

2012-04-13 Thread Verachten Bruno

Here you are: 17:48:12.402 [main]INFO o.a.m.c.b.SequenceFileModelReader - Read 5 feature weights 17:48:12.678 [main]INFO o.a.m.c.b.SequenceFileModelReader - Read 10 feature weights 17:48:13.187 [main]INFO o.a.m.c.b.SequenceFileModelReader - Read 150

Recommend a set of users for an item

Re: Recommended way to consume Nutch data in Mahout

Re: Recommended way to consume Nutch data in Mahout

Re: Recommended way to consume Nutch data in Mahout

Re: Why my evaluator.evaluate return NaN

RE: Classification: using the Java API always returns the same category

Re: Classification: using the Java API always returns the same category

Re: Mahout recommendation engine with Map Reduce.

Recommended way to consume Nutch data in Mahout

Re: LDA clustering documentation (mahout-07-snapshot)

Re: LDA clustering documentation (mahout-07-snapshot)

RE: Classification: using the Java API always returns the same category

12 matches

Site Navigation

Mail list logo

Footer information