So I've seen methods to have Mahout Taste recommend items for a user, such
as:
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/recommender/Recommender.html#recommend(long,
int)
Is there the equivalent for the opposite, where I want to find a set of
users that can be
Nutch has scalability limits that other crawlers are able to avoid so it
isn't quite as fashionable lately.
Ken Krugler's work with common crawl and Bixo is a bit more current.
On Fri, Apr 13, 2012 at 6:36 PM, Pat Ferrel wrote:
> Thanks I'll check that out.
>
> Actually it was pretty easy to wr
Thanks I'll check that out.
Actually it was pretty easy to write a custom
SequenceFilesFromDirectoryFilter. I'm just a little surprised no one is
using crawled data from Nutch already.
On 4/13/12 4:22 PM, Peyman Mohajerian wrote:
One solution is to use Solr, which integrates nicely with Nutc
One solution is to use Solr, which integrates nicely with Nutch. Read data
off Solr using SolrReader API.
On Fri, Apr 13, 2012 at 7:03 AM, Pat Ferrel wrote:
> I'd like to use Nutch to gather data to process with Mahout. Nutch creates
> parsed text for the pages it crawls. Nutch also has several
NaN means "not a number". It is kind of like "null". This means no
answer could be computed, and usually that is because your data is too
sparse. For the kind of recommender you are running, you don't want
such sparse data, and it sounds like you are making the data sparse by
removing a lot of rati
> This shows that category3 is being selected for your input string. I dont see
> any apparent problems.
The problem is that the category3 is always selected whatever the input string
is...
> Can you try to run over the training data and see if the models is
> predicting right in your api vers
This shows that category3 is being selected for your input string. I dont
see any apparent problems. Can you try to run over the training data and
see if the models is predicting right in your api version, just as a sanity
check. Again send logs of the run.
--
Robin Anil
2012/4/13 Verachten
The code you have been using is quite separate from the Hadoop-based
recommender code. There is really no way to just modify it a little to
use Hadoop. The Hadoop-based code in org.apache.mahout.cf.taste.hadoop
operates as a big batch job that computes results in bulk, not
real-time.
The closest t
I'd like to use Nutch to gather data to process with Mahout. Nutch
creates parsed text for the pages it crawls. Nutch also has several cl
tools to turn the data into a text file (readseg for instance). The
tools I've found either create one big text file with markers in it for
records or allow
Hi Jake,
before I didn't understand what you are meaning.
I tried the command cvb as (N.B. for now I'm working without hadoop):
*mahout cvb -i DB-vectors/tfidf-vectors -dict DB-vectors/dictionary.file-0
-o DB-CVB-output -dt DB-CVB-document -k 50 -mt DB-CVB-states -x 10 -tf 0.2*
but it gives me
Thanks for answering me,
i don't get error, but the output file doesn't show me the documents ID.
(50 topics set)
{0:0.002547369011977743,1:0.00233198734746577,2:0.0027053304459988474,,46:0.002681078237741154,47:0.0022995728183704102,48:0.0023898609263648157,49:0.0025577382030260733}
{0:
Here you are:
17:48:12.402 [main]INFO o.a.m.c.b.SequenceFileModelReader - Read
5 feature weights
17:48:12.678 [main]INFO o.a.m.c.b.SequenceFileModelReader - Read
10 feature weights
17:48:13.187 [main]INFO o.a.m.c.b.SequenceFileModelReader - Read
150
12 matches
Mail list logo