Hi.
Let me first say a BIG THANKS to the Mahout community and the authors of
the book Mahout In Action. I have just started with Hadoop and Mahout and
am finding them extremely useful.
I am running the following code:
hadoop jar mahout-core-0.5-job.jar
I think you have to modify the source code by incorporating some filters.
-邮件原件-
发件人: Akshay Jain [mailto:jaks...@gmail.com]
发送时间: 2011年11月8日 17:24
收件人: user@mahout.apache.org
主题: Collaborative filtering help needed
Hi.
Let me first say a BIG THANKS to the Mahout community and the
I think specifying an item file with one item does it? You should get one
rec for most users that is that item.
On Nov 8, 2011 9:25 AM, Akshay Jain jaks...@gmail.com wrote:
Hi.
Let me first say a BIG THANKS to the Mahout community and the authors of
the book Mahout In Action. I have just
Sean, that can be done, but I don't want to get the same item's prediction
for all the users. Is there no other way than to manually code it? (I dont
know java, so I dont think that would be possible for me :( )
On Tue, Nov 8, 2011 at 3:02 PM, Sean Owen sro...@gmail.com wrote:
I think
Hi,
my data is available in XML, it's looking something like that:
data
doc
title ... /title
abstract ... /abstract
keyword ... /keyword
...
keyword ... /keyword
keyword ... /keyword
/doc
doc
...
/doc
/data
I looked into the wikipedia-example and I have a few
I'm not sure what you are asking then. Taking your first email literally it
sounds like you want to estimate the ratings you already know! Do you want
1 rec per user?
On Nov 8, 2011 9:47 AM, Akshay Jain jaks...@gmail.com wrote:
Sean, that can be done, but I don't want to get the same item's
I think he means that he want to get an estimated score for specific item
corresponding to each user, thus the item is not scored by the user.
-邮件原件-
发件人: Sean Owen [mailto:sro...@gmail.com]
发送时间: 2011年11月8日 17:52
收件人: user@mahout.apache.org
主题: Re: Collaborative filtering help needed
Sorry for the bad english.
I want to get the predicted rating predicted rating for users that I
specify for specific items (which will vary from user to user)
E.g.
User1 : I want to get predicted rating for Item_67
user1: i Want to get predicted rating for Item_23
user1: i Want to get predicted
OK, for that, you'd have to modify the code.
You are not talking about getting ratings for the ratings you already know
right? so, user 1 does not already express any rating for item 67 in your
example.
On Tue, Nov 8, 2011 at 10:05 AM, Akshay Jain jaks...@gmail.com wrote:
Sorry for the bad
Hi,
I have a general question about multi-label classification. Binary- or
single-label classification is working, as shown in several examples
(Wikipedia and 20Newsgroup, Mahout In Action book...).
Are there some working examples on multi-label calssification for trying
out?
Or is there some
Yes. I want to predict ratings for new things which the user has not rated
yet.
On Tue, Nov 8, 2011 at 4:00 PM, Sean Owen sro...@gmail.com wrote:
OK, for that, you'd have to modify the code.
You are not talking about getting ratings for the ratings you already know
right? so, user 1 does not
What exactly do you mean by multi-label classification?
The 20 newsgroup example has many possible label values.
Are you asking for an example where multiple labels might be applied to a
single example? If so, no, we don't have a nice example of that.
On Tue, Nov 8, 2011 at 5:36 AM, David
Yes, I was asking for an example where multiple labels might be aplied to a
single example.
Thanks and regards,
David
2011/11/8 Ted Dunning ted.dunn...@gmail.com
What exactly do you mean by multi-label classification?
The 20 newsgroup example has many possible label values.
Are you asking
Hi everybody,
I have completed my first Mahout experiment with an Hadoop local
installation (single machine) and I obtained different results from Scilab
and the Mahout Distributed Lanczos Solver. Could someone explain why this
happens? Am I doing something wrong?
This is my matrix
2,0,8,6,0
The practical techniques for such problems are pretty diverse.
One method is to simply define multiple binary classifiers. If you can
stratify your labels, then you can have some labels depend on others.
Another option is to find commonly occurring sets of labels and build
classifiers for those
I have a lots of data from where I work. The data are documents (title +
abstract) and each document can have one or more categories (e.g. social
sciences + policics). We want to build a recommender and analyze the output
for further testing.
Thanks and regards,
David
2011/11/8 Ted Dunning
Recommender? Recommenders are not normally used for adding categories to
documents.
Is it possible for you to release blinded data in which all terms and
categories are replaced by numbers and permuted? Or even just stemmed and
sorted as with the RCV1 corpus?
Having such a test corpus would
Good morning,
Would it be possible for anyone to point me in the right direction with a
minor problem I am having. I'm trying to run a job using the
BayesFeatureDriver (version 0.5) from my webapp which would use a remote
hadoop cluster. The problem I am having is that the driver executes
Sorry, I mean classifier, my bad...
Unfortunatly I can't release the raw data. And I don't really know how to
blind the data correctly. I could give the categories numbers and hash the
text (title and abstract) if that would be enough...
Thanks and regards,
David
2011/11/8 Ted Dunning
Hi,
I'm trying to use RowSimilarityJob (current trunk) to calculate pairwise
similarities between feature vectors but I'm struggling a bit with the
correct input format.
I used SparseVectorsFromSequenceFiles to create a bunch of vectors from
documents. But using the tfidf vectors directly
Hi Sören,
RowSimilarityJob expects IntWritable,VectorWritable as input. It should
be a reasonable choice for comparing the pairwise similarities between
text documents. I suggest you throw away the 1% most frequent terms as
described in http://terpconnect.umd.edu/~oard/pdf/acl08elsayed2.pdf. I
I am a Mahout newbie so please take this so I might be wrong, but I strongly
suspect it has to do with one of your Eigenvalues being 0. That implies a
singular matrix. You will see that your first two Eigenvalues are equal to the
singular values. Parsing the structure in smaller eiganvals get
From MAHOUT-344 from the patch author:
The idea behind keyGroups is to concatenate hashes from multiple hash functions
reduce the probability of collision between 2 users that agreed on 1 or more
individual hash values. This essentially improves the average similarity of
users in a cluster.
On Tue, Nov 8, 2011 at 4:35 AM, Ted Dunning ted.dunn...@gmail.com wrote:
The practical techniques for such problems are pretty diverse.
One method is to simply define multiple binary classifiers.
Only? You mean the only method we have currently implemented, right?
Labeled
Thanks for the response it helped.
I fixed the problem by adding a core-site.xml file to the root of my jar,
and added the fs.name.dir and mapred.job.tracker properties to it and it
worked.
On Nov 8, 2011 9:24 AM, Sean Owen sro...@gmail.com wrote:
This is more a Hadoop question; I don't think
Sorry, meant to say fs.default.name for the namenode property.
On Nov 8, 2011 1:32 PM, Jamal B jm151...@gmail.com wrote:
Thanks for the response it helped.
I fixed the problem by adding a core-site.xml file to the root of my jar,
and added the fs.name.dir and mapred.job.tracker properties to
Ok after simply converting the vector keys from Text to IntWritable, it
worked fine for me.
Took a while though, but it ran only on my local machine with default
vectorization settings and almost no preprocessing, so there's much room
for improvement.
Thanks for your help!
Sören
On 08/11/11
Team
I saw a presentation on Mahout at hadoop world NY. The room was filled up. Lot
of interest from attendees.
Most questions were on basics. However there were some interest to see more
Mahout algos enabled for mapreduce mode.
Good to see more interest in Mahout!
Hi all,
Sometimes my cluster labels are terms that hardly occur in the
combined text of the documents of a cluster. I would expect to see a
label of a term that occurs very frequently across documents of the
cluster.
For example, suppose there is a cluster of tweets about Mahout. You
would see a
Hello All,
I am trying to use Clustering algorithms to recover Software Architecture
by using static features of code (e.g. method invocations, field accesses,
etc).
To start with, I ran the TestClusterDumper ( using testDirichlet2()
function) on the sample example given. But I am not able to
Akshay-
The Mahout 0.5 release has bugs. We advise everyone to use the Mahout 0.6
trunk.
Lance
On Tue, Nov 8, 2011 at 2:53 AM, Akshay Jain jaks...@gmail.com wrote:
Yes. I want to predict ratings for new things which the user has not rated
yet.
On Tue, Nov 8, 2011 at 4:00 PM, Sean Owen
Should you not be using Recommender.estimatePreference?
On Wed, Nov 9, 2011 at 12:19 AM, Lance Norskog goks...@gmail.com wrote:
Akshay-
The Mahout 0.5 release has bugs. We advise everyone to use the Mahout 0.6
trunk.
Lance
On Tue, Nov 8, 2011 at 2:53 AM, Akshay Jain jaks...@gmail.com
Could this project be done with symbol sequences instead of hash codes? The
advantage of symbol sequences is that you can unpack them.
On Tue, Nov 8, 2011 at 9:54 AM, Vishal Santoshi
vishal.santo...@gmail.comwrote:
Yep.
By concatenating p hash-keys ( generated from p functions ) for each
@Steven- Can you please tell me how to use
Recommender.estimatePreference?
@lance - Thanks for letting me know. I will update it to the latest
version.
Thanks
Akshay
On Wed, Nov 9, 2011 at 5:51 AM, Steven Bourke sbou...@gmail.com wrote:
Should you not be using Recommender.estimatePreference?
Bummer that I couldn't be there. I am in town, but had to meet with
customers during all the talks.
On Tue, Nov 8, 2011 at 4:24 PM, manjuna...@yahoo.com wrote:
Team
I saw a presentation on Mahout at hadoop world NY. The room was filled up.
Lot of interest from attendees.
Most questions were
On Tue, Nov 8, 2011 at 1:07 PM, Jake Mannix jake.man...@gmail.com wrote:
On Tue, Nov 8, 2011 at 4:35 AM, Ted Dunning ted.dunn...@gmail.com wrote:
The practical techniques for such problems are pretty diverse.
One method is to simply define multiple binary classifiers.
Only? You mean
@Steven this is in the distributed part. There is no such method. But
Akshay if your data is not large, yeah, you could save a whole lot of
time and trouble by not using the Hadoop-based code.
@Lance I am not sure there's any evidence he's running into a bug. I
don't know that the general there
37 matches
Mail list logo