Collaborative filtering help needed

2011-11-08 Thread Akshay Jain
Hi. Let me first say a BIG THANKS to the Mahout community and the authors of the book Mahout In Action. I have just started with Hadoop and Mahout and am finding them extremely useful. I am running the following code: hadoop jar mahout-core-0.5-job.jar

Re: Collaborative filtering help needed

2011-11-08 Thread 张玉东
I think you have to modify the source code by incorporating some filters. -邮件原件- 发件人: Akshay Jain [mailto:jaks...@gmail.com] 发送时间: 2011年11月8日 17:24 收件人: user@mahout.apache.org 主题: Collaborative filtering help needed Hi. Let me first say a BIG THANKS to the Mahout community and the

Re: Collaborative filtering help needed

2011-11-08 Thread Sean Owen
I think specifying an item file with one item does it? You should get one rec for most users that is that item. On Nov 8, 2011 9:25 AM, Akshay Jain jaks...@gmail.com wrote: Hi. Let me first say a BIG THANKS to the Mahout community and the authors of the book Mahout In Action. I have just

Re: Collaborative filtering help needed

2011-11-08 Thread Akshay Jain
Sean, that can be done, but I don't want to get the same item's prediction for all the users. Is there no other way than to manually code it? (I dont know java, so I dont think that would be possible for me :( ) On Tue, Nov 8, 2011 at 3:02 PM, Sean Owen sro...@gmail.com wrote: I think

Using XML-Data

2011-11-08 Thread David Rahman
Hi, my data is available in XML, it's looking something like that: data doc title ... /title abstract ... /abstract keyword ... /keyword ... keyword ... /keyword keyword ... /keyword /doc doc ... /doc /data I looked into the wikipedia-example and I have a few

Re: Collaborative filtering help needed

2011-11-08 Thread Sean Owen
I'm not sure what you are asking then. Taking your first email literally it sounds like you want to estimate the ratings you already know! Do you want 1 rec per user? On Nov 8, 2011 9:47 AM, Akshay Jain jaks...@gmail.com wrote: Sean, that can be done, but I don't want to get the same item's

Re: Collaborative filtering help needed

2011-11-08 Thread 张玉东
I think he means that he want to get an estimated score for specific item corresponding to each user, thus the item is not scored by the user. -邮件原件- 发件人: Sean Owen [mailto:sro...@gmail.com] 发送时间: 2011年11月8日 17:52 收件人: user@mahout.apache.org 主题: Re: Collaborative filtering help needed

Re: Collaborative filtering help needed

2011-11-08 Thread Akshay Jain
Sorry for the bad english. I want to get the predicted rating predicted rating for users that I specify for specific items (which will vary from user to user) E.g. User1 : I want to get predicted rating for Item_67 user1: i Want to get predicted rating for Item_23 user1: i Want to get predicted

Re: Collaborative filtering help needed

2011-11-08 Thread Sean Owen
OK, for that, you'd have to modify the code. You are not talking about getting ratings for the ratings you already know right? so, user 1 does not already express any rating for item 67 in your example. On Tue, Nov 8, 2011 at 10:05 AM, Akshay Jain jaks...@gmail.com wrote: Sorry for the bad

Mahout and multi-label classification

2011-11-08 Thread David Rahman
Hi, I have a general question about multi-label classification. Binary- or single-label classification is working, as shown in several examples (Wikipedia and 20Newsgroup, Mahout In Action book...). Are there some working examples on multi-label calssification for trying out? Or is there some

Re: Collaborative filtering help needed

2011-11-08 Thread Akshay Jain
Yes. I want to predict ratings for new things which the user has not rated yet. On Tue, Nov 8, 2011 at 4:00 PM, Sean Owen sro...@gmail.com wrote: OK, for that, you'd have to modify the code. You are not talking about getting ratings for the ratings you already know right? so, user 1 does not

Re: Mahout and multi-label classification

2011-11-08 Thread Ted Dunning
What exactly do you mean by multi-label classification? The 20 newsgroup example has many possible label values. Are you asking for an example where multiple labels might be applied to a single example? If so, no, we don't have a nice example of that. On Tue, Nov 8, 2011 at 5:36 AM, David

Re: Mahout and multi-label classification

2011-11-08 Thread David Rahman
Yes, I was asking for an example where multiple labels might be aplied to a single example. Thanks and regards, David 2011/11/8 Ted Dunning ted.dunn...@gmail.com What exactly do you mean by multi-label classification? The 20 newsgroup example has many possible label values. Are you asking

Comparing results of Mahout SVD and Scilab

2011-11-08 Thread motta
Hi everybody, I have completed my first Mahout experiment with an Hadoop local installation (single machine) and I obtained different results from Scilab and the Mahout Distributed Lanczos Solver. Could someone explain why this happens? Am I doing something wrong? This is my matrix 2,0,8,6,0

Re: Mahout and multi-label classification

2011-11-08 Thread Ted Dunning
The practical techniques for such problems are pretty diverse. One method is to simply define multiple binary classifiers. If you can stratify your labels, then you can have some labels depend on others. Another option is to find commonly occurring sets of labels and build classifiers for those

Re: Mahout and multi-label classification

2011-11-08 Thread David Rahman
I have a lots of data from where I work. The data are documents (title + abstract) and each document can have one or more categories (e.g. social sciences + policics). We want to build a recommender and analyze the output for further testing. Thanks and regards, David 2011/11/8 Ted Dunning

Re: Mahout and multi-label classification

2011-11-08 Thread Ted Dunning
Recommender? Recommenders are not normally used for adding categories to documents. Is it possible for you to release blinded data in which all terms and categories are replaced by numbers and permuted? Or even just stemmed and sorted as with the RCV1 corpus? Having such a test corpus would

BayesFeatureDriver Execution on remote cluster

2011-11-08 Thread Jamal B
Good morning, Would it be possible for anyone to point me in the right direction with a minor problem I am having. I'm trying to run a job using the BayesFeatureDriver (version 0.5) from my webapp which would use a remote hadoop cluster. The problem I am having is that the driver executes

Re: Mahout and multi-label classification

2011-11-08 Thread David Rahman
Sorry, I mean classifier, my bad... Unfortunatly I can't release the raw data. And I don't really know how to blind the data correctly. I could give the categories numbers and hash the text (title and abstract) if that would be enough... Thanks and regards, David 2011/11/8 Ted Dunning

RowSimilarityJob input

2011-11-08 Thread Sören Brunk
Hi, I'm trying to use RowSimilarityJob (current trunk) to calculate pairwise similarities between feature vectors but I'm struggling a bit with the correct input format. I used SparseVectorsFromSequenceFiles to create a bunch of vectors from documents. But using the tfidf vectors directly

Re: RowSimilarityJob input

2011-11-08 Thread Sebastian Schelter
Hi Sören, RowSimilarityJob expects IntWritable,VectorWritable as input. It should be a reasonable choice for comparing the pairwise similarities between text documents. I suggest you throw away the 1% most frequent terms as described in http://terpconnect.umd.edu/~oard/pdf/acl08elsayed2.pdf. I

Re: Comparing results of Mahout SVD and Scilab

2011-11-08 Thread Ed Fine
I am a Mahout newbie so please take this so I might be wrong, but I strongly suspect it has to do with one of your Eigenvalues being 0. That implies a singular matrix. You will see that your first two Eigenvalues are equal to the singular values. Parsing the structure in smaller eiganvals get

Re: Minhash key groups

2011-11-08 Thread Grant Ingersoll
From MAHOUT-344 from the patch author: The idea behind keyGroups is to concatenate hashes from multiple hash functions reduce the probability of collision between 2 users that agreed on 1 or more individual hash values. This essentially improves the average similarity of users in a cluster.

Re: Mahout and multi-label classification

2011-11-08 Thread Jake Mannix
On Tue, Nov 8, 2011 at 4:35 AM, Ted Dunning ted.dunn...@gmail.com wrote: The practical techniques for such problems are pretty diverse. One method is to simply define multiple binary classifiers. Only? You mean the only method we have currently implemented, right? Labeled

Re: BayesFeatureDriver Execution on remote cluster

2011-11-08 Thread Jamal B
Thanks for the response it helped. I fixed the problem by adding a core-site.xml file to the root of my jar, and added the fs.name.dir and mapred.job.tracker properties to it and it worked. On Nov 8, 2011 9:24 AM, Sean Owen sro...@gmail.com wrote: This is more a Hadoop question; I don't think

Re: BayesFeatureDriver Execution on remote cluster

2011-11-08 Thread Jamal B
Sorry, meant to say fs.default.name for the namenode property. On Nov 8, 2011 1:32 PM, Jamal B jm151...@gmail.com wrote: Thanks for the response it helped. I fixed the problem by adding a core-site.xml file to the root of my jar, and added the fs.name.dir and mapred.job.tracker properties to

Re: RowSimilarityJob input

2011-11-08 Thread Sören Brunk
Ok after simply converting the vector keys from Text to IntWritable, it worked fine for me. Took a while though, but it ran only on my local machine with default vectorization settings and almost no preprocessing, so there's much room for improvement. Thanks for your help! Sören On 08/11/11

Mahout session at hadoop world 2011

2011-11-08 Thread manjunaths
Team I saw a presentation on Mahout at hadoop world NY. The room was filled up. Lot of interest from attendees. Most questions were on basics. However there were some interest to see more Mahout algos enabled for mapreduce mode. Good to see more interest in Mahout!

Cluster labeling

2011-11-08 Thread Frank Scholten
Hi all, Sometimes my cluster labels are terms that hardly occur in the combined text of the documents of a cluster. I would expect to see a label of a term that occurs very frequently across documents of the cluster. For example, suppose there is a cluster of tweets about Mahout. You would see a

Dirichlet Clustering Output

2011-11-08 Thread praneet mhatre
Hello All, I am trying to use Clustering algorithms to recover Software Architecture by using static features of code (e.g. method invocations, field accesses, etc). To start with, I ran the TestClusterDumper ( using testDirichlet2() function) on the sample example given. But I am not able to

Re: Collaborative filtering help needed

2011-11-08 Thread Lance Norskog
Akshay- The Mahout 0.5 release has bugs. We advise everyone to use the Mahout 0.6 trunk. Lance On Tue, Nov 8, 2011 at 2:53 AM, Akshay Jain jaks...@gmail.com wrote: Yes. I want to predict ratings for new things which the user has not rated yet. On Tue, Nov 8, 2011 at 4:00 PM, Sean Owen

Re: Collaborative filtering help needed

2011-11-08 Thread Steven Bourke
Should you not be using Recommender.estimatePreference? On Wed, Nov 9, 2011 at 12:19 AM, Lance Norskog goks...@gmail.com wrote: Akshay- The Mahout 0.5 release has bugs. We advise everyone to use the Mahout 0.6 trunk. Lance On Tue, Nov 8, 2011 at 2:53 AM, Akshay Jain jaks...@gmail.com

Re: Minhash key groups

2011-11-08 Thread Lance Norskog
Could this project be done with symbol sequences instead of hash codes? The advantage of symbol sequences is that you can unpack them. On Tue, Nov 8, 2011 at 9:54 AM, Vishal Santoshi vishal.santo...@gmail.comwrote: Yep. By concatenating p hash-keys ( generated from p functions ) for each

Re: Collaborative filtering help needed

2011-11-08 Thread Akshay Jain
@Steven- Can you please tell me how to use Recommender.estimatePreference? @lance - Thanks for letting me know. I will update it to the latest version. Thanks Akshay On Wed, Nov 9, 2011 at 5:51 AM, Steven Bourke sbou...@gmail.com wrote: Should you not be using Recommender.estimatePreference?

Re: Mahout session at hadoop world 2011

2011-11-08 Thread Ted Dunning
Bummer that I couldn't be there. I am in town, but had to meet with customers during all the talks. On Tue, Nov 8, 2011 at 4:24 PM, manjuna...@yahoo.com wrote: Team I saw a presentation on Mahout at hadoop world NY. The room was filled up. Lot of interest from attendees. Most questions were

Re: Mahout and multi-label classification

2011-11-08 Thread Ted Dunning
On Tue, Nov 8, 2011 at 1:07 PM, Jake Mannix jake.man...@gmail.com wrote: On Tue, Nov 8, 2011 at 4:35 AM, Ted Dunning ted.dunn...@gmail.com wrote: The practical techniques for such problems are pretty diverse. One method is to simply define multiple binary classifiers. Only? You mean

Re: Collaborative filtering help needed

2011-11-08 Thread Sean Owen
@Steven this is in the distributed part. There is no such method. But Akshay if your data is not large, yeah, you could save a whole lot of time and trouble by not using the Hadoop-based code. @Lance I am not sure there's any evidence he's running into a bug. I don't know that the general there