Re: Re: Re: how to add -mapred.job.queue.name support for mahout modules ?

2011-12-14 Thread Konstantin Shmakov
Mahout is just a library that runs on Hadoop, so best practices for writing Hadoop drivers should be applicable: Implement the Tool interface If you are writing a Java driver, then consider implementing the Toolhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/util/Tool.html

Re: Re: Re: how to add -mapred.job.queue.name support for mahout modules ?

2011-12-14 Thread Sean Owen
Yes Mahout already does all this. I think you need to set HADOOP_OPTS when using the runner script. On Wed, Dec 14, 2011 at 7:59 AM, Konstantin Shmakov kshma...@gmail.comwrote: Mahout is just a library that runs on Hadoop, so best practices for writing Hadoop drivers should be applicable:

JVM setting DisableExplicitGC

2011-12-14 Thread Aleksei Udatšnõi
I have noticed that in Mahout's wiki (https://cwiki.apache.org/MAHOUT/recommender-documentation.html), it is recommended to set JVM setting -XX:-DisableExplicitGC However in the book Mahout in Action Appendix A JVM tuning this setting is not mentioned. Assuming, I use out-of-the-box Mahout item-

Re: JVM setting DisableExplicitGC

2011-12-14 Thread Sean Owen
It's very minor, and is irrelevant unless you are embedding the recommender in another app, and that app may call System.gc(). You can ignore it or set it as you like. 2011/12/14 Aleksei Udatšnõi a.udac...@gmail.com I have noticed that in Mahout's wiki

Re: SequenceFile cast problems

2011-12-14 Thread Daniele Volpi
The version is 0.6-SNAPSHOT From terminal both commands trainclassifier and testclassifier work. Actually my real purpose is to use the TrainNaiveBayesJob in order to obtain a StandardNaiveBayesClassifier that i can use with the ModelDissector class similiar to chapter 15 in Mahout In Action,

SlopeOneRecommender and MySQLJDBCDataModel in RecommenderBuilder Implementation

2011-12-14 Thread Chee Kin Lim
Hi there, I am writing a Grails plugin for mahout recommender (collaborative filtering) at the moment. As I am beginner of mahout, I have some doubt for the implementation of RecommenderBuilder for SlopeOneRecommender and MySQLJDBCDataModel. Please see the code at

Re: SlopeOneRecommender and MySQLJDBCDataModel in RecommenderBuilder Implementation

2011-12-14 Thread Sean Owen
Yes, because the DataModel you receive from the eval framework is certainly not a MySQL-backed one; it is an artificial one (GenericDataModel) for testing. If you need to inject a custom DataModel, you need DataModelBuilder. This may take some work, to push the test data into a table, create a

Re: SequenceFile cast problems

2011-12-14 Thread Ted Dunning
I think that using the model dissector with NaiveBayes will not work easily. The assumption inside the model dissector is that there is a model matrix compatible with logistic regression to be had. The easy way to get everything to work is to simply use a single categorical variable that can

Re: JVM setting DisableExplicitGC

2011-12-14 Thread Sebastian Schelter
I ran into a case where this setting was very important. I ran my recommender app in Apache Tomcat and used precomputed item similarities that were held in memory (these occupied something like 2 or 3 GB if I recall correctly). Tomcat (I used 5.5) ensures that each object is checked for GC at

Re: Fwd: A MapReduce Algorithm for Matrix Multiplication

2011-12-14 Thread chwaqas254
Hi all, The implementation given in the following link only works while working with standalone mode and not while in pseudodistributed mode or distributed mode: http://homepage.mac.com/j.norstad/matrix-multiply To be percise for sparse matrices it works fine but with dense matrices strategy 1

Query on clusterdumper output and clusteredPoints

2011-12-14 Thread ipshita chatterji
Hi, I am a newbie in Mahout and also have elementary knowledge of clustering. I managed to cluster my data using meanshift and then ran clusterdumper, I get following output: MSV-21{n=1 c=[1:0...] So I asssume that the cluster above has converged and n=1 indicates that there is only one

Re: Understanding TrainLogistic's output

2011-12-14 Thread magicalo
Ted Dunning ted.dunning at gmail.com writes: This is pretty confusing. What has happened is that you have encoded a single categorical variable that has four states as four numerical variables. Unfortunately, Mahout has gotten the message that you are using four categorical variables

Re: Query on clusterdumper output and clusteredPoints

2011-12-14 Thread Gary Snider
What was on your command line? e.g. seqFileDir, pointsDir, etc On Wed, Dec 14, 2011 at 10:54 AM, ipshita chatterji sigmare...@gmail.comwrote: Hi, I am a newbie in Mahout and also have elementary knowledge of clustering. I managed to cluster my data using meanshift and then ran

Re: Query on clusterdumper output and clusteredPoints

2011-12-14 Thread ipshita chatterji
For clusterdumper I had following commandline: $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-6 --output clusteranalyze.txt Have written a separate program to read clusteredOutput directory as clusterdumper with --pointsDir output/clusteredPoints was giving OOM exception.

Re: Query on clusterdumper output and clusteredPoints

2011-12-14 Thread Suneel Marthi
Ensure that you increase the JVM memory settings when running the clusterdump program to avoid OOM. From: ipshita chatterji sigmare...@gmail.com To: user@mahout.apache.org Sent: Wednesday, December 14, 2011 12:37 PM Subject: Re: Query on clusterdumper output

Re: SequenceFile cast problems

2011-12-14 Thread Daniele Volpi
Ok, i was thinking i could easily use the ModelDissector class because requires an AbstractVectorClassifier and the StandardNaiveBayesClassifier in the naivebayes package extends that class. On 14 December 2011 14:42, Ted Dunning ted.dunn...@gmail.com wrote: I think that using the model

Re: Austin SIGKDD - Next Meeting Wednesday, December 14, 2011, 7:00 - 8:00 pm

2011-12-14 Thread Isabel Drost
On 14.12.2011 David Boney wrote: Sure, we are studying machine learning using Mahout. We have started a weekly hackers dojo to learn how to implement Hadoop based machine learning programs using Mahout. Once the group get some experience using Mahout, we are going to focus on projects to add

Re: Query on clusterdumper output and clusteredPoints

2011-12-14 Thread Gary Snider
Ok. See if you can get the --pointsDir working and post what you get. Also for seqFileDir do you have a directory with the word 'final' in it? On Dec 14, 2011, at 12:37 PM, ipshita chatterji sigmare...@gmail.com wrote: For clusterdumper I had following commandline: $MAHOUT_HOME/bin/mahout

Re: SequenceFile cast problems

2011-12-14 Thread Grant Ingersoll
While Ted answered the Dissector question, your original issue, I believe, is that Mahout currently has two different NB implementations. trainclassifier/testclassifier use the old, word based package which requires Text as input. The new package, which TrainNaiveBayesJob uses, requires

Re: Query on clusterdumper output and clusteredPoints

2011-12-14 Thread ipshita chatterji
Actually clustering was done using 0.5 version of mahout but I am using the clusterterdumper code from current version of mahout present in trunk to analyze the clusters. To make it run I renamed the final cluster by appending -final. I got the OOM error even after increasing the mahout heapsize

Re: Query on clusterdumper output and clusteredPoints

2011-12-14 Thread Paritosh Ranjan
You don't need to write your own code for analyzing clustered points. You can use ClusterOutputPostProcessorDriver which will post process your clusters and group clusters belonging to different clusters in their respective directories. You won't get any OOM here. Example of using it is here

Re: Query on clusterdumper output and clusteredPoints

2011-12-14 Thread Paritosh Ranjan
Some typo in previous mail. Please read : ...which will post process your clustering output and group vectors belonging to different clusters in their respective directories... On 15-12-2011 10:34, Paritosh Ranjan wrote: You don't need to write your own code for analyzing clustered points.

Re: Re: Re: how to add -mapred.job.queue.name support for mahout modules ?

2011-12-14 Thread Lance Norskog
Have you tried using the bin/mahout program? On Wed, Dec 14, 2011 at 1:43 AM, Sean Owen sro...@gmail.com wrote: Yes Mahout already does all this. I think you need to set HADOOP_OPTS when using the runner script. On Wed, Dec 14, 2011 at 7:59 AM, Konstantin Shmakov kshma...@gmail.comwrote:

Sentiment analysis data sets

2011-12-14 Thread ramprakash.ramamoorthy
Hello all, I am developing a sentiment analyser using apache mahout, for which I need training data sets. Would be obliged if you can let me know about data sets on word polarity, classifying words into positive,negative and neutral categories. P.S : I am ready to purchase

Re: Sentiment analysis data sets

2011-12-14 Thread JAGANADH G
Hi Ramaprakash I think you are looking for Sentiment Data set . You can find some free data at http://www.cs.jhu.edu/~mdredze/datasets/sentiment/ http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html If you want to develope specific data sets there are ways to do it automatically or semi