Mahout is just a library that runs on Hadoop, so best practices for writing
Hadoop drivers should be applicable:
Implement the Tool interface
If you are writing a Java driver, then consider implementing the
Toolhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/util/Tool.html
Yes Mahout already does all this. I think you need to set HADOOP_OPTS when
using the runner script.
On Wed, Dec 14, 2011 at 7:59 AM, Konstantin Shmakov kshma...@gmail.comwrote:
Mahout is just a library that runs on Hadoop, so best practices for writing
Hadoop drivers should be applicable:
I have noticed that in Mahout's wiki
(https://cwiki.apache.org/MAHOUT/recommender-documentation.html), it
is recommended to set JVM setting -XX:-DisableExplicitGC
However in the book Mahout in Action Appendix A JVM tuning this
setting is not mentioned.
Assuming, I use out-of-the-box Mahout item-
It's very minor, and is irrelevant unless you are embedding the recommender
in another app, and that app may call System.gc(). You can ignore it or set
it as you like.
2011/12/14 Aleksei Udatšnõi a.udac...@gmail.com
I have noticed that in Mahout's wiki
The version is 0.6-SNAPSHOT
From terminal both commands trainclassifier and testclassifier work.
Actually my real purpose is to use the TrainNaiveBayesJob in order to
obtain a StandardNaiveBayesClassifier that i can use with the
ModelDissector class similiar to chapter 15 in Mahout In Action,
Hi there,
I am writing a Grails plugin for mahout recommender (collaborative
filtering) at the moment. As I am beginner of mahout, I have some doubt for
the implementation of RecommenderBuilder for SlopeOneRecommender and
MySQLJDBCDataModel.
Please see the code at
Yes, because the DataModel you receive from the eval framework is certainly
not a MySQL-backed one; it is an artificial one (GenericDataModel) for
testing. If you need to inject a custom DataModel, you need
DataModelBuilder. This may take some work, to push the test data into a
table, create a
I think that using the model dissector with NaiveBayes will not work
easily. The assumption inside the model dissector is that there is a model
matrix compatible with logistic regression to be had.
The easy way to get everything to work is to simply use a single
categorical variable that can
I ran into a case where this setting was very important.
I ran my recommender app in Apache Tomcat and used precomputed item
similarities that were held in memory (these occupied something like 2
or 3 GB if I recall correctly).
Tomcat (I used 5.5) ensures that each object is checked for GC at
Hi all,
The implementation given in the following link only works while working with
standalone mode and not while in pseudodistributed mode or distributed
mode:
http://homepage.mac.com/j.norstad/matrix-multiply
To be percise for sparse matrices it works fine but with dense matrices
strategy 1
Hi,
I am a newbie in Mahout and also have elementary knowledge of
clustering. I managed to cluster my data using meanshift and then ran
clusterdumper, I get following output:
MSV-21{n=1 c=[1:0...]
So I asssume that the cluster above has converged and n=1 indicates
that there is only one
Ted Dunning ted.dunning at gmail.com writes:
This is pretty confusing.
What has happened is that you have encoded a single categorical variable
that has four states as four numerical variables. Unfortunately, Mahout
has gotten the message that you are using four categorical variables
What was on your command line? e.g. seqFileDir, pointsDir, etc
On Wed, Dec 14, 2011 at 10:54 AM, ipshita chatterji sigmare...@gmail.comwrote:
Hi,
I am a newbie in Mahout and also have elementary knowledge of
clustering. I managed to cluster my data using meanshift and then ran
For clusterdumper I had following commandline:
$MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-6
--output clusteranalyze.txt
Have written a separate program to read clusteredOutput directory as
clusterdumper with --pointsDir output/clusteredPoints was giving
OOM exception.
Ensure that you increase the JVM memory settings when running the clusterdump
program to avoid OOM.
From: ipshita chatterji sigmare...@gmail.com
To: user@mahout.apache.org
Sent: Wednesday, December 14, 2011 12:37 PM
Subject: Re: Query on clusterdumper output
Ok, i was thinking i could easily use the ModelDissector class because
requires an AbstractVectorClassifier and the
StandardNaiveBayesClassifier in the naivebayes package extends that
class.
On 14 December 2011 14:42, Ted Dunning ted.dunn...@gmail.com wrote:
I think that using the model
On 14.12.2011 David Boney wrote:
Sure, we are studying machine learning using Mahout. We have started a
weekly hackers dojo to learn how to implement Hadoop based machine
learning programs using Mahout. Once the group get some experience using
Mahout, we are going to focus on projects to add
Ok. See if you can get the --pointsDir working and post what you get. Also
for seqFileDir do you have a directory with the word 'final' in it?
On Dec 14, 2011, at 12:37 PM, ipshita chatterji sigmare...@gmail.com wrote:
For clusterdumper I had following commandline:
$MAHOUT_HOME/bin/mahout
While Ted answered the Dissector question, your original issue, I believe, is
that Mahout currently has two different NB implementations.
trainclassifier/testclassifier use the old, word based package which requires
Text as input. The new package, which TrainNaiveBayesJob uses, requires
Actually clustering was done using 0.5 version of mahout but I am
using the clusterterdumper code from current version of mahout present
in trunk to analyze the clusters. To make it run I renamed the final
cluster by appending -final.
I got the OOM error even after increasing the mahout heapsize
You don't need to write your own code for analyzing clustered points.
You can use ClusterOutputPostProcessorDriver which will post process
your clusters and group clusters belonging to different clusters in
their respective directories. You won't get any OOM here.
Example of using it is here
Some typo in previous mail. Please read :
...which will post process your clustering output and group vectors
belonging to different clusters in their respective directories...
On 15-12-2011 10:34, Paritosh Ranjan wrote:
You don't need to write your own code for analyzing clustered points.
Have you tried using the bin/mahout program?
On Wed, Dec 14, 2011 at 1:43 AM, Sean Owen sro...@gmail.com wrote:
Yes Mahout already does all this. I think you need to set HADOOP_OPTS when
using the runner script.
On Wed, Dec 14, 2011 at 7:59 AM, Konstantin Shmakov kshma...@gmail.comwrote:
Hello all,
I am developing a sentiment analyser using apache mahout, for
which I need training data sets. Would be obliged if you can let me know about
data sets on word polarity, classifying words into positive,negative and
neutral categories.
P.S : I am ready to purchase
Hi Ramaprakash
I think you are looking for Sentiment Data set .
You can find some free data at
http://www.cs.jhu.edu/~mdredze/datasets/sentiment/
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
If you want to develope specific data sets there are ways to do it
automatically or semi
25 matches
Mail list logo