This is what I am running from command line
mahout trainnb -i /trainingMahout/ -o /outputNaiveBayes/ -el -li
/labelIndex -a .5 -ow
On Mon, Feb 23, 2015 at 1:17 PM, chirag lakhani chirag.lakh...@gmail.com
wrote:
I am trying to train a Naive Bayes model in Mahout and I keep getting a
java heap
Thanks! Is that standard practice or do people typically serialize their
encoders and then load the binaries later?
On Wed, Jan 7, 2015 at 5:25 PM, Ted Dunning ted.dunn...@gmail.com wrote:
On Wed, Jan 7, 2015 at 2:20 PM, chirag lakhani chirag.lakh...@gmail.com
wrote:
In the Mahout
I find the java documentation for the classifyfull method in Naive Bayes
I have instantiated a Naive Bayes classifier
StandardNaiveBayesClassifier classifier = new
StandardNaiveBayesClassifier(model);
and then I try to evaluate a particular vector
Vector resultVector =
I meant to say that I found the java documentation to be confusing
On Thu, Jan 8, 2015 at 1:49 PM, chirag lakhani chirag.lakh...@gmail.com
wrote:
I find the java documentation for the classifyfull method in Naive Bayes
I have instantiated a Naive Bayes classifier
I am trying vectorize text data for a Naive Bayes classifier that will be
trained in Hadoop and then the corresponding model will be deployed in a
Java app. My basic approach is to tokenize a string of text data using
Lucene and then encode each token using a StaticWordValueEncoder here are
the
are not being included. How
would I include them in my MapReduce job?
Chirag
On Mon, Jan 5, 2015 at 5:28 PM, chirag lakhani chirag.lakh...@gmail.com
wrote:
I am trying to emulate something similar to what was done in this chimpler
example
https://chimpler.wordpress.com/2013/03/13/using-the-mahout
I am trying to emulate something similar to what was done in this chimpler
example
https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/
If you have data like this
tech308215054011194110 Limited 3-Box $20 BOGO,
I have looked around for information about this but I still feel unsure
about whether this is possible. I have recently developed a naive bayes
model in sci-kit learn that takes some short text data (along with a few
other categorical features) and develops a classifier model. I would like
to
solution to
this given all that is included in the Vector class or do I need to create
my own method?
Chirag
--
*Chirag Lakhani*
Data Scientist
Zaloni, Inc. | www.zaloni.com
633 Davis Dr., Suite 200
Durham, NC 27713
e: clakh...@zaloni.com
p: 919.602.4965 x7020
.
--
--
--
*Chirag Lakhani*
Data Scientist
Zaloni, Inc. | www.zaloni.com
633 Davis Dr., Suite 200
Durham, NC 27713
e: clakh...@zaloni.com
p: 919.602.4965 x7020
So how does the column mean get calculated if the --pcaOffset option is not
specified? I would think you are just doing SVD at that point.
On Tue, Jul 2, 2013 at 5:52 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
On Tue, Jul 2, 2013 at 1:52 PM, Chirag Lakhani clakh...@zaloni.com
wrote
will require one additional
MR pass over A.
Bottom line, typically one wants something along the lines
ssvd --pca=true -u=false -v=false -us=true ...
On Wed, Jul 3, 2013 at 8:58 AM, Dmitriy Lyubimov dlie...@gmail.com
wrote:
On Jul 3, 2013 6:56 AM, Chirag Lakhani clakh...@zaloni.com wrote
(SSVDSolver.java:328)
at pca_factory.main(pca_factory.java:97)
On Wed, Jul 3, 2013 at 3:25 PM, Chirag Lakhani clakh...@zaloni.com wrote:
Okay thanks for that. After working on that issue I am still having
trouble running the SSVD solver. I know I have asked this before but I
still
okay thanks. It looks like I have that part running so I will go back to
the SSVDCli to finish the rest. Thanks for your help.
Chirag
On Wed, Jul 3, 2013 at 4:19 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
On Wed, Jul 3, 2013 at 12:25 PM, Chirag Lakhani clakh...@zaloni.com
wrote:
Okay
?
--
*Chirag Lakhani*
Data Scientist
Zaloni, Inc. | www.zaloni.com
633 Davis Dr., Suite 200
Durham, NC 27713
e: clakh...@zaloni.com
p: 919.602.4965 x7020
ready to cluster.
Then use the streaming k-means stuff.
On Mon, Jun 24, 2013 at 4:43 PM, Chirag Lakhani clakh...@zaloni.com
wrote:
What data base interfaces are there for Mahout? The website mentions
MongoDB and Cassandra. I get the feeling these are for recommender
systems
only
What data base interfaces are there for Mahout? The website mentions
MongoDB and Cassandra. I get the feeling these are for recommender systems
only. Are there any database that Mahout can interface directly in order
to perform clustering?
I am thinking of an example where I have a large table
dynamics goes away since you
aren't predicting ratings in any case.
On Tue, Apr 30, 2013 at 7:18 AM, Chirag Lakhani clakh...@zaloni.com
wrote:
I was wondering if the collaborative filtering library in Mahout has any
algorithms that incorporate concept drift i.e. time dynamics. From my
own
I was wondering if the collaborative filtering library in Mahout has any
algorithms that incorporate concept drift i.e. time dynamics. From my own
research I have come across the BellKor algorithm called TimeSVD++ and
there is a recent paper using hidden markov models with collaborative
Do you know of any other large scale machine learning platforms that do
incorporate it?
On Tue, Apr 30, 2013 at 10:21 AM, Sean Owen sro...@gmail.com wrote:
No, time is in the data model but nothing uses it that I know of.
On Tue, Apr 30, 2013 at 3:18 PM, Chirag Lakhani clakh...@zaloni.com
it into a distributed row matrix type?
On Fri, Apr 12, 2013 at 1:19 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
On Fri, Apr 12, 2013 at 8:42 AM, Dmitriy Lyubimov dlie...@gmail.com
wrote:
No,this is not right.
I will explain later when i have a moment.
On Apr 12, 2013 8:08 AM, Chirag
I am having trouble understanding whether the following code is sufficient
for running PCA
I have a sequence file of dense vectors that I am calling and then I am
trying to run the following code
SSVDSolver pcaFactory = new SSVDSolver(conf, new Path(vectorsFolder), new
22 matches
Mail list logo