Re: java heap space error - Naive Bayes

2015-02-23 Thread chirag lakhani
This is what I am running from command line mahout trainnb -i /trainingMahout/ -o /outputNaiveBayes/ -el -li /labelIndex -a .5 -ow On Mon, Feb 23, 2015 at 1:17 PM, chirag lakhani chirag.lakh...@gmail.com wrote: I am trying to train a Naive Bayes model in Mahout and I keep getting a java heap

Re: consistency of StaticWordValueEncoder

2015-01-08 Thread chirag lakhani
Thanks! Is that standard practice or do people typically serialize their encoders and then load the binaries later? On Wed, Jan 7, 2015 at 5:25 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Wed, Jan 7, 2015 at 2:20 PM, chirag lakhani chirag.lakh...@gmail.com wrote: In the Mahout

output of Naive Bayes Classifer

2015-01-08 Thread chirag lakhani
I find the java documentation for the classifyfull method in Naive Bayes I have instantiated a Naive Bayes classifier StandardNaiveBayesClassifier classifier = new StandardNaiveBayesClassifier(model); and then I try to evaluate a particular vector Vector resultVector =

Re: output of Naive Bayes Classifer

2015-01-08 Thread chirag lakhani
I meant to say that I found the java documentation to be confusing On Thu, Jan 8, 2015 at 1:49 PM, chirag lakhani chirag.lakh...@gmail.com wrote: I find the java documentation for the classifyfull method in Naive Bayes I have instantiated a Naive Bayes classifier

consistency of StaticWordValueEncoder

2015-01-07 Thread chirag lakhani
I am trying vectorize text data for a Naive Bayes classifier that will be trained in Hadoop and then the corresponding model will be deployed in a Java app. My basic approach is to tokenize a string of text data using Lucene and then encode each token using a StaticWordValueEncoder here are the

Re: example of hashing vectorizer for text data using mapreduce code

2015-01-06 Thread chirag lakhani
are not being included. How would I include them in my MapReduce job? Chirag On Mon, Jan 5, 2015 at 5:28 PM, chirag lakhani chirag.lakh...@gmail.com wrote: I am trying to emulate something similar to what was done in this chimpler example https://chimpler.wordpress.com/2013/03/13/using-the-mahout

example of hashing vectorizer for text data using mapreduce code

2015-01-05 Thread chirag lakhani
I am trying to emulate something similar to what was done in this chimpler example https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/ If you have data like this tech308215054011194110 Limited 3-Box $20 BOGO,

deploying Naive Bayes model in Java App

2014-12-18 Thread chirag lakhani
I have looked around for information about this but I still feel unsure about whether this is possible. I have recently developed a naive bayes model in sci-kit learn that takes some short text data (along with a few other categorical features) and develops a classifier model. I would like to

sparsification of a Mahout vector

2014-03-02 Thread Chirag Lakhani
solution to this given all that is included in the Vector class or do I need to create my own method? Chirag -- *Chirag Lakhani* Data Scientist Zaloni, Inc. | www.zaloni.com 633 Davis Dr., Suite 200 Durham, NC 27713 e: clakh...@zaloni.com p: 919.602.4965 x7020

Re: How to SSVD output to generate Clusters

2013-08-01 Thread Chirag Lakhani
. -- -- -- *Chirag Lakhani* Data Scientist Zaloni, Inc. | www.zaloni.com 633 Davis Dr., Suite 200 Durham, NC 27713 e: clakh...@zaloni.com p: 919.602.4965 x7020

Re: PCA using Java Code

2013-07-03 Thread Chirag Lakhani
So how does the column mean get calculated if the --pcaOffset option is not specified? I would think you are just doing SVD at that point. On Tue, Jul 2, 2013 at 5:52 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Tue, Jul 2, 2013 at 1:52 PM, Chirag Lakhani clakh...@zaloni.com wrote

Re: PCA using Java Code

2013-07-03 Thread Chirag Lakhani
will require one additional MR pass over A. Bottom line, typically one wants something along the lines ssvd --pca=true -u=false -v=false -us=true ... On Wed, Jul 3, 2013 at 8:58 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Jul 3, 2013 6:56 AM, Chirag Lakhani clakh...@zaloni.com wrote

Re: PCA using Java Code

2013-07-03 Thread Chirag Lakhani
(SSVDSolver.java:328) at pca_factory.main(pca_factory.java:97) On Wed, Jul 3, 2013 at 3:25 PM, Chirag Lakhani clakh...@zaloni.com wrote: Okay thanks for that. After working on that issue I am still having trouble running the SSVD solver. I know I have asked this before but I still

Re: PCA using Java Code

2013-07-03 Thread Chirag Lakhani
okay thanks. It looks like I have that part running so I will go back to the SSVDCli to finish the rest. Thanks for your help. Chirag On Wed, Jul 3, 2013 at 4:19 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Wed, Jul 3, 2013 at 12:25 PM, Chirag Lakhani clakh...@zaloni.com wrote: Okay

PCA using Java Code

2013-07-02 Thread Chirag Lakhani
? -- *Chirag Lakhani* Data Scientist Zaloni, Inc. | www.zaloni.com 633 Davis Dr., Suite 200 Durham, NC 27713 e: clakh...@zaloni.com p: 919.602.4965 x7020

Re: database support for clustering

2013-06-25 Thread Chirag Lakhani
ready to cluster. Then use the streaming k-means stuff. On Mon, Jun 24, 2013 at 4:43 PM, Chirag Lakhani clakh...@zaloni.com wrote: What data base interfaces are there for Mahout? The website mentions MongoDB and Cassandra. I get the feeling these are for recommender systems only

database support for clustering

2013-06-24 Thread Chirag Lakhani
What data base interfaces are there for Mahout? The website mentions MongoDB and Cassandra. I get the feeling these are for recommender systems only. Are there any database that Mahout can interface directly in order to perform clustering? I am thinking of an example where I have a large table

Re: Time Based Recommender System

2013-05-01 Thread Chirag Lakhani
dynamics goes away since you aren't predicting ratings in any case. On Tue, Apr 30, 2013 at 7:18 AM, Chirag Lakhani clakh...@zaloni.com wrote: I was wondering if the collaborative filtering library in Mahout has any algorithms that incorporate concept drift i.e. time dynamics. From my own

Time Based Recommender System

2013-04-30 Thread Chirag Lakhani
I was wondering if the collaborative filtering library in Mahout has any algorithms that incorporate concept drift i.e. time dynamics. From my own research I have come across the BellKor algorithm called TimeSVD++ and there is a recent paper using hidden markov models with collaborative

Re: Time Based Recommender System

2013-04-30 Thread Chirag Lakhani
Do you know of any other large scale machine learning platforms that do incorporate it? On Tue, Apr 30, 2013 at 10:21 AM, Sean Owen sro...@gmail.com wrote: No, time is in the data model but nothing uses it that I know of. On Tue, Apr 30, 2013 at 3:18 PM, Chirag Lakhani clakh...@zaloni.com

Re: Java Code for PCA

2013-04-16 Thread Chirag Lakhani
it into a distributed row matrix type? On Fri, Apr 12, 2013 at 1:19 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Fri, Apr 12, 2013 at 8:42 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: No,this is not right. I will explain later when i have a moment. On Apr 12, 2013 8:08 AM, Chirag

Java Code for PCA

2013-04-12 Thread Chirag Lakhani
I am having trouble understanding whether the following code is sufficient for running PCA I have a sequence file of dense vectors that I am calling and then I am trying to run the following code SSVDSolver pcaFactory = new SSVDSolver(conf, new Path(vectorsFolder), new