RE: Logistic Regression in Mahout

2013-01-30 Thread Prabhu
Thanks, I thought of that, but that doesn't seem to be the right explanation either For one, in the output I see the equation like TargetVariable ~ -0.001*InterceptTerm + - 0.0006*predictor1 + -0.0004*predictor2 Also if I look at the say predictor1, the co-efficient in R is 1.02 and for predi

RE: Logistic Regression in Mahout

2013-01-30 Thread Prabhu
Hi, Thanks for your response. The class that I am using is org.apache.mahout.classifier.sgd.TrainLogistic Each line in the input file is of the form Targetvalue, predictor1value, predictor2value, predictor20value e.g. lines 1, 1.4, 1.9, 2.3,01.0 0, 1.2,0,3.4,..0.0 ,

Re: cvb/lda run time

2013-01-30 Thread Andy Schlaikjer
I assume you mean input *matrix* with 600,000 doc-term *vectors*. You need to ensure these vectors are split evenly across many part files. The number of part files will determine input splits and in turn map-side parallelism. Could you let us know how much input each of your 70 mappers is proces

Re: can i run mahout algorithms on mobile device..

2013-01-30 Thread Jake Mannix
The *training* of many Mahout algorithms are on Hadoop, but the output classifiers (e.g. a binary text classifier [trained with L1 regularization to sparsify] for spam filtering) could certainly fit on a small footprint like a mobile phone. On Wed, Jan 30, 2013 at 7:46 AM, Mahesh Balija wrote: >

cvb/lda run time

2013-01-30 Thread David LaBarbera
I ran cvb on AWS (mahout 0.7 and amazon's hadoop 1.0.3). I'm running it with hadoop jar mahout-fat.jar org.apache.mahout.driver.MahoutDriver \ cvb \ -i /lda/matrix-converted/matrix \ -o 's3n://.../lda/results \ -dict /lda/dictionary.file-0 \ -dt s3n://.../lda/doc-topics \ -k 10 -x 10 The diction

Re: can i run mahout algorithms on mobile device..

2013-01-30 Thread Saikat Kanjilal
Hi Vignesh, Do you really need mahout for this , you could just write the classification algorithm yourself and run it on a mobile lite offline storage as needed. Ping me offline if you want to discuss in more detail. Regards Sent from my iPhone On Jan 30, 2013, at 7:46 AM, Mahesh Balija wrot

Re: Logistic Regression in Mahout

2013-01-30 Thread Jake Mannix
Looks like you're looking at weights which are logs of the weights you think you want. On Wed, Jan 30, 2013 at 4:12 AM, Prabhu wrote: > Hi all, > > I am trying to use Mahout to run logistic regression analysis on some > data. The data is about 7 Million rows, with about 20 predictor variabl

Re: Logistic Regression in Mahout

2013-01-30 Thread Ted Dunning
What classes are you using and how are you using them? How are you producing the training vectors? On Wed, Jan 30, 2013 at 4:12 AM, Prabhu wrote: > Hi all, > > I am trying to use Mahout to run logistic regression analysis on some > data. The data is about 7 Million rows, with about 20 predi

Re: Clustering using Solr Index vs Lucene Index : Different Results

2013-01-30 Thread Vinay B,
Just a set of mahout commands. Here they are. https://gist.github.com/4674331 For what it's worth,t he relevant solr config from the schema was Thank You On Wed, Jan 30, 2013 at 4:37 AM, Grant Ingersoll wrote: > Can you gist (gist.github.org) or pastebin your code? > > On Jan 29, 2013, at 5

Re: can i run mahout algorithms on mobile device..

2013-01-30 Thread Mahesh Balija
AFAIK it is NOT possible. As Mahout runs on top of Hadoop. Also Hadoop is a distributed computing framework, it will run on cluster of machines. So ideally it may NOT be possible to run on a Mobile. On Wed, Jan 30, 2013 at 8:46 PM, VIGNESH S wrote: > I am trying to implement some classification

can i run mahout algorithms on mobile device..

2013-01-30 Thread VIGNESH S
I am trying to implement some classification in android mobile device.. is it possible to use mahout in mobile device..Please kindly help me -- Thanks and Regards Vignesh Srinivasan 9739135640

Logistic Regression in Mahout

2013-01-30 Thread Prabhu
Hi all, I am trying to use Mahout to run logistic regression analysis on some data. The data is about 7 Million rows, with about 20 predictor variables (all of them numeric). The target variable is Boolean - 0 or 1. I run a logistic regression with this data on R and I get good co-efficients

MiA NewsKMeansClustering Example Help

2013-01-30 Thread Chris Harrington
Hi all, I'm new to Mahout and I've been going through the MiA book, lately I've been trying Chapter 10's example of NewsKMeansClustering as it looks like a good starting point for my own stuff but I've run into a problem just trying to run and view the output. I'm trying to view the output of

Re: Clustering using Solr Index vs Lucene Index : Different Results

2013-01-30 Thread Grant Ingersoll
Can you gist (gist.github.org) or pastebin your code? On Jan 29, 2013, at 5:12 PM, vybe3142 wrote: > Reposting - I wasn't subscribed to the group earlier > > > VS > > first ingesting the data into SOLR and then invoking mahout on the SOLR > index (clustering on the contents of the field "text

Re: Using setPreference() to update recommendations in DataModel in Memory

2013-01-30 Thread Sean Owen
It throws an exception except in a few implementations, mostly the ones based on a database. It isn't something that's really used -- you instead update the backing store indirectly. Yes, the model is batch re-reads of data once in a while. Updates are not in real time in this model. On Wed, Jan 3

Re: Boolean preferences and evaluation

2013-01-30 Thread Sean Owen
No, this is usually discussed under the name "implicit feedback" in the literature. On Wed, Jan 30, 2013 at 2:49 AM, Zia mel wrote: > I tried to find more details about the boolean preferences but > couldn't find any. Did you discover this idea or it has been known and > used before?

Re: Using setPreference() to update recommendations in DataModel in Memory

2013-01-30 Thread Henning Kuich
So what does the method do instead? And basically the conclusion is: To "update" your recommender with new preference values, you need to reload the data model and everything that follows? Thanks, Henning On Tue, Jan 29, 2013 at 7:30 PM, Sean Owen wrote: > It doesn't really work this way. Th