Changing output columns of logistic regression

2014-05-27 Thread Chhaya Vishwakarma
Hi, Logistic regression gives output which has three columns Target Model-output likelihood Is it possible to add more columns to output? I would like to add an ID column so that i can join logistic result with input data. Regards, Chhaya Vishwakarma The c

Re: Confusion on runtime of mahout.

2014-05-27 Thread dongdan39
Yes, those nodes are running tasks. For Logistic Regression, it's reasonable as this algorithm is only sequential implementation. But for Naive Bayes and Random Forest, it's hard to understand. By the way, how do I know/check if I am running the distributed implementation of these algorithms? An

Re: Confusion on runtime of mahout.

2014-05-27 Thread Jay Vyas
have you verified that all the slaves are running tasks? sometimes only a few slaves on a cluster willl pick up a task because of other limitations. Also some algorithms in mahout arent distribnuted. also obviously you will want to make sure that your running the distributed implementations of the

Confusion on runtime of mahout.

2014-05-27 Thread dongdan39
Hi, Expert, I'm confused about the runtime of mahout on e.g Random Forest(the same with Logistic Regression): no matter how I set the number of slaves from 2, 8 to 20 in conf/slaves in Hadoop, the runtime of the program are basically the same. Shouldn't it be faster when the program runs on mo

Re: Trouble with AdaptiveLogistic command line

2014-05-27 Thread Duncan Lawie
Just briefly ... It looks like org/apache/mahout/classifier/sgd/CsvRecordFactory.java is throwing a null exception when there is no target column at line 197 196: // record target column and establish dictionary for decoding target 197: target = vars.get(targetName); Letting vars.get(tar

Re: Indicator Matrix and Mahout + Solr recommender

2014-05-27 Thread Sebastian Schelter
I have added the threshold merely as a way to increase the performance of RowSimilarityJob. If a threshold is given, some item pairs don't need to be looked at. A simple example is if you use cooccurrence count as similarity measure, and set a threshold of n cooccurrences, than any pair contain

Re: Indicator Matrix and Mahout + Solr recommender

2014-05-27 Thread Pat Ferrel
> > On May 27, 2014, at 8:15 AM, Ted Dunning wrote: > > The threshold should not normally be used in the Mahout+Solr deployment > style. Understood and that’s why an alternative way of specifying a cutoff may be a good idea. > > This need is better supported by specifying the maximum number

Re: Indicator Matrix and Mahout + Solr recommender

2014-05-27 Thread Ted Dunning
The threshold should not normally be used in the Mahout+Solr deployment style. This need is better supported by specifying the maximum number of indicators. This is mathematically equivalent to specifying a fraction of values, but is more meaningful to users since good values for this number are

Indicator Matrix and Mahout + Solr recommender

2014-05-27 Thread Pat Ferrel
I was talking with Ken Krugler off list about the Mahout + Solr recommender and he had an interesting request. When calculating the indicator/item similarity matrix using ItemSimilarityJob there is a --threshold option. Wouldn’t it be better to have an option that specified the fraction of va