Hi,
Logistic regression gives output which has three columns
Target
Model-output
likelihood
Is it possible to add more columns to output?
I would like to add an ID column so that i can join logistic result with input
data.
Regards,
Chhaya Vishwakarma
The c
Yes, those nodes are running tasks. For Logistic Regression, it's reasonable as
this algorithm is
only sequential implementation. But for Naive Bayes and Random Forest, it's
hard to understand. By the way, how do I know/check if I am running the
distributed implementation of these algorithms? An
have you verified that all the slaves are running tasks? sometimes only a
few slaves on a cluster willl pick up a task because of other limitations.
Also some algorithms in mahout arent distribnuted.
also obviously you will want to make sure that your running the distributed
implementations of the
Hi, Expert,
I'm confused about the runtime of mahout on e.g Random Forest(the same with
Logistic Regression): no matter how I set the number of slaves from 2, 8 to 20
in conf/slaves in Hadoop,
the runtime of the program are basically the same. Shouldn't it be faster when
the program runs on mo
Just briefly ...
It looks like org/apache/mahout/classifier/sgd/CsvRecordFactory.java is
throwing a null exception when there is no target column at line 197
196: // record target column and establish dictionary for decoding target
197: target = vars.get(targetName);
Letting vars.get(tar
I have added the threshold merely as a way to increase the performance
of RowSimilarityJob. If a threshold is given, some item pairs don't need
to be looked at. A simple example is if you use cooccurrence count as
similarity measure, and set a threshold of n cooccurrences, than any
pair contain
>
> On May 27, 2014, at 8:15 AM, Ted Dunning wrote:
>
> The threshold should not normally be used in the Mahout+Solr deployment
> style.
Understood and that’s why an alternative way of specifying a cutoff may be a
good idea.
>
> This need is better supported by specifying the maximum number
The threshold should not normally be used in the Mahout+Solr deployment
style.
This need is better supported by specifying the maximum number of
indicators. This is mathematically equivalent to specifying a fraction of
values, but is more meaningful to users since good values for this number
are
I was talking with Ken Krugler off list about the Mahout + Solr recommender and
he had an interesting request.
When calculating the indicator/item similarity matrix using ItemSimilarityJob
there is a --threshold option. Wouldn’t it be better to have an option that
specified the fraction of va