On Thursday, February 20, 2014 11:40 PM, Andrew Musselman
andrew.mussel...@gmail.com wrote:
It's an option when you run the examples as I recall. Search in
examples/bin and you can trace it out.
On Feb 20, 2014, at 8:02 PM, qiaoresearcher qiaoresearc...@gmail.com
wrote:
Does mahout
Does mahout have complementary naive bayes implementation available?
I checked the mahout source code, it seems the author did not finish it
yet? as shown in the following, the thetaSummer job is not submitted.
public final class TrainNaiveBayesJob extends AbstractJob {
.
That language is shell-scripting, e.g. bash.
On Wed, Jan 29, 2014 at 2:15 PM, qiaoresearcher qiaoresearc...@gmail.com
wrote:
when run the command like:
mahout seq2sparse -i inputfile -o outputfile
where is the command seq2sparse defined? how does the system know to
actually run
when run the command like:
mahout seq2sparse -i inputfile -o outputfile
where is the command seq2sparse defined? how does the system know to
actually run the SparseFileFromSequenceFile class?
what is the language used in the command Mahout such as the language given
below:
Mahout has an example of using naive bayes to classify 20 news group. but
how to just classify paragraphs (e.g. twitter message, movie review) in
text files such as:
Text files has content like:
--
text paragraph 1 class
-from-text.html
On Thursday, January 16, 2014 10:57 PM, qiaoresearcher
qiaoresearc...@gmail.com wrote:
Mahout has an example of using naive bayes to classify 20 news group. but
how to just classify paragraphs (e.g. twitter message, movie review) in
text files such as:
Text files has
and check the references.
On Fri, Jun 28, 2013 at 3:35 PM, qiaoresearcher qiaoresearc...@gmail.com
wrote:
The logistic regression code is difficult to follow: the trainlogistic
and
runlogistic part
how the likelihood is calculated, how the weights is updated, etc
does anyone knows who
that stochastic gradient descent is a very common algorithm for
large scale logistic regression. You can find the basics anywhere with a
simple google search.
Sent from my iPhone
On Jul 1, 2013, at 11:59, qiaoresearcher qiaoresearc...@gmail.com wrote:
Ted,
Thanks, but I have looked
The logistic regression code is difficult to follow: the trainlogistic and
runlogistic part
how the likelihood is calculated, how the weights is updated, etc
does anyone knows who write the mahout logistic regression code? what are
the reference on logistic regression algorithm he was using to
Current mahout does not have variable importance in random forest.
Variable importance, especially the permutation one, it is trivial to
implement locally.
but how to do it with mapreduce? mapper will only have one record each
time, but the permutation needs to be done on the whole samples of
Hi all,
Assume we want to run mahout examples like:
$HADOOP_HOME/bin/hadoop jar
$MAHOUT_HOME/core/target/mahout-core-VERSION-job.jar
org.apache.mahout.classifier.df.tools.Describe -p
testdata/KDDTrain+.arff -f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N
C 8 N 2 C 19 N L
it works well in command
I just run the RF examples, non-distributed version: BreimanExample
with glass data, 10 iterations with 100 trees, here is the unexpected
output:
13/04/25 15:38:40 INFO df.BreimanExample:
13/04/25 15:38:40 INFO df.BreimanExample: Random Input Test
this data? You are
trying to classify users into what, for what purpose?
On Fri, Nov 9, 2012 at 4:20 PM, qiaoresearcher qiaoresearc...@gmail.com
wrote:
Hi All,
Assume the data is stored in a gzip file which includes many text files.
Within each text file, each line represents an activity
of the universe.
On Nov 9, 2012 8:43 AM, qiaoresearcher qiaoresearc...@gmail.com wrote:
It is a supervised classification problem.
For example, a very simple case:
say, overall we collect 4 pages from the data set: { web_page 1
web_page
2 web_page 3 web_page 4 }
then users may have input
labeled dataset to the rest of the universe.
On Nov 9, 2012 8:43 AM, qiaoresearcher qiaoresearc...@gmail.com
wrote:
It is a supervised classification problem.
For example, a very simple case:
say, overall we collect 4 pages from the data set: { web_page 1
web_page
2 web_page 3
, qiaoresearcher qiaoresearc...@gmail.com
wrote:
It is a supervised classification problem.
For example, a very simple case:
say, overall we collect 4 pages from the data set: { web_page 1
web_page
2 web_page 3 web_page 4 }
then users may have input vectors like:
user1 [1 1 0 0
16 matches
Mail list logo