Re: complementary naive bayes classifier

2014-02-23 Thread qiaoresearcher
On Thursday, February 20, 2014 11:40 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: It's an option when you run the examples as I recall. Search in examples/bin and you can trace it out. On Feb 20, 2014, at 8:02 PM, qiaoresearcher qiaoresearc...@gmail.com wrote: Does mahout

complementary naive bayes classifier

2014-02-20 Thread qiaoresearcher
Does mahout have complementary naive bayes implementation available? I checked the mahout source code, it seems the author did not finish it yet? as shown in the following, the thetaSummer job is not submitted. public final class TrainNaiveBayesJob extends AbstractJob {

Re: mahout command seq2sparse

2014-01-30 Thread qiaoresearcher
. That language is shell-scripting, e.g. bash. On Wed, Jan 29, 2014 at 2:15 PM, qiaoresearcher qiaoresearc...@gmail.com wrote: when run the command like: mahout seq2sparse -i inputfile -o outputfile where is the command seq2sparse defined? how does the system know to actually run

mahout command seq2sparse

2014-01-29 Thread qiaoresearcher
when run the command like: mahout seq2sparse -i inputfile -o outputfile where is the command seq2sparse defined? how does the system know to actually run the SparseFileFromSequenceFile class? what is the language used in the command Mahout such as the language given below:

mahout text mining

2014-01-16 Thread qiaoresearcher
Mahout has an example of using naive bayes to classify 20 news group. but how to just classify paragraphs (e.g. twitter message, movie review) in text files such as: Text files has content like: -- text paragraph 1 class

Re: mahout text mining

2014-01-16 Thread qiaoresearcher
-from-text.html On Thursday, January 16, 2014 10:57 PM, qiaoresearcher qiaoresearc...@gmail.com wrote: Mahout has an example of using naive bayes to classify 20 news group. but how to just classify paragraphs (e.g. twitter message, movie review) in text files such as: Text files has

Re: mahout logistic regression

2013-07-01 Thread qiaoresearcher
and check the references. On Fri, Jun 28, 2013 at 3:35 PM, qiaoresearcher qiaoresearc...@gmail.com wrote: The logistic regression code is difficult to follow: the trainlogistic and runlogistic part how the likelihood is calculated, how the weights is updated, etc does anyone knows who

Re: mahout logistic regression

2013-07-01 Thread qiaoresearcher
that stochastic gradient descent is a very common algorithm for large scale logistic regression. You can find the basics anywhere with a simple google search. Sent from my iPhone On Jul 1, 2013, at 11:59, qiaoresearcher qiaoresearc...@gmail.com wrote: Ted, Thanks, but I have looked

mahout logistic regression

2013-06-28 Thread qiaoresearcher
The logistic regression code is difficult to follow: the trainlogistic and runlogistic part how the likelihood is calculated, how the weights is updated, etc does anyone knows who write the mahout logistic regression code? what are the reference on logistic regression algorithm he was using to

mahout random forest variable importance implementation

2013-06-12 Thread qiaoresearcher
Current mahout does not have variable importance in random forest. Variable importance, especially the permutation one, it is trivial to implement locally. but how to do it with mapreduce? mapper will only have one record each time, but the permutation needs to be done on the whole samples of

how to run mahout examples from eclipse in linux?

2013-05-22 Thread qiaoresearcher
Hi all, Assume we want to run mahout examples like: $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-VERSION-job.jar org.apache.mahout.classifier.df.tools.Describe -p testdata/KDDTrain+.arff -f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N C 8 N 2 C 19 N L it works well in command

Random Forrest implementation in mahout

2013-04-25 Thread qiaoresearcher
I just run the RF examples, non-distributed version: BreimanExample with glass data, 10 iterations with 100 trees, here is the unexpected output: 13/04/25 15:38:40 INFO df.BreimanExample: 13/04/25 15:38:40 INFO df.BreimanExample: Random Input Test

Re: need help on mahout

2012-11-09 Thread qiaoresearcher
this data? You are trying to classify users into what, for what purpose? On Fri, Nov 9, 2012 at 4:20 PM, qiaoresearcher qiaoresearc...@gmail.com wrote: Hi All, Assume the data is stored in a gzip file which includes many text files. Within each text file, each line represents an activity

Re: need help on mahout

2012-11-09 Thread qiaoresearcher
of the universe. On Nov 9, 2012 8:43 AM, qiaoresearcher qiaoresearc...@gmail.com wrote: It is a supervised classification problem. For example, a very simple case: say, overall we collect 4 pages from the data set: { web_page 1 web_page 2 web_page 3 web_page 4 } then users may have input

Re: need help on mahout

2012-11-09 Thread qiaoresearcher
labeled dataset to the rest of the universe. On Nov 9, 2012 8:43 AM, qiaoresearcher qiaoresearc...@gmail.com wrote: It is a supervised classification problem. For example, a very simple case: say, overall we collect 4 pages from the data set: { web_page 1 web_page 2 web_page 3

Re: need help on mahout

2012-11-09 Thread qiaoresearcher
, qiaoresearcher qiaoresearc...@gmail.com wrote: It is a supervised classification problem. For example, a very simple case: say, overall we collect 4 pages from the data set: { web_page 1 web_page 2 web_page 3 web_page 4 } then users may have input vectors like: user1 [1 1 0 0