Sam,

Per Ted's email below please run with the trunk for your work. Please look at 
Chapters 13 - 16 in the Mahout in Action book for sample code snippets for 
classifying 20 newsgroups with SGD.  There presently is no command line option 
(I am not aware of one and could be wrong) for running the 20 newsgroup example 
with SGD.

The only command line tools for SGD - trainlogistic and runlogistic expect the 
input files to be in CSV format which is not what you have.

I have a sample program for qualifying datasets (similar to the format you 
have) using SGD which I can share with you later today.


Regards,
Suneel



________________________________
 From: Ted Dunning <ted.dunn...@gmail.com>
To: user@mahout.apache.org 
Sent: Saturday, December 10, 2011 3:20 AM
Subject: Re: PLEASE HELP! - MAHOUT CLASSIFICATION
 
a) run with trunk

b) see https://github.com/tdunning/Chapter-16

c) also see org.apache.mahout.classifier.sgd.TrainNewsGroups

Your training data is tiny.  The bayes classifiers are designed for large
data.  Poor results are not very surprising at this data size.

On Fri, Dec 9, 2011 at 8:03 PM, Sam Cunningham <sam_cun...@yahoo.com> wrote:

> I am running Mahout distribution v0.5. Though, I am not sure what
> difference
> would that make? I ran my dataset with bayes/cbayes only. I don't have any
> sample code for SGD or its command option. Is there any SGD example for
> 20news
> dataset so that I can follow (for training and testing)?
>

Reply via email to