Ted,
I've added the patch MAHOUT-509_1.patch in Jira [
https://issues.apache.org/jira/browse/MAHOUT-509 ] .
Thank you
On Thu, Oct 7, 2010 at 12:57 PM, Ted Dunning wrote:
> Can you attach the patch there? The mailing list strips attachments.
>
> On Wed, Oct 6, 2010 at 9:22 PM, Gangadhar Nittala
Can you attach the patch there? The mailing list strips attachments.
On Wed, Oct 6, 2010 at 9:22 PM, Gangadhar Nittala
wrote:
> I have attached a patch which has the modified testclassifier.props
> and the fix with the parseInt. I think both these belong to
> MAHOUT-509
>
Joe / others,
I was finally able to test the changes that were done as part of
MAHOUT-509[ https://issues.apache.org/jira/browse/MAHOUT-509] and
follow the instructions in the wiki for the Bayes example [
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
]. The instruction
Joe,
I am out of town for this week and won't have access to my machine. I
will check this during the weekend and will get back to you. Will
follow the steps in the wiki.
Thank you
On Fri, Sep 24, 2010 at 8:44 AM, Joe Kumar wrote:
> Hi Gangadhar,
>
> I ran TestClassifier with similar parameters.
Hi Gangadhar,
I ran TestClassifier with similar parameters. It didnt take me 2 hrs though.
I have documented the steps that worked for me at
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
Can you please get the patch available at MAHOUT-509 and apply it and then
try th
Joe,
Can you let me know what was the command you used to test the
classifier ? With the ngrams set to 1 as suggested by Robin, I was
able to train the classifier. The command:
$HADOOP_HOME/bin/hadoop jar
$MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.classifier.bay
There is a test program called TrainNewsGroups
in org.apache.mahout.classifier.sgd in the examples module.
I would love to work with you to get better documentation pulled together.
On Mon, Sep 20, 2010 at 8:13 PM, Gangadhar Nittala
wrote:
> Joe,
> I will try with the ngram setting of 1 and let
Joe,
I will try with the ngram setting of 1 and let you know how it goes.
Robin, the ngram parameter is used to check the number of subsequences
of characters isn't it ? Or is it evaluated differently w.r.t to the
Bayesian classifier ?
Ted, like Joe mentioned, if you could point us to some informa
Robin / Gangadhar,
With ngram as 1 and all the countries in the country.txt , the model is
getting created without any issues.
$MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.classifier.bayes.TrainClassifier -ng 1 -i wikipediainput
-o wikipediamodel -type bayes -sour
Robin,
Thanks for your tip.
Will try it out and post updates.
reg
Joe.
On Mon, Sep 20, 2010 at 6:31 AM, Robin Anil wrote:
> Hi Guys, Sorry about not replying, I see two problems(possible). 1st. You
> need atleast 2 countries. otherwise there is no classification. Secondly
> ngram =3 is a bit t
Hi Guys, Sorry about not replying, I see two problems(possible). 1st. You
need atleast 2 countries. otherwise there is no classification. Secondly
ngram =3 is a bit too high. With wikipedia this will result in a huge number
of features. Why dont you try with one and see.
Robin
On Mon, Sep 20, 201
Hi Ted,
sure. will keep digging..
About SGD, I dont have an idea about how it works et al. If there is some
documentation / reference / quick summary to read about it that'll be gr8.
Just saw one reference in
https://cwiki.apache.org/confluence/display/MAHOUT/Logistic+Regression.
I am assuming w
I don't know if it's related, but I remember getting a similar
Exception one year ago when I was working on the implementation of
Random Forests. In my case it was caused by
SequenceFile.Sorter.merge(). I ended up writing my own merge function
because I really didn't need to sort the output.
On M
Gangadhar,
Just to eliminate the usual suspects, I am using Mac OSX 10.5.8, Mahout 0.4
(revision 986659), Hadoop 0.20.2, 2GB Mem for Hadoop , 80 GB free space.
commands tat I executed.
I had issues with my namenode and so did a format using hadoop namenode
-format.
$MAHOUT_HOME/examples/src/test/
I am watching these efforts with interest, but have been unable to
contribute much to the process. I would encourage Joe and others to keep
whittling this problem down so that we can understand what is causing it.
In the meantime, I think that the SGD classifiers are close to production
quality.
Joe,
Even I tried with reducing the number of countries in the country.txt.
That didn't help. And in my case, I was monitoring the disk space and
at no time did it reach 0%. So, I am not sure if that is the case. To
remove the dependency on the number of countries, I even tried with
the subjects.tx
Gangadhar,
I modified $MAHOUT_HOME/examples/src/test/resources/country.txt to just have
1 entry (spain) and used WikipediaDatasetCreatorDriver to create the
wikipediainput data set and then ran TrainClassifier and it worked. when I
ran TestClassifier as below, I got blank results in the output.
$
Gangadhar,
After running TrainClassifier again, the map task just failed with the same
exception and I am pretty sure it is an issue with disk space.
As the map was progressing, I was monitoring my free disk space dropping
from 81GB. It came down to 0 after almost 66% through the map task and then
Joe,
I don't think it is the disk space that could be the problem because I
did have enough disk space (well, not 81GB, but around 40GB free) . I
will try if the suggestions in the thread you mentioned make any
difference. Will keep you posted.
Thank you
On Fri, Sep 17, 2010 at 11:33 PM, Joe Kuma
Gangadhar,
I couldnt find any concrete reason behind this error. Some of them have
reported this to happen very sporadic. As per some suggestions in this
thread (
http://www.mail-archive.com/core-u...@hadoop.apache.org/msg09250.html) , I
have changed the location of hadoop tmp dir. Also I have cle
Thank you Joe for the confirmation. I am also checking the code to see
what is causing this issue. May be others in the list will know what
can cause this issue. I am guessing the root cause is not Mahout but
something in Hadoop.
On Thu, Sep 16, 2010 at 11:34 PM, Joe Kumar wrote:
> Gangadhar,
>
>
Gangadhar,
After some system issues, I finally ran the TrainClassifier. After almost
65% into the map job, I got the same error that you have mentioned.
INFO mapred.JobClient: Task Id : attempt_201009160819_0002_m_00_0,
Status : FAILED
org.apache.hadoop.util.DiskChecker$DiskErrorException: Cou
Hi Gangadhar,
rite. I did the same to execute the TrainClassifier but then since the
default datasource is hdfs, we should not be mandated to provide this
parameter.
I havent completed executing the TrainClassifier yet. I'll do it tonite and
let you know if I get into trouble.
reg,
Joe.
On Wed,
I ran into the issue that Joe mentioned about the command line
parameters. I just added the datasource to the command line to execute
thus
$HADOOP_HOME/bin/hadoop jar
$MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.classifier.bayes.TrainClassifier --gramSize 3
--inp
Robin,
sure. I'll submit a patch.
The command line flag already has the default behavior specified.
--classifierType (-type) classifierTypeType of classifier:
bayes|cbayes.
Default: bayes
--dataSource (-source) dataSource Location of
On Wed, Sep 15, 2010 at 10:26 AM, Joe Kumar wrote:
> Hi all,
>
> As I was going through wikipedia example, I encountered a situation with
> TrainClassifier wherein some of the options with default values are
> actually
> mandatory.
> The documentation / command line help says that
>
> 1. defaul
Hi all,
As I was going through wikipedia example, I encountered a situation with
TrainClassifier wherein some of the options with default values are actually
mandatory.
The documentation / command line help says that
1. default source (--datasource) is hdfs but TrainClassifier
has withRequi
27 matches
Mail list logo