Also, right before the screen dump I see:

13/04/11 15:46:40 INFO mapred.JobClient:     Combine output records=462236
13/04/11 15:46:40 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=1618497536
13/04/11 15:46:40 INFO mapred.JobClient:     Reduce output records=419058
13/04/11 15:46:40 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=4697526272
13/04/11 15:46:40 INFO mapred.JobClient:     Map output records=702535
13/04/11 15:46:40 INFO cbayes.CBayesDriver: Calculating Tf-Idf...
13/04/11 15:46:41 INFO common.BayesTfIdfDriver: Counts of documents in
Each Label
13/04/11 15:46:42 INFO common.BayesTfIdfDriver: {ANGER_RAGE  family's
personal fucking bank.=1.0, ANGER_RAGE give up life...=1.0, ANGER_RAGE
understand peopleS=1.0, ANGER_RAGE many episodes record day?5=1.0,
ANGER_RAGE! need punching bag take out angerC=1.0, ANGER_RAGE& right
now�� insults make laugh.A=1.0, ANGER_RAGEunny a

On Thu, Apr 11, 2013 at 3:58 PM, Ryan Compton <compton.r...@gmail.com> wrote:
> I'm trying to train a simple text classifier using cbayes. I've got
> formatted <Text,Text> sequence files created with
> com.twitter.elephantbird.pig.store.SequenceFileStorage(), eg:
>
> JOY      actually turning decent new year ☺
> JOY      best New Years tonight! ready 2013. <U+1F609> <U+1F38A><U+1F389>
> JOY      playing Dream League Soccer iPad 2 earned 13 coins!
> JOY      Great way start new ear
> JOY      good sober New Years Eve
> ANGER_RAGE       Last night frank hasn't done revision prelims
> ANGER_RAGE       hell cut forehead such ball ache! Cheers pleb chucks
> glass bottles around!
> ANGER_RAGE       shops open today customer services shut apparently
> being paid "come back tomorrow".
>
> These are stored in a directory as:
> /emotion-training-labeled/part-m-0000*
>
> I pass the labeled data into cbayes:
>
> mahout trainclassifier -i /emotion-training-labeled/ -o emotion-model/
> -type cbayes -ng 1 -source hdfs
>
> Both map and reduce get to 100%,  then I see something about Tf-Idf
> followed by what looks like a complete dump of my training data print
> to the screen for the next few minutes and then a stack trace:
>
> rything life teach lesson, willing observe learn.” YUP!GJOYB Halbrecht
> DAN CASTAIC CA found local Videographer. Register FREE:"JOY Palm Read
> Easy Created WorldJOY=1.0, ANGER_RAGE people fisty latelyK=1.0,
> ANGER_RAGE ew gon lot em ��=1.0, ANGER_RAGE ain't gonna love =1.0}
> 13/04/11 15:46:51 INFO common.BayesTfIdfDriver: {dataSource=hdfs,
> alpha_i=1.0, minDf=1, gramSize=1}
> 13/04/11 15:46:51 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 13/04/11 15:46:57 INFO mapred.FileInputFormat: Total input paths to process : 
> 3
> 13/04/11 15:46:58 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://master/user/rfcompton/.staging/job_201303271312_2786
> 13/04/11 15:46:58 ERROR security.UserGroupInformation:
> PriviledgedActionException as:rfcompton (auth:SIMPLE)
> cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> java.io.IOException: Exceeded max jobconf size: 10706309 limit:
> 5242880
>         at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3766)
>         at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
> Caused by: java.io.IOException: Exceeded max jobconf size: 10706309
> limit: 5242880
>         at 
> org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:406)
>         at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3764)
>         ... 10 more
>
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: java.io.IOException: Exceeded max jobconf size:
> 10706309 limit: 5242880
>         at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3766)
>         at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
> Caused by: java.io.IOException: Exceeded max jobconf size: 10706309
> limit: 5242880
>         at 
> org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:406)
>         at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3764)
>         ... 10 more
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1107)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>         at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:904)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
>         at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1242)
>         at 
> org.apache.mahout.classifier.bayes.mapreduce.common.BayesTfIdfDriver.runJob(BayesTfIdfDriver.java:97)
>         at 
> org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:51)
>         at 
> org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:58)
>         at 
> org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:151)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

Reply via email to