Kevin, While this is fresh in your mind can you prepare a javadoc patch that would have helped you out? And suggest other doc patches as well?
On Mon, Feb 24, 2014 at 3:00 AM, Kevin Moulart <kevinmoul...@gmail.com>wrote: > Thanks, that's about the clearest answer I got so far :) > > > 2014-02-24 11:59 GMT+01:00 Sebastian Schelter <s...@apache.org>: > > > NaiveBayes expects a SequenceFile as input. The key is the class label as > > Text, the value are the features as VectorWritable. > > > > --sebastian > > > > > > On 02/24/2014 11:51 AM, Kevin Moulart wrote: > > > >> Hi again, > >> I finally set my mind on going through java to make a sequence file for > >> the > >> naive bayes, > >> but I still can't manage to find anyplace stating exactly what should be > >> in > >> the sequence file > >> for mahout to process it with Naive Bayes. > >> > >> I tried virtually every piece of code i found related to this subject, > >> with > >> no luck. > >> > >> My CSV file is like this : > >> Label that I want to predict, feature 1, feature 2, ..., feature 1628 > >> > >> Could someone tell me exactly what Naive Bayes training procedure > expects > >> ? > >> > >> > >> 2014-02-20 13:56 GMT+01:00 Jay Vyas <jayunit...@gmail.com>: > >> > >> This relates to a previous question I have: Does mahout have a concept > >>> of > >>> adapters which allow us to read data csv style data with filters to > >>> create > >>> exact format for its various inputs (i.e. Recommender three column > >>> format).? If not is it worth a jira? > >>> > >>> > >>> On Feb 20, 2014, at 7:50 AM, Kevin Moulart <kevinmoul...@gmail.com> > >>>> > >>> wrote: > >>> > >>>> > >>>> Hi and thanks ! > >>>> > >>>> What about the command line, is there a way to do that using the > >>>> existing > >>>> command line ? > >>>> > >>>> > >>>> > >>>> > >>>> 2014-02-20 12:02 GMT+01:00 Suneel Marthi <suneel_mar...@yahoo.com>: > >>>> > >>>> To convert input CSV to vectors, u can either: > >>>>> > >>>>> a) Use CSVIterator > >>>>> b) use InputDriver > >>>>> > >>>>> Either of the above should generate vectors from input CSV that could > >>>>> > >>>> then > >>> > >>>> be fed into Mahout classifier/clustering jobs. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Thursday, February 20, 2014 5:57 AM, Kevin Moulart < > >>>>> kevinmoul...@gmail.com> wrote: > >>>>> > >>>>> Hi I'm trying to apply a Naive Bayes Classifier to a large CSV file > >>>>> from > >>>>> the command line. > >>>>> > >>>>> I know I have to feed the classifier with a seq file, so I tried to > put > >>>>> > >>>> my > >>> > >>>> csv into one using the command seqdirectory, but even when I try with > a > >>>>> really small csv (less than 100Mo) I instantly get an > >>>>> > >>>> outOfMemoryException > >>> > >>>> from java heap space : > >>>>> > >>>>> mahout seqdirectory -i "/user/cacf/Echant/testSeq" -o > >>>>> > >>>> "/user/cacf/resSeq" > >>> > >>>> -ow > >>>>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. > >>>>>> Running on hadoop, using > >>>>>> > >>>>> /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop > >>> > >>>> and HADOOP_CONF_DIR=/etc/hadoop/conf > >>>>>> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar > >>>>>> 14/02/20 11:47:22 INFO common.AbstractJob: Command line arguments: > >>>>>> {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], > >>>>>> --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], > >>>>>> --input=[/user/cacf/Echant/testSeq], --keyPrefix=[], > >>>>>> --output=[/user/cacf/resSeq], > >>>>>> > >>>>> --overwrite=null, --startPhase=[0], > >>>>> > >>>>>> --tempDir=[temp]} > >>>>>> 14/02/20 11:47:22 INFO common.HadoopUtil: Deleting /user/cacf/resSeq > >>>>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap > space > >>>>>> at java.util.Arrays.copyOf(Arrays.java:2367) > >>>>>> at > >>>>>> > >>>>> > >>>>> java.lang.AbstractStringBuilder.expandCapacity( > >>> AbstractStringBuilder.java:130) > >>> > >>>> at > >>>>>> > >>>>> > >>>>> java.lang.AbstractStringBuilder.ensureCapacityInternal( > >>> AbstractStringBuilder.java:114) > >>> > >>>> at > >>>>>> > >>>>> > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) > >>>>> > >>>>>> at java.lang.StringBuilder.append(StringBuilder.java:132) > >>>>>> at > >>>>>> > >>>>> > >>>>> org.apache.mahout.text.PrefixAdditionFilter.process( > >>> PrefixAdditionFilter.java:62) > >>> > >>>> at > >>>>>> > >>>>> > >>>>> org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept( > >>> SequenceFilesFromDirectoryFilter.java:90) > >>> > >>>> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1468) > >>>>>> at > >>>>>> > >>>>> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1502) > >>>>> > >>>>>> at > >>>>>> > >>>>> > >>>>> org.apache.mahout.text.SequenceFilesFromDirectory.run( > >>> SequenceFilesFromDirectory.java:98) > >>> > >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > >>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > >>>>>> at > >>>>>> > >>>>> > >>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main( > >>> SequenceFilesFromDirectory.java:53) > >>> > >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >>>>>> at > >>>>>> > >>>>> > >>>>> sun.reflect.NativeMethodAccessorImpl.invoke( > >>> NativeMethodAccessorImpl.java:57) > >>> > >>>> at > >>>>>> > >>>>> > >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke( > >>> DelegatingMethodAccessorImpl.java:43) > >>> > >>>> at java.lang.reflect.Method.invoke(Method.java:606) > >>>>>> at > >>>>>> > >>>>> > >>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke( > >>> ProgramDriver.java:72) > >>> > >>>> at > >>>>>> > >>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144) > >>>>> > >>>>>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:196) > >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >>>>>> at > >>>>>> > >>>>> > >>>>> sun.reflect.NativeMethodAccessorImpl.invoke( > >>> NativeMethodAccessorImpl.java:57) > >>> > >>>> at > >>>>>> > >>>>> > >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke( > >>> DelegatingMethodAccessorImpl.java:43) > >>> > >>>> at java.lang.reflect.Method.invoke(Method.java:606) > >>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:208) > >>>>>> > >>>>> > >>>>> > >>>>> Do you have an idea or a simple way to use Naive Bayes against my > large > >>>>> > >>>> CSV > >>> > >>>> ? > >>>>> > >>>>> Thanks in advance ! > >>>>> -- > >>>>> Kévin Moulart > >>>>> GSM France : +33 7 81 06 10 10 > >>>>> GSM Belgique : +32 473 85 23 85 > >>>>> Téléphone fixe : +32 2 771 88 45 > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Kévin Moulart > >>>> GSM France : +33 7 81 06 10 10 > >>>> GSM Belgique : +32 473 85 23 85 > >>>> Téléphone fixe : +32 2 771 88 45 > >>>> > >>> > >>> > >> > >> > >> > > > > > -- > Kévin Moulart > GSM France : +33 7 81 06 10 10 > GSM Belgique : +32 473 85 23 85 > Téléphone fixe : +32 2 771 88 45 >