Wikipedia Example has incorrect input Key
-----------------------------------------

                 Key: MAHOUT-91
                 URL: https://issues.apache.org/jira/browse/MAHOUT-91
             Project: Mahout
          Issue Type: Bug
          Components: Classification
            Reporter: Grant Ingersoll
            Assignee: Grant Ingersoll
            Priority: Minor
             Fix For: 0.1


Running the WikipediaDataSetCreator 
{code}
 bin/hadoop jar ~/projects/lucene/mahout/mahout-clean/examples/build/ 
org.apache.mahout.examples.classifiers.cbayes.WikipediaDatasetCreator -i 
wikipediadump -o wikipediainput -c 
~/projects/lucene/mahout/mahout-clean/examples/src/test/resources/country.txt 
{code}

yielded:
08/10/31 11:15:26 INFO mapred.JobClient: Task Id : 
attempt_200810301619_0001_m_000000_0, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast 
to org.apache.hadoop.io.Text
        at 
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.map(WikipediaDatasetCreatorMapper.java:41)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)


The fix is:
{code}
Index: 
src/main/java/org/apache/mahout/classifier/bayes/WikipediaDatasetCreatorMapper.java
===================================================================
--- 
src/main/java/org/apache/mahout/classifier/bayes/WikipediaDatasetCreatorMapper.java
 (revision 709230)
+++ 
src/main/java/org/apache/mahout/classifier/bayes/WikipediaDatasetCreatorMapper.java
 (working copy)
@@ -20,6 +20,7 @@
 import org.apache.commons.lang.StringEscapeUtils;
 import org.apache.hadoop.io.DefaultStringifier;
 import org.apache.hadoop.io.Text;
+import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.MapReduceBase;
 import org.apache.hadoop.mapred.Mapper;
@@ -39,11 +40,11 @@
 import java.util.Set;
 
 public class WikipediaDatasetCreatorMapper extends MapReduceBase implements
-    Mapper<Text, Text, Text, Text> {
+    Mapper<LongWritable, Text, Text, Text> {
 
   private static Set<String> countries = null;
   
-  public void map(Text key, Text value,
+  public void map(LongWritable key, Text value,
       OutputCollector<Text, Text> output, Reporter reporter)
       throws IOException {
     String document = value.toString();
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to