[ https://issues.apache.org/jira/browse/GORA-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved GORA-476. --------------------------------------- Resolution: Fixed Correct [~alfonso.nishikawa]. I've removed all references to the AvroStore and DataFileAvroStore due to the issues you've highlighted. This issue is therefore being marked as resolved. Thanks for chiming in. > Nutch 2.X GeneratorJob creates NullPointerException when using > DataFileAvroStore > -------------------------------------------------------------------------------- > > Key: GORA-476 > URL: https://issues.apache.org/jira/browse/GORA-476 > Project: Apache Gora > Issue Type: Bug > Components: avro, gora-core > Affects Versions: 0.6.1 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Fix For: 0.9 > > > When running the Nuth 2.X GeneratorJob I get the following > {code} > 2016-05-12 17:27:30,191 INFO crawl.GeneratorJob - GeneratorJob: starting > 2016-05-12 17:27:30,191 INFO crawl.GeneratorJob - GeneratorJob: filtering: > false > 2016-05-12 17:27:30,191 INFO crawl.GeneratorJob - GeneratorJob: normalizing: > false > 2016-05-12 17:27:30,191 INFO crawl.GeneratorJob - GeneratorJob: topN: 50000 > 2016-05-12 17:27:30,319 WARN util.NativeCodeLoader - Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2016-05-12 17:27:30,333 INFO crawl.FetchScheduleFactory - Using > FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule > 2016-05-12 17:27:30,334 INFO crawl.AbstractFetchSchedule - > defaultInterval=2592000 > 2016-05-12 17:27:30,334 INFO crawl.AbstractFetchSchedule - > maxInterval=7776000 > 2016-05-12 17:27:31,012 WARN conf.Configuration - > file:/tmp/hadoop-lmcgibbn/mapred/staging/lmcgibbn997854508/.staging/job_local997854508_0001/job.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2016-05-12 17:27:31,014 WARN conf.Configuration - > file:/tmp/hadoop-lmcgibbn/mapred/staging/lmcgibbn997854508/.staging/job_local997854508_0001/job.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > 2016-05-12 17:27:31,091 WARN conf.Configuration - > file:/tmp/hadoop-lmcgibbn/mapred/local/localRunner/lmcgibbn/job_local997854508_0001/job_local997854508_0001.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 2016-05-12 17:27:31,094 WARN conf.Configuration - > file:/tmp/hadoop-lmcgibbn/mapred/local/localRunner/lmcgibbn/job_local997854508_0001/job_local997854508_0001.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > 2016-05-12 17:27:31,309 INFO crawl.FetchScheduleFactory - Using > FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule > 2016-05-12 17:27:31,309 INFO crawl.AbstractFetchSchedule - > defaultInterval=2592000 > 2016-05-12 17:27:31,309 INFO crawl.AbstractFetchSchedule - > maxInterval=7776000 > 2016-05-12 17:27:31,381 WARN mapred.LocalJobRunner - job_local997854508_0001 > java.lang.Exception: java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > Caused by: java.lang.NullPointerException > at org.apache.nutch.util.TableUtil.unreverseUrl(TableUtil.java:88) > at org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:51) > at org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:1) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2016-05-12 17:27:32,107 ERROR crawl.GeneratorJob - GeneratorJob: > java.lang.RuntimeException: job failed: name=[test]generate: > 1463099249-21154, jobid=job_local997854508_0001 > at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119) > at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:232) > at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:272) > at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:343) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:351) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)