Hi Tony,
Which gora backend are you on, including the version of the backend itself
please?
I use Gora 0.3 with gora-cassandra on some cron jobs and injected your URLs
into my db. All works fine.
I did notice that these pages have a hellish lots of content which is not
displayed on the page. Loads of CSS and garbage.
Something else I did notice is that if you enable microformats-reltag and
parse reltag's out, you get loads of bad characters.,.. which is not nice.
I will log a Jira as this should be fixed.





On Mon, Jun 17, 2013 at 10:08 AM, Tony Mullins <[email protected]>wrote:

> Hi ,
>
> I am getting weird error on DBUpdater Job in Nutch2.x.
> I am crawling these two links
>
>
> http://www.amazon.com/Degree-Antiperspirant-Deodorant-Extreme-Blast/dp/B001ET769Y
> http://www.amazon.com/Cisco-WAP4410N-Wireless-N-Access-Point/dp/B001IYCMNA
>
> And my all jobs are running fine , when I run my dpupdate job I get this
> error
>
> Exception in thread "main" java.lang.RuntimeException: job failed:
> name=update-table, jobid=job_local482736560_0001
>     at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
>     at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:98)
>     at
> org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:105)
>     at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:119)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:123)
>
> And hadoop log file says
>
> 2013-06-17 21:51:41,478 WARN  mapred.FileOutputCommitter - Output path is
> null in cleanup
> 2013-06-17 21:51:41,479 WARN  mapred.LocalJobRunner -
> job_local384125843_0001
> java.lang.IndexOutOfBoundsException
>     at java.nio.Buffer.checkBounds(Buffer.java:559)
>     at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:143)
>     at
>
> org.apache.avro.ipc.ByteBufferInputStream.read(ByteBufferInputStream.java:52)
>     at
>
> org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:183)
>     at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:265)
>     at
>
> org.apache.gora.mapreduce.FakeResolvingDecoder.readString(FakeResolvingDecoder.java:131)
>
> And if I crawl simple page like www.google.nl .. every thing works fine ,
> including dbupdate job !!!
>
> Any clues how to debug this issues ? what could be the reason for this ?
>
> Thanks.
> Tony.
>



-- 
*Lewis*

Reply via email to