Yes.
On Tue, Jun 18, 2013 at 12:34 AM, Lewis John Mcgibbney < [email protected]> wrote: > And it is the same Exception you are getting each time you attempt to > updatedb with this URL? > > > > On Mon, Jun 17, 2013 at 12:24 PM, Tony Mullins <[email protected] > >wrote: > > > OK , I have tried some other urls and apparently I get DBUpdate job > > exception on this url only > > > http://www.amazon.com/Cisco-WAP4410N-Wireless-N-Access-Point/dp/B001IYCMNA > > . > > So there is some data on this url which is causing problem to my > cassandra > > update db job. > > > > Any ideas where should I look further to resolve this issue ? > > > > Thanks > > Tony. > > > > > > On Mon, Jun 17, 2013 at 11:35 PM, Tony Mullins <[email protected] > > >wrote: > > > > > I am using gora comes with Nutch2.x ( i think its 0.3 ) with cassandra > > > 1.2.5. And getting the above mentioned error. > > > Any hints how should I tackle this problem , any suggestions plz ? > > > > > > If I do simple crawl like www.google.com , all works fine !!! > > > > > > Thanks, > > > Tony. > > > > > > > > > On Mon, Jun 17, 2013 at 11:21 PM, Lewis John Mcgibbney < > > > [email protected]> wrote: > > > > > >> Hi Tony, > > >> Which gora backend are you on, including the version of the backend > > itself > > >> please? > > >> I use Gora 0.3 with gora-cassandra on some cron jobs and injected your > > >> URLs > > >> into my db. All works fine. > > >> I did notice that these pages have a hellish lots of content which is > > not > > >> displayed on the page. Loads of CSS and garbage. > > >> Something else I did notice is that if you enable microformats-reltag > > and > > >> parse reltag's out, you get loads of bad characters.,.. which is not > > nice. > > >> I will log a Jira as this should be fixed. > > >> > > >> > > >> > > >> > > >> > > >> On Mon, Jun 17, 2013 at 10:08 AM, Tony Mullins < > > [email protected] > > >> >wrote: > > >> > > >> > Hi , > > >> > > > >> > I am getting weird error on DBUpdater Job in Nutch2.x. > > >> > I am crawling these two links > > >> > > > >> > > > >> > > > >> > > > http://www.amazon.com/Degree-Antiperspirant-Deodorant-Extreme-Blast/dp/B001ET769Y > > >> > > > >> > > > http://www.amazon.com/Cisco-WAP4410N-Wireless-N-Access-Point/dp/B001IYCMNA > > >> > > > >> > And my all jobs are running fine , when I run my dpupdate job I get > > this > > >> > error > > >> > > > >> > Exception in thread "main" java.lang.RuntimeException: job failed: > > >> > name=update-table, jobid=job_local482736560_0001 > > >> > at > > >> org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) > > >> > at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:98) > > >> > at > > >> > > org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:105) > > >> > at > org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:119) > > >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > >> > at > org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:123) > > >> > > > >> > And hadoop log file says > > >> > > > >> > 2013-06-17 21:51:41,478 WARN mapred.FileOutputCommitter - Output > path > > >> is > > >> > null in cleanup > > >> > 2013-06-17 21:51:41,479 WARN mapred.LocalJobRunner - > > >> > job_local384125843_0001 > > >> > java.lang.IndexOutOfBoundsException > > >> > at java.nio.Buffer.checkBounds(Buffer.java:559) > > >> > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:143) > > >> > at > > >> > > > >> > > > >> > > > org.apache.avro.ipc.ByteBufferInputStream.read(ByteBufferInputStream.java:52) > > >> > at > > >> > > > >> > > > >> > > > org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:183) > > >> > at > > >> org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:265) > > >> > at > > >> > > > >> > > > >> > > > org.apache.gora.mapreduce.FakeResolvingDecoder.readString(FakeResolvingDecoder.java:131) > > >> > > > >> > And if I crawl simple page like www.google.nl .. every thing works > > >> fine , > > >> > including dbupdate job !!! > > >> > > > >> > Any clues how to debug this issues ? what could be the reason for > > this ? > > >> > > > >> > Thanks. > > >> > Tony. > > >> > > > >> > > >> > > >> > > >> -- > > >> *Lewis* > > >> > > > > > > > > > > > > -- > *Lewis* >

