Lewis ,

I am getting the same error on some other url as well.
So there is some issue with such urls which is causing exception in
dbupdate job.

Any idea how what could be the reason for this error ?

Thanks,
Tony


On Tue, Jun 18, 2013 at 12:36 AM, Tony Mullins <[email protected]>wrote:

> Yes.
>
>
> On Tue, Jun 18, 2013 at 12:34 AM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
>> And it is the same Exception you are getting each time you attempt to
>> updatedb with this URL?
>>
>>
>>
>> On Mon, Jun 17, 2013 at 12:24 PM, Tony Mullins <[email protected]
>> >wrote:
>>
>> > OK , I have tried some other urls and apparently I get DBUpdate job
>> > exception on this url only
>> >
>> http://www.amazon.com/Cisco-WAP4410N-Wireless-N-Access-Point/dp/B001IYCMNA
>> > .
>> > So there is some data on this url which is causing problem to my
>> cassandra
>> > update db job.
>> >
>> > Any ideas where should I look further to resolve this issue ?
>> >
>> > Thanks
>> > Tony.
>> >
>> >
>> > On Mon, Jun 17, 2013 at 11:35 PM, Tony Mullins <
>> [email protected]
>> > >wrote:
>> >
>> > > I am using gora comes with Nutch2.x ( i think its 0.3 ) with cassandra
>> > > 1.2.5. And getting the above mentioned error.
>> > > Any hints how should I tackle this problem , any suggestions plz ?
>> > >
>> > > If I do simple crawl like www.google.com , all works fine !!!
>> > >
>> > > Thanks,
>> > > Tony.
>> > >
>> > >
>> > > On Mon, Jun 17, 2013 at 11:21 PM, Lewis John Mcgibbney <
>> > > [email protected]> wrote:
>> > >
>> > >> Hi Tony,
>> > >> Which gora backend are you on, including the version of the backend
>> > itself
>> > >> please?
>> > >> I use Gora 0.3 with gora-cassandra on some cron jobs and injected
>> your
>> > >> URLs
>> > >> into my db. All works fine.
>> > >> I did notice that these pages have a hellish lots of content which is
>> > not
>> > >> displayed on the page. Loads of CSS and garbage.
>> > >> Something else I did notice is that if you enable microformats-reltag
>> > and
>> > >> parse reltag's out, you get loads of bad characters.,.. which is not
>> > nice.
>> > >> I will log a Jira as this should be fixed.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Mon, Jun 17, 2013 at 10:08 AM, Tony Mullins <
>> > [email protected]
>> > >> >wrote:
>> > >>
>> > >> > Hi ,
>> > >> >
>> > >> > I am getting weird error on DBUpdater Job in Nutch2.x.
>> > >> > I am crawling these two links
>> > >> >
>> > >> >
>> > >> >
>> > >>
>> >
>> http://www.amazon.com/Degree-Antiperspirant-Deodorant-Extreme-Blast/dp/B001ET769Y
>> > >> >
>> > >>
>> >
>> http://www.amazon.com/Cisco-WAP4410N-Wireless-N-Access-Point/dp/B001IYCMNA
>> > >> >
>> > >> > And my all jobs are running fine , when I run my dpupdate job I get
>> > this
>> > >> > error
>> > >> >
>> > >> > Exception in thread "main" java.lang.RuntimeException: job failed:
>> > >> > name=update-table, jobid=job_local482736560_0001
>> > >> >     at
>> > >> org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
>> > >> >     at
>> org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:98)
>> > >> >     at
>> > >> >
>> org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:105)
>> > >> >     at
>> org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:119)
>> > >> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> > >> >     at
>> org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:123)
>> > >> >
>> > >> > And hadoop log file says
>> > >> >
>> > >> > 2013-06-17 21:51:41,478 WARN  mapred.FileOutputCommitter - Output
>> path
>> > >> is
>> > >> > null in cleanup
>> > >> > 2013-06-17 21:51:41,479 WARN  mapred.LocalJobRunner -
>> > >> > job_local384125843_0001
>> > >> > java.lang.IndexOutOfBoundsException
>> > >> >     at java.nio.Buffer.checkBounds(Buffer.java:559)
>> > >> >     at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:143)
>> > >> >     at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.avro.ipc.ByteBufferInputStream.read(ByteBufferInputStream.java:52)
>> > >> >     at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:183)
>> > >> >     at
>> > >> org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:265)
>> > >> >     at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.gora.mapreduce.FakeResolvingDecoder.readString(FakeResolvingDecoder.java:131)
>> > >> >
>> > >> > And if I crawl simple page like www.google.nl .. every thing works
>> > >> fine ,
>> > >> > including dbupdate job !!!
>> > >> >
>> > >> > Any clues how to debug this issues ? what could be the reason for
>> > this ?
>> > >> >
>> > >> > Thanks.
>> > >> > Tony.
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> *Lewis*
>> > >>
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> *Lewis*
>>
>
>

Reply via email to