Yes.

On Tue, Jun 18, 2013 at 12:34 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> And it is the same Exception you are getting each time you attempt to
> updatedb with this URL?
>
>
>
> On Mon, Jun 17, 2013 at 12:24 PM, Tony Mullins <[email protected]
> >wrote:
>
> > OK , I have tried some other urls and apparently I get DBUpdate job
> > exception on this url only
> >
> http://www.amazon.com/Cisco-WAP4410N-Wireless-N-Access-Point/dp/B001IYCMNA
> > .
> > So there is some data on this url which is causing problem to my
> cassandra
> > update db job.
> >
> > Any ideas where should I look further to resolve this issue ?
> >
> > Thanks
> > Tony.
> >
> >
> > On Mon, Jun 17, 2013 at 11:35 PM, Tony Mullins <[email protected]
> > >wrote:
> >
> > > I am using gora comes with Nutch2.x ( i think its 0.3 ) with cassandra
> > > 1.2.5. And getting the above mentioned error.
> > > Any hints how should I tackle this problem , any suggestions plz ?
> > >
> > > If I do simple crawl like www.google.com , all works fine !!!
> > >
> > > Thanks,
> > > Tony.
> > >
> > >
> > > On Mon, Jun 17, 2013 at 11:21 PM, Lewis John Mcgibbney <
> > > [email protected]> wrote:
> > >
> > >> Hi Tony,
> > >> Which gora backend are you on, including the version of the backend
> > itself
> > >> please?
> > >> I use Gora 0.3 with gora-cassandra on some cron jobs and injected your
> > >> URLs
> > >> into my db. All works fine.
> > >> I did notice that these pages have a hellish lots of content which is
> > not
> > >> displayed on the page. Loads of CSS and garbage.
> > >> Something else I did notice is that if you enable microformats-reltag
> > and
> > >> parse reltag's out, you get loads of bad characters.,.. which is not
> > nice.
> > >> I will log a Jira as this should be fixed.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Jun 17, 2013 at 10:08 AM, Tony Mullins <
> > [email protected]
> > >> >wrote:
> > >>
> > >> > Hi ,
> > >> >
> > >> > I am getting weird error on DBUpdater Job in Nutch2.x.
> > >> > I am crawling these two links
> > >> >
> > >> >
> > >> >
> > >>
> >
> http://www.amazon.com/Degree-Antiperspirant-Deodorant-Extreme-Blast/dp/B001ET769Y
> > >> >
> > >>
> >
> http://www.amazon.com/Cisco-WAP4410N-Wireless-N-Access-Point/dp/B001IYCMNA
> > >> >
> > >> > And my all jobs are running fine , when I run my dpupdate job I get
> > this
> > >> > error
> > >> >
> > >> > Exception in thread "main" java.lang.RuntimeException: job failed:
> > >> > name=update-table, jobid=job_local482736560_0001
> > >> >     at
> > >> org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
> > >> >     at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:98)
> > >> >     at
> > >> >
> org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:105)
> > >> >     at
> org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:119)
> > >> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >> >     at
> org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:123)
> > >> >
> > >> > And hadoop log file says
> > >> >
> > >> > 2013-06-17 21:51:41,478 WARN  mapred.FileOutputCommitter - Output
> path
> > >> is
> > >> > null in cleanup
> > >> > 2013-06-17 21:51:41,479 WARN  mapred.LocalJobRunner -
> > >> > job_local384125843_0001
> > >> > java.lang.IndexOutOfBoundsException
> > >> >     at java.nio.Buffer.checkBounds(Buffer.java:559)
> > >> >     at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:143)
> > >> >     at
> > >> >
> > >> >
> > >>
> >
> org.apache.avro.ipc.ByteBufferInputStream.read(ByteBufferInputStream.java:52)
> > >> >     at
> > >> >
> > >> >
> > >>
> >
> org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:183)
> > >> >     at
> > >> org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:265)
> > >> >     at
> > >> >
> > >> >
> > >>
> >
> org.apache.gora.mapreduce.FakeResolvingDecoder.readString(FakeResolvingDecoder.java:131)
> > >> >
> > >> > And if I crawl simple page like www.google.nl .. every thing works
> > >> fine ,
> > >> > including dbupdate job !!!
> > >> >
> > >> > Any clues how to debug this issues ? what could be the reason for
> > this ?
> > >> >
> > >> > Thanks.
> > >> > Tony.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> *Lewis*
> > >>
> > >
> > >
> >
>
>
>
> --
> *Lewis*
>

Reply via email to