Hi Just a follow up on this
Since making the patch to Avro, and after a solid day of crawling, I can report that all is well. There is also a noticeable performance boost throughout, which has increased my crawling capacity- HBase 0.94.x is way more refined than 0.90.x and far more well behaved. While the upgrade process wasn't entirely straightforward- now that all is running, it has been fruitful and I'm very glad I did it. Many thanks for these improvements Az On Fri, Sep 12, 2014 at 12:20 AM, Azhar Jassal <[email protected]> wrote: > Hi Lewis > > Thanks for pointing me to that issue. Its helped me make some progress. > > The issue I've encountered is failing to deserialise Integer's and Long's > persisted by Nutch 2.2.1. The patch attached in AVRO-813 seems to suggest > that the EOFException's thrown are unnecessary, so I followed its example > to silence those exceptions in the 3 places they are thrown that seems > vulnerable, these are: readInt, readLong and ensureBounds in BinaryDecoder). > > Making those changes to Avro has got Nutch 2.3-SNAPSHOT (Gora 0.4/ HBase > 0.94.13) running against the tables that were filled by Nutch 2.2.1 (Gora > 0.3/ HBase 0.90.4). > > Any thoughts? Has Avro changed some handling of reading Integer's/ Long's > that has caused it to not be able to read the ints/ longs persisted by > Nutch 2.2.1? > > > Az > > Below, Avro patch against trunk (disable EOFException in readInt, > readLong, ensureBounds): > > > --- a/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java > +++ b/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java > @@ -149,9 +149,9 @@ public class BinaryDecoder extends Decoder { > } > } > pos += len; > - if (pos > limit) { > - throw new EOFException(); > - } > + //if (pos > limit) { > + // throw new EOFException(); > + //} > return (n >>> 1) ^ -(n & 1); // back to two's-complement > } > > @@ -186,9 +186,9 @@ public class BinaryDecoder extends Decoder { > } else { > l = n; > } > - if (pos > limit) { > - throw new EOFException(); > - } > + //if (pos > limit) { > + // throw new EOFException(); > + //} > return (l >>> 1) ^ -(l & 1); // back to two's-complement > } > > @@ -469,8 +469,8 @@ public class BinaryDecoder extends Decoder { > if (remaining < num) { > // move remaining to front > source.compactAndFill(buf, pos, minPos, remaining); > - if (pos >= limit) > - throw new EOFException(); > + //if (pos >= limit) > + // throw new EOFException(); > } > } > > > On Thu, Sep 11, 2014 at 7:07 PM, Lewis John Mcgibbney < > [email protected]> wrote: > >> Hi Azhar, >> >> On Wed, Sep 10, 2014 at 10:28 PM, <[email protected]> >> wrote: >> >> > >> > I am in the process of upgrading from Nutch 2.2.1 to Nutch 2.3-SNAPSHOT: >> > >> > I have upgraded HBase from 0.90.4 to 0.94.13 and can scan all of the >> > pre-existing tables through HBase shell. >> >> >> Nice >> >> >> > If I inject new URL's into a new >> > crawl table, everything works fine. >> >> >> Excellent >> >> >> > However, when running a job, e.g. >> > FetcherJob against the tables that pre-exist, I encounter the following >> > Exception coming from GoraRecordReader- this is preventing FetcherMapper >> > from running : >> > >> > java.io.EOFException >> > >> > at >> > org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) >> > >> > at >> org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) >> > >> > This is nasty... >> >> https://issues.apache.org/jira/browse/AVRO-813 >> > >

