Hi

Just a follow up on this

Since making the patch to Avro, and after a solid day of crawling, I can
report that all is well.

There is also a noticeable performance boost throughout, which has
increased my crawling capacity- HBase 0.94.x is way more refined than
0.90.x and far more well behaved.

While the upgrade process wasn't entirely straightforward- now that all is
running, it has been fruitful and I'm very glad I did it.

Many thanks for these improvements

Az

On Fri, Sep 12, 2014 at 12:20 AM, Azhar Jassal <[email protected]> wrote:

> Hi Lewis
>
> Thanks for pointing me to that issue. Its helped me make some progress.
>
> The issue I've encountered is failing to deserialise Integer's and Long's
> persisted by Nutch 2.2.1. The patch attached in AVRO-813 seems to suggest
> that the EOFException's thrown are unnecessary, so I followed its example
> to silence those exceptions in the 3 places they are thrown that seems
> vulnerable, these are: readInt, readLong and ensureBounds in BinaryDecoder).
>
> Making those changes to Avro has got Nutch 2.3-SNAPSHOT (Gora 0.4/ HBase
> 0.94.13) running against the tables that were filled by Nutch 2.2.1 (Gora
> 0.3/ HBase 0.90.4).
>
> Any thoughts? Has Avro changed some handling of reading Integer's/ Long's
> that has caused it to not be able to read the ints/ longs persisted by
> Nutch 2.2.1?
>
>
> Az
>
> Below, Avro patch against trunk (disable EOFException in readInt,
> readLong, ensureBounds):
>
>
> --- a/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java
> +++ b/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java
> @@ -149,9 +149,9 @@ public class BinaryDecoder extends Decoder {
>        }
>      }
>      pos += len;
> -    if (pos > limit) {
> -      throw new EOFException();
> -    }
> +    //if (pos > limit) {
> +    //  throw new EOFException();
> +    //}
>      return (n >>> 1) ^ -(n & 1); // back to two's-complement
>    }
>
> @@ -186,9 +186,9 @@ public class BinaryDecoder extends Decoder {
>      } else {
>        l = n;
>      }
> -    if (pos > limit) {
> -      throw new EOFException();
> -    }
> +    //if (pos > limit) {
> +    //  throw new EOFException();
> +    //}
>      return (l >>> 1) ^ -(l & 1); // back to two's-complement
>    }
>
> @@ -469,8 +469,8 @@ public class BinaryDecoder extends Decoder {
>      if (remaining < num) {
>        // move remaining to front
>        source.compactAndFill(buf, pos, minPos, remaining);
> -      if (pos >= limit)
> -        throw new EOFException();
> +      //if (pos >= limit)
> +      //  throw new EOFException();
>      }
>    }
>
>
> On Thu, Sep 11, 2014 at 7:07 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
>> Hi Azhar,
>>
>> On Wed, Sep 10, 2014 at 10:28 PM, <[email protected]>
>> wrote:
>>
>> >
>> > I am in the process of upgrading from Nutch 2.2.1 to Nutch 2.3-SNAPSHOT:
>> >
>> > I have upgraded HBase from 0.90.4 to 0.94.13 and can scan all of the
>> > pre-existing tables through HBase shell.
>>
>>
>> Nice
>>
>>
>> > If I inject new URL's into a new
>> > crawl table, everything works fine.
>>
>>
>> Excellent
>>
>>
>> > However, when running a job, e.g.
>> > FetcherJob against the tables that pre-exist, I encounter the following
>> > Exception coming from GoraRecordReader- this is preventing FetcherMapper
>> > from running :
>> >
>> > java.io.EOFException
>> >
>> >         at
>> > org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>> >
>> >         at
>> org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>> >
>> > This is nasty...
>>
>> https://issues.apache.org/jira/browse/AVRO-813
>>
>
>

Reply via email to