Hi Lewis

Thanks for pointing me to that issue. Its helped me make some progress.

The issue I've encountered is failing to deserialise Integer's and Long's
persisted by Nutch 2.2.1. The patch attached in AVRO-813 seems to suggest
that the EOFException's thrown are unnecessary, so I followed its example
to silence those exceptions in the 3 places they are thrown that seems
vulnerable, these are: readInt, readLong and ensureBounds in BinaryDecoder).

Making those changes to Avro has got Nutch 2.3-SNAPSHOT (Gora 0.4/ HBase
0.94.13) running against the tables that were filled by Nutch 2.2.1 (Gora
0.3/ HBase 0.90.4).

Any thoughts? Has Avro changed some handling of reading Integer's/ Long's
that has caused it to not be able to read the ints/ longs persisted by
Nutch 2.2.1?


Az

Below, Avro patch against trunk (disable EOFException in readInt, readLong,
ensureBounds):


--- a/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java
+++ b/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java
@@ -149,9 +149,9 @@ public class BinaryDecoder extends Decoder {
       }
     }
     pos += len;
-    if (pos > limit) {
-      throw new EOFException();
-    }
+    //if (pos > limit) {
+    //  throw new EOFException();
+    //}
     return (n >>> 1) ^ -(n & 1); // back to two's-complement
   }

@@ -186,9 +186,9 @@ public class BinaryDecoder extends Decoder {
     } else {
       l = n;
     }
-    if (pos > limit) {
-      throw new EOFException();
-    }
+    //if (pos > limit) {
+    //  throw new EOFException();
+    //}
     return (l >>> 1) ^ -(l & 1); // back to two's-complement
   }

@@ -469,8 +469,8 @@ public class BinaryDecoder extends Decoder {
     if (remaining < num) {
       // move remaining to front
       source.compactAndFill(buf, pos, minPos, remaining);
-      if (pos >= limit)
-        throw new EOFException();
+      //if (pos >= limit)
+      //  throw new EOFException();
     }
   }


On Thu, Sep 11, 2014 at 7:07 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi Azhar,
>
> On Wed, Sep 10, 2014 at 10:28 PM, <[email protected]>
> wrote:
>
> >
> > I am in the process of upgrading from Nutch 2.2.1 to Nutch 2.3-SNAPSHOT:
> >
> > I have upgraded HBase from 0.90.4 to 0.94.13 and can scan all of the
> > pre-existing tables through HBase shell.
>
>
> Nice
>
>
> > If I inject new URL's into a new
> > crawl table, everything works fine.
>
>
> Excellent
>
>
> > However, when running a job, e.g.
> > FetcherJob against the tables that pre-exist, I encounter the following
> > Exception coming from GoraRecordReader- this is preventing FetcherMapper
> > from running :
> >
> > java.io.EOFException
> >
> >         at
> > org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
> >
> >         at
> org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
> >
> > This is nasty...
>
> https://issues.apache.org/jira/browse/AVRO-813
>

Reply via email to