Just some additional info:

In order to output this Exception I had to hack my copy of Gora 0.4

File: org/apache/gora/mapreduce/GoraRecordReader.java

Otherwise, you can see that the Exception is caught and suppressed. I had
to print it out as otherwise the Mapper fails silently.

Have I missed required step while upgrading to Nutch 2.3/ Gora 0.4/ HBase
0.94.13 that treated the existing data in some way?

Code of GoraRecordReader thats seeding the Mapper failures:

  @Override
  public boolean nextKeyValue() throws IOException, InterruptedException {
          try{
            if (counter.isModulo()) {
              boolean firstBatch = (this.result == null);
              if (! firstBatch) {
                this.query.setStartKey(this.result.getKey());
                if (this.query.getLimit() == counter.getRecordsMax()) {
                  this.query.setLimit(counter.getRecordsMax() + 1);
                }
              }
              if (this.result != null) {
                this.result.close();
              }
        
              executeQuery();
        
              if (! firstBatch) {
                // skip first result
                this.result.next();
              }
            }
        
            counter.increment();
            return this.result.next();
          }
          catch(Exception e){
                return false;
          }
  }




On Wed, Sep 10, 2014 at 4:03 PM, Azhar Jassal <[email protected]> wrote:

> Hi
>
> I am in the process of upgrading from Nutch 2.2.1 to Nutch 2.3-SNAPSHOT:
>
> I have upgraded HBase from 0.90.4 to 0.94.13 and can scan all of the
> pre-existing tables through HBase shell. If I inject new URL's into a new
> crawl table, everything works fine. However, when running a job, e.g.
> FetcherJob against the tables that pre-exist, I encounter the following
> Exception coming from GoraRecordReader- this is preventing FetcherMapper
> from running :
>
> java.io.EOFException
>
>         at
> org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>
>         at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>
>         at
> org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
>
>         at
> org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:376)
>
>         at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:156)
>
>         at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>
>         at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>
>         at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>
>         at
> org.apache.gora.hbase.util.HBaseByteInterface.fromBytes(HBaseByteInterface.java:145)
>
>         at
> org.apache.gora.hbase.util.HBaseByteInterface.fromBytes(HBaseByteInterface.java:114)
>
>         at
> org.apache.gora.hbase.store.HBaseStore.setField(HBaseStore.java:713)
>
>         at
> org.apache.gora.hbase.store.HBaseStore.setField(HBaseStore.java:679)
>
>         at
> org.apache.gora.hbase.store.HBaseStore.setField(HBaseStore.java:644)
>
>         at
> org.apache.gora.hbase.store.HBaseStore.newInstance(HBaseStore.java:625)
>
>         at
> org.apache.gora.hbase.query.HBaseResult.readNext(HBaseResult.java:48)
>
>         at
> org.apache.gora.hbase.query.HBaseScannerResult.nextInner(HBaseScannerResult.java:54)
>
>         at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
>
>         at
> org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:119)
>
>         at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
>
>         at
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> Like I said, working against a new table is fine- its only against the
> existing data (crawlId's). There seems to be something that either Avro
> doesn't like about the data- HBase seems to be fine as I can scan tables
> and read data directly.
>
> Any ideas?
>
>
> Az
>

Reply via email to