FYI this was due to hadoop version. 3.2.0 was throwing this error, but
rolled back to version in googles pom.xml 2.7.4 and it is working fine now.

Kindof annoying cause I wasted several hours jumping through hoops trying
to get 3.2.0 working :(

On Wed, Sep 4, 2019 at 5:09 PM Shannon Duncan <[email protected]>
wrote:

> I have successfully been using the sequence file source located here:
>
>
> https://github.com/googleapis/java-bigtable-hbase/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java
>
> However recently we started to do block level compression with bzip2 on
> the SequenceFile. This is supported out of the box on the Hadoop side of
> things.
>
> However when reading in the files, while most records parse out just fine
> there are a handful of records that throw:
>
> ####
> Exception in thread "main" java.lang.IndexOutOfBoundsException: offs(1368)
> + len(1369) > dest.length(1467).
> at
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:398)
> ####
>
> I've gone in circles looking at this. It seems that the last record being
> read from the sequencefile in each thread is hitting this on the value
> retrieval (Key retrieves just fine, but value throws this error).
>
> Any clues as to what this could be?
>
> File is KV<Text, Text> aka
> "SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text(org.apache.hadoop.io.compress.BZip2Codec"
>
> Any help is appreciated!
>
> - Shannon
>

Reply via email to