FYI this was due to hadoop version. 3.2.0 was throwing this error, but rolled back to version in googles pom.xml 2.7.4 and it is working fine now.
Kindof annoying cause I wasted several hours jumping through hoops trying to get 3.2.0 working :( On Wed, Sep 4, 2019 at 5:09 PM Shannon Duncan <[email protected]> wrote: > I have successfully been using the sequence file source located here: > > > https://github.com/googleapis/java-bigtable-hbase/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java > > However recently we started to do block level compression with bzip2 on > the SequenceFile. This is supported out of the box on the Hadoop side of > things. > > However when reading in the files, while most records parse out just fine > there are a handful of records that throw: > > #### > Exception in thread "main" java.lang.IndexOutOfBoundsException: offs(1368) > + len(1369) > dest.length(1467). > at > org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:398) > #### > > I've gone in circles looking at this. It seems that the last record being > read from the sequencefile in each thread is hitting this on the value > retrieval (Key retrieves just fine, but value throws this error). > > Any clues as to what this could be? > > File is KV<Text, Text> aka > "SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text(org.apache.hadoop.io.compress.BZip2Codec" > > Any help is appreciated! > > - Shannon >
