For cross-referencing, here is the SO thread[1]. Unfortunately, I don't
have a good answer for you, except try to align the ORC versions somehow.


On Fri, Dec 4, 2020 at 9:00 AM Сергей Чернов <> wrote:

> Hello,
> My situation is following:
>    1. I write data in ORC format by Flink into HDFS:
>       - I implements *Vectorizer* interface for processing my data and
>       converting it into *VectorizedRowBatch*
>       -  I create *OrcBulkWriter:*
>       OrcBulkWriterFactory<MyData> orcBulkWriterFactory = new
>       OrcBulkWriterFactory<>(new MyVectorizerImpl(orcSchemaString));
>       - I configure *StreamingFileSink*:
>       StreamingFileSink.forBulkFormat(hdfsPath, orcBulkWriterFactory)
>       .withBucketAssigner(new BaseBucketAssigner<>()).build();
>       - I deploy my job into Flink cluster and in *hdfsPath* catalog a
>       see ORC file
>    1. I create Hive table by the following command:
> CREATE TABLE flink_orc_test(STRING a, BIGINT b) STORED AS ORC 'hdfsPath';
>    1. I try to execute query:
>    SELECT * FROM flink_orc_test LIMIT 10;
>    2. I have an error:
>    Bad status for request TFetchResultsReq(fetchType=0, 
> operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, 
> operationType=0,
> operationId=THandleIdentifier(secret='a\x08\xc3U\xbb\xa7I\xce\x96\xa6\xdb\x82\xa4\xa9\xd1x',
>  guid='\xcc:\xca\xcb\x08\xa5KI\x8a}7\x95\xc5\xcd\xd2\xf0')),
>    orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0, 
> errorMessage=' java.lang.ArrayIndexOutOfBoundsException: 
> 6',
>    sqlState=None, 
> infoMessages=['*
>  java.lang.ArrayIndexOutOfBoundsException: 6:25:24',
> '',
> '',
> '',
>    'sun.reflect.GeneratedMethodAccessor25:invoke::-1', 
> '',
>    '',
> '',
> 'org.apache.hive.service.cli.session.HiveSessionProxy:access$',
> 'org.apache.hive.service.cli.session.HiveSessionProxy$',
>    '',
>    '',
> '',
> '',
>    'com.sun.proxy.$Proxy37:fetchResults::-1',
>    '',
> '',
> 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$',
> 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$',
>    '',
>    '',
> '',
> 'org.apache.thrift.server.TThreadPoolServer$',
> '',
> 'java.util.concurrent.ThreadPoolExecutor$',
>    '',
>    '* 6:29:4',
> '',
> '',
>    '',
>    '', 
> '',
>  '*java.lang.ArrayIndexOutOfBoundsException:6:37:8', 
> 'org.apache.orc.OrcFile$',
>    '',
>    'org.apache.orc.impl.ReaderImpl:<init>',
>    '<init>',
>    '', 
> '',
> 'org.apache.hadoop.hive.ql.exec.FetchOperator$',
> '',
> ''],
>  statusCode=3), results=None, hasMoreRows=None)
> Dependencies:
> <dependency>
>   <groupId>org.apache.hadoop</groupId>
>   <artifactId>hadoop-client</artifactId>
>   <version>3.0.0-cdh6.1.1</version>
>   <scope>provided</scope>
> </dependency>
> <dependency>
>   <groupId>org.apache.flink</groupId>
>   <artifactId>flink-orc_2.12</artifactId>
>   <version>1.11.2</version>
> </dependency>
> I think that the general problem that *flink-orc* uses *orc-core* dependency,
> but for correct working with hive i need a *hive-orc*.
> Actually, I can not replace *orc-core* with *hive-orc* because this is
> incompatible with *flink-orc* classes.
> How can I solve it?
> It is preferable for me use *StreamingFileSink* for writing ORC files, not
>  *Flink Table API with Hive*.
> Hive version: *2.1.1-cdh6.1.1*
> Flink Version: *1.11.2*
> --
> Best regards,
> Sergei Chernov


Arvid Heise | Senior Java Developer


Follow us @VervericaData


Join Flink Forward <> - The Apache Flink

Stream Processing | Event Driven | Real Time


Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Reply via email to