For cross-referencing, here is the SO thread[1]. Unfortunately, I don't have a good answer for you, except try to align the ORC versions somehow.
[1] https://stackoverflow.com/questions/65126074/orc-files-generated-by-flink-can-not-be-used-in-hive On Fri, Dec 4, 2020 at 9:00 AM Сергей Чернов <chrnv...@mail.ru> wrote: > Hello, > > My situation is following: > > 1. I write data in ORC format by Flink into HDFS: > - I implements *Vectorizer* interface for processing my data and > converting it into *VectorizedRowBatch* > - I create *OrcBulkWriter:* > OrcBulkWriterFactory<MyData> orcBulkWriterFactory = new > OrcBulkWriterFactory<>(new MyVectorizerImpl(orcSchemaString)); > - I configure *StreamingFileSink*: > StreamingFileSink.forBulkFormat(hdfsPath, orcBulkWriterFactory) > .withBucketAssigner(new BaseBucketAssigner<>()).build(); > - I deploy my job into Flink cluster and in *hdfsPath* catalog a > see ORC file > > > 1. I create Hive table by the following command: > > CREATE TABLE flink_orc_test(STRING a, BIGINT b) STORED AS ORC 'hdfsPath'; > > > 1. I try to execute query: > > SELECT * FROM flink_orc_test LIMIT 10; > > 2. I have an error: > > Bad status for request TFetchResultsReq(fetchType=0, > operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, > operationType=0, > > operationId=THandleIdentifier(secret='a\x08\xc3U\xbb\xa7I\xce\x96\xa6\xdb\x82\xa4\xa9\xd1x', > guid='\xcc:\xca\xcb\x08\xa5KI\x8a}7\x95\xc5\xcd\xd2\xf0')), > orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0, > errorMessage='java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: > 6', > sqlState=None, > infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 6:25:24', > > 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:496', > > 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:297', > > 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:868', > 'sun.reflect.GeneratedMethodAccessor25:invoke::-1', > 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', > 'java.lang.reflect.Method:invoke:Method.java:498', > > 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', > > 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', > > 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', > 'java.security.AccessController:doPrivileged:AccessController.java:-2', > 'javax.security.auth.Subject:doAs:Subject.java:422', > > 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1731', > > 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', > 'com.sun.proxy.$Proxy37:fetchResults::-1', > 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:507', > > 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:708', > > 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1717', > > 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1702', > 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', > 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', > > 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', > > 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', > > 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', > > 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', > 'java.lang.Thread:run:Thread.java:748', > '*java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 6:29:4', > > 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:521', > > 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:428', > 'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:146', > 'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2277', > 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:491', > '*java.lang.ArrayIndexOutOfBoundsException:6:37:8', > 'org.apache.orc.OrcFile$WriterVersion:from:OrcFile.java:145', > 'org.apache.orc.impl.OrcTail:getWriterVersion:OrcTail.java:74', > 'org.apache.orc.impl.ReaderImpl:<init>:ReaderImpl.java:385', > 'org.apache.hadoop.hive.ql.io.orc.ReaderImpl:<init>:ReaderImpl.java:62', > 'org.apache.hadoop.hive.ql.io.orc.OrcFile:createReader:OrcFile.java:89', > 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat:getRecordReader:OrcInputFormat.java:1690', > > 'org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit:getRecordReader:FetchOperator.java:695', > > 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:333', > > 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:459'], > statusCode=3), results=None, hasMoreRows=None) > > > Dependencies: > > <dependency> > <groupId>org.apache.hadoop</groupId> > <artifactId>hadoop-client</artifactId> > <version>3.0.0-cdh6.1.1</version> > <scope>provided</scope> > </dependency> > <dependency> > <groupId>org.apache.flink</groupId> > <artifactId>flink-orc_2.12</artifactId> > <version>1.11.2</version> > </dependency> > > > > I think that the general problem that *flink-orc* uses *orc-core* dependency, > but for correct working with hive i need a *hive-orc*. > > Actually, I can not replace *orc-core* with *hive-orc* because this is > incompatible with *flink-orc* classes. > > > How can I solve it? > > > It is preferable for me use *StreamingFileSink* for writing ORC files, not > *Flink Table API with Hive*. > > > Hive version: *2.1.1-cdh6.1.1* > > Flink Version: *1.11.2* > > -- > Best regards, > Sergei Chernov > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng