Let me try this out on my standalone Hive. I remember reading something similar on SO[1]. In this case, it was an external ORC generated by Spark and an external table was created using CDH. The OP answered referring to a community post[2] on Cloudera. It may be worth checking.
[1] https://stackoverflow.com/questions/62791744/problem-of-compatibility-of-an-external-orc-and-claudera-s-hive [2] https://community.cloudera.com/t5/Cloudera-Labs/Problem-of-compatibility-of-an-external-orc-and-Claudera-s/m-p/299395/highlight/false#M582 On Fri, Dec 4, 2020 at 4:47 PM Arvid Heise <ar...@ververica.com> wrote: > For cross-referencing, here is the SO thread[1]. Unfortunately, I don't > have a good answer for you, except try to align the ORC versions somehow. > > [1] > https://stackoverflow.com/questions/65126074/orc-files-generated-by-flink-can-not-be-used-in-hive > > On Fri, Dec 4, 2020 at 9:00 AM Сергей Чернов <chrnv...@mail.ru> wrote: > >> Hello, >> >> My situation is following: >> >> 1. I write data in ORC format by Flink into HDFS: >> - I implements *Vectorizer* interface for processing my data and >> converting it into *VectorizedRowBatch* >> - I create *OrcBulkWriter:* >> OrcBulkWriterFactory<MyData> orcBulkWriterFactory = new >> OrcBulkWriterFactory<>(new MyVectorizerImpl(orcSchemaString)); >> - I configure *StreamingFileSink*: >> StreamingFileSink.forBulkFormat(hdfsPath, orcBulkWriterFactory) >> .withBucketAssigner(new BaseBucketAssigner<>()).build(); >> - I deploy my job into Flink cluster and in *hdfsPath* catalog a >> see ORC file >> >> >> 1. I create Hive table by the following command: >> >> CREATE TABLE flink_orc_test(STRING a, BIGINT b) STORED AS ORC 'hdfsPath'; >> >> >> 1. I try to execute query: >> >> SELECT * FROM flink_orc_test LIMIT 10; >> >> 2. I have an error: >> >> Bad status for request TFetchResultsReq(fetchType=0, >> operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, >> operationType=0, >> >> operationId=THandleIdentifier(secret='a\x08\xc3U\xbb\xa7I\xce\x96\xa6\xdb\x82\xa4\xa9\xd1x', >> guid='\xcc:\xca\xcb\x08\xa5KI\x8a}7\x95\xc5\xcd\xd2\xf0')), >> orientation=4, maxRows=100): >> TFetchResultsResp(status=TStatus(errorCode=0, >> errorMessage='java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: >> 6', >> sqlState=None, >> infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: >> java.lang.ArrayIndexOutOfBoundsException: 6:25:24', >> >> 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:496', >> >> 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:297', >> >> 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:868', >> 'sun.reflect.GeneratedMethodAccessor25:invoke::-1', >> 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', >> 'java.lang.reflect.Method:invoke:Method.java:498', >> >> 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', >> >> 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', >> >> 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', >> 'java.security.AccessController:doPrivileged:AccessController.java:-2', >> 'javax.security.auth.Subject:doAs:Subject.java:422', >> >> 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1731', >> >> 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', >> 'com.sun.proxy.$Proxy37:fetchResults::-1', >> 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:507', >> >> 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:708', >> >> 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1717', >> >> 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1702', >> 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', >> 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', >> >> 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', >> >> 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', >> >> 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', >> >> 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', >> 'java.lang.Thread:run:Thread.java:748', >> '*java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 6:29:4', >> >> 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:521', >> >> 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:428', >> 'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:146', >> 'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2277', >> 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:491', >> '*java.lang.ArrayIndexOutOfBoundsException:6:37:8', >> 'org.apache.orc.OrcFile$WriterVersion:from:OrcFile.java:145', >> 'org.apache.orc.impl.OrcTail:getWriterVersion:OrcTail.java:74', >> 'org.apache.orc.impl.ReaderImpl:<init>:ReaderImpl.java:385', >> 'org.apache.hadoop.hive.ql.io.orc.ReaderImpl:<init>:ReaderImpl.java:62', >> 'org.apache.hadoop.hive.ql.io.orc.OrcFile:createReader:OrcFile.java:89', >> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat:getRecordReader:OrcInputFormat.java:1690', >> >> 'org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit:getRecordReader:FetchOperator.java:695', >> >> 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:333', >> >> 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:459'], >> statusCode=3), results=None, hasMoreRows=None) >> >> >> Dependencies: >> >> <dependency> >> <groupId>org.apache.hadoop</groupId> >> <artifactId>hadoop-client</artifactId> >> <version>3.0.0-cdh6.1.1</version> >> <scope>provided</scope> >> </dependency> >> <dependency> >> <groupId>org.apache.flink</groupId> >> <artifactId>flink-orc_2.12</artifactId> >> <version>1.11.2</version> >> </dependency> >> >> >> >> I think that the general problem that *flink-orc* uses *orc-core* dependency, >> but for correct working with hive i need a *hive-orc*. >> >> Actually, I can not replace *orc-core* with *hive-orc* because this is >> incompatible with *flink-orc* classes. >> >> >> How can I solve it? >> >> >> It is preferable for me use *StreamingFileSink* for writing ORC files, >> not *Flink Table API with Hive*. >> >> >> Hive version: *2.1.1-cdh6.1.1* >> >> Flink Version: *1.11.2* >> >> -- >> Best regards, >> Sergei Chernov >> > > > -- > > Arvid Heise | Senior Java Developer > > <https://www.ververica.com/> > > Follow us @VervericaData > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Ververica GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > (Toni) Cheng >