Just upgraded Hive from Hive-3.0 to 3.1.1 Connected to: Apache Hive (version 3.1.1) Driver: Hive JDBC (version 3.1.1)
Created an ORC table through Spark as below: sql("use accounts") // // Drop and create table ll_18740868 // sql("DROP TABLE IF EXISTS accounts.ll_18740868") var sqltext = "" sqltext = """ CREATE TABLE accounts.ll_18740868 ( TransactionDate DATE ,TransactionType String ,SortCode String ,AccountNumber String ,TransactionDescription String ,DebitAmount Double ,CreditAmount Double ,Balance Double ) COMMENT 'from csv file from excel sheet' STORED AS ORC TBLPROPERTIES ( "orc.compress"="ZLIB" ) """ sql(sqltext) Table is created Ok and populated from CSV files from HDFS Data is inserted through a Hive temp table created on DataFrame "a" as below: a.toDF.registerTempTable("tmp") INSERT INTO TABLE accounts.ll_18740868 SELECT ……... FROM tmp So the data is there as I can select rows from the ORC table // example scala> sql("Select TransactionDate, DebitAmount, CreditAmount, Balance from ll_18740868 limit 3 ").collect.foreach(println) [2011-12-30,50.0,null,304.89] [2011-12-30,19.01,null,354.89] [2011-12-29,80.1,null,373.9] However, this select does not work from beeline 0: jdbc:hive2://rhes75:10099/default> Beeline version 3.1.1 by Apache Hive 0: jdbc:hive2://rhes75:10099/default> use accounts; No rows affected (0.011 seconds) 0: jdbc:hive2://rhes75:10099/default> Select TransactionDate, DebitAmount, CreditAmount, Balance from ll_18740868 limit 3; Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I (state=,code=0) I thought this problem would have gone away in this release? So it works through because it uses Spark Tungesten optimiser but not through Hive! explain Select TransactionDate, DebitAmount, CreditAmount, Balance from ll_18740868 limit 3; +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: 3 | | Processor Tree: | | TableScan | | alias: ll_18740868 | | Statistics: Num rows: 80 Data size: 53535 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: transactiondate (type: date), debitamount (type: double), creditamount (type: double), balance (type: double) | | outputColumnNames: _col0, _col1, _col2, _col3 | | Statistics: Num rows: 80 Data size: 53535 Basic stats: COMPLETE Column stats: NONE | | Limit | | Number of rows: 3 | | Statistics: Num rows: 3 Data size: 2007 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +----------------------------------------------------+ BTW, I tried different settings for set hive.exec.orc.split.strategy None worked Thanks