Douglas Drinka created SPARK-28099:
--------------------------------------

             Summary: Assertion when querying unpartitioned Hive table with 
partition-like naming
                 Key: SPARK-28099
                 URL: https://issues.apache.org/jira/browse/SPARK-28099
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.3
            Reporter: Douglas Drinka


{code:java}
val testData = List(1,2,3,4,5)
val dataFrame = testData.toDF()
dataFrame
.coalesce(1)
.write
.mode(SaveMode.Overwrite)
.format("orc")
.option("compression", "zlib")
.save("s3://ddrinka.sparkbug/testFail/dir1=1/dir2=2/")

spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.testFail")
spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.testFail (val INT) STORED AS 
ORC LOCATION 's3://ddrinka.sparkbug/testFail/'")

val queryResponse = spark.sql("SELECT * FROM ddrinka_sparkbug.testFail")
//Throws AssertionError
//at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:214){code}
It looks like the native ORC reader is creating virtual columns named dir1 and 
dir2, which don't exist in the Hive table. [The 
assertion|[https://github.com/apache/spark/blob/c0297dedd829a92cca920ab8983dab399f8f32d5/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L257]]
 is checking that the number of columns match, which fails due to the virtual 
partition columns.

Actually getting data back from this query will be dependent on SPARK-28098, 
supporting subdirectories for Hive queries at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to