Hi, I have created a hive external table stored as textfile partitioned by event_date Date.
How do we have to specify a specific format of csv while reading in spark from Hive table ? The environment is 1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version 2.10.4(Java HotSpot(TM) 64 - Bit Server VM, Java 1.7.0_67) 2. Hive 1.1, CDH 5.5.1 scala script sqlContext.setConf("hive.exec.dynamic.partition", "true") sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict") val distData = sc.parallelize(Array((1, 1, 1), (2, 2, 2), (3, 3, 3))).toDF val distData_1 = distData.withColumn("event_date", current_date()) distData_1: org.apache.spark.sql.DataFrame = [_1: int, _2: int, _3: int, event_date: date] scala > distData_1.show + ---+---+---+----------+ |_1 |_2 |_3 | event_date | | 1 | 1 | 1 | 2016-03-25 | | 2 | 2 | 2 | 2016-03-25 | | 3 | 3 | 3 | 2016-03-25 | distData_1.write.mode("append").partitionBy("event_date").saveAsTable("part_table") scala > sqlContext.sql("select * from part_table").show | a | b | c | event_date | |1,1,1 | null | null | 2016-03-25 | |2,2,2 | null | null | 2016-03-25 | |3,3,3 | null | null | 2016-03-25 | Hive table create external table part_table (a String, b int, c bigint) partitioned by (event_date Date) row format delimited fields terminated by ',' stored as textfile LOCATION "/user/hdfs/hive/part_table"; select * from part_table shows |part_table.a | part_table.b | part_table.c | part_table.event_date | |1 |1 |1 |2016-03-25 |2 |2 |2 |2016-03-25 |3 |3 |3 |2016-03-25 Looking at the hdfs The path has 2 part files /user/hdfs/hive/part_table/event_date=2016-03-25 part-00000 part-00001 part-00000 content 1,1,1 part-00001 content 2,2,2 3,3,3 P.S. if we store the table as orc it writes and reads the data as expected.