Hi,

I have created a hive external table stored as textfile partitioned by
event_date Date.

How do we have to specify a specific format of csv while reading in spark
from Hive table ?

The environment is

     1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version 2.10.4(Java
HotSpot(TM) 64 - Bit Server VM, Java 1.7.0_67)
     2. Hive 1.1, CDH 5.5.1

scala script

        sqlContext.setConf("hive.exec.dynamic.partition", "true")
        sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

        val distData = sc.parallelize(Array((1, 1, 1), (2, 2, 2), (3, 3,
3))).toDF
        val distData_1 = distData.withColumn("event_date", current_date())
        distData_1: org.apache.spark.sql.DataFrame = [_1: int, _2: int, _3:
int, event_date: date]

        scala > distData_1.show
        + ---+---+---+----------+
        |_1 |_2 |_3 | event_date |
        | 1 | 1 | 1 | 2016-03-25 |
        | 2 | 2 | 2 | 2016-03-25 |
        | 3 | 3 | 3 | 2016-03-25 |


distData_1.write.mode("append").partitionBy("event_date").saveAsTable("part_table")


        scala > sqlContext.sql("select * from part_table").show
        | a    | b    | c    | event_date |
        |1,1,1 | null | null | 2016-03-25 |
        |2,2,2 | null | null | 2016-03-25 |
        |3,3,3 | null | null | 2016-03-25 |



Hive table

        create external table part_table (a String, b int, c bigint)
        partitioned by (event_date Date)
        row format delimited fields terminated by ','
        stored as textfile  LOCATION "/user/hdfs/hive/part_table";

        select * from part_table shows
        |part_table.a | part_table.b | part_table.c | part_table.event_date
|
        |1                 |1                 |1
 |2016-03-25
        |2                 |2                 |2
 |2016-03-25
        |3                 |3                 |3
 |2016-03-25


Looking at the hdfs


        The path has 2 part files
/user/hdfs/hive/part_table/event_date=2016-03-25
        part-00000
        part-00001

          part-00000 content
            1,1,1
          part-00001 content
            2,2,2
            3,3,3


P.S. if we store the table as orc it writes and reads the data as expected.

Reply via email to