HiveContext unable to recognize the delimiter of Hive table in textfile partitioned by date

Shiva Achari Tue, 05 Apr 2016 07:47:17 -0700

Hi,

I have created a hive external table stored as textfile partitioned by
event_date Date.


How do we have to specify a specific format of csv while reading in spark
from Hive table ?

The environment is

     1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version 2.10.4(Java
HotSpot(TM) 64 - Bit Server VM, Java 1.7.0_67)
     2. Hive 1.1, CDH 5.5.1

scala script

        sqlContext.setConf("hive.exec.dynamic.partition", "true")
        sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

        val distData = sc.parallelize(Array((1, 1, 1), (2, 2, 2), (3, 3,
3))).toDF
        val distData_1 = distData.withColumn("event_date", current_date())
        distData_1: org.apache.spark.sql.DataFrame = [_1: int, _2: int, _3:
int, event_date: date]

        scala > distData_1.show
        + ---+---+---+----------+
        |_1 |_2 |_3 | event_date |
        | 1 | 1 | 1 | 2016-03-25 |
        | 2 | 2 | 2 | 2016-03-25 |
        | 3 | 3 | 3 | 2016-03-25 |


distData_1.write.mode("append").partitionBy("event_date").saveAsTable("part_table")


        scala > sqlContext.sql("select * from part_table").show
        | a    | b    | c    | event_date |
        |1,1,1 | null | null | 2016-03-25 |
        |2,2,2 | null | null | 2016-03-25 |
        |3,3,3 | null | null | 2016-03-25 |



Hive table

        create external table part_table (a String, b int, c bigint)
        partitioned by (event_date Date)
        row format delimited fields terminated by ','
        stored as textfile  LOCATION "/user/hdfs/hive/part_table";

        select * from part_table shows
        |part_table.a | part_table.b | part_table.c | part_table.event_date
|
        |1                 |1                 |1
 |2016-03-25
        |2                 |2                 |2
 |2016-03-25
        |3                 |3                 |3
 |2016-03-25


Looking at the hdfs


        The path has 2 part files
/user/hdfs/hive/part_table/event_date=2016-03-25
        part-00000
        part-00001

          part-00000 content
            1,1,1
          part-00001 content
            2,2,2
            3,3,3


P.S. if we store the table as orc it writes and reads the data as expected.

HiveContext unable to recognize the delimiter of Hive table in textfile partitioned by date

Reply via email to