Re: Timestamp datatype in dataframe + Spark 1.4.1

Divya Gehlot Tue, 29 Dec 2015 18:21:35 -0800

Hello Community Users,
I am able to resolve the issue .
The issue was input data format ,By default Excel writes the data in
2001/01/09 whereas Spark Sql takes 2001-01-09 format.


Here is the sample code below


SQL context available as sqlContext.

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

scala> import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql.hive.orc._

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
15/12/29 04:29:39 WARN SparkConf: The configuration key
'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark
1.3 and and may be removed in the future. Please use the new key
'spark.yarn.am.waitTime' instead.
15/12/29 04:29:39 INFO HiveContext: Initializing execution hive, version
0.13.1
hiveContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@7312f6d8

scala> import org.apache.spark.sql.types.{StructType, StructField,
StringType, IntegerType,FloatType ,LongType ,TimestampType ,DateType };
import org.apache.spark.sql.types.{StructType, StructField, StringType,
IntegerType, FloatType, LongType, TimestampType, DateType}

scala> val customSchema = StructType(Seq(StructField("year", DateType,
true),StructField("make", StringType, true),StructField("model",
StringType, true),StructField("comment", StringType,
true),StructField("blank", StringType, true)))
customSchema: org.apache.spark.sql.types.StructType =
StructType(StructField(year,DateType,true),
StructField(make,StringType,true), StructField(model,StringType,true),
StructField(comment,StringType,true), StructField(blank,StringType,true))

scala> val df =
hiveContext.read.format("com.databricks.spark.csv").option("header",
"true").schema(customSchema).load("/tmp/TestDivya/carsdate.csv")
15/12/29 04:30:27 INFO HiveContext: Initializing HiveMetastoreConnection
version 0.13.1 using Spark classes.
df: org.apache.spark.sql.DataFrame = [year: date, make: string, model:
string, comment: string, blank: string]

scala> df.printSchema()
root
 |-- year: date (nullable = true)
 |-- make: string (nullable = true)
 |-- model: string (nullable = true)
 |-- comment: string (nullable = true)
 |-- blank: string (nullable = true)


scala> val selectedData = df.select("year", "model")
selectedData: org.apache.spark.sql.DataFrame = [year: date, model: string]

scala> selectedData.show()
15/12/29 04:31:20 INFO MemoryStore: ensureFreeSpace(216384) called with
curMem=0, maxMem=278302556
15/12/29 04:31:20 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 211.3 KB, free 265.2 MB)

15/12/29 04:31:24 INFO YarnScheduler: Removed TaskSet 2.0, whose tasks have
all completed, from pool
15/12/29 04:31:24 INFO DAGScheduler: ResultStage 2 (show at <console>:35)
finished in 0.051 s
15/12/29 04:31:24 INFO DAGScheduler: Job 2 finished: show at <console>:35,
took 0.063356 s
+----------+-----+
|      year|model|
+----------+-----+
|2001-01-01|    S|
|2010-12-10|     |
|2009-01-11| E350|
|2008-01-01| Volt|
+----------+-----+

On 30 December 2015 at 00:42, Annabel Melongo <melongo_anna...@yahoo.com>
wrote:

> Divya,
>
> From reading the post, it appears that you resolved this issue. Great job!
>
> I would recommend putting the solution here as well so that it helps
> another developer down the line.
>
> Thanks
>
>
> On Monday, December 28, 2015 8:56 PM, Divya Gehlot <
> divya.htco...@gmail.com> wrote:
>
>
> Hi,
> Link to schema issue
> <https://community.hortonworks.com/questions/8124/returns-empty-result-set-when-using-timestamptype.html>
> Please let me know if have issue in viewing the above link
>
> On 28 December 2015 at 23:00, Annabel Melongo <melongo_anna...@yahoo.com>
> wrote:
>
> Divya,
>
> Why don't you share how you create the dataframe using the schema as
> stated in 1)
>
>
> On Monday, December 28, 2015 4:42 AM, Divya Gehlot <
> divya.htco...@gmail.com> wrote:
>
>
> Hi,
> I have input data set which is CSV file where I have date columns.
> My output will also be CSV file and will using this output CSV  file as
> for hive table creation.
> I have few queries :
> 1.I tried using custom schema using Timestamp but it is returning empty
> result set when querying the dataframes.
> 2.Can I use String datatype in Spark for date column and while creating
> table can define it as date type ? Partitioning of my hive table will be
> date column.
>
> Would really  appreciate if you share some sample code for timestamp in
> Dataframe whereas same can be used while creating the hive table.
>
>
>
> Thanks,
> Divya
>
>
>
>
>
>

Re: Timestamp datatype in dataframe + Spark 1.4.1

Reply via email to