Hello Community Users, I am able to resolve the issue . The issue was input data format ,By default Excel writes the data in 2001/01/09 whereas Spark Sql takes 2001-01-09 format.
Here is the sample code below SQL context available as sqlContext. scala> import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.HiveContext scala> import org.apache.spark.sql.hive.orc._ import org.apache.spark.sql.hive.orc._ scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 15/12/29 04:29:39 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 15/12/29 04:29:39 INFO HiveContext: Initializing execution hive, version 0.13.1 hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@7312f6d8 scala> import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType,FloatType ,LongType ,TimestampType ,DateType }; import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType, FloatType, LongType, TimestampType, DateType} scala> val customSchema = StructType(Seq(StructField("year", DateType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true))) customSchema: org.apache.spark.sql.types.StructType = StructType(StructField(year,DateType,true), StructField(make,StringType,true), StructField(model,StringType,true), StructField(comment,StringType,true), StructField(blank,StringType,true)) scala> val df = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").schema(customSchema).load("/tmp/TestDivya/carsdate.csv") 15/12/29 04:30:27 INFO HiveContext: Initializing HiveMetastoreConnection version 0.13.1 using Spark classes. df: org.apache.spark.sql.DataFrame = [year: date, make: string, model: string, comment: string, blank: string] scala> df.printSchema() root |-- year: date (nullable = true) |-- make: string (nullable = true) |-- model: string (nullable = true) |-- comment: string (nullable = true) |-- blank: string (nullable = true) scala> val selectedData = df.select("year", "model") selectedData: org.apache.spark.sql.DataFrame = [year: date, model: string] scala> selectedData.show() 15/12/29 04:31:20 INFO MemoryStore: ensureFreeSpace(216384) called with curMem=0, maxMem=278302556 15/12/29 04:31:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 211.3 KB, free 265.2 MB) 15/12/29 04:31:24 INFO YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool 15/12/29 04:31:24 INFO DAGScheduler: ResultStage 2 (show at <console>:35) finished in 0.051 s 15/12/29 04:31:24 INFO DAGScheduler: Job 2 finished: show at <console>:35, took 0.063356 s +----------+-----+ | year|model| +----------+-----+ |2001-01-01| S| |2010-12-10| | |2009-01-11| E350| |2008-01-01| Volt| +----------+-----+ On 30 December 2015 at 00:42, Annabel Melongo <melongo_anna...@yahoo.com> wrote: > Divya, > > From reading the post, it appears that you resolved this issue. Great job! > > I would recommend putting the solution here as well so that it helps > another developer down the line. > > Thanks > > > On Monday, December 28, 2015 8:56 PM, Divya Gehlot < > divya.htco...@gmail.com> wrote: > > > Hi, > Link to schema issue > <https://community.hortonworks.com/questions/8124/returns-empty-result-set-when-using-timestamptype.html> > Please let me know if have issue in viewing the above link > > On 28 December 2015 at 23:00, Annabel Melongo <melongo_anna...@yahoo.com> > wrote: > > Divya, > > Why don't you share how you create the dataframe using the schema as > stated in 1) > > > On Monday, December 28, 2015 4:42 AM, Divya Gehlot < > divya.htco...@gmail.com> wrote: > > > Hi, > I have input data set which is CSV file where I have date columns. > My output will also be CSV file and will using this output CSV file as > for hive table creation. > I have few queries : > 1.I tried using custom schema using Timestamp but it is returning empty > result set when querying the dataframes. > 2.Can I use String datatype in Spark for date column and while creating > table can define it as date type ? Partitioning of my hive table will be > date column. > > Would really appreciate if you share some sample code for timestamp in > Dataframe whereas same can be used while creating the hive table. > > > > Thanks, > Divya > > > > > >