ANSHUMAN created SPARK-21763: -------------------------------- Summary: InferSchema option does not infer the correct schema (timestamp) from xlsx file. Key: SPARK-21763 URL: https://issues.apache.org/jira/browse/SPARK-21763 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Environment: Environment is my personal laptop. Reporter: ANSHUMAN Priority: Minor
I have a xlsx file containing date/time filed (My Time) in following format and sample records - 5/16/2017 12:19:00 AM 5/16/2017 12:56:00 AM 5/16/2017 1:17:00 PM 5/16/2017 5:26:00 PM 5/16/2017 6:26:00 PM I am reading the xlsx file in following manner: - {code:java} val inputDF = spark.sqlContext.read.format("com.crealytics.spark.excel") .option("location","file:///C:/Users/file.xlsx") .option("useHeader","true") .option("treatEmptyValuesAsNulls","true") .option("inferSchema","true") .option("addColorColumns","false") .load() {code} When I try to get schema using {code:java} inputDF.printSchema() {code} , I get *Double*. Sometimes, even I get the schema as *String*. And when I print the data, I get the output as: - +------------------+ | My Time| +------------------+ |42871.014189814814| | 42871.03973379629| |42871.553773148145| | 42871.72765046296| | 42871.76887731482| +------------------+ Above output is clearly not correct for the given input. Moreover, if I convert the xlsx file in csv format and read it, I get the output correctly. Here is the way how I read in csv format: - {code:java} spark.sqlContext.read.format("csv") .option("header", "true") .option("inferSchema", true) .load(fileLocation) {code} Please look into the issue. I could not find the answer to it anywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org