[jira] [Created] (SPARK-21763) InferSchema option does not infer the correct schema (timestamp) from xlsx file.

ANSHUMAN (JIRA) Thu, 17 Aug 2017 09:50:24 -0700

ANSHUMAN created SPARK-21763:
--------------------------------

             Summary: InferSchema option does not infer the correct schema 
(timestamp) from xlsx file.
                 Key: SPARK-21763
                 URL: https://issues.apache.org/jira/browse/SPARK-21763
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
         Environment: Environment is my personal laptop.
            Reporter: ANSHUMAN
            Priority: Minor



I have a xlsx file containing date/time filed (My Time) in following format and 
sample records - 
5/16/2017  12:19:00 AM
5/16/2017  12:56:00 AM
5/16/2017  1:17:00 PM
5/16/2017  5:26:00 PM
5/16/2017  6:26:00 PM

I am reading the xlsx file in following manner: -

{code:java}
val inputDF = spark.sqlContext.read.format("com.crealytics.spark.excel")
    .option("location","file:///C:/Users/file.xlsx")
    .option("useHeader","true")
    .option("treatEmptyValuesAsNulls","true")
    .option("inferSchema","true")
    .option("addColorColumns","false")
    .load()
{code}

When I try to get schema using 
{code:java}
inputDF.printSchema()
{code}
, I get *Double*.
Sometimes, even I get the schema as *String*.

And when I print the data, I get the output as: -
+------------------+
|       My Time|
+------------------+
|42871.014189814814|
| 42871.03973379629|
|42871.553773148145|
| 42871.72765046296|
| 42871.76887731482|
+------------------+

Above output is clearly not correct for the given input.

Moreover, if I convert the xlsx file in csv format and read it, I get the 
output correctly. Here is the way how I read in csv format: - 

{code:java}
spark.sqlContext.read.format("csv")
      .option("header", "true")
      .option("inferSchema", true)
      .load(fileLocation)
{code}

Please look into the issue. I could not find the answer to it anywhere.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21763) InferSchema option does not infer the correct schema (timestamp) from xlsx file.

Reply via email to