[ https://issues.apache.org/jira/browse/DRILL-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Khurram Faraaz updated DRILL-6994: ---------------------------------- Component/s: Execution - Data Types > TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill > --------------------------------------------------------------------------- > > Key: DRILL-6994 > URL: https://issues.apache.org/jira/browse/DRILL-6994 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Affects Versions: 1.14.0 > Reporter: Khurram Faraaz > Priority: Major > > A timestamp type column in a parquet file created from Spark is treated as > VARBINARY by Drill 1.14.0., Trying to cast DOB column to DATE results in an > Exception, although the monthOfYear field is in the allowed range. > Data used in the test > {noformat} > [test@md123 spark_data]# cat inferSchema_example.csv > Name,Department,years_of_experience,DOB > Sam,Software,5,1990-10-10 > Alex,Data Analytics,3,1992-10-10 > {noformat} > Create the parquet file using the above CSV file > {noformat} > [test@md123 bin]# ./spark-shell > 19/01/22 21:21:34 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Spark context Web UI available at http://md123.qa.lab:4040 > Spark context available as 'sc' (master = local[*], app id = > local-1548192099796). > Spark session available as 'spark'. > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.3.1-mapr-SNAPSHOT > /_/ > Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_191) > Type in expressions to have them evaluated. > Type :help for more information. > scala> import org.apache.spark.sql.\{DataFrame, SQLContext} > import org.apache.spark.sql.\{DataFrame, SQLContext} > scala> import org.apache.spark.\{SparkConf, SparkContext} > import org.apache.spark.\{SparkConf, SparkContext} > scala> val sqlContext: SQLContext = new SQLContext(sc) > warning: there was one deprecation warning; re-run with -deprecation for > details > sqlContext: org.apache.spark.sql.SQLContext = > org.apache.spark.sql.SQLContext@2e0163cb > scala> val df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load("/apps/inferSchema_example.csv") > df: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 2 > more fields] > scala> df.printSchema > test > |-- Name: string (nullable = true) > |-- Department: string (nullable = true) > |-- years_of_experience: integer (nullable = true) > |-- DOB: timestamp (nullable = true) > scala> df.write.parquet("/apps/infer_schema_example.parquet") > // Read the parquet file > scala> val data = > sqlContext.read.parquet("/apps/infer_schema_example.parquet") > data: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... > 2 more fields] > // Print the schema of the parquet file from Spark > scala> data.printSchema > test > |-- Name: string (nullable = true) > |-- Department: string (nullable = true) > |-- years_of_experience: integer (nullable = true) > |-- DOB: timestamp (nullable = true) > // Display the contents of parquet file on spark-shell > // register temp table and do a show on all records,to display. > scala> data.registerTempTable("employee") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> val allrecords = sqlContext.sql("SELeCT * FROM employee") > allrecords: org.apache.spark.sql.DataFrame = [Name: string, Department: > string ... 2 more fields] > scala> allrecords.show() > +----+--------------+-------------------+-------------------+ > |Name| Department|years_of_experience| DOB| > +----+--------------+-------------------+-------------------+ > | Sam| Software| 5|1990-10-10 00:00:00| > |Alex|Data Analytics| 3|1992-10-10 00:00:00| > +----+--------------+-------------------+-------------------+ > {noformat} > Querying the parquet file from Drill 1.14.0-mapr, results in the DOB column > (timestamp type in Spark) being treated as VARBINARY. > {noformat} > apache drill 1.14.0-mapr > "a little sql for your nosql" > 0: jdbc:drill:schema=dfs.tmp> select * from > dfs.`/apps/infer_schema_example.parquet`; > +-------+-----------------+----------------------+--------------+ > | Name | Department | years_of_experience | DOB | > +-------+-----------------+----------------------+--------------+ > | Sam | Software | 5 | [B@2bef51f2 | > | Alex | Data Analytics | 3 | [B@650eab8 | > +-------+-----------------+----------------------+--------------+ > 2 rows selected (0.229 seconds) > // typeof(DOB) column returns a VARBINARY type, whereas the parquet schema in > Spark for DOB: timestamp (nullable = true) > 0: jdbc:drill:schema=dfs.tmp> select typeof(DOB) from > dfs.`/apps/infer_schema_example.parquet`; > +------------+ > | EXPR$0 | > +------------+ > | VARBINARY | > | VARBINARY | > +------------+ > 2 rows selected (0.199 seconds) > {noformat} > // CAST to DATE type results in Exception, though the monthOfYear is in the > range [1,12] > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select cast(DOB as DATE) from > dfs.`/apps/infer_schema_example.parquet`; > Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must > be in the range [1,12] > Fragment 0:0 > [Error Id: 536c67d8-77c4-4b36-8aec-743344141d31 on md123.qa.lab:31010] > (state=,code=0) > {noformat} > Stack trace from drillbit.log > {noformat} > 2019-01-22 22:13:27,334 [23b86a78-64fc-5873-87b5-7e95d9740e51:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > IllegalFieldValueException: Value 0 for monthOfYear must be in the range > [1,12] > Fragment 0:0 > [Error Id: 536c67d8-77c4-4b36-8aec-743344141d31 on md123.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalFieldValueException: Value 0 for monthOfYear must be in the range > [1,12] > Fragment 0:0 > [Error Id: 536c67d8-77c4-4b36-8aec-743344141d31 on md123.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) > ~[drill-common-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) > [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) > [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) > [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.14.0-mapr.jar:1.14.0-mapr] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_181] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_181] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181] > Caused by: org.joda.time.IllegalFieldValueException: Value 0 for monthOfYear > must be in the range [1,12] > at org.joda.time.field.FieldUtils.verifyValueBounds(FieldUtils.java:252) > ~[drill-hive-exec-shaded-1.14.0-mapr.jar:1.14.0-mapr] > at > org.joda.time.chrono.BasicChronology.getDateMidnightMillis(BasicChronology.java:612) > ~[drill-hive-exec-shaded-1.14.0-mapr.jar:1.14.0-mapr] > at > org.joda.time.chrono.BasicChronology.getDateTimeMillis(BasicChronology.java:159) > ~[drill-hive-exec-shaded-1.14.0-mapr.jar:1.14.0-mapr] > at > org.joda.time.chrono.AssembledChronology.getDateTimeMillis(AssembledChronology.java:120) > ~[drill-hive-exec-shaded-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate(StringFunctionHelpers.java:210) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.test.generated.ProjectorGen977.doEval(ProjectorTemplate.java:41) > ~[na:na] > at > org.apache.drill.exec.test.generated.ProjectorGen977.projectRecords(ProjectorTemplate.java:67) > ~[na:na] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:231) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:117) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.physical.impl.BasetestExec.next(BasetestExec.java:103) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.physical.impl.ScreenCreator$Screentest.innerNext(ScreenCreator.java:83) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.physical.impl.BasetestExec.next(BasetestExec.java:93) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:294) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:281) > ~[drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_181] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_181] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > ~[hadoop-common-2.7.0-mapr-1808.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:281) > [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr] > ... 4 common frames omitted > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)