Vandana Yadav created CARBONDATA-2397:
-----------------------------------------

             Summary: Error while fetching data from a table with complex data 
type(array_of_struct)
                 Key: CARBONDATA-2397
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2397
             Project: CarbonData
          Issue Type: Bug
          Components: data-query
    Affects Versions: 1.4.0
         Environment: spark 2.1,spark 2.2
            Reporter: Vandana Yadav
         Attachments: arrayofstruct.csv

Error while fetching data from a table with the complex data 
type(array_of_struct)



Steps to reproduce:

Create Table :

create table ARRAY_OF_STRUCT_com (CUST_ID string, YEAR int, MONTH int, AGE int, 
GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_OF_STRUCT 
array<struct<ID:int,COUNTRY:string,STATE:string,CITI:string,CHECK_DATE:timestamp>>,CARD_COUNT
 int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
STORED BY 'org.apache.carbondata.format'

Load Data into the table:

LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/complex/arrayofstruct.csv' INTO table 
ARRAY_OF_STRUCT_com options ('DELIMITER'=',', 'QUOTECHAR'='"', 
'FILEHEADER'='CUST_ID,YEAR,MONTH,AGE,GENDER,EDUCATED,IS_MARRIED,ARRAY_OF_STRUCT,CARD_COUNT,DEBIT_COUNT,CREDIT_COUNT,DEPOSIT,HQ_DEPOSIT','COMPLEX_DELIMITER_LEVEL_1'='$','COMPLEX_DELIMITER_LEVEL_2'='&')

 

Execute Query:

select * from array_of_struct_com;

Expected Result: it should display all the data from the table.

Actual Result:

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 
(TID 2, localhost, executor driver): java.lang.ClassCastException: 
java.lang.Integer cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
 at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getArray(rows.scala:48)
 at 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:194)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
 at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

Driver stacktrace: (state=,code=0)

Error log:

18/04/25 11:13:58 INFO SparkExecuteStatementOperation: Running query 'select * 
from array_of_struct_com' with 4a55f43f-96c7-46c0-9a71-cc66e5dfa641
18/04/25 11:13:58 INFO CarbonSparkSqlParser: Parsing command: select * from 
array_of_struct_com
18/04/25 11:13:58 INFO HiveMetaStore: 7: get_table : db=bug 
tbl=array_of_struct_com
18/04/25 11:13:58 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_table : 
db=bug tbl=array_of_struct_com 
18/04/25 11:13:58 INFO HiveMetaStore: 7: Opening raw store with implemenation 
class:org.apache.hadoop.hive.metastore.ObjectStore
18/04/25 11:13:58 INFO ObjectStore: ObjectStore, initialize called
18/04/25 11:13:58 INFO Query: Reading in results for query 
"org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is 
closing
18/04/25 11:13:58 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
DERBY
18/04/25 11:13:58 INFO ObjectStore: Initialized ObjectStore
18/04/25 11:13:58 INFO CatalystSqlParser: Parsing command: array<string>
18/04/25 11:13:58 INFO CarbonLRUCache: pool-23-thread-6 Removed entry from 
InMemory lru cache :: 
hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_0/0_batchno0-0-1524572335034.carbonindex
18/04/25 11:13:58 INFO CarbonLRUCache: pool-23-thread-6 Removed entry from 
InMemory lru cache :: 
hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_1/0_batchno0-0-1524575558281.carbonindex
18/04/25 11:13:58 INFO HiveMetaStore: 7: get_table : db=bug 
tbl=array_of_struct_com
18/04/25 11:13:58 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_table : 
db=bug tbl=array_of_struct_com 
18/04/25 11:13:58 INFO CatalystSqlParser: Parsing command: array<string>
18/04/25 11:13:58 INFO HiveMetaStore: 7: get_database: bug
18/04/25 11:13:58 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_database: 
bug 
18/04/25 11:13:58 INFO HiveMetaStore: 7: get_database: bug
18/04/25 11:13:58 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_database: 
bug 
18/04/25 11:13:58 INFO HiveMetaStore: 7: get_tables: db=bug pat=*
18/04/25 11:13:58 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_tables: 
db=bug pat=* 
18/04/25 11:13:58 INFO TableInfo: pool-23-thread-6 Table block size not 
specified for bug_array_of_struct_com. Therefore considering the default value 
1024 MB
18/04/25 11:13:58 INFO CarbonLateDecodeRule: pool-23-thread-6 skip 
CarbonOptimizer
18/04/25 11:13:58 INFO CarbonLateDecodeRule: pool-23-thread-6 Skip 
CarbonOptimizer
18/04/25 11:13:58 INFO CodeGenerator: Code generated in 63.208107 ms
18/04/25 11:13:58 INFO TableInfo: pool-23-thread-6 Table block size not 
specified for bug_array_of_struct_com. Therefore considering the default value 
1024 MB
18/04/25 11:13:58 INFO BlockletDataMap: pool-23-thread-6 Time taken to load 
blocklet datamap from file : 
hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_0/0_batchno0-0-1524572335034.carbonindexis
 2
18/04/25 11:13:58 INFO BlockletDataMap: pool-23-thread-6 Time taken to load 
blocklet datamap from file : 
hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_1/0_batchno0-0-1524575558281.carbonindexis
 3
18/04/25 11:13:58 INFO CarbonScanRDD: 
 Identified no.of.blocks: 2,
 no.of.tasks: 2,
 no.of.nodes: 0,
 parallelism: 4
 
18/04/25 11:13:58 INFO SparkContext: Starting job: run at 
AccessController.java:0
18/04/25 11:13:58 INFO DAGScheduler: Got job 2 (run at AccessController.java:0) 
with 2 output partitions
18/04/25 11:13:58 INFO DAGScheduler: Final stage: ResultStage 2 (run at 
AccessController.java:0)
18/04/25 11:13:58 INFO DAGScheduler: Parents of final stage: List()
18/04/25 11:13:58 INFO DAGScheduler: Missing parents: List()
18/04/25 11:13:58 INFO DAGScheduler: Submitting ResultStage 2 
(MapPartitionsRDD[8] at run at AccessController.java:0), which has no missing 
parents
18/04/25 11:13:58 INFO MemoryStore: Block broadcast_2 stored as values in 
memory (estimated size 35.7 KB, free 366.2 MB)
18/04/25 11:13:58 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in 
memory (estimated size 27.7 KB, free 366.2 MB)
18/04/25 11:13:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 
192.168.2.102:40679 (size: 27.7 KB, free: 366.2 MB)
18/04/25 11:13:58 INFO SparkContext: Created broadcast 2 from broadcast at 
DAGScheduler.scala:1006
18/04/25 11:13:58 INFO DAGScheduler: Submitting 2 missing tasks from 
ResultStage 2 (MapPartitionsRDD[8] at run at AccessController.java:0) (first 15 
tasks are for partitions Vector(0, 1))
18/04/25 11:13:58 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
18/04/25 11:13:58 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 3, 
localhost, executor driver, partition 0, ANY, 6524 bytes)
18/04/25 11:13:58 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 4, 
localhost, executor driver, partition 1, ANY, 6534 bytes)
18/04/25 11:13:58 INFO Executor: Running task 0.0 in stage 2.0 (TID 3)
18/04/25 11:13:58 INFO Executor: Running task 1.0 in stage 2.0 (TID 4)
18/04/25 11:13:58 INFO TableInfo: Executor task launch worker for task 3 Table 
block size not specified for bug_array_of_struct_com. Therefore considering the 
default value 1024 MB
18/04/25 11:13:58 INFO TableInfo: Executor task launch worker for task 4 Table 
block size not specified for bug_array_of_struct_com. Therefore considering the 
default value 1024 MB
18/04/25 11:13:58 INFO AbstractQueryExecutor: [Executor task launch worker for 
task 3][partitionID:com;queryID:18146683300601] Query will be executed on 
table: array_of_struct_com
18/04/25 11:13:58 INFO AbstractQueryExecutor: [Executor task launch worker for 
task 4][partitionID:com;queryID:18146683300601] Query will be executed on 
table: array_of_struct_com
18/04/25 11:13:58 INFO ResultCollectorFactory: [Executor task launch worker for 
task 3][partitionID:com;queryID:18146683300601] Row based dictionary collector 
is used to scan and collect the data
18/04/25 11:13:58 INFO ResultCollectorFactory: [Executor task launch worker for 
task 4][partitionID:com;queryID:18146683300601] Restructure based dictionary 
collector is used to scan and collect the data
18/04/25 11:13:58 INFO UnsafeMemoryManager: [Executor task launch worker for 
task 4][partitionID:com;queryID:18146683300601] Total memory used after task 
18146962815012 is 6581 Current tasks running now are : [17522539140626, 
17607895858118, 18146866243005]
18/04/25 11:13:58 ERROR Executor: Exception in task 1.0 in stage 2.0 (TID 4)
java.lang.ClassCastException: java.lang.Integer cannot be cast to 
org.apache.spark.sql.catalyst.util.ArrayData
 at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getArray(rows.scala:48)
 at 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:194)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
 at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to