[ 
https://issues.apache.org/jira/browse/SQOOP-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liz Szilagyi reassigned SQOOP-3088:
-----------------------------------

    Assignee:     (was: Liz Szilagyi)

> Sqoop export with Parquet data failure does not contain the MapTask error
> -------------------------------------------------------------------------
>
>                 Key: SQOOP-3088
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3088
>             Project: Sqoop
>          Issue Type: Bug
>          Components: tools
>            Reporter: Markus Kemper
>            Priority: Major
>
> *Test Case*
> {noformat}
> #################
> # STEP 01 - Setup Table and Data
> #################
> export MYCONN=jdbc:oracle:thin:@oracle.cloudera.com:1521/orcl12c;
> export MYUSER=sqoop
> export MYPSWD=cloudera
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1_oracle"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1_oracle (c1 int, c2 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1_oracle values (1, 'data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1_oracle"
> Output:
> -------------------------------------
> | C1                   | C2         | 
> -------------------------------------
> | 1                    | data       | 
> -------------------------------------
> #################
> # STEP 02 - Import Data as Parquet
> #################
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1_ORACLE --target-dir /user/user1/t1_oracle_parquet --delete-target-dir 
> --num-mappers 1 --as-parquetfile 
> Output:
> 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Transferred 1.624 KB in 
> 50.1693 seconds (33.1478 bytes/sec)
> 16/12/21 07:11:47 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> #################
> # STEP 03 - Verify Parquet Data
> #################
> hdfs dfs -ls /user/user1/t1_oracle_parquet/*.parquet
> parquet-tools schema -d 
> hdfs://namenode.cloudera.com/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet
> Output:
> -rw-r--r--   3 user1 user1        597 2016-12-21 07:11 
> /user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet
> ---
> message T1_ORACLE {
>   optional binary C1 (UTF8);
>   optional binary C2 (UTF8);
> }
> creator: parquet-mr version 1.5.0-cdh5.8.3 (build ${buildNumber})
> extra: parquet.avro.schema = {"type":"record","name":"T1_ORACLE","doc":"Sqoop 
> import of 
> T1_ORACLE","fields":[{"name":"C1","type":["null","string"],"default":null,"columnName":"C1","sqlType":"2"},{"name":"C2","type":["null","string"],"default":null,"columnName":"C2","sqlType":"12"}],"tableName":"T1_ORACLE"}
> file schema: T1_ORACLE
> ------------------------------------------------------------------------------------------------------------------------
> C1: OPTIONAL BINARY O:UTF8 R:0 D:1
> C2: OPTIONAL BINARY O:UTF8 R:0 D:1
> row group 1: RC:1 TS:85
> ------------------------------------------------------------------------------------------------------------------------
> C1:  BINARY SNAPPY DO:0 FPO:4 SZ:40/38/0.95 VC:1 ENC:PLAIN,RLE,BIT_PACKED
> C2:  BINARY SNAPPY DO:0 FPO:44 SZ:49/47/0.96 VC:1 ENC:PLAIN,RLE,BIT_PACKED
> #################
> # STEP 04 - Export Parquet Data
> #################
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table 
> T1_ORACLE --export-dir /user/user1/t1_oracle_parquet --num-mappers 1 --verbose
> Output:
> [sqoop debug]
> 16/12/21 07:15:06 INFO mapreduce.Job:  map 0% reduce 0%
> 16/12/21 07:15:40 INFO mapreduce.Job:  map 100% reduce 0%
> 16/12/21 07:15:40 INFO mapreduce.Job: Job job_1481911879790_0026 failed with 
> state FAILED due to: Task failed task_1481911879790_0026_m_000000
> Job failed as tasks failed. failedMaps:1 failedReduces:0
> 16/12/21 07:15:40 INFO mapreduce.Job: Counters: 8
>       Job Counters 
>               Failed map tasks=1
>               Launched map tasks=1
>               Data-local map tasks=1
>               Total time spent by all maps in occupied slots (ms)=32125
>               Total time spent by all reduces in occupied slots (ms)=0
>               Total time spent by all map tasks (ms)=32125
>               Total vcore-seconds taken by all map tasks=32125
>               Total megabyte-seconds taken by all map tasks=32896000
> 16/12/21 07:15:40 WARN mapreduce.Counters: Group FileSystemCounters is 
> deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
> 16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 
> 46.8304 seconds (0 bytes/sec)
> 16/12/21 07:15:40 WARN mapreduce.Counters: Group 
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
> org.apache.hadoop.mapreduce.TaskCounter instead
> 16/12/21 07:15:40 INFO mapreduce.ExportJobBase: Exported 0 records.
> 16/12/21 07:15:40 DEBUG util.ClassLoaderStack: Restoring classloader: 
> java.net.FactoryURLClassLoader@577cfae6
> 16/12/21 07:15:40 ERROR tool.ExportTool: Error during export: Export job 
> failed!
> [yarn debug]
> 2016-12-21 07:15:38,911 DEBUG [Thread-11] 
> org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Committing transaction of 0 
> statements
> 2016-12-21 07:15:38,914 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : parquet.io.ParquetDecodingException: Can not read 
> value at 1 in block 0 in file 
> hdfs://nameservice1/user/user1/t1_oracle_parquet/a6ba3dda-b5fc-42d7-9555-5837a12a036b.parquet
>       at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:241)
>       at 
> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
>       at 
> org.kitesdk.data.spi.filesystem.AbstractCombineFileRecordReader.nextKeyValue(AbstractCombineFileRecordReader.java:68)
>       at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
>       at 
> org.kitesdk.data.spi.AbstractKeyRecordReaderWrapper.nextKeyValue(AbstractKeyRecordReaderWrapper.java:55)
>       at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
>       at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
>       at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at 
> org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ClassCastException: T1_ORACLE cannot be cast to 
> org.apache.avro.generic.IndexedRecord
>       at 
> parquet.avro.AvroIndexedRecordConverter.start(AvroIndexedRecordConverter.java:185)
>       at 
> parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:391)
>       at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:216)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to