[ https://issues.apache.org/jira/browse/SQOOP-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandish Kumar HN reassigned SQOOP-3147: --------------------------------------- Assignee: Sandish Kumar HN > Import data to Hive Table in S3 in Parquet format > ------------------------------------------------- > > Key: SQOOP-3147 > URL: https://issues.apache.org/jira/browse/SQOOP-3147 > Project: Sqoop > Issue Type: Bug > Affects Versions: 1.4.6 > Reporter: Ahmed Kamal > Assignee: Sandish Kumar HN > > Using this command succeeds only if the Hive Table's location is HDFS. If the > table is backed by S3 it throws an exception while trying to move the data > from HDFS tmp directory to S3 > Job job_1486539699686_3090 failed with state FAILED due to: Job commit > failed: org.kitesdk.data.DatasetIOException: Dataset merge failed > at > org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:333) > at > org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:56) > at > org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.commitJob(DatasetKeyOutputFormat.java:370) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Dataset merge failed during rename of > hdfs://hdfs-path/tmp/dev_kamal/.temp/job_1486539699686_3090/mr/job_1486539699686_3090/0192f987-bd4c-4cb7-836f-562ac483e008.parquet > to > s3://bucket_name/dev_kamal/address/0192f987-bd4c-4cb7-836f-562ac483e008.parquet > at > org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:329) > ... 7 more > sqoop import --connect "jdbc:mysql://connectionUrl" --table "tableName" > --as-parquetfile --verbose --username=uname --password=pass --hive-import > --delete-target-dir --hive-database dev_kamal --hive-table tableName > --hive-overwrite -m 150 > Another issue that I noticed is that Sqoop loads the Avro schema in > TBLProperties under avro.schema.literal attribute and if the table has a lot > of columns, the schema would be truncated and this would cause a weird > exception like this one. > *Exception :* > 17/03/07 12:13:13 INFO hive.metastore: Trying to connect to metastore with > URI thrift://url:9083 > 17/03/07 12:13:13 INFO hive.metastore: Opened a connection to metastore, > current connections: 1 > 17/03/07 12:13:13 INFO hive.metastore: Connected to metastore. > 17/03/07 12:13:17 DEBUG util.ClassLoaderStack: Restoring classloader: > sun.misc.Launcher$AppClassLoader@3e9b1010 > 17/03/07 12:13:17 ERROR sqoop.Sqoop: Got exception running Sqoop: > org.apache.avro.SchemaParseException: > org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was > expecting closing quote for a string value > at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001] > org.apache.avro.SchemaParseException: > org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was > expecting closing quote for a string value > at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001] > at org.apache.avro.Schema$Parser.parse(Schema.java:929) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at > org.kitesdk.data.DatasetDescriptor$Builder.schemaLiteral(DatasetDescriptor.java:475) > at > org.kitesdk.data.spi.hive.HiveUtils.descriptorForTable(HiveUtils.java:154) > at > org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:104) > at > org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:192) > at org.kitesdk.data.Datasets.load(Datasets.java:108) > at org.kitesdk.data.Datasets.load(Datasets.java:165) > at org.kitesdk.data.Datasets.load(Datasets.java:187) > at > org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:78) > at > org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108) > at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) > at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) > at > org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) > at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) > at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) > at org.apache.sqoop.Sqoop.run(Sqoop.java:143) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) > at org.apache.sqoop.Sqoop.main(Sqoop.java:236) > Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: > was expecting closing quote for a string value > at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at > org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:454) > at > org.codehaus.jackson.impl.ReaderBasedParser._finishString2(ReaderBasedParser.java:1342) > at > org.codehaus.jackson.impl.ReaderBasedParser._finishString(ReaderBasedParser.java:1330) > at > org.codehaus.jackson.impl.ReaderBasedParser.getText(ReaderBasedParser.java:200) > at > org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:203) > at > org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:224) > at > org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:200) > at > org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58) > at > org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15) > at > org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2704) > at > org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1344) > at org.apache.avro.Schema$Parser.parse(Schema.java:927) > ... 21 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)