Sqoop2 delegates Parquet support to Kite. Each file format might be a different code path in Kite.
On Fri, Aug 14, 2015 at 6:52 PM, Lee S <[email protected]> wrote: > Hi Abe: > I'll try to add hive deps in the codes and rebuild sqoop2. > But I have set the file format to FileFormat.CSV, why it comes about the > parquet. > > On Sat, Aug 15, 2015 at 1:00 AM, Abraham Elmahrek <[email protected]> > wrote: > >> Hey man, >> >> It looks like certain Hive jars are missing from the job for some reason. >> Seems like we need to add more jars to >> https://github.com/apache/sqoop/blob/sqoop2/connector/connector-kite/src/main/java/org/apache/sqoop/connector/kite/KiteFromInitializer.java#L71 >> and >> https://github.com/apache/sqoop/blob/sqoop2/connector/connector-kite/src/main/java/org/apache/sqoop/connector/kite/KiteToInitializer.java#L78 >> . >> >> I've created https://issues.apache.org/jira/browse/SQOOP-2489 to track >> this bug. I've also created >> https://issues.apache.org/jira/browse/SQOOP-2490 to provide a facility >> to work around these kinds of issues in the future. >> >> Sqoop2 is a work in progress and still needs some battle testing. With >> that in mind, can you use the Avro integration instead? Otherwise, you >> might need to rebuild Sqoop2 with the fix to get this working at the >> moment. Again, https://issues.apache.org/jira/browse/SQOOP-2490 will >> change all of that. >> >> -Abe >> >> On Fri, Aug 14, 2015 at 1:10 AM, Lee S <[email protected]> wrote: >> >>> Hi all: >>> I try to import from rdms to hive with kite connector in sqoop-shell. >>> And I submit the job successfully , but when I try to track the job >>> staus >>> on yarn web-ui, I found there are errors in container log as below. >>> And the job continues running ,never stops. >>> The log shows *java.lang.NoClassDefFoundError: >>> org/apache/hadoop/hive/ql/io/HiveOutputFormat.* >>> >>> *Anybody can help?* >>> >>> *Showing 4096 bytes. Click here >>> <http://pdm-03:8042/node/containerlogs/container_1439241066552_0044_01_000002/root/stderr/?start=0> >>> for >>> full log* >>> >>> *utFormatLoadExecutor - SqoopOutputFormatLoadExecutor consumer thread is >>> starting >>> 2015-08-14 15:39:10,765 [OutputFormatLoader-consumer] INFO >>> org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor - Running loader >>> class org.apache.sqoop.connector.kite.KiteLoader >>> 2015-08-14 15:39:10,771 [main] INFO org.apache.sqoop.job.mr.SqoopMapper - >>> Starting progress service >>> 2015-08-14 15:39:10,772 [main] INFO org.apache.sqoop.job.mr.SqoopMapper - >>> Running extractor class org.apache.sqoop.connector.jdbc.GenericJdbcExtractor >>> 2015-08-14 15:39:10,981 [OutputFormatLoader-consumer] INFO >>> org.apache.sqoop.connector.kite.KiteLoader - Constructed temporary dataset >>> URI: >>> dataset:hive:wangjun/temp_5bfec97713e04374b2f2efde2dc5e4de?auth:host=pdm-03&auth:port=9083 >>> 2015-08-14 15:39:11,093 [main] INFO >>> org.apache.sqoop.connector.jdbc.GenericJdbcExtractor - Using query: SELECT >>> id FROM bcpdm.history WHERE 1368 <= id AND id <= 1546 >>> 2015-08-14 15:39:11,537 [OutputFormatLoader-consumer] ERROR >>> org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor - Error while >>> loading data out of MR job. >>> java.lang.NoClassDefFoundError: >>> org/apache/hadoop/hive/ql/io/HiveOutputFormat >>> at java.lang.ClassLoader.defineClass1(Native Method) >>> at java.lang.ClassLoader.defineClass(ClassLoader.java:800) >>> at >>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) >>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) >>> at java.net.URLClassLoader.access$100(URLClassLoader.java:71) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:190) >>> at >>> org.kitesdk.data.spi.hive.HiveUtils.getHiveParquetOutputFormat(HiveUtils.java:446) >>> at org.kitesdk.data.spi.hive.HiveUtils.<clinit>(HiveUtils.java:91) >>> at >>> org.kitesdk.data.spi.hive.HiveManagedMetadataProvider.create(HiveManagedMetadataProvider.java:83) >>> at >>> org.kitesdk.data.spi.hive.HiveManagedDatasetRepository.create(HiveManagedDatasetRepository.java:77) >>> at org.kitesdk.data.Datasets.create(Datasets.java:239) >>> at org.kitesdk.data.Datasets.create(Datasets.java:307) >>> at org.kitesdk.data.Datasets.create(Datasets.java:335) >>> at >>> org.apache.sqoop.connector.kite.KiteDatasetExecutor.createDataset(KiteDatasetExecutor.java:70) >>> at >>> org.apache.sqoop.connector.kite.KiteLoader.getExecutor(KiteLoader.java:52) >>> at org.apache.sqoop.connector.kite.KiteLoader.load(KiteLoader.java:62) >>> at org.apache.sqoop.connector.kite.KiteLoader.load(KiteLoader.java:36) >>> at >>> org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$ConsumerThread.run(SqoopOutputFormatLoadExecutor.java:250) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.ql.io.HiveOutputFormat >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> ... 31 more >>> 2015-08-14 15:39:11,540 [main] INFO org.apache.sqoop.job.mr.SqoopMapper - >>> Stopping progress service >>> 2015-08-14 15:39:11,540 [main] INFO >>> org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor - >>> SqoopOutputFormatLoadExecutor::SqoopRecordWriter is about to be closed* >>> >>> >>> p.s. sqoop version: 1.99.6 >>> hadoop version 2.6.0 >>> >> >> >
