Here is the (sanitized) command to create the saved Sqoop job in the Sqoop Metastore:
sqoop job \ --create import__myscope__mydb__mytable \ --meta-connect ... \ -- import \ --connect "jdbc:sqlserver://mydbserver:1433;databaseName=mydb;" \ --username ... \ --password-file ... \ --num-mappers 4 \ --target-dir ... \ --as-parquetfile \ --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec \ --relaxed-isolation \ --query "SELECT id, a, b, c, modified_datetime FROM mytable" \ --split-by id \ --merge-key id \ --incremental lastmodified \ --check-column modified_datetime \ --last-value "1900-01-01 00:00:00.000" Every night, Oozie would run a Sqoop action: sqoop job —exec import__myscope__mydb__mytable From: "Xu, Qian A" Reply-To: "[email protected]<mailto:[email protected]>" Date: Wednesday, May 13, 2015 at 10:40 AM To: "[email protected]<mailto:[email protected]>" Subject: RE: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files Could you please share the command you are using for incremental import (as Parquet)? From: Michael Arena [mailto:[email protected]] Sent: Tuesday, May 12, 2015 6:56 AM To: [email protected]<mailto:[email protected]> Subject: Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files Yes, incremental lastmodified. From: Abraham Elmahrek Reply-To: "[email protected]<mailto:[email protected]>" Date: Monday, May 11, 2015 at 4:47 PM To: "[email protected]<mailto:[email protected]>" Subject: Re: Scoop 1.4.5 on CDH 5.2.1 crashes when doing an incremental import to Parquet files Hey man, Is this Incremental lastmodified mode? -Abe On Fri, May 8, 2015 at 1:21 PM, Michael Arena <[email protected]<mailto:[email protected]>> wrote: Cloudera backported Parquet support to their version of Sqoop 1.4.5. Originally, I was doing Sqoop incremental imports from SQL Server to text files (TSV). This worked fine but the size and query speed of the textfiles are a problem. I then tried importing as Avro files but Sqoop prohibits Avro and incremental mode. I then tried importing as Parquet files. The initial import worked fine and loaded 69,071 rows. Then next time Sqoop ran, it pulled in the 1 changed row but then the "merge" step failed since it appears to think the files are text (not Parquet): 15/05/08 16:05:45 INFO mapreduce.Job: Job job_1431092783319_0252 completed successfully 15/05/08 16:05:45 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=144566 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=34006 HDFS: Number of bytes written=11556 HDFS: Number of read operations=32 HDFS: Number of large read operations=0 HDFS: Number of write operations=7 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=16564 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=8282 Total vcore-seconds taken by all map tasks=8282 Total megabyte-seconds taken by all map tasks=33923072 Map-Reduce Framework Map input records=1 Map output records=1 Input split bytes=119 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=82 CPU time spent (ms)=4570 Physical memory (bytes) snapshot=591728640 Virtual memory (bytes) snapshot=3873914880 Total committed heap usage (bytes)=1853882368 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Transferred 11.2852 KB in 31.1579 seconds (370.8854 bytes/sec) 15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Retrieved 1 records. 15/05/08 16:05:45 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 15/05/08 16:05:46 INFO client.RMProxy: Connecting to ResourceManager at ue1b-labA02/10.74.50.172:8032<http://10.74.50.172:8032> 15/05/08 16:05:46 INFO input.FileInputFormat: Total input paths to process : 5 15/05/08 16:05:46 INFO mapreduce.JobSubmitter: number of splits:5 15/05/08 16:05:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1431092783319_0255 15/05/08 16:05:47 INFO impl.YarnClientImpl: Submitted application application_1431092783319_0255 15/05/08 16:05:47 INFO mapreduce.Job: The url to track the job: http://ue1b-labA02:8088/proxy/application_1431092783319_0255/ 15/05/08 16:05:47 INFO mapreduce.Job: Running job: job_1431092783319_0255 15/05/08 16:06:01 INFO mapreduce.Job: Job job_1431092783319_0255 running in uber mode : false 15/05/08 16:06:01 INFO mapreduce.Job: map 0% reduce 0% 15/05/08 16:06:07 INFO mapreduce.Job: Task Id : attempt_1431092783319_0255_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: Can't parse input data: 'PAR1�� �� ' at QueryResult.__loadFromFields(QueryResult.java:1413) at QueryResult.parse(QueryResult.java:1221) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.NumberFormatException: For input string: "PAR1�� �� " at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.valueOf(Integer.java:582) at QueryResult.__loadFromFields(QueryResult.java:1270) ... 11 more How are you engaging with millennials at your organization? Earn “Lifetime Loyalty with Effective Millennial Engagement” by signing up for our next webinar. Join us Tuesday, May 12 at 1:00 EDT to obtain the tools you need to earn brand loyalty from this important demographic. Click here <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to register! How are you engaging with millennials at your organization? Earn “Lifetime Loyalty with Effective Millennial Engagement” by signing up for our next webinar. Join us Tuesday, May 12 at 1:00 EDT to obtain the tools you need to earn brand loyalty from this important demographic. Click here <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to register!
