Hey man, Is this Incremental lastmodified mode?
-Abe On Fri, May 8, 2015 at 1:21 PM, Michael Arena <[email protected]> wrote: > Cloudera backported Parquet support to their version of Sqoop 1.4.5. > > Originally, I was doing Sqoop incremental imports from SQL Server to > text files (TSV). > This worked fine but the size and query speed of the textfiles are a > problem. > > I then tried importing as Avro files but Sqoop prohibits Avro and > incremental mode. > > I then tried importing as Parquet files. > > The initial import worked fine and loaded 69,071 rows. > Then next time Sqoop ran, it pulled in the 1 changed row but then the > "merge" step failed since it appears to think the files are text (not > Parquet): > > 15/05/08 16:05:45 INFO mapreduce.Job: Job job_1431092783319_0252 > completed successfully > 15/05/08 16:05:45 INFO mapreduce.Job: Counters: 30 > File System Counters > FILE: Number of bytes read=0 > FILE: Number of bytes written=144566 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=34006 > HDFS: Number of bytes written=11556 > HDFS: Number of read operations=32 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=7 > Job Counters > Launched map tasks=1 > Other local map tasks=1 > Total time spent by all maps in occupied slots (ms)=16564 > Total time spent by all reduces in occupied slots (ms)=0 > Total time spent by all map tasks (ms)=8282 > Total vcore-seconds taken by all map tasks=8282 > Total megabyte-seconds taken by all map tasks=33923072 > Map-Reduce Framework > Map input records=1 > Map output records=1 > Input split bytes=119 > Spilled Records=0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=82 > CPU time spent (ms)=4570 > Physical memory (bytes) snapshot=591728640 > Virtual memory (bytes) snapshot=3873914880 > Total committed heap usage (bytes)=1853882368 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Bytes Written=0 > 15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Transferred 11.2852 KB in > 31.1579 seconds (370.8854 bytes/sec) > 15/05/08 16:05:45 INFO mapreduce.ImportJobBase: Retrieved 1 records. > 15/05/08 16:05:45 INFO Configuration.deprecation: mapred.output.key.class > is deprecated. Instead, use mapreduce.job.output.key.class > 15/05/08 16:05:46 INFO client.RMProxy: Connecting to ResourceManager at > ue1b-labA02/10.74.50.172:8032 > 15/05/08 16:05:46 INFO input.FileInputFormat: Total input paths to process > : 5 > 15/05/08 16:05:46 INFO mapreduce.JobSubmitter: number of splits:5 > 15/05/08 16:05:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1431092783319_0255 > 15/05/08 16:05:47 INFO impl.YarnClientImpl: Submitted application > application_1431092783319_0255 > 15/05/08 16:05:47 INFO mapreduce.Job: The url to track the job: > http://ue1b-labA02:8088/proxy/application_1431092783319_0255/ > 15/05/08 16:05:47 INFO mapreduce.Job: Running job: job_1431092783319_0255 > 15/05/08 16:06:01 INFO mapreduce.Job: Job job_1431092783319_0255 running > in uber mode : false > 15/05/08 16:06:01 INFO mapreduce.Job: map 0% reduce 0% > 15/05/08 16:06:07 INFO mapreduce.Job: Task Id : > attempt_1431092783319_0255_m_000000_0, Status : FAILED > Error: java.lang.RuntimeException: Can't parse input data: 'PAR1�� > �� > ' > at QueryResult.__loadFromFields(QueryResult.java:1413) > at QueryResult.parse(QueryResult.java:1221) > at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53) > at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: java.lang.NumberFormatException: For input string: "PAR1�� > �� > " > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:492) > at java.lang.Integer.valueOf(Integer.java:582) > at QueryResult.__loadFromFields(QueryResult.java:1270) > ... 11 more > > > > How are you engaging with millennials at your organization? Earn “Lifetime > Loyalty with Effective Millennial Engagement” by signing up for our next > webinar. Join us *Tuesday, May 12 at 1:00 EDT *to obtain the tools you > need to earn brand loyalty from this important demographic. Click here > <http://content.paytronix.com/Lifetime-Loyalty_0515_sig.html> to > register! >
