RE: Sqoop Import parallel sessions - Question

Sethuramaswamy, Suresh Thu, 14 Aug 2014 09:36:06 -0700

Thanks for the suggestions Gwen Shapira.



-----Original Message-----
From: Gwen Shapira [mailto:[email protected]] 
Sent: Thursday, August 14, 2014 12:24 PM
To: [email protected]
Subject: Re: Sqoop Import parallel sessions - Question

Sqoop needs to write to a directory that doesn't exist yet. Since both
your jobs try to write a single directory, one will complain that the
directory exists.

You can use --warehouse-dir  or --target-dir parameters to make sure
each job writes to its own directory.
Or, you can use --partition-key and --partition value parameters to
import the data into separate Hive partitions (makes sense from table
design perspective too)

On Thu, Aug 14, 2014 at 9:12 AM, Sethuramaswamy, Suresh
<[email protected]> wrote:
> Sure.
>
> This is my command. When I run 2 commands in parallel , I get the exception 
> as mentioned below.
>
> sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table 
> <Table_name>   --where "date between '01-JAN-2013' and '30-JAN-2013'" -m 1 
> --hive-import  --hive-table <hive tablename>  --compression-codec 
> org.apache.hadoop.io.compress.SnappyCodec --null-string '\\N' 
> --null-non-string '\\N' --hive-drop-import-delims;
>
> ...
> ...
>
> ..
>
> sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table 
> <Table_name>   --where "date between '01-DEC-2013' and '31-DEC-2013'" -m 1 
> --hive-import  --hive-table <hive tablename>  --compression-codec 
> org.apache.hadoop.io.compress.SnappyCodec --null-string '\\N' 
> --null-non-string '\\N' --hive-drop-import-delims;
>
>
>
> Exception:
>
>
> 14/08/14 12:04:57 ERROR tool.ImportTool: Encountered IOException running 
> import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output 
> directory <SCHEMA>.<TABLENAME> already exists
>         at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:987)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612)
>         at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
>         at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
>         at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
>         at 
> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:614)
>         at 
> org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:436)
>         at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
>         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
>         at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
>
>
> -----Original Message-----
> From: Jarek Jarcec Cecho [mailto:[email protected]] On Behalf Of Jarek Jarcec 
> Cecho
> Sent: Thursday, August 14, 2014 11:41 AM
> To: [email protected]
> Subject: Re: Sqoop Import parallel sessions - Question
>
> It would be helpful if you could share your entire Sqoop commands and the 
> exact exception with it's stack trace.
>
> Jarcec
>
> On Aug 14, 2014, at 7:57 AM, Sethuramaswamy, Suresh 
> <[email protected]> wrote:
>
>> Team,
>>
>> We had to initiate Sqoop import for a month old records in a session, 
>> similarly I need to initiate 12 such statements in parallel in order to read 
>> 1 year worth of data, while I do this,
>>
>> I keep getting the error <SCHEMA>.<TABLENAME> folder already exists.  This 
>> is because of all these sessions being initiated with same uid and the 
>> mapred temporary hdfs folder under the user's home directory until it 
>> completes.
>>
>> Is there a better option for me to accomplish .?
>>
>>
>> Thanks
>> Suresh Sethuramaswamy
>>
>>
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic 
>> communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>> ==============================================================================
>
>
>
> ===============================================================================
> Please access the attached hyperlink for an important electronic 
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ===============================================================================
>


=============================================================================== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================

RE: Sqoop Import parallel sessions - Question

Reply via email to