Thanks Jarcec for clearing that up! Regards, Nitin
On Mon, Jan 25, 2016 at 8:21 PM, Jarek Jarcec Cecho <[email protected]> wrote: > Hi Nitin, > here is my stab at answering the question: > > > • Does sqoop perform a clean up of the already imported/exported > data? > > Import happens to temporary directory, if the job wont’ finish all > partially imported data will get dropped. On export side we have a lot of > smaller transactions so you will get partial export in case of failure. > However we have option to export with staging table that is designed to > deal with this partial export issue. I would suggest to take a look into > our user guide [1]. > > > • Does sqoop automatically restart the job in the case of network > failure? > > There are multiple levels of parallelism and re-tries. If one task fails, > Hadoop will re-run it by default 3 times before killing the whole job > itself. We’re not restarting the whole job as we’re assuming that if 3 > re-tries didn’t help, there is no point with retrying it again. > > Jarcec > > Links: > 1: > http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_export_literal > > > > On Jan 24, 2016, at 10:30 PM, Nitin Kumar <[email protected]> > wrote: > > > > > > I am using apache sqoop 1.4.6 (distributed with HortonWorks HDP 2.3 > package) to import and export data between rdbms systems and hdfs. I have > to deploy this in a production environment and was wondering about the > network resilience of sqoop. > > Say I'm done with about 90% of the import/export job and there is a > network failure between the rdbms system and my hadoop cluster. Since sqoop > internally executes a map/reduce job for this I'm guessing the job will > fail completely and require a manual restart. In this regard I have the > following questions > > > > • Does sqoop perform a clean up of the already imported/exported > data? > > • Does sqoop automatically restart the job in the case of network > failure? > > • If a manual clean up and restart is required, what other > technology alongside sqoop do people generally use to achieve network > resilience? > > • Is there a different version of sqoop that offers this feature? > > Your answers and suggestions would highly appreciated. > > > > Thanks! > > > >
