Hi all,
I'm currently researching if Sqoop is the right tool for our task (basically setting up an RDBMS-to-RDMBS ETL system with Sqoop, HDFS, Spark and Oozie). So far it looks very promising :-) now I wonder one thing: Does the "sqoop import-all-tables" command use a single SQL transaction to fetch all the tables from the database or is this not done because of (I guess) performance reasons? As then it could not run a parallel import? The potential problem I see is that there might be changes on the database tables while reading the data. If Sqoop reads the tables one-by-one without a transaction, it might get different "states" of data, right (like for one table it gets what transaction t1 has committed and for the next table it gets what transaction t2 has committed)? This is what worries me a bit in our case.

Regards

Frank

Reply via email to