Hi all,
I'm currently researching if Sqoop is the right tool for our task
(basically setting up an RDBMS-to-RDMBS ETL system with Sqoop, HDFS,
Spark and Oozie). So far it looks very promising :-) now I wonder one
thing: Does the "sqoop import-all-tables" command use a single SQL
transaction to fetch all the tables from the database or is this not
done because of (I guess) performance reasons? As then it could not run
a parallel import? The potential problem I see is that there might be
changes on the database tables while reading the data. If Sqoop reads
the tables one-by-one without a transaction, it might get different
"states" of data, right (like for one table it gets what transaction t1
has committed and for the next table it gets what transaction t2 has
committed)? This is what worries me a bit in our case.
Regards
Frank