Sqoop is probably the more mature tool for the job. It also just does one thing. The argument for doing it in Spark would be wanting to integrate it with a larger workflow. I imagine Sqoop would be more efficient and flexible for just the task of ingest, including continuously pulling deltas which I am not sure Spark really does for you.
MapReduce won't matter here. The bottleneck is reading from the RDBMS in general. On Wed, Aug 24, 2016 at 11:07 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Personally I prefer Spark JDBC. > > Both Sqoop and Spark rely on the same drivers. > > I think Spark is faster and if you have many nodes you can partition your > incoming data and take advantage of Spark DAG + in memory offering. > > By default Sqoop will use Map-reduce which is pretty slow. > > Remember for Spark you will need to have sufficient memory > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > Disclaimer: Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > > > On 24 August 2016 at 22:39, Venkata Penikalapati > <mail.venkatakart...@gmail.com> wrote: >> >> Team, >> Please help me in choosing sqoop or spark jdbc to fetch data from rdbms. >> Sqoop has lot of optimizations to fetch data does spark jdbc also has those >> ? >> >> I'm performing few analytics using spark data for which data is residing >> in rdbms. >> >> Please guide me with this. >> >> >> Thanks >> Venkata Karthik P >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org