Re: Sqoop vs spark jdbc

Sean Owen Thu, 25 Aug 2016 00:31:35 -0700

Sqoop is probably the more mature tool for the job. It also just does
one thing. The argument for doing it in Spark would be wanting to
integrate it with a larger workflow. I imagine Sqoop would be more
efficient and flexible for just the task of ingest, including
continuously pulling deltas which I am not sure Spark really does for
you.


MapReduce won't matter here. The bottleneck is reading from the RDBMS
in general.

On Wed, Aug 24, 2016 at 11:07 PM, Mich Talebzadeh
<mich.talebza...@gmail.com> wrote:
> Personally I prefer Spark JDBC.
>
> Both Sqoop and Spark rely on the same drivers.
>
> I think Spark is faster and if you have many nodes you can partition your
> incoming data and take advantage of Spark DAG + in memory offering.
>
> By default Sqoop will use Map-reduce which is pretty slow.
>
> Remember for Spark you will need to have sufficient memory
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 24 August 2016 at 22:39, Venkata Penikalapati
> <mail.venkatakart...@gmail.com> wrote:
>>
>> Team,
>> Please help me in choosing sqoop or spark jdbc to fetch data from rdbms.
>> Sqoop has lot of optimizations to fetch data does spark jdbc also has those
>> ?
>>
>> I'm performing few analytics using spark data for which data is residing
>> in rdbms.
>>
>> Please guide me with this.
>>
>>
>> Thanks
>> Venkata Karthik P
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Sqoop vs spark jdbc

Reply via email to