Re: Python Spark for full fledged ETL

upkar . kohli Thu, 29 Jun 2017 19:26:54 -0700

Hi,

This is my line of thinking - Spark offers a variety of transformations which 
would support most of the use cases for replacing an ETL tool such as 
Informatica. ET part of ETL is perfectly covered. Loading may generally require 
more functionality though. Spinning up Informatica cluster which also has a 
master slave architecture would cost $$. I know pentaho and other such tools 
are there to support the use case. But, can we do the same with spark cluster.


Regards,
Upkar

Sent from my iPhone

> On 29-Jun-2017, at 22:06, Gourav Sengupta <gourav.sengu...@gmail.com> wrote:
> 
> SPARK + JDBC.
> 
> But Why?
> 
> Regards,
> Gourav Sengupta
> 
>> On Thu, Jun 29, 2017 at 3:44 PM, upkar_kohli <upkar.ko...@gmail.com> wrote:
>> Hi,
>> 
>> Has anyone tried mixing Spark with some of the other python jdbc/odbc 
>> packages to create an end to end ETL framework. Framwork would enable making 
>> update, delete and other DML operations along with Stored proc / function 
>> calls across variety of databases. Any setup that would be easy to use.
>> 
>> I know only know of few odbc python packages that are production ready and 
>> widely used, such as pyodbc or sqlAlchemy 
>> 
>> JayDeBeApi which can interface with JDBC is in Beta stage
>> 
>> Would it be a bad use case if this is attempted with foreachpartition 
>> through Spark? If not, what could be a good stack for such an implementation 
>> using python.
>> 
>> Regards,
>> Upkar
>

Re: Python Spark for full fledged ETL

Reply via email to