Re: Python Spark for full fledged ETL

upkar . kohli Thu, 29 Jun 2017 19:49:06 -0700

Hi,

One more thing - i am talking about spark in cluster mode without hadoop.


Regards,
Upkar

Sent from my iPhone

> On 30-Jun-2017, at 07:55, upkar.ko...@gmail.com wrote:
> 
> Hi,
> 
> This is my line of thinking - Spark offers a variety of transformations which 
> would support most of the use cases for replacing an ETL tool such as 
> Informatica. ET part of ETL is perfectly covered. Loading may generally 
> require more functionality though. Spinning up Informatica cluster which also 
> has a master slave architecture would cost $$. I know pentaho and other such 
> tools are there to support the use case. But, can we do the same with spark 
> cluster. 
> 
> Regards,
> Upkar
> 
> Sent from my iPhone
> 
>> On 29-Jun-2017, at 22:06, Gourav Sengupta <gourav.sengu...@gmail.com> wrote:
>> 
>> SPARK + JDBC.
>> 
>> But Why?
>> 
>> Regards,
>> Gourav Sengupta
>> 
>>> On Thu, Jun 29, 2017 at 3:44 PM, upkar_kohli <upkar.ko...@gmail.com> wrote:
>>> Hi,
>>> 
>>> Has anyone tried mixing Spark with some of the other python jdbc/odbc 
>>> packages to create an end to end ETL framework. Framwork would enable 
>>> making update, delete and other DML operations along with Stored proc / 
>>> function calls across variety of databases. Any setup that would be easy to 
>>> use.
>>> 
>>> I know only know of few odbc python packages that are production ready and 
>>> widely used, such as pyodbc or sqlAlchemy 
>>> 
>>> JayDeBeApi which can interface with JDBC is in Beta stage
>>> 
>>> Would it be a bad use case if this is attempted with foreachpartition 
>>> through Spark? If not, what could be a good stack for such an 
>>> implementation using python.
>>> 
>>> Regards,
>>> Upkar
>>

Re: Python Spark for full fledged ETL

Reply via email to