Re: spark architecture question -- Pleas Read

Sachin Naik Sat, 28 Jan 2017 09:08:53 -0800

I strongly agree with Jorn and Russell. There are different solutions for data 
movement depending upon your needs frequency, bi-directional drivers. workflow, 
handling duplicate records. This is a space is known as " Change Data Capture - 
CDC" for short. If you need more information, I would be happy to chat with 
you.  I built some products in this space that extensively used connection 
pooling over ODBC/JDBC.


Happy to chat if you need more information. 

-Sachin Naik

>>Hard to tell. Can you give more insights >>on what you try to achieve and 
>>what the data is about?
>>For example, depending on your use case sqoop can make sense or not.
Sent from my iPhone

> On Jan 27, 2017, at 11:22 PM, Russell Spitzer <russell.spit...@gmail.com> 
> wrote:
> 
> You can treat Oracle as a JDBC source 
> (http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases)
>  and skip Sqoop, HiveTables and go straight to Queries. Then you can skip 
> hive on the way back out (see the same link) and write directly to Oracle. 
> I'll leave the performance questions for someone else. 
> 
>> On Fri, Jan 27, 2017 at 11:06 PM Sirisha Cheruvu <siri8...@gmail.com> wrote:
>> 
>> On Sat, Jan 28, 2017 at 6:44 AM, Sirisha Cheruvu <siri8...@gmail.com> wrote:
>> Hi Team,
>> 
>> RIght now our existing flow is
>> 
>> Oracle-->Sqoop --> Hive--> Hive Queries on Spark-sql (Hive 
>> Context)-->Destination Hive table -->sqoop export to Oracle
>> 
>> Half of the Hive UDFS required is developed in Java UDF..
>> 
>> SO Now I want to know if I run the native scala UDF's than runninng hive 
>> java udfs in spark-sql will there be any performance difference
>> 
>> 
>> Can we skip the Sqoop Import and export part and 
>> 
>> Instead directly load data from oracle to spark and code Scala UDF's for 
>> transformations and export output data back to oracle?
>> 
>> RIght now the architecture we are using is
>> 
>> oracle-->Sqoop (Import)-->Hive Tables--> Hive Queries --> Spark-SQL--> Hive 
>> --> Oracle 
>> what would be optimal architecture to process data from oracle using spark 
>> ?? can i anyway better this process ?
>> 
>> 
>> 
>> 
>> Regards,
>> Sirisha 
>>

Re: spark architecture question -- Pleas Read

Reply via email to