The sql statements are embedded in a PL/1 program using DB2 running ob z/OS. 
Quite powerful, but expensive and foremost shared withother jobs in the 
comapny. The whole job takes approx. 20 minutes.  So I was thinking to use 
Spark and let the Spark job run on 10 or 20 virtual instances, which I can 
spawn easily, on-demand and almost for free using a cloud infrastructure.   



What are you doing it on right now?



> On Jul 6, 2016, at 3:25 PM, dabuki <dabuks...@gmail.com> wrote:

>  
> I was thinking about to replace a legacy batch job with Spark, but I'm not

> sure if Spark is suited for this use case. Before I start the proof of

> concept, I wanted to ask for opinions.

>  
> The legacy job works as follows: A file (100k - 1 mio entries) is iterated.

> Every row contains a (book) order with an id and for each row approx. 15

> processing steps have to be performed that involve access to multiple

> database tables. In total approx. 25 tables (each containing 10k-700k

> entries) have to be scanned using the book's id and the retrieved data is

> joined together.  
>  
> As I'm new to Spark I'm not sure if I can leverage Spark's processing model

> for this use case.

>  
>  
>  
>  
>  
> --

> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-suited-for-replacing-a-batch-job-using-many-database-tables-tp27300.html

> Sent from the Apache Spark User List mailing list archive at Nabble.com.

>  
> ---------------------------------------------------------------------

> To unsubscribe e-mail: user-unsubscr...@spark.apache.org

>  


Reply via email to