Re: Is Spark suited for replacing a batch job using many database tables?

Jean Georges Perrin Wed, 06 Jul 2016 12:57:27 -0700

Right now, I am having "fun" with Spark and 26446249960843350 datapoints on my 
MacBook Air, but my small friend is suffering...


From my experience:
You will be able to do the job with Spark. You can try to load everything on a 
dev machine, no need to have a server, a workstation might be enough.
I would not recommend VM when you go to production, unless you have them. Bare 
metal seems more suitable.

It's definitely worth a shot!

> On Jul 6, 2016, at 3:39 PM, Andreas Bauer <dabuks...@gmail.com> wrote:
> 
> The sql statements are embedded in a PL/1 program using DB2 running ob z/OS. 
> Quite powerful, but expensive and foremost shared withother jobs in the 
> comapny. The whole job takes approx. 20 minutes. 
> 
> So I was thinking to use Spark and let the Spark job run on 10 or 20 virtual 
> instances, which I can spawn easily, on-demand and almost for free using a 
> cloud infrastructure. 
> 
> 
> 
> 
> On 6. Juli 2016 um 21:29:53 MESZ, Jean Georges Perrin <j...@jgp.net> wrote:
>> What are you doing it on right now?
>> 
>> > On Jul 6, 2016, at 3:25 PM, dabuki wrote:
>> > 
>> > I was thinking about to replace a legacy batch job with Spark, but I'm not
>> > sure if Spark is suited for this use case. Before I start the proof of
>> > concept, I wanted to ask for opinions.
>> > 
>> > The legacy job works as follows: A file (100k - 1 mio entries) is iterated.
>> > Every row contains a (book) order with an id and for each row approx. 15
>> > processing steps have to be performed that involve access to multiple
>> > database tables. In total approx. 25 tables (each containing 10k-700k
>> > entries) have to be scanned using the book's id and the retrieved data is
>> > joined together. 
>> > 
>> > As I'm new to Spark I'm not sure if I can leverage Spark's processing model
>> > for this use case.
>> > 
>> > 
>> > 
>> > 
>> > 
>> > --
>> > View this message in context: 
>> > http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-suited-for-replacing-a-batch-job-using-many-database-tables-tp27300.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> > 
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> > 
>>

Re: Is Spark suited for replacing a batch job using many database tables?

Reply via email to