Right now, I am having "fun" with Spark and 26446249960843350 datapoints on my MacBook Air, but my small friend is suffering...
From my experience: You will be able to do the job with Spark. You can try to load everything on a dev machine, no need to have a server, a workstation might be enough. I would not recommend VM when you go to production, unless you have them. Bare metal seems more suitable. It's definitely worth a shot! > On Jul 6, 2016, at 3:39 PM, Andreas Bauer <dabuks...@gmail.com> wrote: > > The sql statements are embedded in a PL/1 program using DB2 running ob z/OS. > Quite powerful, but expensive and foremost shared withother jobs in the > comapny. The whole job takes approx. 20 minutes. > > So I was thinking to use Spark and let the Spark job run on 10 or 20 virtual > instances, which I can spawn easily, on-demand and almost for free using a > cloud infrastructure. > > > > > On 6. Juli 2016 um 21:29:53 MESZ, Jean Georges Perrin <j...@jgp.net> wrote: >> What are you doing it on right now? >> >> > On Jul 6, 2016, at 3:25 PM, dabuki wrote: >> > >> > I was thinking about to replace a legacy batch job with Spark, but I'm not >> > sure if Spark is suited for this use case. Before I start the proof of >> > concept, I wanted to ask for opinions. >> > >> > The legacy job works as follows: A file (100k - 1 mio entries) is iterated. >> > Every row contains a (book) order with an id and for each row approx. 15 >> > processing steps have to be performed that involve access to multiple >> > database tables. In total approx. 25 tables (each containing 10k-700k >> > entries) have to be scanned using the book's id and the retrieved data is >> > joined together. >> > >> > As I'm new to Spark I'm not sure if I can leverage Spark's processing model >> > for this use case. >> > >> > >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-suited-for-replacing-a-batch-job-using-many-database-tables-tp27300.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > >>