subject:"Spark Streaming\: No parallelism in writing to database \(MySQL\)"

Spark Streaming: No parallelism in writing to database (MySQL)

2014-09-25 Thread maddenpj

I posted yesterday about a related issue but resolved it shortly after. I'm using Spark Streaming to summarize event data from Kafka and save it to a MySQL table. Currently the bottleneck is in writing to MySQL and I'm puzzled as to how to speed it up. I've tried repartitioning with several

Re: Spark Streaming: No parallelism in writing to database (MySQL)

2014-09-25 Thread maddenpj

Update for posterity, so once again I solved the problem shortly after posting to the mailing list. So updateStateByKey uses the default partitioner, which in my case seemed like it was set to one. Changing my call from .updateStateByKey[Long](updateFn) - .updateStateByKey[Long](updateFn,

Re: Spark Streaming: No parallelism in writing to database (MySQL)

2014-09-25 Thread Buntu Dev

Thanks for the update.. I'm interested in writing the results to MySQL as well, can you share some light or code sample on how you setup the driver/connection pool/etc.? On Thu, Sep 25, 2014 at 4:00 PM, maddenpj madde...@gmail.com wrote: Update for posterity, so once again I solved the problem

Re: Spark Streaming: No parallelism in writing to database (MySQL)

2014-09-25 Thread maddenpj

Yup it's all in the gist: https://gist.github.com/maddenpj/5032c76aeb330371a6e6 Lines 6-9 deal with setting up the driver specifically. This sets the driver up on each partition which keeps the connection pool around per record. -- View this message in context: