I posted yesterday about a related issue but resolved it shortly after. I'm
using Spark Streaming to summarize event data from Kafka and save it to a
MySQL table. Currently the bottleneck is in writing to MySQL and I'm puzzled
as to how to speed it up. I've tried repartitioning with several
Update for posterity, so once again I solved the problem shortly after
posting to the mailing list. So updateStateByKey uses the default
partitioner, which in my case seemed like it was set to one.
Changing my call from .updateStateByKey[Long](updateFn) -
.updateStateByKey[Long](updateFn,
Thanks for the update.. I'm interested in writing the results to MySQL as
well, can you share some light or code sample on how you setup the
driver/connection pool/etc.?
On Thu, Sep 25, 2014 at 4:00 PM, maddenpj madde...@gmail.com wrote:
Update for posterity, so once again I solved the problem
Yup it's all in the gist:
https://gist.github.com/maddenpj/5032c76aeb330371a6e6
Lines 6-9 deal with setting up the driver specifically. This sets the driver
up on each partition which keeps the connection pool around per record.
--
View this message in context: