Re: Spark Streaming, updateStateByKey and mapPartitions() - and lazy "DatabaseConnection"

Cody Koeninger Fri, 12 Jun 2015 08:27:07 -0700

There are several database apis that use a thread local or singleton
reference to a connection pool (we use ScalikeJDBC currently, but there are
others).


You can use mapPartitions earlier in the chain to make sure the connection
pool is set up on that executor, then use it inside updateStateByKey

On Fri, Jun 12, 2015 at 10:07 AM, algermissen1971 <
algermissen1...@icloud.com> wrote:

> Hi,
>
> I have a scenario with spark streaming, where I need to write to a
> database from within updateStateByKey[1].
>
> That means that inside my update function I need a connection.
>
> I have so far understood that I should create a new (lazy) connection for
> every partition. But since I am not working in foreachRDD I wonder where I
> can iterate over the partitions.
>
> Should I use mapPartitions() somewhere up the chain?
>
> Jan
>
>
>
> [1] The use case being saving ‘done' sessions during web tracking.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Spark Streaming, updateStateByKey and mapPartitions() - and lazy "DatabaseConnection"

Reply via email to