Cody, On 12 Jun 2015, at 17:26, Cody Koeninger <c...@koeninger.org> wrote:
> There are several database apis that use a thread local or singleton > reference to a connection pool (we use ScalikeJDBC currently, but there are > others). > > You can use mapPartitions earlier in the chain to make sure the connection > pool is set up on that executor, then use it inside updateStateByKey > Thanks. You are saying I should just make an arbitrary use of the ‘connection’ to invoke the ‘lazy’. E.g. like this: object SomeDB { lazy val conn = new SomeDB( “some serializable config") } Then somewhere else: theTrackingEvents.map(toPairs).mapPartitions(iter => iter.map( pair => { SomeDb.conn.init pair } )).updateStateByKey[Session](myUpdateFunction _) An in myUpdateFunction def myUpdateFunction( …) { SomeDb.conn.store( … ) } Correct? Jan > On Fri, Jun 12, 2015 at 10:07 AM, algermissen1971 > <algermissen1...@icloud.com> wrote: > Hi, > > I have a scenario with spark streaming, where I need to write to a database > from within updateStateByKey[1]. > > That means that inside my update function I need a connection. > > I have so far understood that I should create a new (lazy) connection for > every partition. But since I am not working in foreachRDD I wonder where I > can iterate over the partitions. > > Should I use mapPartitions() somewhere up the chain? > > Jan > > > > [1] The use case being saving ‘done' sessions during web tracking. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org