Hi Manas,

The approach is correct, with one caveat: You may have several tasks
executing in parallel in one executor. Having one single connection per JVM
will either fail, if the connection is not thread-safe or become a
bottleneck b/c all task will be competing for the same resource.
The best approach would be to extend your current idea with a pool of
connections, where you can 'borrow'  a connection and return it after use.

-kr, Gerard.


On Thu, Mar 24, 2016 at 2:00 PM, Manas <manasjyo...@hotmail.com> wrote:

> I understand that using foreachPartition I can create one DB connection per
> partition level. Is there a way to create a DB connection per executor
> level
> and share that for all partitions/tasks run within that executor? One
> approach I am thinking is to have a singleton with say a getConnection
> method. The connection object is not created in the driver rather it passes
> to the the singleton object the DB connection detail (host, port, user,
> password etc). In the foreachPartition this singleton object is passed too.
> The getConnection method of the singleton creates the actual connection
> object only the first time it's called and returns the same connection
> instance for all later invocations. I believe that way each executor JVM
> will have one instance of the singleton/connection and thus all
> partitions/tasks running within that executor would share the same
> connection. I'd like to validate this approach with the spark experts. Does
> it have any inherent flaw or is there a better way to create one instance
> of
> an object per executor?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Create-one-DB-connection-per-executor-tp26588.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to