I get TD's recommendation of sharing a connection among tasks. Now, is there a
good way to determine when to close connections?
Gino B.
On Jul 17, 2014, at 7:05 PM, Yan Fang yanfang...@gmail.com wrote:
Hi Sean,
Thank you. I see your point. What I was thinking is that, do computation in
Thats, a good question. My first reach is timeout. Timing out after 10s of
seconds should be sufficient. So there should be a timer in the singleton
that runs a check every second, on when the singleton was last used, and
closes the connections after a time out. Any attempts to use the connection
Actually, let me clarify further. There are number of possibilities.
1. The easier, less efficient way is to create a connection object every
time you do foreachPartition (as shown in the pseudocode earlier in the
thread). For each partition, you create a connection, use it to push a all
the
Hi guys,
need some help in this problem. In our use case, we need to continuously
insert values into the database. So our approach is to create the jdbc
object in the main method and then do the inserting operation in the
DStream foreachRDD operation. Is this approach reasonable?
Then the
Could you share some code (or pseudo-code)?
Sounds like you're instantiating the JDBC connection in the driver,
and using it inside a closure that would be run in a remote executor.
That means that the connection object would need to be serializable.
If that sounds like what you're doing, it
And if Marcelo's guess is correct, then the right way to do this would be
to lazily / dynamically create the jdbc connection server as a singleton
in the workers/executors and use that. Something like this.
dstream.foreachRDD(rdd = {
rdd.foreachPartition((iterator: Iterator[...]) = {
Hi Marcelo and TD,
Thank you for the help. If I use TD's approache, it works and there is no
exception. Only drawback is that it will create many connections to the DB,
which I was trying to avoid.
Here is a snapshot of my code. Mark as red for the important code. What I
was thinking is that, if
Hi Sean,
Thank you. I see your point. What I was thinking is that, do computation in
a distributed fashion and do the storing from a single place. But you are
right, having multiple DB connections actually is fine.
Thanks for answering my questions. That helps me understand the system.
Cheers,