Actually, let me clarify further. There are number of possibilities.
1. The easier, less efficient way is to create a connection object every
time you do foreachPartition (as shown in the pseudocode earlier in the
thread). For each partition, you create a connection, use it to push a all
the recor
Thats, a good question. My first reach is timeout. Timing out after 10s of
seconds should be sufficient. So there should be a timer in the singleton
that runs a check every second, on when the singleton was last used, and
closes the connections after a time out. Any attempts to use the connection
a
I get TD's recommendation of sharing a connection among tasks. Now, is there a
good way to determine when to close connections?
Gino B.
> On Jul 17, 2014, at 7:05 PM, Yan Fang wrote:
>
> Hi Sean,
>
> Thank you. I see your point. What I was thinking is that, do computation in a
> distributed
Hi Sean,
Thank you. I see your point. What I was thinking is that, do computation in
a distributed fashion and do the storing from a single place. But you are
right, having multiple DB connections actually is fine.
Thanks for answering my questions. That helps me understand the system.
Cheers,
On Thu, Jul 17, 2014 at 10:39 PM, Yan Fang wrote:
> Thank you for the help. If I use TD's approache, it works and there is no
> exception. Only drawback is that it will create many connections to the DB,
> which I was trying to avoid.
Connection-like objects aren't data that can be serialized. Wh
Hi Marcelo and TD,
Thank you for the help. If I use TD's approache, it works and there is no
exception. Only drawback is that it will create many connections to the DB,
which I was trying to avoid.
Here is a snapshot of my code. Mark as red for the important code. What I
was thinking is that, if
And if Marcelo's guess is correct, then the right way to do this would be
to lazily / dynamically create the jdbc connection server as a singleton
in the workers/executors and use that. Something like this.
dstream.foreachRDD(rdd => {
rdd.foreachPartition((iterator: Iterator[...]) => {
Could you share some code (or pseudo-code)?
Sounds like you're instantiating the JDBC connection in the driver,
and using it inside a closure that would be run in a remote executor.
That means that the connection object would need to be serializable.
If that sounds like what you're doing, it won't
Hi guys,
need some help in this problem. In our use case, we need to continuously
insert values into the database. So our approach is to create the jdbc
object in the main method and then do the inserting operation in the
DStream foreachRDD operation. Is this approach reasonable?
Then the problem