I think it creates a new connection on each worker, whenever the Processor
references Resource, it got initialized.
There's no need for the driver connect to the db in this case.

On Thu, Jun 29, 2017 at 5:52 PM, salvador <sot.b...@gmail.com> wrote:

> Hi all,
> I am writing a spark job from which at some point I want to send some
> metrics to InfluxDB. Here is some sample code of how I am doing it at the
> moment.
> I have a Resources object class which contains all the details for the db
> connection:
> object Resources { def forceInit: () => Unit = () => ()
>   val influxHost: String = Config.influxHost.getOrElse("localhost")
>   val influxUdpPort: Int = Config.influxUdpPort.getOrElse(30089)
>   val influxDB = new MetricsClient(influxHost, influxUdpPort, "spark")
> }
> This is how my code on the driver looks like:
> object ProcessStuff extends App {
>   val spark = SparkSession .builder() .config(sparkConfig) .getOrCreate()
>   val df = spark .read .parquet(Config.input)
>   Resources.forceInit
>   val annotatedSentences = df.rdd
>     .map {
>       case (Row(a: String, b: String)) => Processor.process(a,b)
>     }
>     .cache()
> }
> I am sending all the metrics I want from the process() method which uses
> the
> client I initialised on the driver code. Currently this works and I am able
> to send millions of data point. I was just wandering how it works
> internally. Does it share the db connection or creates a new connection
> every time?
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Understanding-how-spark-share-
> db-connections-created-on-driver-tp28806.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to