Re: JDBC sessionInitStatement for writes?

2021-11-25 Thread trsell
Sorry I somehow missed the "Scope" column in the docs, which explicitly states its for reads only. I don't suppose anyone knows of some other method I can submit SET statements for write sessions? On Fri, Nov 26, 2021 at 12:51 PM wrote: > Hello, > > Regarding JDBC sinks, the docs state: >

JDBC sessionInitStatement for writes?

2021-11-25 Thread trsell
Hello, Regarding JDBC sinks, the docs state: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html sessionInitStatement: After each database session is opened to the remote DB and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block). Use this to

[Spark Core]: Does Spark support group scheduling techniques like Drizzle?

2021-11-25 Thread Bowen Yu
Hi all, Spark's 100ms+ stage-launching overhead limits its applicability in low-latency stream processing and deep learning. The Drizzle paper published in SOSP '17 seems to solve this problem well by submitting a group of stages together to amortize the stage-launching overhead. It is also used

Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

2021-11-25 Thread JHI Star
Thanks, I'll have a closer look at GKE and compare it with what some other sites running similar to use have used (Openstack). Well, no, I don't envisage any public cloud integration. There is no plan to use Hive just PySpark using HDFS ! On Wed, Nov 24, 2021 at 10:31 AM Mich Talebzadeh wrote:

Re: Spark salesforce connector

2021-11-25 Thread daniel queiroz
Hi, https://github.com/springml/spark-salesforce.git I've customized this code that I've found, maybe it can help you. import java.io._ import com.typesafe.config.{ Config, ConfigFactory } import scalaj.http.{Http, HttpOptions, HttpResponse} import org.apache.spark.sql.{Dataset, SparkSession}