Re: Best way to emit custom metrics to Prometheus in spark structured streaming

2020-11-02 Thread Jungtaek Lim
You can try out "Dataset.observe" added in Spark 3, which enables arbitrary metrics to be logged and exposed to streaming query listeners. On Tue, Nov 3, 2020 at 3:25 AM meetwes wrote: > Hi I am looking for the right approach to emit custom metrics for spark > structured streaming job. *Actual S

Re: Cannot perform operation after producer has been closed

2020-11-02 Thread Jungtaek Lim
Which Spark version do you use? There's a known issue on Kafka producer pool in Spark 2.x which was fixed in Spark 3.0, so you'd like to check whether your case is bound to the known issue or not. https://issues.apache.org/jira/browse/SPARK-21869 On Tue, Nov 3, 2020 at 1:53 AM Eric Beabes wrote

Passing authentication token to the user session in Spark Thrift Server

2020-11-02 Thread mhd wrk
What's the recommended way of associating authentication token (response to a successful login) to the user session from a custom authenticator (PasswdAuthenticationProvider)? Thanks, Mohammad

Best way to emit custom metrics to Prometheus in spark structured streaming

2020-11-02 Thread meetwes
Hi I am looking for the right approach to emit custom metrics for spark structured streaming job.*Actual Scenario:* I have an aggregated dataframe let's say with (id, key, value) columns. One of the kpis could be 'droppedRecords' and the corresponding value column has the number of dropped records.

Cannot perform operation after producer has been closed

2020-11-02 Thread Eric Beabes
I know this is related to Kafka but it happens during the Spark Structured Streaming job that's why I am asking on this mailing list. How would you debug this or get around this in Spark Structured Streaming? Any tips would be appreciated. Thanks. java.lang.IllegalStateException: Cannot perform

Re: Integration testing Framework Spark SQL Scala

2020-11-02 Thread Lars Albertsson
Hi, Sorry for the very slow reply - I am far behind in my mailing list subscriptions. You'll find a few slides covering the topic in this presentation: https://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458 Video here: https://vimeo.com/192429554 Regards, Lars

Executors across Stages

2020-11-02 Thread AVS Bharadwaj
Hi, I am running spark in cluster mode ( on K8 ). On running it on Word Count example, the number of executors assigned are different across stages. Our number of assigned executors is 20. While stage 1 gets all 20 of them alloted, stage 2 gets only < 10 executors. Is there any particular reason fo

[UNSUB me please]

2020-11-02 Thread northbright
Aurora Borealis