Re: Metrics Problem

2020-07-10 Thread Bryan Jeffrey
On Thu, Jul 2, 2020 at 2:33 PM Bryan Jeffrey wrote: > Srinivas, > > I finally broke a little bit of time free to look at this issue. I > reduced the scope of my ambitions and simply cloned a the ConsoleSink and > ConsoleReporter class. After doing so I can see the original version > works, but

RE: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

2020-07-10 Thread Varshney, Vaibhav
Hi Prashant, It sounds encouraging. During scale down of the cluster, probably few of the spark jobs are impacted due to re-computation of shuffle data. This is not of supreme importance for us for now. Is there any reference deployment architecture available, which is HA , scalable and

Application Upgrade - structured streaming

2020-07-10 Thread KhajaAsmath Mohammed
Hi, When doing application upgrade for spark structured streaming, do we need to delete the checkpoint or does it start consuming offsets from the point we left? kafka source we need to use the option "StartingOffsets" with a json string like """ {"topicA":{"0":23,"1":-1},"topicB":{"0":-2}}

Re: Strange WholeStageCodegen UI values

2020-07-10 Thread Michal Sankot
Hey guys, Thanks for insights. Bobby, I see that it guesses those values from run time of the whole task. But as the whole task took 6.6 minutes, how can it come up with 7.27 hours? Sean, yes there is a data skew. One task taking tens of minutes while other take tens of seconds. What gave it

Building Spark 3.0.0 for Hive 1.2

2020-07-10 Thread Patrick McCarthy
I'm trying to build Spark 3.0.0 for my Yarn cluster, with Hadoop 2.7.3 and Hive 1.2.1. I downloaded the source and created a runnable dist with ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive-1.2 -Phadoop-2.7 -Pyarn We're running Spark 2.4.0 in production so I