On Thu, Jul 2, 2020 at 2:33 PM Bryan Jeffrey
wrote:
> Srinivas,
>
> I finally broke a little bit of time free to look at this issue. I
> reduced the scope of my ambitions and simply cloned a the ConsoleSink and
> ConsoleReporter class. After doing so I can see the original version
> works, but
Hi Prashant,
It sounds encouraging. During scale down of the cluster, probably few of the
spark jobs are impacted due to re-computation of shuffle data. This is not of
supreme importance for us for now.
Is there any reference deployment architecture available, which is HA ,
scalable and
Hi,
When doing application upgrade for spark structured streaming, do we need to
delete the checkpoint or does it start consuming offsets from the point we left?
kafka source we need to use the option "StartingOffsets" with a json string like
""" {"topicA":{"0":23,"1":-1},"topicB":{"0":-2}}
Hey guys,
Thanks for insights.
Bobby, I see that it guesses those values from run time of the whole task.
But as the whole task took 6.6 minutes, how can it come up with 7.27 hours?
Sean, yes there is a data skew. One task taking tens of minutes while other
take tens of seconds. What gave it
I'm trying to build Spark 3.0.0 for my Yarn cluster, with Hadoop 2.7.3 and
Hive 1.2.1. I downloaded the source and created a runnable dist with
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr
-Phive-1.2 -Phadoop-2.7 -Pyarn
We're running Spark 2.4.0 in production so I