Anyone else can answer below questions on performance tuning Structured streaming? @Jacek?
On Sun, May 3, 2020 at 12:07 AM Srinivas V <srini....@gmail.com> wrote: > Hi Alex, read the book , it is a good one but i don’t see things which I > strongly want to understand. > You are right on the partition and tasks. > 1.How to use coalesce with spark structured streaming ? > > Also I want to ask few more questions, > 2. How to restrict number of executors on structured streaming? > —num-executors is minimum is it ? > To cap max, can I use spark.dynamicAllocation.maxExecutors ? > > 3. Does other streaming properties hold good for structured streaming? > Like spark.streaming.dynamicAllocation.enabled ? > If not what are the ones it takes into consideration? > > 4. Does structured streaming 2.4.5 allow dynamicAllocation of executors/ > cores? In case of Kafka consumer, when the cluster has to scale down, does > it reconfigure the mapping of executors cores to kaka partitions? > > 5. Why spark srtructured Streaming web ui (SQL tab) is not so informative > like streaming tab of Spark streaming ? > > It would be great if these questions are answered, otherwise the only > option left would be to go through the spark code and figure out. > > On Sat, Apr 18, 2020 at 1:09 PM Alex Ott <alex...@gmail.com> wrote: > >> Just to clarify - I didn't write this explicitly in my answer. When you're >> working with Kafka, every partition in Kafka is mapped into Spark >> partition. And in Spark, every partition is mapped into task. But you >> can >> use `coalesce` to decrease the number of Spark partitions, so you'll have >> less tasks... >> >> Srinivas V at "Sat, 18 Apr 2020 10:32:33 +0530" wrote: >> SV> Thank you Alex. I will check it out and let you know if I have any >> questions >> >> SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott <alex...@gmail.com> wrote: >> >> SV> http://shop.oreilly.com/product/0636920047568.do has quite good >> information >> SV> on it. For Kafka, you need to start with approximation that >> processing of >> SV> each partition is a separate task that need to be executed, so >> you need to >> SV> plan number of cores correspondingly. >> SV> >> SV> Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote: >> SV> SV> Hello, >> SV> SV> Can someone point me to a good video or document which >> takes about performance tuning for structured streaming app? >> SV> SV> I am looking especially for listening to Kafka topics say 5 >> topics each with 100 portions . >> SV> SV> Trying to figure out best cluster size and number of >> executors and cores required. >> >> >> -- >> With best wishes, Alex Ott >> http://alexott.net/ >> Twitter: alexott_en (English), alexott (Russian) >> >