That's incorrect, it's spark.default.parallelism, but as the name suggests, that is merely a default. You control partitioning directly with .repartition()
On Tue, Mar 14, 2023 at 11:37 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Check this link > > > https://sparkbyexamples.com/spark/difference-between-spark-sql-shuffle-partitions-and-spark-default-parallelism/ > > You can set it > > spark.conf.set("sparkDefaultParallelism", value]) > > > Have a look at Streaming statistics in Spark GUI, especially *Processing > Tim*e, defined by Spark GUI as Time taken to process all jobs of a batch. > *The **Scheduling Dela*y and *the **Total Dela*y are additional > indicators of health. > > > then decide how to set the value. > > > HTH > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 14 Mar 2023 at 16:04, Emmanouil Kritharakis < > kritharakismano...@gmail.com> wrote: > >> Yes I need to check the performance of my streaming job in terms of >> latency and throughput. Is there any working example of how to increase the >> parallelism with spark structured streaming using Dataset data structures? >> Thanks in advance. >> >> Kind regards, >> >> ------------------------------------------------------------------ >> >> Emmanouil (Manos) Kritharakis >> >> Ph.D. candidate in the Department of Computer Science >> <https://sites.bu.edu/casp/people/ekritharakis/> >> >> Boston University >> >> >> On Tue, Mar 14, 2023 at 12:01 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> What benefits are you going with increasing parallelism? Better througput >>> >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Tue, 14 Mar 2023 at 15:58, Emmanouil Kritharakis < >>> kritharakismano...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> I hope this email finds you well! >>>> >>>> I have a simple dataflow in which I read from a kafka topic, perform a >>>> map transformation and then I write the result to another topic. Based on >>>> your documentation here >>>> <https://spark.apache.org/docs/3.3.2/structured-streaming-kafka-integration.html#content>, >>>> I need to work with Dataset data structures. Even though my solution works, >>>> I need to increase the parallelism. The spark documentation includes a lot >>>> of parameters that I can change based on specific data structures like >>>> *spark.default.parallelism* or *spark.sql.shuffle.partitions*. The >>>> former is the default number of partitions in RDDs returned by >>>> transformations like join, reduceByKey while the later is not recommended >>>> for structured streaming as it is described in documentation: "Note: For >>>> structured streaming, this configuration cannot be changed between query >>>> restarts from the same checkpoint location". >>>> >>>> So my question is how can I increase the parallelism for a simple >>>> dataflow based on datasets with a map transformation only? >>>> >>>> I am looking forward to hearing from you as soon as possible. Thanks in >>>> advance! >>>> >>>> Kind regards, >>>> >>>> ------------------------------------------------------------------ >>>> >>>> Emmanouil (Manos) Kritharakis >>>> >>>> Ph.D. candidate in the Department of Computer Science >>>> <https://sites.bu.edu/casp/people/ekritharakis/> >>>> >>>> Boston University >>>> >>>