Re: Change parallelism number in Spark Streaming

2019-06-27 Thread Jungtaek Lim
Thu, Jun 27, 2019 at 4:48 AM Rong, Jialei >>>> wrote: >>>> >>>>> Thank you for your quick reply! >>>>> >>>>> Is there any plan to improve this? >>>>> >>>>> I asked this question due to some investigation

Re: Change parallelism number in Spark Streaming

2019-06-27 Thread Jacek Laskowski
gt;> >>>> I asked this question due to some investigation on comparing those >>>> state of art streaming systems, among which Flink and DataFlow allow >>>> changing parallelism number, and by my knowledge of Spark Streaming, it >>>> seems it is also able to do that: if s

Re: Change parallelism number in Spark Streaming

2019-06-26 Thread Jungtaek Lim
knowledge of Spark Streaming, it seems it is >>> also able to do that: if some “key interval” concept is used, then state >>> can somehow decoupled from partition number by consistent hashing. >>> >>> >>> >>> >>> >>> Regards >&g

Re: Change parallelism number in Spark Streaming

2019-06-26 Thread Jacek Laskowski
hashing. >> >> >> >> >> >> Regards >> >> Jialei >> >> >> >> *From: *Jacek Laskowski >> *Date: *Wednesday, June 26, 2019 at 11:00 AM >> *To: *"Rong, Jialei" >> *Cc: *"user @spark" >> *Su

Re: Change parallelism number in Spark Streaming

2019-06-26 Thread Jacek Laskowski
, 2019 at 11:00 AM > *To: *"Rong, Jialei" > *Cc: *"user @spark" > *Subject: *Re: Change parallelism number in Spark Streaming > > > > Hi, > > > > It's not allowed to change the numer of partitions after your streaming > query is star

Re: Change parallelism number in Spark Streaming

2019-06-26 Thread Rong, Jialei
Fantastic, thanks! From: Jungtaek Lim Date: Wednesday, June 26, 2019 at 2:59 PM To: "Rong, Jialei" Cc: Jacek Laskowski , "user @spark" Subject: Re: Change parallelism number in Spark Streaming Hi, you could consider state operator's partition numbers as "max

Re: Change parallelism number in Spark Streaming

2019-06-26 Thread Jungtaek Lim
9 at 11:00 AM > *To: *"Rong, Jialei" > *Cc: *"user @spark" > *Subject: *Re: Change parallelism number in Spark Streaming > > > > Hi, > > > > It's not allowed to change the numer of partitions after your streaming > query is started. &

Re: Change parallelism number in Spark Streaming

2019-06-26 Thread Rong, Jialei
to do that: if some “key interval” concept is used, then state can somehow decoupled from partition number by consistent hashing. Regards Jialei From: Jacek Laskowski Date: Wednesday, June 26, 2019 at 11:00 AM To: "Rong, Jialei" Cc: "user @spark" Subject: Re: Change p

Re: Change parallelism number in Spark Streaming

2019-06-26 Thread Jacek Laskowski
Hi, It's not allowed to change the numer of partitions after your streaming query is started. The reason is exactly the number of state stores which is exactly the number of partitions (perhaps multiplied by the number of stateful operators). I think you'll even get a warning or an exception whe