Hi,

you could consider state operator's partition numbers as "max parallelism",
as parallelism can be reduced via applying coalesce. It would be
effectively working similar as key groups.

If you're also considering offline query, there's a tool to manipulate
state which enables reading and writing state in structured streaming,
achieving rescaling and schema evolution.

https://github.com/HeartSaVioR/spark-state-tools
(DISCLAIMER: I'm an author of this tool.)

Thanks,
Jungtaek Lim (HeartSaVioR)

On Thu, Jun 27, 2019 at 4:48 AM Rong, Jialei <jia...@amazon.com.invalid>
wrote:

> Thank you for your quick reply!
>
> Is there any plan to improve this?
>
> I asked this question due to some investigation on comparing those state
> of art streaming systems, among which Flink and DataFlow allow changing
> parallelism number, and by my knowledge of Spark Streaming, it seems it is
> also able to do that: if some “key interval” concept is used, then state
> can somehow decoupled from partition number by consistent hashing.
>
>
>
>
>
> Regards
>
> Jialei
>
>
>
> *From: *Jacek Laskowski <ja...@japila.pl>
> *Date: *Wednesday, June 26, 2019 at 11:00 AM
> *To: *"Rong, Jialei" <jia...@amazon.com.invalid>
> *Cc: *"user @spark" <user@spark.apache.org>
> *Subject: *Re: Change parallelism number in Spark Streaming
>
>
>
> Hi,
>
>
>
> It's not allowed to change the numer of partitions after your streaming
> query is started.
>
>
>
> The reason is exactly the number of state stores which is exactly the
> number of partitions (perhaps multiplied by the number of stateful
> operators).
>
>
>
> I think you'll even get a warning or an exception when you change it after
> restarting the query.
>
>
>
> The number of partitions is stored in a checkpoint location.
>
>
>
> Jacek
>
>
>
> On Wed, 26 Jun 2019, 19:30 Rong, Jialei, <jia...@amazon.com.invalid>
> wrote:
>
> Hi Dear Spark Expert
>
>
>
> I’m curious about a question regarding Spark Streaming/Structured
> Streaming: whether it allows to change parallelism number(the default one
> or the one specified in particular operator) in a stream having stateful
> transform/operator? Whether this will cause my checkpointed state get
> messed up?
>
>
>
>
>
> Regards
>
> Jialei
>
>
>
>

-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Reply via email to