退订
发自我的iPhone ------------------ Original ------------------ From: Tony Wei <tony19920...@gmail.com> Date: Tue,Mar 14,2023 1:11 PM To: David Anderson <dander...@apache.org> Cc: Hangxiang Yu <master...@gmail.com>, user <user@flink.apache.org> Subject: Re: is there any detrimental side-effect if i set the max parallelismas 32768 Hi Hangxiang, David, Thank you for your replies. Your responses are very helpful. Best regards, Tony Wei David Anderson <dander...@apache.org> 於 2023年3月14日 週二 下午12:12寫道: I believe there is some noticeable overhead if you are using the heap-based state backend, but with RocksDB I think the difference is negligible. David On Tue, Mar 7, 2023 at 11:10 PM Hangxiang Yu <master...@gmail.com> wrote: > > Hi, Tony. > "be detrimental to performance" means that some extra space overhead of the field of the key-group may influence performance. > As we know, Flink will write the key group as the prefix of the key to speed up rescaling. > So the format will be like: key group | key len | key | ...... > You could check the relationship between max parallelism and bytes of key group as below: > ------------------------------------------ > max parallelism bytes of key group > 128 1 > 32768 2 > ------------------------------------------ > So I think the cost will be very small if the real key length >> 2 bytes. > > On Wed, Mar 8, 2023 at 1:06 PM Tony Wei <tony19920...@gmail.com> wrote: >> >> Hi experts, >> >>> Setting the maximum parallelism to a very large value can be detrimental to performance because some state backends have to keep internal data structures that scale with the number of key-groups (which are the internal implementation mechanism for rescalable state). >>> >>> Changing the maximum parallelism explicitly when recovery from original job will lead to state incompatibility. >> >> >> I read the section above from Flink official document [1], and I'm wondering what the detail is regarding to the side-effect. >> >> Suppose that I have a Flink SQL job with large state, large parallelism and using RocksDB as my state backend. >> I would like to set the max parallelism as 32768, so that I don't bother if the max parallelism can be divided by the parallelism whenever I want to scale my job, >> because the number of key groups will not differ too much between each subtask. >> >> I'm wondering if this is a good practice, because based on the official document it is not recommended actually. >> If possible, I would like to know the detail about this side-effect. Which state backend will have this issue? and Why? >> Please give me an advice. Thanks in advance. >> >> Best regards, >> Tony Wei >> >> [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism > > > > -- > Best, > Hangxiang.