退订


发自我的iPhone


------------------ Original ------------------
From: Tony Wei <tony19920...@gmail.com&gt;
Date: Tue,Mar 14,2023 1:11 PM
To: David Anderson <dander...@apache.org&gt;
Cc: Hangxiang Yu <master...@gmail.com&gt;, user <user@flink.apache.org&gt;
Subject: Re: is there any detrimental side-effect if i set the max 
parallelismas 32768



Hi Hangxiang, David,

Thank you for your replies. Your responses are very helpful.


Best regards,
Tony Wei


David Anderson <dander...@apache.org&gt; 於 2023年3月14日 週二 下午12:12寫道:

I believe there is some noticeable overhead if you are using the
 heap-based state backend, but with RocksDB I think the difference is
 negligible.
 
 David
 
 On Tue, Mar 7, 2023 at 11:10 PM Hangxiang Yu <master...@gmail.com&gt; wrote:
 &gt;
 &gt; Hi, Tony.
 &gt; "be detrimental to performance" means that some extra space overhead of 
the field of the key-group may influence performance.
 &gt; As we know, Flink will write the key group as the prefix of the key to 
speed up rescaling.
 &gt; So the format will be like: key group | key len | key | ......
 &gt; You could check the relationship between max parallelism and bytes of key 
group as below:
 &gt; ------------------------------------------
 &gt; max parallelism&nbsp; &nbsp;bytes of key group
 &gt;&nbsp; &nbsp; &nbsp; &nbsp; 128&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; 1
 &gt;&nbsp; &nbsp; &nbsp; &nbsp;32768&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp;2
 &gt; ------------------------------------------
 &gt; So I think the cost will be very small if the real key length &gt;&gt; 2 
bytes.
 &gt;
 &gt; On Wed, Mar 8, 2023 at 1:06 PM Tony Wei <tony19920...@gmail.com&gt; wrote:
 &gt;&gt;
 &gt;&gt; Hi experts,
 &gt;&gt;
 &gt;&gt;&gt; Setting the maximum parallelism to a very large value can be 
detrimental to performance because some state backends have to keep internal 
data structures that scale with the number of key-groups (which are the 
internal implementation mechanism for rescalable state).
 &gt;&gt;&gt;
 &gt;&gt;&gt; Changing the maximum parallelism explicitly when recovery from 
original job will lead to state incompatibility.
 &gt;&gt;
 &gt;&gt;
 &gt;&gt; I read the section above from Flink official document [1], and I'm 
wondering what the detail is regarding to the side-effect.
 &gt;&gt;
 &gt;&gt; Suppose that I have a Flink SQL job with large state, large 
parallelism and using RocksDB as my state backend.
 &gt;&gt; I would like to set the max parallelism as 32768, so that I don't 
bother if the max parallelism can be divided by the parallelism whenever I want 
to scale my job,
 &gt;&gt; because the number of key groups will not differ too much between 
each subtask.
 &gt;&gt;
 &gt;&gt; I'm wondering if this is a good practice, because based on the 
official document it is not recommended actually.
 &gt;&gt; If possible, I would like to know the detail about this side-effect. 
Which state backend will have this issue? and Why?
 &gt;&gt; Please give me an advice. Thanks in advance.
 &gt;&gt;
 &gt;&gt; Best regards,
 &gt;&gt; Tony Wei
 &gt;&gt;
 &gt;&gt; [1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism
 &gt;
 &gt;
 &gt;
 &gt; --
 &gt; Best,
 &gt; Hangxiang.

Reply via email to