Hi Jiangang,

Thanks for your suggestion.

>>> The config can affect the memory usage. Will the related memory configs
be changed?

I think we will not change the default network memory settings. My best
expectation is that the default value can work for most cases (though may
not the best) and for other cases, user may need to tune the memory
settings.

>>> Can you share the tpcds results for different configs? Although we
change the default values, it is helpful to change them for different
users. In this case, the experience can help a lot.

I did not keep all previous TPCDS results, but from the results, I can tell
that on HDD, always using the sort-shuffle is a good choice. For small
jobs, using sort-shuffle may not bring much performance gain, this may
because that all shuffle data can be cached in memory (page cache), this is
the case if the cluster have enough resources. However, if the whole
cluster is under heavy burden or you are running large scale jobs, the
performance of those small jobs can also be influenced. For large-scale
jobs, the configurations suggested to be tuned are
taskmanager.network.sort-shuffle.min-buffers and
taskmanager.memory.framework.off-heap.batch-shuffle.size, you can increase
these values for large-scale batch jobs.

BTW, I am still running TPCDS tests these days and I can share these
results soon.

Best,
Yingjie

刘建刚 <liujiangangp...@gmail.com> 于2021年12月10日周五 18:30写道:

> Glad to see the suggestion. In our test, we found that small jobs with the
> changing configs can not improve the performance much just as your test. I
> have some suggestions:
>
>    - The config can affect the memory usage. Will the related memory
>    configs be changed?
>    - Can you share the tpcds results for different configs? Although we
>    change the default values, it is helpful to change them for different
>    users. In this case, the experience can help a lot.
>
> Best,
> Liu Jiangang
>
> Yun Gao <yungao...@aliyun.com.invalid> 于2021年12月10日周五 17:20写道:
>
>> Hi Yingjie,
>>
>> Very thanks for drafting the FLIP and initiating the discussion!
>>
>> May I have a double confirmation for
>> taskmanager.network.sort-shuffle.min-parallelism that
>> since other frameworks like Spark have used sort-based shuffle for all
>> the cases, does our
>> current circumstance still have difference with them?
>>
>> Best,
>> Yun
>>
>>
>>
>>
>> ------------------------------------------------------------------
>> From:Yingjie Cao <kevin.ying...@gmail.com>
>> Send Time:2021 Dec. 10 (Fri.) 16:17
>> To:dev <d...@flink.apache.org>; user <u...@flink.apache.org>; user-zh <
>> user-zh@flink.apache.org>
>> Subject:Re: [DISCUSS] Change some default config values of blocking
>> shuffle
>>
>> Hi dev & users:
>>
>> I have created a FLIP [1] for it, feedbacks are highly appreciated.
>>
>> Best,
>> Yingjie
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability
>> Yingjie Cao <kevin.ying...@gmail.com> 于2021年12月3日周五 17:02写道:
>>
>> Hi dev & users,
>>
>> We propose to change some default values of blocking shuffle to improve
>> the user out-of-box experience (not influence streaming). The default
>> values we want to change are as follows:
>>
>> 1. Data compression
>> (taskmanager.network.blocking-shuffle.compression.enabled): Currently, the
>> default value is 'false'.  Usually, data compression can reduce both disk
>> and network IO which is good for performance. At the same time, it can save
>> storage space. We propose to change the default value to true.
>>
>> 2. Default shuffle implementation
>> (taskmanager.network.sort-shuffle.min-parallelism): Currently, the default
>> value is 'Integer.MAX', which means by default, Flink jobs will always use
>> hash-shuffle. In fact, for high parallelism, sort-shuffle is better for
>> both stability and performance. So we propose to reduce the default value
>> to a proper smaller one, for example, 128. (We tested 128, 256, 512 and
>> 1024 with a tpc-ds and 128 is the best one.)
>>
>> 3. Read buffer of sort-shuffle
>> (taskmanager.memory.framework.off-heap.batch-shuffle.size): Currently, the
>> default value is '32M'. Previously, when choosing the default value, both
>> ‘32M' and '64M' are OK for tests and we chose the smaller one in a cautious
>> way. However, recently, it is reported in the mailing list that the default
>> value is not enough which caused a buffer request timeout issue. We already
>> created a ticket to improve the behavior. At the same time, we propose to
>> increase this default value to '64M' which can also help.
>>
>> 4. Sort buffer size of sort-shuffle
>> (taskmanager.network.sort-shuffle.min-buffers): Currently, the default
>> value is '64' which means '64' network buffers (32k per buffer by default).
>> This default value is quite modest and the performance can be influenced.
>> We propose to increase this value to a larger one, for example, 512 (the
>> default TM and network buffer configuration can serve more than 10 result
>> partitions concurrently).
>>
>> We already tested these default values together with tpc-ds benchmark in
>> a cluster and both the performance and stability improved a lot. These
>> changes can help to improve the out-of-box experience of blocking shuffle.
>> What do you think about these changes? Is there any concern? If there are
>> no objections, I will make these changes soon.
>>
>> Best,
>> Yingjie
>>
>>

回复