Hi community,
I hope to extend `RelDistribution` to support more distribution types in
order to solve data skew in the normal hash distribution.
When we use hash distribution to bring all records with the same hash key
to the same place, the job performance would be poor if there exists hot
keys.
I think you should contribute a change that adds a new value to the enum. I
know that enums are not easily extensible, but in cases like this, that can be
a feature rather than a bug.
There are not very many distribution types, and new distribution types are
rarely invented. Requiring people to
Hi Julian,
Make sense.
Then a new newAdded RelDistribution type requires a strong reason.
I have created a JIRA [1] to track this requirement.
[1] https://issues.apache.org/jira/browse/CALCITE-4957
Best,
Jing Zhang
Julian Hyde 于2021年12月22日周三 08:04写道:
> I think you should contribute a change th
Hi, Jing. I still don't get your point of adding new distribution types.
I think what you need is a new metadata indicating whether there are skewed
values. By looking it up through RelMetadataQuery, you may boost (or even
make it INF) the cost of two-pass agg or shuffled join and make other
impl
Hi Jinpeng,
Thanks for response.
I guess we say different solution to solve data skew, please correct me if
I am wrong.
What you say is for cases hot keys could be known in advance and stored in
metadata. So we could query whether there is data skew and skew values
from RelMetadataQuery. The opti
Sorry for typo.
so it would effect existed sql behavior. => it would not effect existed sql
behavior.
Jing Zhang 于2021年12月22日周三 12:36写道:
> Hi Jinpeng,
> Thanks for response.
> I guess we say different solution to solve data skew, please correct me if
> I am wrong.
>
> What you say is for cases
Hi, Jing.
I'm not worrying about existing queries being affected. But I am just
providing some suggestions. If you don't handle traits derivations
correctly, your new queries may not perform as well as those old queries
using classic distribution types.
Metadata queries can not only return concre
Hi Jinpeng,
Thanks a lot for your response.
> If you don't handle traits derivations
correctly, your new queries may not perform as well as those old queries
using classic distribution types.
Exactly, Thanks for reminding me. I would handle trait derivations
carefully for the new added distributio