[DISCUSS] Extends RelDistribution to support more distribution types

2021-12-21 Thread Jing Zhang
Hi community, I hope to extend `RelDistribution` to support more distribution types in order to solve data skew in the normal hash distribution. When we use hash distribution to bring all records with the same hash key to the same place, the job performance would be poor if there exists hot keys.

Re: [DISCUSS] Extends RelDistribution to support more distribution types

2021-12-21 Thread Julian Hyde
I think you should contribute a change that adds a new value to the enum. I know that enums are not easily extensible, but in cases like this, that can be a feature rather than a bug. There are not very many distribution types, and new distribution types are rarely invented. Requiring people to

Re: [DISCUSS] Extends RelDistribution to support more distribution types

2021-12-21 Thread Jing Zhang
Hi Julian, Make sense. Then a new newAdded RelDistribution type requires a strong reason. I have created a JIRA [1] to track this requirement. [1] https://issues.apache.org/jira/browse/CALCITE-4957 Best, Jing Zhang Julian Hyde 于2021年12月22日周三 08:04写道: > I think you should contribute a change th

Re: [DISCUSS] Extends RelDistribution to support more distribution types

2021-12-21 Thread Jinpeng Wu
Hi, Jing. I still don't get your point of adding new distribution types. I think what you need is a new metadata indicating whether there are skewed values. By looking it up through RelMetadataQuery, you may boost (or even make it INF) the cost of two-pass agg or shuffled join and make other impl

Re: [DISCUSS] Extends RelDistribution to support more distribution types

2021-12-21 Thread Jing Zhang
Hi Jinpeng, Thanks for response. I guess we say different solution to solve data skew, please correct me if I am wrong. What you say is for cases hot keys could be known in advance and stored in metadata. So we could query whether there is data skew and skew values from RelMetadataQuery. The opti

Re: [DISCUSS] Extends RelDistribution to support more distribution types

2021-12-21 Thread Jing Zhang
Sorry for typo. so it would effect existed sql behavior. => it would not effect existed sql behavior. Jing Zhang 于2021年12月22日周三 12:36写道: > Hi Jinpeng, > Thanks for response. > I guess we say different solution to solve data skew, please correct me if > I am wrong. > > What you say is for cases

Re: [DISCUSS] Extends RelDistribution to support more distribution types

2021-12-21 Thread Jinpeng Wu
Hi, Jing. I'm not worrying about existing queries being affected. But I am just providing some suggestions. If you don't handle traits derivations correctly, your new queries may not perform as well as those old queries using classic distribution types. Metadata queries can not only return concre

Re: [DISCUSS] Extends RelDistribution to support more distribution types

2021-12-22 Thread Jing Zhang
Hi Jinpeng, Thanks a lot for your response. > If you don't handle traits derivations correctly, your new queries may not perform as well as those old queries using classic distribution types. Exactly, Thanks for reminding me. I would handle trait derivations carefully for the new added distributio