[ https://issues.apache.org/jira/browse/FLINK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fabian Hueske reassigned FLINK-5722: ------------------------------------ Assignee: Fabian Hueske > Implement DISTINCT as dedicated operator > ---------------------------------------- > > Key: FLINK-5722 > URL: https://issues.apache.org/jira/browse/FLINK-5722 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL > Affects Versions: 1.2.0, 1.3.0 > Reporter: Fabian Hueske > Assignee: Fabian Hueske > > DISTINCT is currently implemented for batch Table API / SQL as an aggregate > which groups on all fields. Grouped aggregates are implemented as GroupReduce > with sort-based combiner. > This operator can be more efficiently implemented by using ReduceFunction and > hinting a HashCombine strategy. The same ReduceFunction can be used for all > DISTINCT operations and can be assigned with appropriate forward field > annotations. > We would need a custom conversion rule which translates distinct aggregations > (grouping on all fields and returning all fields) into a custom > DataSetRelNode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)