[ https://issues.apache.org/jira/browse/FLINK-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aljoscha Krettek closed FLINK-1915. ----------------------------------- Resolution: Won't Fix We're not actively developing the DataSet API anymore. > Faulty plan selection by optimizer > ---------------------------------- > > Key: FLINK-1915 > URL: https://issues.apache.org/jira/browse/FLINK-1915 > Project: Flink > Issue Type: Bug > Components: API / DataSet > Reporter: Till Rohrmann > Priority: Minor > > The optimizer selects for certain jobs a sub-optimal execution plan. > For example, the {{WebLogAnalysis}} example job contains a coGroup input > which consists of a {{Filter}} and a subsequent {{Projection}}. The optimizer > inserts a hash partitioning between the filter and the mapper (projection) > and a sorting after the projection. It would be more efficient if the hash > partitioning would have been done after the projection, because the data is > smaller at this stage. > I could observe a similar behaviour for a larger job, where the hash > partitioning was executed before a filter operation which was then used as > input for a join operator. I suspect that the optimizer considers the two > plans (hash partitioning before the filter and after the filter) as > equivalent in the absence of proper size estimates. However, executing the > hash partitioning after the filter should always be more efficient. -- This message was sent by Atlassian Jira (v8.3.4#803005)