[ 
https://issues.apache.org/jira/browse/FLINK-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-1915.
-----------------------------------
    Resolution: Won't Fix

We're not actively developing the DataSet API anymore.

> Faulty plan selection by optimizer
> ----------------------------------
>
>                 Key: FLINK-1915
>                 URL: https://issues.apache.org/jira/browse/FLINK-1915
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataSet
>            Reporter: Till Rohrmann
>            Priority: Minor
>
> The optimizer selects for certain jobs a sub-optimal execution plan. 
> For example, the {{WebLogAnalysis}} example job contains a coGroup input 
> which consists of a {{Filter}} and a subsequent {{Projection}}. The optimizer 
> inserts a hash partitioning between the filter and the mapper (projection) 
> and a sorting after the projection. It would be more efficient if the hash 
> partitioning would have been done after the projection, because the data is 
> smaller at this stage.
> I could observe a similar behaviour for a larger job, where the hash 
> partitioning was executed before a filter operation which was then used as 
> input for a join operator. I suspect that the optimizer considers the two 
> plans (hash partitioning before the filter and after the filter) as 
> equivalent in the absence of proper size estimates. However, executing the 
> hash partitioning after the filter should always be more efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to