Yes, that was my fault. I'm used to auto reply-all on my desktop machine, but my phone just did a simple reply. Sorry for the confusion, Fabian
2016-06-29 19:24 GMT+02:00 Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr>: > Thank you, Aljoscha! > I received a similar update from Fabian, only now I see the user list was > not in CC. > > Fabian::*The optimizer hasn’t been touched (except for bugfixes and new > operators) for quite some time.* > *These limitations are still present and I don’t expect them to be removed > anytime soon. IMO, it is more likely that certain optimizations like join > reordering will be done for Table API / SQL queries by the Calcite > optimizer and pushed through the Flink Dataset optimizer.* > > I agree, for join reordering optimisations it makes sense to rely on > Calcite. > My goal is to understand how current documentation correlates to the > Flink’s framework status. > > I've did an experimental study where I compared Flink and Spark for many > workloads at very large scale (I’ll share the results soon) and I would > like to develop a few ideas on top of Flink (from the results Flink is the > winner in most of the use cases and it is our choice for the platform on > which to develop and grow). > > My interest is in understanding more about Flink today. I am familiar with > most of the papers written, I am watching the documentation also. > I am looking at the DataSet API, runtime and current architecture. > > Best, > Ovidiu > > On 29 Jun 2016, at 17:27, Aljoscha Krettek <aljos...@apache.org> wrote: > > Hi, > I think this document is still up-to-date since not much was done in these > parts of the code for the 1.0 release and after that. > > Maybe Timo can give some insights into what optimizations are done in the > Table API/SQL that will be be released in an updated version in 1.1. > > Cheers, > Aljoscha > > +Timo, Explicitly adding Timo > > On Tue, 28 Jun 2016 at 21:41 Ovidiu-Cristian MARCU < > ovidiu-cristian.ma...@inria.fr> wrote: > >> Hi, >> >> The optimizer internals described in this document [1] are probably not >> up-to-date. >> Can you please confirm if this is still valid: >> >> *“The following optimizations are not performed* >> >> - *Join reordering (or operator reordering in general): Joins / >> Filters / Reducers are not re-ordered in Flink. This is a high opportunity >> optimization, but with high risk in the absence of good estimates about >> the >> data characteristics. Flink is not doing these optimizations at this >> point.* >> - *Index vs. Table Scan selection: In Flink, all data sources are >> always scanned. The data source (the input format) may apply clever >> mechanism to not scan all the data, but pre-select and project. Examples >> are the RCFile / ORCFile / Parquet input formats."* >> >> Any update of this page will be very helpful. >> >> Thank you. >> >> Best, >> Ovidiu >> [1] https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals >> > >