Thanks Xiao, a more up to date publication in a conference like VLDB will certainly turn the the tide for many of us trying to defend Spark's Optimizer.
On Wed, Jan 15, 2020 at 9:39 AM Xiao Li <gatorsm...@gmail.com> wrote: > In the upcoming Spark 3.0, we introduced a new framework for Adaptive > Query Execution in Catalyst. This can adjust the plans based on the runtime > statistics. This is missing in Calcite based on my understanding. > > Catalyst is also very easy to enhance. We also use the dynamic programming > approach in our cost-based join reordering. If needed, in the future, we > also can improve the existing CBO and make it more general. The paper of > Spark SQL was published 5 years ago. A lot of great contributions were made > in the past 5 years. > > Cheers, > > Xiao > > Debajyoti Roy <newroy...@gmail.com> 于2020年1月15日周三 上午9:23写道: > >> Thanks all, and Matei. >> >> TL;DR of the conclusion for my particular case: >> Qualitatively, while Catalyst[1] tries to mitigate learning curve and >> maintenance burden, it lacks the dynamic programming approach used by >> Calcite[2] and risks falling into local minima. >> Quantitatively, there is no reproducible benchmark, that fairly compares >> Optimizer frameworks, apples to apples (excluding execution). >> >> References: >> [1] - >> https://amplab.cs.berkeley.edu/wp-content/uploads/2015/03/SparkSQLSigmod2015.pdf >> [2] - https://arxiv.org/pdf/1802.10233.pdf >> >> On Mon, Jan 13, 2020 at 5:37 PM Matei Zaharia <matei.zaha...@gmail.com> >> wrote: >> >>> I’m pretty sure that Catalyst was built before Calcite, or at least in >>> parallel. Calcite 1.0 was only released in 2015. From a technical >>> standpoint, building Catalyst in Scala also made it more concise and easier >>> to extend than an optimizer written in Java (you can find various >>> presentations about how Catalyst works). >>> >>> Matei >>> >>> > On Jan 13, 2020, at 8:41 AM, Michael Mior <mm...@apache.org> wrote: >>> > >>> > It's fairly common for adapters (Calcite's abstraction of a data >>> > source) to push down predicates. However, the API certainly looks a >>> > lot different than Catalyst's. >>> > -- >>> > Michael Mior >>> > mm...@apache.org >>> > >>> > Le lun. 13 janv. 2020 à 09:45, Jason Nerothin >>> > <jasonnerot...@gmail.com> a écrit : >>> >> >>> >> The implementation they chose supports push down predicates, Datasets >>> and other features that are not available in Calcite: >>> >> >>> >> https://databricks.com/glossary/catalyst-optimizer >>> >> >>> >> On Mon, Jan 13, 2020 at 8:24 AM newroyker <newroy...@gmail.com> >>> wrote: >>> >>> >>> >>> Was there a qualitative or quantitative benchmark done before a >>> design >>> >>> decision was made not to use Calcite? >>> >>> >>> >>> Are there limitations (for heuristic based, cost based, * aware >>> optimizer) >>> >>> in Calcite, and frameworks built on top of Calcite? In the context >>> of big >>> >>> data / TCPH benchmarks. >>> >>> >>> >>> I was unable to dig up anything concrete from user group / Jira. >>> Appreciate >>> >>> if any Catalyst veteran here can give me pointers. Trying to defend >>> >>> Spark/Catalyst. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >>> >>> >>> >>> --------------------------------------------------------------------- >>> >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> >>> >> >>> >> >>> >> -- >>> >> Thanks, >>> >> Jason >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> > >>> >>>