Re: SQL Query Set Analyzer

2018-08-07 Thread Rajat Venkatesh
James, I am the creator of Quark (Hopefully you meant this project: https://github.com/qubole/quark). Quark was an experiment in federated materialized views. We noticed that data teams store views of their data in Redshift, Vertica etc but the relationships were not captured. I dont think Quark wi

Re: SQL Query Set Analyzer

2018-08-06 Thread Julian Hyde
Regarding the SCOPE paper you reference. That was on my mind too (I went to the talk at SIGMOD). A materialized view is created only if the same query is used *textually identically* in different parts of the ETL process, so it is mainly for optimizing batch jobs that are largely the same night

Re: SQL Query Set Analyzer

2018-08-06 Thread Jesus Camacho Rodriguez
You can find an overview of the work that has been done in Hive for materialized view integration in the following link: https://cwiki.apache.org/confluence/display/Hive/Materialized+views Materialized views can be stored in external tables such as Druid-backed tables too. Druid rules that in Calci

Re: SQL Query Set Analyzer

2018-08-06 Thread Julian Hyde
It’s hard to automatically recommend a set of MVs from past queries. The design space is just too large. But if you are designing MVs for interactive BI, you can use the “lattice” model. This works because many queries will be filter-join-aggregate queries on a star schema (i.e. a central fact t

Re: SQL Query Set Analyzer

2018-08-03 Thread James Taylor
Both the Lattice Suggestor and Quark sound like what I need for an automated solution, but I have some more basic follow up questions first. Here's our basic use case (very similar to Zheng Shao's, I believe): - Our company has stood up Presto for data analysts - Nightly ETL jobs populate Hive tabl

Re: SQL Query Set Analyzer

2018-07-26 Thread Julian Hyde
PS +1 for Babel. If you are analyzing a set of queries, it is very likely that these queries were written to be executed against another database. Babel aims to take such queries and convert them into Calcite relational algebra. The process might occasionally be lossy, if Calcite's algebra doe

Re: SQL Query Set Analyzer

2018-07-26 Thread Julian Hyde
There are many possible analyzers, but Lattice Suggester is one that I am working on and is relatively mature. It looks at lots of queries and builds lattices (star schemas with measures) from those queries. It finds commonality by “growing” lattices - adding measures, adding derived expressions

Re: SQL Query Set Analyzer

2018-07-26 Thread Khai Tran
building basic library code for comparing RexNodes, RelNodes (like two projects with different column order are equal) From: Devjyoti Patra Sent: Wednesday, July 25, 2018 10:28 PM To: dev@calcite.apache.org Subject: Re: SQL Query Set Analyzer Hi Zheng, At Qubole, we

Re: SQL Query Set Analyzer

2018-07-25 Thread Shuyi Chen
Hi Zheng, I think the Babel parser in the latest Calcite 1.17 might be helpful here. The Babel parser (JIRA here ) can allow Calcite to accept all SQL dialects (e.g. Hive, oracle, and etc.). Before 1.17, Calcite can only parse its own SQL dialect,

Re: SQL Query Set Analyzer

2018-07-25 Thread Devjyoti Patra
Hi Zheng, At Qubole, we are building something very similar to what you are looking for. And from experience, I can tell you that it is a lot easy to build it than what one may think. We use Calcite parser to parse the SQL into Sqlnode and then use different tree visitors to extract query attribut

SQL Query Set Analyzer

2018-07-25 Thread Zheng Shao
Hi, We are thinking about starting a project to analyze huge number of SQL queries (think millions) to identify common patterns: * Common sub queries * Common filtering conditions (columns) for a table * Common join keys for table pairs Are there any existing projects on that direction using Calc