Re: Hive SQL extension

Stamatis Zampetakis Mon, 26 Oct 2020 02:28:01 -0700

I do like extensions and things that simplify our life when
writing queries.


Regarding the partitioning syntax for Iceberg, there may be better
alternatives.
I was also leaning towards a syntax like the one proposed by Jesus (in
another thread) based on virtual columns, which is also part of SQL
standard.

Regarding the other use cases mentioned (temporal queries, time travel
etc.) there are things that are part of SQL standard so we could start from
there and then introduce extensions if needed.

Syntactic sugar is powerful but in terms of design I find it more
appropriate to perform "desugaring" after having an AST; either AST to AST
transformations or afterwards.
The syntax (sugared or not) is part and responsibility of the parser so an
architecture with sub-parser hooks seems a bit brittle, especially if we
start using it extensively.
Having said that you have thought of this much more than I did so maybe the
hook's approach is a better idea after all :)

Best,
Stamatis

On Fri, Oct 23, 2020 at 2:26 PM Pau Tallada <tall...@pic.es> wrote:

> Hi all,
>
> I do not know if that may be of interest to you, but there are other
> projects that could benefit from this.
> For instance, ADQL
> <https://www.ivoa.net/documents/ADQL/20180112/PR-ADQL-2.1-20180112.html>
> (Astronomical Data Query Language) is a SQL-like language that defines some
> higher-level functions that enable powerful geospatial queries. Projects
> like queryparser <https://github.com/aipescience/queryparser> are able to
> translate from ADQL to vendor-SQL for MySQL or PostreSQL. In this case, the
> syntactic sugar is implemented as an external layer on top, but could very
> well be implemented in a rewrite hook if available.
>
> Cheers,
>
> Pau.
>
> Missatge de Peter Vary <pv...@cloudera.com> del dia dj., 22 d’oct. 2020 a
> les 16:21:
>
>>
>> Let's assume that this feature would be useful for Iceberg tables, but
>> useless and even problematic/forbidden for other tables. :)
>>
>> My thinking is, that it could make Hive much more user friendly, if we
>> would allow for extensions in language.
>>
>> With Iceberg integration we plan to do several extensions which might not
>> be useful for other tables. Some examples:
>>
>>    - When creating tables we want to send additional information to the
>>    storage layer, and pushing everything in properties is a pain (not really
>>    user friendly)
>>    - We would like to allow querying table history for iceberg tables
>>    (previous snapshotId-s, timestamps, etc)
>>    - We would like to allow time travel for iceberg tables based on the
>>    data queried above
>>    - We would like to allow the user to see / manage / remove old
>>    snapshots
>>
>>
>> These are all very specific Iceberg related stuff, and most probably will
>> not work / useful for any other type of the tables, so I think adding them
>> to Hive parser would be a stretch.
>>
>> On the other hand if we do not provide SQL interface for accessing these
>> features then the users will turn to Spark/Impala/Presto to be able to work
>> with Iceberg tables.
>>
>> As for your specific question for handling syntax errors (I have just
>> started to think about how would I do it, so feel free to suggest better
>> methods):
>>
>>    - Let's assume that we have a hook which can get the sql command as
>>    an input and can rewrite it to a new SQL command
>>    - I would write simplified parser which tries to be as simple as
>>    possible for the specific command
>>    - Based on the parsing I would return the same command / throw an
>>    exception / rewrite the command
>>
>>
>> Admittedly this solution is working only if we can make every feature
>> work without changing other part of Hive, and we just want to add
>> "syntactic sugar" to it. (Do not underestimate the benefits of syntactic
>> sugar :))
>>
>> Thanks,
>> Peter
>>
>>
>> On Oct 22, 2020, at 11:44, Stamatis Zampetakis <zabe...@gmail.com> wrote:
>>
>> Hi Peter,
>>
>> I am nowhere near being an expert but just wanted to share my thoughts.
>>
>> If I understand correctly you would like some syntactic sugar in Hive to
>> support partitioning as per Iceberg. I cannot tell if that's really useful
>> or not but from my point of view it doesn't seem a very good idea to
>> introduce another layer of parsing before the actual parser (don't know if
>> there is one already). For instance, how are you gonna handle the situation
>> where there are syntax errors in your sugared part and what the end user
>> should see?
>>
>> No matter how it is added if you give the possibility to the user to
>> write such queries it becomes part of the Hive syntax and as such a job of
>> the parser.
>>
>> Best,
>> Stamatis
>>
>>
>> On Thu, Oct 22, 2020 at 9:49 AM Peter Vary <pv...@cloudera.com> wrote:
>>
>>> Hi Hive experts,
>>>
>>> I would like to extend Hive SQL language to provide a way to create
>>> Iceberg partitioned tables like this:
>>>
>>> create table iceberg_test(
>>>         level string,
>>>         event_time timestamp,
>>>         message string,
>>>         register_time date,
>>>         telephone array <string>
>>>     )
>>>     partition by spec(
>>>         level identity,
>>>         event_time identity,
>>>         event_time hour,
>>>         register_time day
>>>     )
>>>     stored as iceberg;
>>>
>>>
>>> The problem is that this syntax is very specific of Iceberg, and I think
>>> it is not a good idea to change the Hive syntax globally to accommodate a
>>> specific use-case.
>>> The following CREATE TABLE statement could archive the same thing:
>>>
>>> create table iceberg_test(
>>>         level string,
>>>         event_time timestamp,
>>>         message string,
>>>         register_time date,
>>>         telephone array <string>
>>>     )
>>>     STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
>>>     TBLPROPERTIES ('iceberg.mr.table.partition.spec'='...');
>>>
>>>
>>> I am looking for a way to rewrite the original (Hive syntactically not
>>> correct) query to a new (syntactically correct) one.
>>>
>>> I was checking the hooks as a possible solution, but I have found that:
>>>
>>>    - HiveDriverRunHook.preDriverRun can get the original /
>>>    syntactically not correct query, but I have found no way to rewrite it 
>>> to a
>>>    syntactically correct one (it looks like a read only query)
>>>    - HiveSemanticAnalyzerHook can rewrite the AST tree, but it needs a
>>>    syntactically correct query to start with
>>>
>>>
>>> Any other ideas how to archive the goals above? Either with Hooks, or
>>> with any other way?
>>>
>>> Thanks,
>>> Peter
>>>
>>
>>
>
> --
> ----------------------------------
> Pau Tallada Crespí
> Departament de Serveis
> Port d'Informació Científica (PIC)
> Tel: +34 93 170 2729
> ----------------------------------
>
>

Re: Hive SQL extension

Reply via email to