> However, ideally we wish to manipulate the original query as delivered by the 
> user (or as close to it as possible), and we’re finding that the tree has 
> been modified significantly by the time it hits the hook

That's CBO. It takes the Query - > AST -> Calcite Tree -> AST -> hook - the 
bushy join conversion is already done by the time the hook gets called.

We need a Parser hook to hook it ahead of CBO, not a Semantic Analyzer hook.

> Additionally we wish to track back ASTNodes to the character sequences in the 
> source HQL that were their origin (where sensible), and ultimately hope to be 
> able regenerate the query text from the AST.

I started work on a Hive-unparser a while back based on this class, but it a 
world of verbose coding.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L850

If you're doing active work on this, I'd like to help, because I need the AST 
-> query to debug CBO.

> The use case, if you are interested, is a mutation testing framework for HQL. 
> The testing of mutants is operational, but now we need to report on 
> survivors, hence the need to track back from specific query elements to 
> character sequences in the original query string.

This sounds a lot like the fuzzing random-query-gen used in Cloudera to have 
Impala vs Hive bug-for-bug compat.

https://cwiki.apache.org/confluence/download/attachments/27362054/Random%20Query%20Gen-%20Hive%20Meetup.pptx

Cheers,
Gopal


Reply via email to