> However, ideally we wish to manipulate the original query as delivered by the > user (or as close to it as possible), and we’re finding that the tree has > been modified significantly by the time it hits the hook
That's CBO. It takes the Query - > AST -> Calcite Tree -> AST -> hook - the bushy join conversion is already done by the time the hook gets called. We need a Parser hook to hook it ahead of CBO, not a Semantic Analyzer hook. > Additionally we wish to track back ASTNodes to the character sequences in the > source HQL that were their origin (where sensible), and ultimately hope to be > able regenerate the query text from the AST. I started work on a Hive-unparser a while back based on this class, but it a world of verbose coding. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L850 If you're doing active work on this, I'd like to help, because I need the AST -> query to debug CBO. > The use case, if you are interested, is a mutation testing framework for HQL. > The testing of mutants is operational, but now we need to report on > survivors, hence the need to track back from specific query elements to > character sequences in the original query string. This sounds a lot like the fuzzing random-query-gen used in Cloudera to have Impala vs Hive bug-for-bug compat. https://cwiki.apache.org/confluence/download/attachments/27362054/Random%20Query%20Gen-%20Hive%20Meetup.pptx Cheers, Gopal
