Hello,

We are running and maintaining quite big and complex Hive SELECT query
right now. It's basically a single SELECT query which performs JOIN of
about ten other SELECT query outputs.

A simplest way to refactor that we can think of is to break this query down
into multiple views and then join the views. There is similar possibility
to create intermediate tables.

However creating multiple DDLs in order to maintain a single DML is not
very smooth. We would end up polluting metadata database by creating views
/ intermediate tables which are used in just this ETL.

What are the other efficient ways to maintain complex SQL queries written
in Hive? Are there better ways to break Hive query into multiple modules?

-- Saumitra S. Shahapure

Reply via email to