Hello, We are running and maintaining quite big and complex Hive SELECT query right now. It's basically a single SELECT query which performs JOIN of about ten other SELECT query outputs.
A simplest way to refactor that we can think of is to break this query down into multiple views and then join the views. There is similar possibility to create intermediate tables. However creating multiple DDLs in order to maintain a single DML is not very smooth. We would end up polluting metadata database by creating views / intermediate tables which are used in just this ETL. What are the other efficient ways to maintain complex SQL queries written in Hive? Are there better ways to break Hive query into multiple modules? -- Saumitra S. Shahapure