Hi Team, basically we have all data as hive tables ..and processing it till now in hive on MR.. now that we have hivecontext which can run hivequeries on spark, we are making all these complex hive scripts to run using hivecontext.sql(sc.textfile(hivescript)) kind of approach ie basically running hive queries on spark and not coding anything yet in scala still we see just making hive queries to run on spark is showing a lot difference in time than run on MR..
so as we already have hivescripts lets make those complex hivescript run using hc.sql as hc.sql is able to do it or is this not best practice even though spark can do it its still better to load all those individual hive tables in spark and make rdds and write scala code to get the same functionality happening in hive its becoming difficult for us to choose whether to leave it to hc.sql to do the work of running complex scripts also or we have to code in scala..will it be worth the effort of manual intervention in terms of performance ex of our sample scripts use db; create tempfunction1 as com.fgh.jkl.TestFunction; create destable in hive; insert overwrite desttable select (big complext transformations and usage of hive udf) from table1,table2,table3 join table4 on some condition complex and join table 7 on another complex condition where complex filtering So please help what would be best approach and why i should not give entire script for hivecontext to make its own rdds and run on spark if we are able to do it coz all examples i see online are only showing hc.sql("select * from table1) and nothing complex than that