Hi, What I'm planning to do is develop a reporting platform using existing data. I have an existing RDBMS which has large number of records. So I'm using. ( http://stackoverflow.com/questions/33635234/hadoop-2-7-spark-hive-jasperreports-scoop-architecuture )
- Scoop - Extract data from RDBMS to Hadoop - Hadoop - Storage platform -> *Deployment Completed* - Hive - Datawarehouse - Spark - Read time processing -> *Deployment Completed* I'm planning to deploy Hive on Spark but I can't find the installation steps. I tried to read the official '[Hive on Spark][1]' guide but it has problems. As an example it says under 'Configuring Yarn' `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler` but does not imply where should I do it. Also as per the guide configurations are set in the Hive runtime shell which is not permanent according to my knowledge. Given that I read [this][2] but it does not have any steps. Please provide me the steps to run Hive on Spark on Ubuntu as a production system? [1]: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started [2]: http://stackoverflow.com/questions/26018306/how-to-configure-hive-to-use-spark -- Regards, Dasun Hegoda, Software Engineer www.dasunhegoda.com | dasunheg...@gmail.com