This works fine for me spark-shell --master yarn-client
On Tue, Nov 24, 2015 at 11:43 AM, Dasun Hegoda <dasunheg...@gmail.com> wrote: > Hey floks, > > Any updates? > > On Mon, Nov 23, 2015 at 5:15 PM, Dasun Hegoda <dasunheg...@gmail.com> > wrote: > >> Do you have any clue how to get his fixed? >> >> On Mon, Nov 23, 2015 at 4:27 PM, Dasun Hegoda <dasunheg...@gmail.com> >> wrote: >> >>> I get this now. It's different than what you get >>> >>> hduser@master:~/spark-1.5.1-bin-hadoop2.6/bin$ ./spark-shell >>> 15/11/23 05:56:13 INFO spark.SecurityManager: Changing view acls to: >>> hduser >>> 15/11/23 05:56:13 INFO spark.SecurityManager: Changing modify acls to: >>> hduser >>> 15/11/23 05:56:13 INFO spark.SecurityManager: SecurityManager: >>> authentication disabled; ui acls disabled; users with view permissions: >>> Set(hduser); users with modify permissions: Set(hduser) >>> 15/11/23 05:56:13 INFO spark.HttpServer: Starting HTTP Server >>> 15/11/23 05:56:13 INFO server.Server: jetty-8.y.z-SNAPSHOT >>> 15/11/23 05:56:13 INFO server.AbstractConnector: Started >>> SocketConnector@0.0.0.0:34334 >>> 15/11/23 05:56:13 INFO util.Utils: Successfully started service 'HTTP >>> class server' on port 34334. >>> Welcome to >>> ____ __ >>> / __/__ ___ _____/ /__ >>> _\ \/ _ \/ _ `/ __/ '_/ >>> /___/ .__/\_,_/_/ /_/\_\ version 1.5.1 >>> /_/ >>> >>> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java >>> 1.7.0_55) >>> Type in expressions to have them evaluated. >>> Type :help for more information. >>> 15/11/23 05:56:17 INFO spark.SparkContext: Running Spark version 1.5.1 >>> 15/11/23 05:56:17 WARN spark.SparkConf: >>> SPARK_JAVA_OPTS was detected (set to '-Dspark.driver.port=53411'). >>> This is deprecated in Spark 1.0+. >>> >>> Please instead use: >>> - ./spark-submit with conf/spark-defaults.conf to set defaults for an >>> application >>> - ./spark-submit with --driver-java-options to set -X options for a >>> driver >>> - spark.executor.extraJavaOptions to set -X options for executors >>> - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons >>> (master or worker) >>> >>> 15/11/23 05:56:17 WARN spark.SparkConf: Setting >>> 'spark.executor.extraJavaOptions' to '-Dspark.driver.port=53411' as a >>> work-around. >>> 15/11/23 05:56:17 WARN spark.SparkConf: Setting >>> 'spark.driver.extraJavaOptions' to '-Dspark.driver.port=53411' as a >>> work-around. >>> 15/11/23 05:56:17 INFO spark.SecurityManager: Changing view acls to: >>> hduser >>> 15/11/23 05:56:17 INFO spark.SecurityManager: Changing modify acls to: >>> hduser >>> 15/11/23 05:56:17 INFO spark.SecurityManager: SecurityManager: >>> authentication disabled; ui acls disabled; users with view permissions: >>> Set(hduser); users with modify permissions: Set(hduser) >>> 15/11/23 05:56:18 INFO slf4j.Slf4jLogger: Slf4jLogger started >>> 15/11/23 05:56:18 INFO Remoting: Starting remoting >>> 15/11/23 05:56:18 INFO Remoting: Remoting started; listening on >>> addresses :[akka.tcp://sparkDriver@192.168.7.87:53411] >>> 15/11/23 05:56:18 INFO util.Utils: Successfully started service >>> 'sparkDriver' on port 53411. >>> 15/11/23 05:56:18 INFO spark.SparkEnv: Registering MapOutputTracker >>> 15/11/23 05:56:18 INFO spark.SparkEnv: Registering BlockManagerMaster >>> 15/11/23 05:56:18 INFO storage.DiskBlockManager: Created local directory >>> at /tmp/blockmgr-0232975c-c76b-444d-b7f7-1ef2f28e388c >>> 15/11/23 05:56:18 INFO storage.MemoryStore: MemoryStore started with >>> capacity 530.3 MB >>> 15/11/23 05:56:18 INFO spark.HttpFileServer: HTTP File server directory >>> is >>> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/httpd-311975ea-ac22-493d-8fd5-0f48b562a9a5 >>> 15/11/23 05:56:18 INFO spark.HttpServer: Starting HTTP Server >>> 15/11/23 05:56:18 INFO server.Server: jetty-8.y.z-SNAPSHOT >>> 15/11/23 05:56:18 INFO server.AbstractConnector: Started >>> SocketConnector@0.0.0.0:60477 >>> 15/11/23 05:56:18 INFO util.Utils: Successfully started service 'HTTP >>> file server' on port 60477. >>> 15/11/23 05:56:18 INFO spark.SparkEnv: Registering >>> OutputCommitCoordinator >>> 15/11/23 05:56:18 INFO server.Server: jetty-8.y.z-SNAPSHOT >>> 15/11/23 05:56:18 INFO server.AbstractConnector: Started >>> SelectChannelConnector@0.0.0.0:4040 >>> 15/11/23 05:56:18 INFO util.Utils: Successfully started service >>> 'SparkUI' on port 4040. >>> 15/11/23 05:56:18 INFO ui.SparkUI: Started SparkUI at >>> http://192.168.7.87:4040 >>> 15/11/23 05:56:18 WARN metrics.MetricsSystem: Using default name >>> DAGScheduler for source because spark.app.id is not set. >>> 15/11/23 05:56:18 INFO client.AppClient$ClientEndpoint: Connecting to >>> master spark://master:7077... >>> 15/11/23 05:56:38 ERROR util.SparkUncaughtExceptionHandler: Uncaught >>> exception in thread Thread[appclient-registration-retry-thread,5,main] >>> java.util.concurrent.RejectedExecutionException: Task >>> java.util.concurrent.FutureTask@236f0e3a rejected from >>> java.util.concurrent.ThreadPoolExecutor@500f1402[Running, pool size = >>> 1, active threads = 0, queued tasks = 0, completed tasks = 1] >>> at >>> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) >>> at >>> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) >>> at >>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) >>> at >>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110) >>> at >>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:96) >>> at >>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:95) >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> at >>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >>> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) >>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >>> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) >>> at >>> org.apache.spark.deploy.client.AppClient$ClientEndpoint.tryRegisterAllMasters(AppClient.scala:95) >>> at >>> org.apache.spark.deploy.client.AppClient$ClientEndpoint.org$apache$spark$deploy$client$AppClient$ClientEndpoint$$registerWithMaster(AppClient.scala:121) >>> at >>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:132) >>> at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119) >>> at >>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:124) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> 15/11/23 05:56:38 INFO storage.DiskBlockManager: Shutdown hook called >>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Shutdown hook called >>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory >>> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/httpd-311975ea-ac22-493d-8fd5-0f48b562a9a5 >>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory >>> /tmp/spark-8fefb39a-09b5-443c-b7b4-9c54bce6e245 >>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory >>> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/userFiles-b593fc93-c23a-4a9e-aede-ed051f149fcb >>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory >>> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593 >>> >>> On Mon, Nov 23, 2015 at 4:19 PM, Mich Talebzadeh <m...@peridale.co.uk> >>> wrote: >>> >>>> As example shows all set in hive-core.xml >>>> >>>> >>>> >>>> <property> >>>> >>>> <name>hive.execution.engine</name> >>>> >>>> *<value>spark</value>* >>>> >>>> <description> >>>> >>>> Expects one of [mr, tez, spark]. >>>> >>>> Chooses execution engine. Options are: mr (Map reduce, default) >>>> or tez (hadoop 2 only) >>>> >>>> </description> >>>> >>>> </property> >>>> >>>> >>>> >>>> <property> >>>> >>>> <name> spark.eventLog.enabled</name> >>>> >>>> *<value>true</value>* >>>> >>>> <description> >>>> >>>> Spark event log setting >>>> >>>> </description> >>>> >>>> </property> >>>> >>>> >>>> >>>> >>>> >>>> Mich Talebzadeh >>>> >>>> >>>> >>>> *Sybase ASE 15 Gold Medal Award 2008* >>>> >>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15 >>>> >>>> >>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf >>>> >>>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase >>>> ASE 15", ISBN 978-0-9563693-0-7*. >>>> >>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN >>>> 978-0-9759693-0-4* >>>> >>>> *Publications due shortly:* >>>> >>>> *Complex Event Processing in Heterogeneous Environments*, ISBN: >>>> 978-0-9563693-3-8 >>>> >>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, >>>> volume >>>> one out shortly >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> NOTE: The information in this email is proprietary and confidential. >>>> This message is for the designated recipient only, if you are not the >>>> intended recipient, you should destroy it immediately. Any information in >>>> this message shall not be understood as given or endorsed by Peridale >>>> Technology Ltd, its subsidiaries or their employees, unless expressly so >>>> stated. It is the responsibility of the recipient to ensure that this email >>>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their >>>> employees accept any responsibility. >>>> >>>> >>>> >>>> *From:* Dasun Hegoda [mailto:dasunheg...@gmail.com] >>>> *Sent:* 23 November 2015 10:40 >>>> >>>> *To:* user@hive.apache.org >>>> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu >>>> >>>> >>>> >>>> Thank you very much. This is very informative. Do you know how to set >>>> these in hive-site.xml? >>>> >>>> >>>> >>>> hive> set spark.master=<Spark Master URL> >>>> >>>> hive> set spark.eventLog.enabled=true; >>>> >>>> hive> set spark.eventLog.dir=<Spark event log folder (must exist)> >>>> >>>> hive> set spark.executor.memory=512m; >>>> >>>> hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer; >>>> >>>> >>>> >>>> If these set these in hive-site I think we will be able to get through >>>> >>>> >>>> >>>> On Mon, Nov 23, 2015 at 3:05 PM, Mich Talebzadeh <m...@peridale.co.uk> >>>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>> I am looking at the set up here >>>> >>>> >>>> >>>> >>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started >>>> . >>>> >>>> >>>> >>>> First this is about configuration of Hive to work with Spark. These are >>>> my understanding >>>> >>>> >>>> >>>> 1. Hive uses Yarn as its resource manager regardless >>>> >>>> 2. Hive uses MapReduce as its execution engine by default >>>> >>>> 3. Changing the execution engine to that of Spark at the >>>> configuration level. If you look at Hive configuration file -> >>>> $HIVE_HOME/conf/hive-site.xml, you will see that default is mr MapReduce >>>> >>>> <property> >>>> >>>> <name>hive.execution.engine</name> >>>> >>>> *<value>mr</value>* >>>> >>>> <description> >>>> >>>> Expects one of [mr, tez]. >>>> >>>> Chooses execution engine. Options are: mr (Map reduce, default) >>>> or tez (hadoop 2 only) >>>> >>>> </description> >>>> >>>> </property> >>>> >>>> >>>> >>>> 4. If you change that to *spark and restart Hive, *you will force >>>> Hive to use spark as its engine. So the choice is either do it at the >>>> configuration level or session level (i.e set set >>>> hive.execution.engine=spark;). For the rest of parameters you can do >>>> the same. i.e. at hive-core.xml or at session level. Personally I would >>>> still want hive to use MR engine so I will create spark-defaults.conf >>>> as mentioned. >>>> >>>> 5. I then start spark as standalone that works fine >>>> >>>> *hduser@rhes564::/usr/lib/spark> ./sbin/start-master.sh* >>>> >>>> starting org.apache.spark.deploy.master.Master, logging to >>>> /usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out >>>> >>>> hduser@rhes564::/usr/lib/spark> more >>>> /usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out >>>> >>>> Spark Command: /usr/java/latest/bin/java -cp >>>> /usr/lib/spark/sbin/../conf/:/usr/lib/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/usr/lib/spark/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark/lib/datanucleus-ap >>>> >>>> i-jdo-3.2.6.jar:/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar -Xms1g >>>> -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip >>>> rhes564 --port 7077 --webui-port 8080 >>>> >>>> ======================================== >>>> >>>> Using Spark's default log4j profile: >>>> org/apache/spark/log4j-defaults.properties >>>> >>>> 15/11/21 21:41:58 INFO Master: Registered signal handlers for [TERM, >>>> HUP, INT] >>>> >>>> 15/11/21 21:41:58 WARN Utils: Your hostname, rhes564 resolves to a >>>> loopback address: 127.0.0.1; using 50.140.197.217 instead (on interface >>>> eth0) >>>> >>>> 15/11/21 21:41:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to >>>> another address >>>> >>>> 15/11/21 21:41:59 WARN NativeCodeLoader: Unable to load native-hadoop >>>> library for your platform... using builtin-java classes where applicable >>>> >>>> 15/11/21 21:41:59 INFO SecurityManager: Changing view acls to: hduser >>>> >>>> 15/11/21 21:41:59 INFO SecurityManager: Changing modify acls to: hduser >>>> >>>> 15/11/21 21:41:59 INFO SecurityManager: SecurityManager: authentication >>>> disabled; ui acls disabled; users with view permissions: Set(hduser); users >>>> with modify permissions: Set(hduser) >>>> >>>> 15/11/21 21:41:59 INFO Slf4jLogger: Slf4jLogger started >>>> >>>> 15/11/21 21:42:00 INFO Remoting: Starting remoting >>>> >>>> 15/11/21 21:42:00 INFO Remoting: Remoting started; listening on >>>> addresses :[akka.tcp://sparkMaster@rhes564:7077] >>>> >>>> 15/11/21 21:42:00 INFO Utils: Successfully started service >>>> 'sparkMaster' on port 7077. >>>> >>>> 15/11/21 21:42:00 INFO Master: Starting Spark master at >>>> spark://rhes564:7077 >>>> >>>> 15/11/21 21:42:00 INFO Master: Running Spark version 1.5.2 >>>> >>>> 15/11/21 21:42:00 INFO Utils: Successfully started service 'MasterUI' >>>> on port 8080. >>>> >>>> 15/11/21 21:42:00 INFO MasterWebUI: Started MasterWebUI at >>>> http://50.140.197.217:8080 >>>> >>>> 15/11/21 21:42:00 INFO Utils: Successfully started service on port 6066. >>>> >>>> 15/11/21 21:42:00 INFO StandaloneRestServer: Started REST server for >>>> submitting applications on port 6066 >>>> >>>> 15/11/21 21:42:00 INFO Master: I have been elected leader! New state: >>>> ALIVE >>>> >>>> 6. Then I try to start interactive spark-shell and it fails with an >>>> error that I reported before >>>> >>>> *hduser@rhes564::/usr/lib/spark/bin> ./spark-shell --master >>>> spark://rhes564:7077* >>>> >>>> log4j:WARN No appenders could be found for logger >>>> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). >>>> >>>> log4j:WARN Please initialize the log4j system properly. >>>> >>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig >>>> for more info. >>>> >>>> Using Spark's repl log4j profile: >>>> org/apache/spark/log4j-defaults-repl.properties >>>> >>>> To adjust logging level use sc.setLogLevel("INFO") >>>> >>>> Welcome to >>>> >>>> ____ __ >>>> >>>> / __/__ ___ _____/ /__ >>>> >>>> _\ \/ _ \/ _ `/ __/ '_/ >>>> >>>> /___/ .__/\_,_/_/ /_/\_\ version 1.5.2 >>>> >>>> /_/ >>>> >>>> >>>> >>>> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java >>>> 1.7.0_25) >>>> >>>> Type in expressions to have them evaluated. >>>> >>>> Type :help for more information. >>>> >>>> 15/11/23 09:33:56 WARN Utils: Your hostname, rhes564 resolves to a >>>> loopback address: 127.0.0.1; using 50.140.197.217 instead (on interface >>>> eth0) >>>> >>>> 15/11/23 09:33:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to >>>> another address >>>> >>>> 15/11/23 09:33:57 WARN MetricsSystem: Using default name DAGScheduler >>>> for source because spark.app.id is not set. >>>> >>>> Spark context available as sc. >>>> >>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name >>>> hive.server2.thrift.http.min.worker.threads does not exist >>>> >>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name >>>> hive.mapjoin.optimized.keys does not exist >>>> >>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name >>>> hive.mapjoin.lazy.hashtable does not exist >>>> >>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name >>>> hive.server2.thrift.http.max.worker.threads does not exist >>>> >>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name >>>> hive.server2.logging.operation.verbose does not exist >>>> >>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name >>>> hive.optimize.multigroupby.common.distincts does not exist >>>> >>>> *java.lang.RuntimeException: java.lang.RuntimeException: The root >>>> scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: >>>> rwx------* >>>> >>>> >>>> >>>> That is where I am now and I have reported this spark user group but no >>>> luck yet. >>>> >>>> >>>> >>>> >>>> >>>> Mich Talebzadeh >>>> >>>> >>>> >>>> *Sybase ASE 15 Gold Medal Award 2008* >>>> >>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15 >>>> >>>> >>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf >>>> >>>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase >>>> ASE 15", ISBN 978-0-9563693-0-7*. >>>> >>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN >>>> 978-0-9759693-0-4* >>>> >>>> *Publications due shortly:* >>>> >>>> *Complex Event Processing in Heterogeneous Environments*, ISBN: >>>> 978-0-9563693-3-8 >>>> >>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, >>>> volume >>>> one out shortly >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> NOTE: The information in this email is proprietary and confidential. >>>> This message is for the designated recipient only, if you are not the >>>> intended recipient, you should destroy it immediately. Any information in >>>> this message shall not be understood as given or endorsed by Peridale >>>> Technology Ltd, its subsidiaries or their employees, unless expressly so >>>> stated. It is the responsibility of the recipient to ensure that this email >>>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their >>>> employees accept any responsibility. >>>> >>>> >>>> >>>> *From:* Dasun Hegoda [mailto:dasunheg...@gmail.com] >>>> *Sent:* 23 November 2015 07:05 >>>> *To:* user@hive.apache.org >>>> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu >>>> >>>> >>>> >>>> Anyone???? >>>> >>>> >>>> >>>> On Sat, Nov 21, 2015 at 1:32 PM, Dasun Hegoda <dasunheg...@gmail.com> >>>> wrote: >>>> >>>> Thank you very much but I would like to do the integration of these >>>> components myself rather than using a packaged distribution. I think I have >>>> come to right place. Can you please kindly tell me the configuration >>>> steps run Hive on Spark? >>>> >>>> >>>> >>>> At least someone please elaborate these steps. >>>> >>>> >>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started >>>> . >>>> >>>> >>>> >>>> Because at the latter part of the guide configurations are set in the >>>> Hive runtime shell which is not permanent according to my knowledge. >>>> >>>> >>>> >>>> Please help me to get this done. Also I'm planning write a detailed >>>> guide with configuration steps to run Hive on Spark. So others >>>> can benefited from it and not troubled like me. >>>> >>>> >>>> >>>> Can someone please kindly tell me the configuration steps run Hive on >>>> Spark? >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Nov 21, 2015 at 12:28 PM, Sai Gopalakrishnan < >>>> sai.gopalakrish...@aspiresys.com> wrote: >>>> >>>> Hi everyone, >>>> >>>> >>>> >>>> Thank you for your responses. I think Mich's suggestion is a great one, >>>> will go with it. As Alan suggested, using compactor in Hive should help out >>>> with managing the delta files. >>>> >>>> >>>> >>>> @Dasun, pardon me for deviating from the topic. Regarding >>>> configuration, you could try a packaged distribution (Hortonworks >>>> , Cloudera or MapR) like Jörn Franke said. I use Hortonworks, its >>>> open-source and compatible with Linux and Windows, provides detailed >>>> documentation for installation and can be installed in less than a day >>>> provided you're all set with the hardware. >>>> http://hortonworks.com/hdp/downloads/ >>>> >>>> [image: Image removed by sender.] >>>> <http://hortonworks.com/hdp/downloads/> >>>> >>>> Download Hadoop - Hortonworks >>>> >>>> Download Apache Hadoop for the enterprise with Hortonworks Data >>>> Platform. Data access, storage, governance, security and operations across >>>> Linux and Windows >>>> >>>> Read more... <http://hortonworks.com/hdp/downloads/> >>>> >>>> >>>> >>>> >>>> >>>> Regards, >>>> >>>> Sai >>>> >>>> >>>> ------------------------------ >>>> >>>> *From:* Dasun Hegoda <dasunheg...@gmail.com> >>>> *Sent:* Saturday, November 21, 2015 8:00 AM >>>> *To:* user@hive.apache.org >>>> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu >>>> >>>> >>>> >>>> Hi Mich, Hi Sai, Hi Jorn, >>>> >>>> Thank you very much for the information. I think we are deviating from >>>> the original question. Hive on Spark on Ubuntu. Can you please kindly tell >>>> me the configuration steps? >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Nov 20, 2015 at 11:10 PM, Jörn Franke <jornfra...@gmail.com> >>>> wrote: >>>> >>>> I think the most recent versions of cloudera or Hortonworks should >>>> include all these components - try their Sandboxes. >>>> >>>> >>>> On 20 Nov 2015, at 12:54, Dasun Hegoda <dasunheg...@gmail.com> wrote: >>>> >>>> Where can I get a Hadoop distribution containing these technologies? >>>> Link? >>>> >>>> >>>> >>>> On Fri, Nov 20, 2015 at 5:22 PM, Jörn Franke <jornfra...@gmail.com> >>>> wrote: >>>> >>>> I recommend to use a Hadoop distribution containing these technologies. >>>> I think you get also other useful tools for your scenario, such as Auditing >>>> using sentry or ranger. >>>> >>>> >>>> On 20 Nov 2015, at 10:48, Mich Talebzadeh <m...@peridale.co.uk> wrote: >>>> >>>> Well >>>> >>>> >>>> >>>> “I'm planning to deploy Hive on Spark but I can't find the >>>> installation steps. I tried to read the official '[Hive on Spark][1]' guide >>>> but it has problems. As an example it says under 'Configuring Yarn' >>>> `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler` >>>> but does not imply where should I do it. Also as per the guide >>>> configurations are set in the Hive runtime shell which is not permanent >>>> according to my knowledge.” >>>> >>>> >>>> >>>> You can do that in yarn-site.xml file which is normally under >>>> $HADOOP_HOME/etc/hadoop. >>>> >>>> >>>> >>>> >>>> >>>> HTH >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Mich Talebzadeh >>>> >>>> >>>> >>>> *Sybase ASE 15 Gold Medal Award 2008* >>>> >>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15 >>>> >>>> >>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf >>>> >>>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase >>>> ASE 15", ISBN 978-0-9563693-0-7*. >>>> >>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN >>>> 978-0-9759693-0-4* >>>> >>>> *Publications due shortly:* >>>> >>>> *Complex Event Processing in Heterogeneous Environments*, ISBN: >>>> 978-0-9563693-3-8 >>>> >>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, >>>> volume one out shortly >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> NOTE: The information in this email is proprietary and confidential. >>>> This message is for the designated recipient only, if you are not the >>>> intended recipient, you should destroy it immediately. Any information in >>>> this message shall not be understood as given or endorsed by Peridale >>>> Technology Ltd, its subsidiaries or their employees, unless expressly so >>>> stated. It is the responsibility of the recipient to ensure that this email >>>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their >>>> employees accept any responsibility. >>>> >>>> >>>> >>>> *From:* Dasun Hegoda [mailto:dasunheg...@gmail.com >>>> <dasunheg...@gmail.com>] >>>> *Sent:* 20 November 2015 09:36 >>>> *To:* user@hive.apache.org >>>> *Subject:* Hive on Spark - Hadoop 2 - Installation - Ubuntu >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> What I'm planning to do is develop a reporting platform using existing >>>> data. I have an existing RDBMS which has large number of records. So I'm >>>> using. ( >>>> http://stackoverflow.com/questions/33635234/hadoop-2-7-spark-hive-jasperreports-scoop-architecuture >>>> ) >>>> >>>> >>>> >>>> - Scoop - Extract data from RDBMS to Hadoop >>>> >>>> - Hadoop - Storage platform -> *Deployment Completed* >>>> >>>> - Hive - Datawarehouse >>>> >>>> - Spark - Read time processing -> *Deployment Completed* >>>> >>>> >>>> >>>> I'm planning to deploy Hive on Spark but I can't find the installation >>>> steps. I tried to read the official '[Hive on Spark][1]' guide but it has >>>> problems. As an example it says under 'Configuring Yarn' >>>> `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler` >>>> but does not imply where should I do it. Also as per the guide >>>> configurations are set in the Hive runtime shell which is not permanent >>>> according to my knowledge. >>>> >>>> >>>> >>>> Given that I read [this][2] but it does not have any steps. >>>> >>>> >>>> >>>> Please provide me the steps to run Hive on Spark on Ubuntu as a >>>> production system? >>>> >>>> >>>> >>>> >>>> >>>> [1]: >>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started >>>> >>>> [2]: >>>> http://stackoverflow.com/questions/26018306/how-to-configure-hive-to-use-spark >>>> >>>> >>>> >>>> -- >>>> >>>> Regards, >>>> >>>> Dasun Hegoda, Software Engineer >>>> www.dasunhegoda.com | dasunheg...@gmail.com >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Regards, >>>> >>>> Dasun Hegoda, Software Engineer >>>> www.dasunhegoda.com | dasunheg...@gmail.com >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Regards, >>>> >>>> Dasun Hegoda, Software Engineer >>>> www.dasunhegoda.com | dasunheg...@gmail.com >>>> >>>> [image: Image removed by sender. Aspire Systems] >>>> >>>> This e-mail message and any attachments are for the sole use of the >>>> intended recipient(s) and may contain proprietary, confidential, trade >>>> secret or privileged information. Any unauthorized review, use, disclosure >>>> or distribution is prohibited and may be a violation of law. If you are not >>>> the intended recipient, please contact the sender by reply e-mail and >>>> destroy all copies of the original message. >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Regards, >>>> >>>> Dasun Hegoda, Software Engineer >>>> www.dasunhegoda.com | dasunheg...@gmail.com >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Regards, >>>> >>>> Dasun Hegoda, Software Engineer >>>> www.dasunhegoda.com | dasunheg...@gmail.com >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Regards, >>>> >>>> Dasun Hegoda, Software Engineer >>>> www.dasunhegoda.com | dasunheg...@gmail.com >>>> >>> >>> >>> >>> -- >>> Regards, >>> Dasun Hegoda, Software Engineer >>> www.dasunhegoda.com | dasunheg...@gmail.com >>> >> >> >> >> -- >> Regards, >> Dasun Hegoda, Software Engineer >> www.dasunhegoda.com | dasunheg...@gmail.com >> > > > > -- > Regards, > Dasun Hegoda, Software Engineer > www.dasunhegoda.com | dasunheg...@gmail.com > -- Regards, Dasun Hegoda, Software Engineer www.dasunhegoda.com | dasunheg...@gmail.com