Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

Dasun Hegoda Thu, 26 Nov 2015 21:11:28 -0800

This works fine for me

spark-shell --master yarn-client


On Tue, Nov 24, 2015 at 11:43 AM, Dasun Hegoda <dasunheg...@gmail.com>
wrote:

> Hey floks,
>
> Any updates?
>
> On Mon, Nov 23, 2015 at 5:15 PM, Dasun Hegoda <dasunheg...@gmail.com>
> wrote:
>
>> Do you have any clue how to get his fixed?
>>
>> On Mon, Nov 23, 2015 at 4:27 PM, Dasun Hegoda <dasunheg...@gmail.com>
>> wrote:
>>
>>> I get this now. It's different than what you get
>>>
>>> hduser@master:~/spark-1.5.1-bin-hadoop2.6/bin$ ./spark-shell
>>> 15/11/23 05:56:13 INFO spark.SecurityManager: Changing view acls to:
>>> hduser
>>> 15/11/23 05:56:13 INFO spark.SecurityManager: Changing modify acls to:
>>> hduser
>>> 15/11/23 05:56:13 INFO spark.SecurityManager: SecurityManager:
>>> authentication disabled; ui acls disabled; users with view permissions:
>>> Set(hduser); users with modify permissions: Set(hduser)
>>> 15/11/23 05:56:13 INFO spark.HttpServer: Starting HTTP Server
>>> 15/11/23 05:56:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> 15/11/23 05:56:13 INFO server.AbstractConnector: Started
>>> SocketConnector@0.0.0.0:34334
>>> 15/11/23 05:56:13 INFO util.Utils: Successfully started service 'HTTP
>>> class server' on port 34334.
>>> Welcome to
>>>       ____              __
>>>      / __/__  ___ _____/ /__
>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.1
>>>       /_/
>>>
>>> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
>>> 1.7.0_55)
>>> Type in expressions to have them evaluated.
>>> Type :help for more information.
>>> 15/11/23 05:56:17 INFO spark.SparkContext: Running Spark version 1.5.1
>>> 15/11/23 05:56:17 WARN spark.SparkConf:
>>> SPARK_JAVA_OPTS was detected (set to '-Dspark.driver.port=53411').
>>> This is deprecated in Spark 1.0+.
>>>
>>> Please instead use:
>>>  - ./spark-submit with conf/spark-defaults.conf to set defaults for an
>>> application
>>>  - ./spark-submit with --driver-java-options to set -X options for a
>>> driver
>>>  - spark.executor.extraJavaOptions to set -X options for executors
>>>  - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons
>>> (master or worker)
>>>
>>> 15/11/23 05:56:17 WARN spark.SparkConf: Setting
>>> 'spark.executor.extraJavaOptions' to '-Dspark.driver.port=53411' as a
>>> work-around.
>>> 15/11/23 05:56:17 WARN spark.SparkConf: Setting
>>> 'spark.driver.extraJavaOptions' to '-Dspark.driver.port=53411' as a
>>> work-around.
>>> 15/11/23 05:56:17 INFO spark.SecurityManager: Changing view acls to:
>>> hduser
>>> 15/11/23 05:56:17 INFO spark.SecurityManager: Changing modify acls to:
>>> hduser
>>> 15/11/23 05:56:17 INFO spark.SecurityManager: SecurityManager:
>>> authentication disabled; ui acls disabled; users with view permissions:
>>> Set(hduser); users with modify permissions: Set(hduser)
>>> 15/11/23 05:56:18 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>> 15/11/23 05:56:18 INFO Remoting: Starting remoting
>>> 15/11/23 05:56:18 INFO Remoting: Remoting started; listening on
>>> addresses :[akka.tcp://sparkDriver@192.168.7.87:53411]
>>> 15/11/23 05:56:18 INFO util.Utils: Successfully started service
>>> 'sparkDriver' on port 53411.
>>> 15/11/23 05:56:18 INFO spark.SparkEnv: Registering MapOutputTracker
>>> 15/11/23 05:56:18 INFO spark.SparkEnv: Registering BlockManagerMaster
>>> 15/11/23 05:56:18 INFO storage.DiskBlockManager: Created local directory
>>> at /tmp/blockmgr-0232975c-c76b-444d-b7f7-1ef2f28e388c
>>> 15/11/23 05:56:18 INFO storage.MemoryStore: MemoryStore started with
>>> capacity 530.3 MB
>>> 15/11/23 05:56:18 INFO spark.HttpFileServer: HTTP File server directory
>>> is
>>> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/httpd-311975ea-ac22-493d-8fd5-0f48b562a9a5
>>> 15/11/23 05:56:18 INFO spark.HttpServer: Starting HTTP Server
>>> 15/11/23 05:56:18 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> 15/11/23 05:56:18 INFO server.AbstractConnector: Started
>>> SocketConnector@0.0.0.0:60477
>>> 15/11/23 05:56:18 INFO util.Utils: Successfully started service 'HTTP
>>> file server' on port 60477.
>>> 15/11/23 05:56:18 INFO spark.SparkEnv: Registering
>>> OutputCommitCoordinator
>>> 15/11/23 05:56:18 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> 15/11/23 05:56:18 INFO server.AbstractConnector: Started
>>> SelectChannelConnector@0.0.0.0:4040
>>> 15/11/23 05:56:18 INFO util.Utils: Successfully started service
>>> 'SparkUI' on port 4040.
>>> 15/11/23 05:56:18 INFO ui.SparkUI: Started SparkUI at
>>> http://192.168.7.87:4040
>>> 15/11/23 05:56:18 WARN metrics.MetricsSystem: Using default name
>>> DAGScheduler for source because spark.app.id is not set.
>>> 15/11/23 05:56:18 INFO client.AppClient$ClientEndpoint: Connecting to
>>> master spark://master:7077...
>>> 15/11/23 05:56:38 ERROR util.SparkUncaughtExceptionHandler: Uncaught
>>> exception in thread Thread[appclient-registration-retry-thread,5,main]
>>> java.util.concurrent.RejectedExecutionException: Task
>>> java.util.concurrent.FutureTask@236f0e3a rejected from
>>> java.util.concurrent.ThreadPoolExecutor@500f1402[Running, pool size =
>>> 1, active threads = 0, queued tasks = 0, completed tasks = 1]
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>>> at
>>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
>>> at
>>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:96)
>>> at
>>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:95)
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> at
>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>>> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>>> at
>>> org.apache.spark.deploy.client.AppClient$ClientEndpoint.tryRegisterAllMasters(AppClient.scala:95)
>>> at
>>> org.apache.spark.deploy.client.AppClient$ClientEndpoint.org$apache$spark$deploy$client$AppClient$ClientEndpoint$$registerWithMaster(AppClient.scala:121)
>>> at
>>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:132)
>>> at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
>>> at
>>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:124)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 15/11/23 05:56:38 INFO storage.DiskBlockManager: Shutdown hook called
>>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Shutdown hook called
>>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory
>>> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/httpd-311975ea-ac22-493d-8fd5-0f48b562a9a5
>>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory
>>> /tmp/spark-8fefb39a-09b5-443c-b7b4-9c54bce6e245
>>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory
>>> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/userFiles-b593fc93-c23a-4a9e-aede-ed051f149fcb
>>> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory
>>> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593
>>>
>>> On Mon, Nov 23, 2015 at 4:19 PM, Mich Talebzadeh <m...@peridale.co.uk>
>>> wrote:
>>>
>>>> As example shows all set in hive-core.xml
>>>>
>>>>
>>>>
>>>> <property>
>>>>
>>>>     <name>hive.execution.engine</name>
>>>>
>>>>     *<value>spark</value>*
>>>>
>>>>     <description>
>>>>
>>>>       Expects one of [mr, tez, spark].
>>>>
>>>>       Chooses execution engine. Options are: mr (Map reduce, default)
>>>> or tez (hadoop 2 only)
>>>>
>>>>     </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> <property>
>>>>
>>>>     <name> spark.eventLog.enabled</name>
>>>>
>>>>     *<value>true</value>*
>>>>
>>>>     <description>
>>>>
>>>>            Spark event log setting
>>>>
>>>>     </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> *Sybase ASE 15 Gold Medal Award 2008*
>>>>
>>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>>>>
>>>>
>>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>>>>
>>>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase
>>>> ASE 15", ISBN 978-0-9563693-0-7*.
>>>>
>>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
>>>> 978-0-9759693-0-4*
>>>>
>>>> *Publications due shortly:*
>>>>
>>>> *Complex Event Processing in Heterogeneous Environments*, ISBN:
>>>> 978-0-9563693-3-8
>>>>
>>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, 
>>>> volume
>>>> one out shortly
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> NOTE: The information in this email is proprietary and confidential.
>>>> This message is for the designated recipient only, if you are not the
>>>> intended recipient, you should destroy it immediately. Any information in
>>>> this message shall not be understood as given or endorsed by Peridale
>>>> Technology Ltd, its subsidiaries or their employees, unless expressly so
>>>> stated. It is the responsibility of the recipient to ensure that this email
>>>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their
>>>> employees accept any responsibility.
>>>>
>>>>
>>>>
>>>> *From:* Dasun Hegoda [mailto:dasunheg...@gmail.com]
>>>> *Sent:* 23 November 2015 10:40
>>>>
>>>> *To:* user@hive.apache.org
>>>> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu
>>>>
>>>>
>>>>
>>>> Thank you very much. This is very informative. Do you know how to set
>>>> these in hive-site.xml?
>>>>
>>>>
>>>>
>>>> hive> set spark.master=<Spark Master URL>
>>>>
>>>> hive> set spark.eventLog.enabled=true;
>>>>
>>>> hive> set spark.eventLog.dir=<Spark event log folder (must exist)>
>>>>
>>>> hive> set spark.executor.memory=512m;
>>>>
>>>> hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>>>>
>>>>
>>>>
>>>> If these set these in hive-site I think we will be able to get through
>>>>
>>>>
>>>>
>>>> On Mon, Nov 23, 2015 at 3:05 PM, Mich Talebzadeh <m...@peridale.co.uk>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I am looking at the set up here
>>>>
>>>>
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
>>>> .
>>>>
>>>>
>>>>
>>>> First this is about configuration of Hive to work with Spark. These are
>>>> my understanding
>>>>
>>>>
>>>>
>>>> 1.    Hive uses Yarn as its resource manager regardless
>>>>
>>>> 2.    Hive uses MapReduce as its execution engine by default
>>>>
>>>> 3.    Changing the execution engine to that of Spark at the
>>>> configuration level. If you look at Hive configuration file ->
>>>>  $HIVE_HOME/conf/hive-site.xml, you will see that default is mr MapReduce
>>>>
>>>> <property>
>>>>
>>>>     <name>hive.execution.engine</name>
>>>>
>>>>     *<value>mr</value>*
>>>>
>>>>     <description>
>>>>
>>>>       Expects one of [mr, tez].
>>>>
>>>>       Chooses execution engine. Options are: mr (Map reduce, default)
>>>> or tez (hadoop 2 only)
>>>>
>>>>     </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> 4.    If you change that to *spark and restart Hive, *you will force
>>>> Hive to use spark as its engine. So the choice is either do it at the
>>>> configuration level or session level (i.e set set
>>>> hive.execution.engine=spark;). For the rest of parameters you can do
>>>> the same. i.e. at hive-core.xml or at session level. Personally I would
>>>> still want hive to use MR engine so I will create spark-defaults.conf
>>>> as mentioned.
>>>>
>>>> 5.    I then start spark as standalone that works fine
>>>>
>>>> *hduser@rhes564::/usr/lib/spark> ./sbin/start-master.sh*
>>>>
>>>> starting org.apache.spark.deploy.master.Master, logging to
>>>> /usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out
>>>>
>>>> hduser@rhes564::/usr/lib/spark> more
>>>> /usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out
>>>>
>>>> Spark Command: /usr/java/latest/bin/java -cp
>>>> /usr/lib/spark/sbin/../conf/:/usr/lib/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/usr/lib/spark/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark/lib/datanucleus-ap
>>>>
>>>> i-jdo-3.2.6.jar:/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar -Xms1g
>>>> -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip
>>>> rhes564 --port 7077 --webui-port 8080
>>>>
>>>> ========================================
>>>>
>>>> Using Spark's default log4j profile:
>>>> org/apache/spark/log4j-defaults.properties
>>>>
>>>> 15/11/21 21:41:58 INFO Master: Registered signal handlers for [TERM,
>>>> HUP, INT]
>>>>
>>>> 15/11/21 21:41:58 WARN Utils: Your hostname, rhes564 resolves to a
>>>> loopback address: 127.0.0.1; using 50.140.197.217 instead (on interface
>>>> eth0)
>>>>
>>>> 15/11/21 21:41:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
>>>> another address
>>>>
>>>> 15/11/21 21:41:59 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>>
>>>> 15/11/21 21:41:59 INFO SecurityManager: Changing view acls to: hduser
>>>>
>>>> 15/11/21 21:41:59 INFO SecurityManager: Changing modify acls to: hduser
>>>>
>>>> 15/11/21 21:41:59 INFO SecurityManager: SecurityManager: authentication
>>>> disabled; ui acls disabled; users with view permissions: Set(hduser); users
>>>> with modify permissions: Set(hduser)
>>>>
>>>> 15/11/21 21:41:59 INFO Slf4jLogger: Slf4jLogger started
>>>>
>>>> 15/11/21 21:42:00 INFO Remoting: Starting remoting
>>>>
>>>> 15/11/21 21:42:00 INFO Remoting: Remoting started; listening on
>>>> addresses :[akka.tcp://sparkMaster@rhes564:7077]
>>>>
>>>> 15/11/21 21:42:00 INFO Utils: Successfully started service
>>>> 'sparkMaster' on port 7077.
>>>>
>>>> 15/11/21 21:42:00 INFO Master: Starting Spark master at
>>>> spark://rhes564:7077
>>>>
>>>> 15/11/21 21:42:00 INFO Master: Running Spark version 1.5.2
>>>>
>>>> 15/11/21 21:42:00 INFO Utils: Successfully started service 'MasterUI'
>>>> on port 8080.
>>>>
>>>> 15/11/21 21:42:00 INFO MasterWebUI: Started MasterWebUI at
>>>> http://50.140.197.217:8080
>>>>
>>>> 15/11/21 21:42:00 INFO Utils: Successfully started service on port 6066.
>>>>
>>>> 15/11/21 21:42:00 INFO StandaloneRestServer: Started REST server for
>>>> submitting applications on port 6066
>>>>
>>>> 15/11/21 21:42:00 INFO Master: I have been elected leader! New state:
>>>> ALIVE
>>>>
>>>> 6.    Then I try to start interactive spark-shell and it fails with an
>>>> error that I reported before
>>>>
>>>> *hduser@rhes564::/usr/lib/spark/bin> ./spark-shell --master
>>>> spark://rhes564:7077*
>>>>
>>>> log4j:WARN No appenders could be found for logger
>>>> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>>>>
>>>> log4j:WARN Please initialize the log4j system properly.
>>>>
>>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
>>>> for more info.
>>>>
>>>> Using Spark's repl log4j profile:
>>>> org/apache/spark/log4j-defaults-repl.properties
>>>>
>>>> To adjust logging level use sc.setLogLevel("INFO")
>>>>
>>>> Welcome to
>>>>
>>>>       ____              __
>>>>
>>>>      / __/__  ___ _____/ /__
>>>>
>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>
>>>>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>>>>
>>>>       /_/
>>>>
>>>>
>>>>
>>>> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
>>>> 1.7.0_25)
>>>>
>>>> Type in expressions to have them evaluated.
>>>>
>>>> Type :help for more information.
>>>>
>>>> 15/11/23 09:33:56 WARN Utils: Your hostname, rhes564 resolves to a
>>>> loopback address: 127.0.0.1; using 50.140.197.217 instead (on interface
>>>> eth0)
>>>>
>>>> 15/11/23 09:33:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
>>>> another address
>>>>
>>>> 15/11/23 09:33:57 WARN MetricsSystem: Using default name DAGScheduler
>>>> for source because spark.app.id is not set.
>>>>
>>>> Spark context available as sc.
>>>>
>>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
>>>> hive.server2.thrift.http.min.worker.threads does not exist
>>>>
>>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
>>>> hive.mapjoin.optimized.keys does not exist
>>>>
>>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
>>>> hive.mapjoin.lazy.hashtable does not exist
>>>>
>>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
>>>> hive.server2.thrift.http.max.worker.threads does not exist
>>>>
>>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
>>>> hive.server2.logging.operation.verbose does not exist
>>>>
>>>> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
>>>> hive.optimize.multigroupby.common.distincts does not exist
>>>>
>>>> *java.lang.RuntimeException: java.lang.RuntimeException: The root
>>>> scratch dir: /tmp/hive on HDFS should be writable. Current permissions are:
>>>> rwx------*
>>>>
>>>>
>>>>
>>>> That is where I am now and I have reported this spark user group but no
>>>> luck yet.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> *Sybase ASE 15 Gold Medal Award 2008*
>>>>
>>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>>>>
>>>>
>>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>>>>
>>>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase
>>>> ASE 15", ISBN 978-0-9563693-0-7*.
>>>>
>>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
>>>> 978-0-9759693-0-4*
>>>>
>>>> *Publications due shortly:*
>>>>
>>>> *Complex Event Processing in Heterogeneous Environments*, ISBN:
>>>> 978-0-9563693-3-8
>>>>
>>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, 
>>>> volume
>>>> one out shortly
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> NOTE: The information in this email is proprietary and confidential.
>>>> This message is for the designated recipient only, if you are not the
>>>> intended recipient, you should destroy it immediately. Any information in
>>>> this message shall not be understood as given or endorsed by Peridale
>>>> Technology Ltd, its subsidiaries or their employees, unless expressly so
>>>> stated. It is the responsibility of the recipient to ensure that this email
>>>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their
>>>> employees accept any responsibility.
>>>>
>>>>
>>>>
>>>> *From:* Dasun Hegoda [mailto:dasunheg...@gmail.com]
>>>> *Sent:* 23 November 2015 07:05
>>>> *To:* user@hive.apache.org
>>>> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu
>>>>
>>>>
>>>>
>>>> Anyone????
>>>>
>>>>
>>>>
>>>> On Sat, Nov 21, 2015 at 1:32 PM, Dasun Hegoda <dasunheg...@gmail.com>
>>>> wrote:
>>>>
>>>> Thank you very much but I would like to do the integration of these
>>>> components myself rather than using a packaged distribution. I think I have
>>>> come to right place. Can you please kindly tell me the configuration
>>>> steps run Hive on Spark?
>>>>
>>>>
>>>>
>>>> At least someone please elaborate these steps.
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
>>>> .
>>>>
>>>>
>>>>
>>>> Because at the latter part of the guide configurations are set in the
>>>> Hive runtime shell which is not permanent according to my knowledge.
>>>>
>>>>
>>>>
>>>> Please help me to get this done. Also I'm planning write a detailed
>>>> guide with configuration steps to run Hive on Spark. So others
>>>> can benefited from it and not troubled like me.
>>>>
>>>>
>>>>
>>>> Can someone please kindly tell me the configuration steps run Hive on
>>>> Spark?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Nov 21, 2015 at 12:28 PM, Sai Gopalakrishnan <
>>>> sai.gopalakrish...@aspiresys.com> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>>
>>>>
>>>> Thank you for your responses. I think Mich's suggestion is a great one,
>>>> will go with it. As Alan suggested, using compactor in Hive should help out
>>>> with managing the delta files.
>>>>
>>>>
>>>>
>>>> @Dasun, pardon me for deviating from the topic. Regarding
>>>> configuration, you could try a packaged distribution (Hortonworks
>>>> , Cloudera or MapR) like  Jörn Franke said. I use Hortonworks, its
>>>> open-source and compatible with Linux and Windows, provides detailed
>>>> documentation for installation and can be installed in less than a day
>>>> provided you're all set with the hardware.
>>>> http://hortonworks.com/hdp/downloads/
>>>>
>>>> [image: Image removed by sender.]
>>>> <http://hortonworks.com/hdp/downloads/>
>>>>
>>>> Download Hadoop - Hortonworks
>>>>
>>>> Download Apache Hadoop for the enterprise with Hortonworks Data
>>>> Platform. Data access, storage, governance, security and operations across
>>>> Linux and Windows
>>>>
>>>> Read more... <http://hortonworks.com/hdp/downloads/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Sai
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> *From:* Dasun Hegoda <dasunheg...@gmail.com>
>>>> *Sent:* Saturday, November 21, 2015 8:00 AM
>>>> *To:* user@hive.apache.org
>>>> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu
>>>>
>>>>
>>>>
>>>> Hi Mich, Hi Sai, Hi Jorn,
>>>>
>>>> Thank you very much for the information. I think we are deviating from
>>>> the original question. Hive on Spark on Ubuntu. Can you please kindly tell
>>>> me the configuration steps?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Nov 20, 2015 at 11:10 PM, Jörn Franke <jornfra...@gmail.com>
>>>> wrote:
>>>>
>>>> I think the most recent versions of cloudera or Hortonworks should
>>>> include all these components - try their Sandboxes.
>>>>
>>>>
>>>> On 20 Nov 2015, at 12:54, Dasun Hegoda <dasunheg...@gmail.com> wrote:
>>>>
>>>> Where can I get a Hadoop distribution containing these technologies?
>>>> Link?
>>>>
>>>>
>>>>
>>>> On Fri, Nov 20, 2015 at 5:22 PM, Jörn Franke <jornfra...@gmail.com>
>>>> wrote:
>>>>
>>>> I recommend to use a Hadoop distribution containing these technologies.
>>>> I think you get also other useful tools for your scenario, such as Auditing
>>>> using sentry or ranger.
>>>>
>>>>
>>>> On 20 Nov 2015, at 10:48, Mich Talebzadeh <m...@peridale.co.uk> wrote:
>>>>
>>>> Well
>>>>
>>>>
>>>>
>>>> “I'm planning to deploy Hive on Spark but I can't find the
>>>> installation steps. I tried to read the official '[Hive on Spark][1]' guide
>>>> but it has problems. As an example it says under 'Configuring Yarn'
>>>> `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
>>>> but does not imply where should I do it. Also as per the guide
>>>> configurations are set in the Hive runtime shell which is not permanent
>>>> according to my knowledge.”
>>>>
>>>>
>>>>
>>>> You can do that in yarn-site.xml file which is normally under
>>>> $HADOOP_HOME/etc/hadoop.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> HTH
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> *Sybase ASE 15 Gold Medal Award 2008*
>>>>
>>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>>>>
>>>>
>>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>>>>
>>>> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase
>>>> ASE 15", ISBN 978-0-9563693-0-7*.
>>>>
>>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
>>>> 978-0-9759693-0-4*
>>>>
>>>> *Publications due shortly:*
>>>>
>>>> *Complex Event Processing in Heterogeneous Environments*, ISBN:
>>>> 978-0-9563693-3-8
>>>>
>>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4,
>>>> volume one out shortly
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> NOTE: The information in this email is proprietary and confidential.
>>>> This message is for the designated recipient only, if you are not the
>>>> intended recipient, you should destroy it immediately. Any information in
>>>> this message shall not be understood as given or endorsed by Peridale
>>>> Technology Ltd, its subsidiaries or their employees, unless expressly so
>>>> stated. It is the responsibility of the recipient to ensure that this email
>>>> is virus free, therefore neither Peridale Ltd, its subsidiaries nor their
>>>> employees accept any responsibility.
>>>>
>>>>
>>>>
>>>> *From:* Dasun Hegoda [mailto:dasunheg...@gmail.com
>>>> <dasunheg...@gmail.com>]
>>>> *Sent:* 20 November 2015 09:36
>>>> *To:* user@hive.apache.org
>>>> *Subject:* Hive on Spark - Hadoop 2 - Installation - Ubuntu
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> What I'm planning to do is develop a reporting platform using existing
>>>> data. I have an existing RDBMS which has large number of records. So I'm
>>>> using. (
>>>> http://stackoverflow.com/questions/33635234/hadoop-2-7-spark-hive-jasperreports-scoop-architecuture
>>>> )
>>>>
>>>>
>>>>
>>>>  - Scoop - Extract data from RDBMS to Hadoop
>>>>
>>>>  - Hadoop - Storage platform -> *Deployment Completed*
>>>>
>>>>  - Hive - Datawarehouse
>>>>
>>>>  - Spark - Read time processing -> *Deployment Completed*
>>>>
>>>>
>>>>
>>>> I'm planning to deploy Hive on Spark but I can't find the installation
>>>> steps. I tried to read the official '[Hive on Spark][1]' guide but it has
>>>> problems. As an example it says under 'Configuring Yarn'
>>>> `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
>>>> but does not imply where should I do it. Also as per the guide
>>>> configurations are set in the Hive runtime shell which is not permanent
>>>> according to my knowledge.
>>>>
>>>>
>>>>
>>>> Given that I read [this][2] but it does not have any steps.
>>>>
>>>>
>>>>
>>>> Please provide me the steps to run Hive on Spark on Ubuntu as a
>>>> production system?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   [1]:
>>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
>>>>
>>>>   [2]:
>>>> http://stackoverflow.com/questions/26018306/how-to-configure-hive-to-use-spark
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>>
>>>> Dasun Hegoda, Software Engineer
>>>> www.dasunhegoda.com | dasunheg...@gmail.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>>
>>>> Dasun Hegoda, Software Engineer
>>>> www.dasunhegoda.com | dasunheg...@gmail.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>>
>>>> Dasun Hegoda, Software Engineer
>>>> www.dasunhegoda.com | dasunheg...@gmail.com
>>>>
>>>> [image: Image removed by sender. Aspire Systems]
>>>>
>>>> This e-mail message and any attachments are for the sole use of the
>>>> intended recipient(s) and may contain proprietary, confidential, trade
>>>> secret or privileged information. Any unauthorized review, use, disclosure
>>>> or distribution is prohibited and may be a violation of law. If you are not
>>>> the intended recipient, please contact the sender by reply e-mail and
>>>> destroy all copies of the original message.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>>
>>>> Dasun Hegoda, Software Engineer
>>>> www.dasunhegoda.com | dasunheg...@gmail.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>>
>>>> Dasun Hegoda, Software Engineer
>>>> www.dasunhegoda.com | dasunheg...@gmail.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>>
>>>> Dasun Hegoda, Software Engineer
>>>> www.dasunhegoda.com | dasunheg...@gmail.com
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Dasun Hegoda, Software Engineer
>>> www.dasunhegoda.com | dasunheg...@gmail.com
>>>
>>
>>
>>
>> --
>> Regards,
>> Dasun Hegoda, Software Engineer
>> www.dasunhegoda.com | dasunheg...@gmail.com
>>
>
>
>
> --
> Regards,
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunheg...@gmail.com
>



-- 
Regards,
Dasun Hegoda, Software Engineer
www.dasunhegoda.com | dasunheg...@gmail.com

Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

Reply via email to