Re: Fw: Managed to make Hive run on Spark engine

Jone Zhang Tue, 08 Dec 2015 03:59:22 -0800

You can search for last month's mailing list with "Do you have more
suggestions on when to use Hive on MapReduce or Hive on Spark?"
I hope for you a little help.


Best wishes.

2015-12-08 6:18 GMT+08:00 Ashok Kumar <ashok34...@yahoo.com>:

>
> This is great news sir. It shows perseverance pays at last.
>
> Can you inform us when the write-up is ready so I can set it up as well
> please.
>
> I know a bit about the advantages of having Hive using Spark engine.
> However, the general question I have is when one should use Hive on spark
> as opposed to Hive on MapReduce engine?
>
> Thanks again
>
>
>
>
> On Monday, 7 December 2015, 15:50, Mich Talebzadeh <m...@peridale.co.uk>
> wrote:
>
>
> For those interested
>
> *From:* Mich Talebzadeh [mailto:m...@peridale.co.uk]
> *Sent:* 06 December 2015 20:33
> *To:* user@hive.apache.org
> *Subject:* Managed to make Hive run on Spark engine
>
> Thanks all especially to Xuefu.for contributions. Finally it works, which
> means don’t give up until it works J
>
> hduser@rhes564::/usr/lib/hive/lib> hive
> Logging initialized using configuration in
> jar:file:/usr/lib/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
> *hive> set spark.home= /usr/lib/spark-1.3.1-bin-hadoop2.6;*
> *hive> set hive.execution.engine=spark;*
> *hive> set spark.master=spark://50.140.197.217:7077
> <http://50.140.197.217:7077>;*
> *hive> set spark.eventLog.enabled=true;*
> *hive> set spark.eventLog.dir= /usr/lib/spark-1.3.1-bin-hadoop2.6/logs;*
> *hive> set spark.executor.memory=512m;*
> *hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;*
> *hive> set hive.spark.client.server.connect.timeout=220000ms;*
> *hive> set
> spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;*
> hive> use asehadoop;
> OK
> Time taken: 0.638 seconds
> hive> *select count(1) from t;*
> Query ID = hduser_20151206200528_4b85889f-e4ca-41d2-9bd2-1082104be42b
> Total jobs = 1
> Launching Job 1 out of 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=<number>
> Starting Spark Job = c8fee86c-0286-4276-aaa1-2a5eb4e4958a
>
> Query Hive on Spark job[0] stages:
> 0
> 1
>
> Status: Running (Hive on Spark job[0])
> Job Progress Format
> CurrentTime StageId_StageAttemptId:
> SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount
> [StageCost]
> 2015-12-06 20:05:36,299 Stage-0_0: 0(+1)/1      Stage-1_0: 0/1
> 2015-12-06 20:05:39,344 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1
> 2015-12-06 20:05:40,350 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished
> Status: Finished successfully in 8.10 seconds
> OK
>
> The versions used for this project
>
>
> OS version Linux version 2.6.18-92.el5xen (
> brewbuil...@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20071124
> (Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:31:30 EDT 2008
>
> Hadoop 2.6.0
> Hive 1.2.1
> spark-1.3.1-bin-hadoop2.6 (downloaded from prebuild 
> spark-1.3.1-bin-hadoop2.6.gz
> for starting spark standalone cluster)
> The Jar file used in $HIVE_HOME/lib to link Hive to spark was à
> spark-assembly-1.3.1-hadoop2.4.0.jar
>    (built from the source downloaded as zipped file spark-1.3.1.gz and
> built with command line make-distribution.sh --name
> "hadoop2-without-hive" --tgz
> "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"
>
> Pretty picky on parameters, CLASSPATH, IP addresses or hostname etc to
> make it work
>
> I will create a full guide on how to build and make Hive to run with Spark
> as its engine (as opposed to MR).
>
> HTH
>
> Mich Talebzadeh
>
> *Sybase ASE 15 Gold Medal Award 2008*
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
> *Publications due shortly:*
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
> http://talebzadehmich.wordpress.com/
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
>
>
>

Re: Fw: Managed to make Hive run on Spark engine

Reply via email to