thank you Krishna! Could you please explain why do I need install spark on each node if Spark official site said: If you have a Hadoop 2 cluster, you can run Spark without any installation needed
I have HDP 2 (YARN) and that's why I hope I don't need to install spark on each node Thank you, Konstantin Kudryavtsev On Mon, Jul 7, 2014 at 1:57 PM, Krishna Sankar <ksanka...@gmail.com> wrote: > Konstantin, > > 1. You need to install the hadoop rpms on all nodes. If it is Hadoop > 2, the nodes would have hdfs & YARN. > 2. Then you need to install Spark on all nodes. I haven't had > experience with HDP, but the tech preview might have installed Spark as > well. > 3. In the end, one should have hdfs,yarn & spark installed on all the > nodes. > 4. After installations, check the web console to make sure hdfs, yarn > & spark are running. > 5. Then you are ready to start experimenting/developing spark > applications. > > HTH. > Cheers > <k/> > > > On Mon, Jul 7, 2014 at 2:34 AM, Konstantin Kudryavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > >> guys, I'm not talking about running spark on VM, I don have problem with >> it. >> >> I confused in the next: >> 1) Hortonworks describe installation process as RPMs on each node >> 2) spark home page said that everything I need is YARN >> >> And I'm in stucj with understanding what I need to do to run spark on >> yarn (do I need RPMs installations or only build spark on edge node?) >> >> >> Thank you, >> Konstantin Kudryavtsev >> >> >> On Mon, Jul 7, 2014 at 4:34 AM, Robert James <srobertja...@gmail.com> >> wrote: >> >>> I can say from my experience that getting Spark to work with Hadoop 2 >>> is not for the beginner; after solving one problem after another >>> (dependencies, scripts, etc.), I went back to Hadoop 1. >>> >>> Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure >>> why, but, given so, Hadoop 2 has too many bumps >>> >>> On 7/6/14, Marco Shaw <marco.s...@gmail.com> wrote: >>> > That is confusing based on the context you provided. >>> > >>> > This might take more time than I can spare to try to understand. >>> > >>> > For sure, you need to add Spark to run it in/on the HDP 2.1 express VM. >>> > >>> > Cloudera's CDH 5 express VM includes Spark, but the service isn't >>> running by >>> > default. >>> > >>> > I can't remember for MapR... >>> > >>> > Marco >>> > >>> >> On Jul 6, 2014, at 6:33 PM, Konstantin Kudryavtsev >>> >> <kudryavtsev.konstan...@gmail.com> wrote: >>> >> >>> >> Marco, >>> >> >>> >> Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that >>> you >>> >> can try >>> >> from >>> >> >>> http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf >>> >> HDP 2.1 means YARN, at the same time they propose ti install rpm >>> >> >>> >> On other hand, http://spark.apache.org/ said " >>> >> Integrated with Hadoop >>> >> Spark can run on Hadoop 2's YARN cluster manager, and can read any >>> >> existing Hadoop data. >>> >> >>> >> If you have a Hadoop 2 cluster, you can run Spark without any >>> installation >>> >> needed. " >>> >> >>> >> And this is confusing for me... do I need rpm installation on not?... >>> >> >>> >> >>> >> Thank you, >>> >> Konstantin Kudryavtsev >>> >> >>> >> >>> >>> On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw <marco.s...@gmail.com> >>> >>> wrote: >>> >>> Can you provide links to the sections that are confusing? >>> >>> >>> >>> My understanding, the HDP1 binaries do not need YARN, while the HDP2 >>> >>> binaries do. >>> >>> >>> >>> Now, you can also install Hortonworks Spark RPM... >>> >>> >>> >>> For production, in my opinion, RPMs are better for manageability. >>> >>> >>> >>>> On Jul 6, 2014, at 5:39 PM, Konstantin Kudryavtsev >>> >>>> <kudryavtsev.konstan...@gmail.com> wrote: >>> >>>> >>> >>>> Hello, thanks for your message... I'm confused, Hortonworhs suggest >>> >>>> install spark rpm on each node, but on Spark main page said that >>> yarn >>> >>>> enough and I don't need to install it... What the difference? >>> >>>> >>> >>>> sent from my HTC >>> >>>> >>> >>>>> On Jul 6, 2014 8:34 PM, "vs" <vinayshu...@gmail.com> wrote: >>> >>>>> Konstantin, >>> >>>>> >>> >>>>> HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you >>> can >>> >>>>> try >>> >>>>> from >>> >>>>> >>> http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf >>> >>>>> >>> >>>>> Let me know if you see issues with the tech preview. >>> >>>>> >>> >>>>> "spark PI example on HDP 2.0 >>> >>>>> >>> >>>>> I downloaded spark 1.0 pre-build from >>> >>>>> http://spark.apache.org/downloads.html >>> >>>>> (for HDP2) >>> >>>>> The run example from spark web-site: >>> >>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi >>> >>>>> --master >>> >>>>> yarn-cluster --num-executors 3 --driver-memory 2g >>> --executor-memory 2g >>> >>>>> --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2 >>> >>>>> >>> >>>>> I got error: >>> >>>>> Application application_1404470405736_0044 failed 3 times due to AM >>> >>>>> Container for appattempt_1404470405736_0044_000003 exited with >>> >>>>> exitCode: 1 >>> >>>>> due to: Exception from container-launch: >>> >>>>> org.apache.hadoop.util.Shell$ExitCodeException: >>> >>>>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) >>> >>>>> at org.apache.hadoop.util.Shell.run(Shell.java:379) >>> >>>>> at >>> >>>>> >>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) >>> >>>>> at >>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) >>> >>>>> at >>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) >>> >>>>> at >>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) >>> >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> >>>>> at >>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>>>> at >>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> >>>>> at java.lang.Thread.run(Thread.java:744) >>> >>>>> .Failing this attempt.. Failing the application. >>> >>>>> >>> >>>>> Unknown/unsupported param List(--executor-memory, 2048, >>> >>>>> --executor-cores, 1, >>> >>>>> --num-executors, 3) >>> >>>>> Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] >>> >>>>> Options: >>> >>>>> --jar JAR_PATH Path to your application's JAR file >>> (required) >>> >>>>> --class CLASS_NAME Name of your application's main class >>> >>>>> (required) >>> >>>>> ...bla-bla-bla >>> >>>>> " >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> View this message in context: >>> >>>>> >>> http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-run-Spark-1-0-SparkPi-on-HDP-2-0-tp8802p8873.html >>> >>>>> Sent from the Apache Spark User List mailing list archive at >>> >>>>> Nabble.com. >>> >> >>> > >>> >> >> >