That worked. Why ? Can you share a comprehensive iist of examples.
On Mon, Aug 3, 2015 at 4:59 PM, Alex <abezzu...@nflabs.com> wrote: > Hi, > > inside %spark you do not need to create SqlContext manually: > as with "sc" for SparkContext, Interpreter already have injected "sqlc" > val. > > Also AFAIK println statement should be in the separate paragraph. > > Can you try using that and see if it helps? > > -- > Kind regards, > Alexander > > On 04 Aug 2015, at 05:58, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > > I am unable to see the visualization with Zeppelin from blog : > http://hortonworks.com/blog/introduction-to-data-science-with-apache-spark/ > > > Notebook > %spark > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > import java.sql.Date > import org.apache.spark.sql.Row > > case class Log(level: String, date: Date, fileName: String) > > import java.text.SimpleDateFormat > > val df = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss,SSS") > > val ambari = ambariLogs.map { line => > val s = line.split(" ") > val logLevel = s(0) > val dateTime = df.parse(s(1) + " " + s(2)) > val fileName = s(3).split(":")(0) > Log(logLevel,new Date(dateTime.getTime()), fileName)}.toDF() > ambari.registerTempTable("ambari") > > > //ambari.groupBy("level").count() > sqlContext.sql("SELECT COUNT(*) from ambari") > > Output: > > sqlContext: org.apache.spark.sql.SQLContext = > org.apache.spark.sql.SQLContext@5ca68ee6 import sqlContext.implicits._ > import java.sql.Date import org.apache.spark.sql.Row defined class Log > import java.text.SimpleDateFormat df: java.text.SimpleDateFormat = > java.text.SimpleDateFormat@98f267e7 ambari: > org.apache.spark.sql.DataFrame = [level: string, date: date, fileName: > string] res74: org.apache.spark.sql.DataFrame = [c0: bigint] > > > Hence the table ambari is created successfully. > > In a new note, i wrote this > > %spark > import org.apache.spark.sql.Row > > val result = sqlContext.sql("SELECT level, COUNT(1) from ambari group by > level").map { > case Row(level: String, count: Long) => { > level + "\t" + count > } > }.collect() > > println("%table Log Level\tCount\n" + result.mkString("\n")) > > > Output: > import org.apache.spark.sql.Row result: Array[String] = Array(INFO 2444, > WARNING 3) %table Log Level Count INFO 2444 WARNING 3 > > I did not get graph rendering despite am outputing %table from println. > > Any suggestions ? > > > On Mon, Aug 3, 2015 at 1:47 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > >> Fixed it >> >> mvn clean package -Pspark-1.3 -Dspark.version=1.3.1 >> -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests >> >> Earlier i had >> >> mvn clean install -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 >> -Phadoop-2.7 -Pyarn >> >> On Mon, Aug 3, 2015 at 1:31 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >> wrote: >> >>> I have hadoop cluster up using Ambari. It also allowed me to install >>> Spark 1.3.1 and i can run sample spark application & Yarn application. So >>> cluster is up and running fine. >>> >>> I got Zeppelin setup on a new box and was able to launch UI. >>> >>> I modified spark interpreter to set >>> >>> masteryarn-clientspark.app.nameZeppelinspark.cores.max >>> spark.driver.extraJavaOptions-Dhdp.version=2.3.1.0-2574 >>> spark.executor.memory512mspark.home/usr/hdp/2.3.1.0-2574/spark >>> spark.yarn.am.extraJavaOptions-Dhdp.version=2.3.1.0-2574spark.yarn.jar >>> /home/zeppelin/incubator-zeppelin/interpreter/spark/zeppelin-spark-0.6.0-incubating-SNAPSHOT.jar >>> zeppelin.dep.localrepolocal-repo >>> >>> When i run a spark notebook >>> %spark >>> val ambariLogs = >>> sc.textFile("file:///var/log/ambari-agent/ambari-agent.log") >>> ambariLogs.take(10).mkString("\n") >>> >>> (The location exists) >>> >>> I see two exceptions in Zeppelin spark interpreter logs >>> >>> ERROR [2015-08-03 13:30:50,262] ({pool-1-thread-2} >>> ProcessFunction.java[process]:41) - Internal error processing getProgress >>> >>> java.lang.NoClassDefFoundError: Could not initialize class >>> org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$ >>> >>> at >>> org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:38) >>> >>> at >>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:55) >>> >>> at >>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) >>> >>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:381) >>> >>> at >>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:301) >>> >>> at >>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146) >>> >>> at >>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:423) >>> >>> at >>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) >>> >>> at >>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) >>> >>> at >>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109) >>> >>> at >>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:298) >>> >>> at >>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1068) >>> >>> at >>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1053) >>> >>> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) >>> >>> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) >>> >>> at >>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> >>> at java.lang.Thread.run(Thread.java:745) >>> >>> >>> AND >>> >>> >>> WARN [2015-08-03 13:30:50,085] ({pool-1-thread-2} >>> Logging.scala[logWarning]:71) - Service 'SparkUI' could not bind on port >>> 4041. Attempting port 4042. >>> >>> INFO [2015-08-03 13:30:50,112] ({pool-1-thread-2} >>> Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT >>> >>> WARN [2015-08-03 13:30:50,123] ({pool-1-thread-2} >>> AbstractLifeCycle.java[setFailed]:204) - FAILED >>> SelectChannelConnector@0.0.0.0:4042: java.net.BindException: Address >>> already in use >>> >>> java.net.BindException: Address already in use >>> >>> at sun.nio.ch.Net.bind0(Native Method) >>> >>> at sun.nio.ch.Net.bind(Net.java:444) >>> >>> at sun.nio.ch.Net.bind(Net.java:436) >>> >>> at >>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) >>> >>> >>> Any suggestions ? >>> >>> >>> On Mon, Aug 3, 2015 at 11:00 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>> wrote: >>> >>>> Thanks a lot for all these documents. Appreciate your effort & time. >>>> >>>> On Mon, Aug 3, 2015 at 10:15 AM, Christian Tzolov <ctzo...@pivotal.io> >>>> wrote: >>>> >>>>> ÐΞ€ρ@Ҝ (๏̯͡๏), >>>>> >>>>> I've successfully run Zeppelin with Spark on YARN. I'm using Ambari >>>>> and PivotalHD30. PHD30 is ODP compliant so you should be able to repeat >>>>> the >>>>> configuration for HDP (e.g. hortonworks). >>>>> >>>>> 1. Before you start with Zeppelin, make sure that your Spark/YARN env. >>>>> works from the command line (e.g run Pi test). If it doesn't work make >>>>> sure >>>>> that the hdp.version is set correctly or you can hardcode the >>>>> stack.name and stack.version properties as Ambari Custom yarn-site >>>>> properties (that is what i did). >>>>> >>>>> 2. Your Zeppelin should be build with proper Spark and Hadoop versions >>>>> and YARN support enabled. In my case used this build command: >>>>> >>>>> mvn clean package -Pspark-1.4 -Dspark.version=1.4.1 >>>>> -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests -Pbuild-distr >>>>> >>>>> 3. Open the Spark interpreter configuration and set 'master' property >>>>> to 'yarn-client' ( e.g. master=yarn-client). then press Save. >>>>> >>>>> 4. In (conf/zeppelin-env.sh) set HADOOP_CONF_DIR for PHD and HDP it >>>>> will look like this: >>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf >>>>> >>>>> 5. (optional) i've restarted the zeppelin daemon but i don't think >>>>> this is required. >>>>> >>>>> 6. Make sure that HDFS has /user/<zeppelin user> folder exists and >>>>> has HDFS write permissions. Otherwise you can create it like this: >>>>> sudo -u hdfs hdfs dfs -mkdir /user/<zeppelin user> >>>>> sudo -u hdfs hdfs dfs -chown -R <zeppelin user>t:hdfs >>>>> /user/<zeppelin user> >>>>> >>>>> Good to go! >>>>> >>>>> Cheers, >>>>> Christian >>>>> >>>>> On 3 August 2015 at 17:50, Vadla, Karthik <karthik.va...@intel.com> >>>>> wrote: >>>>> >>>>>> Hi Deepak, >>>>>> >>>>>> >>>>>> >>>>>> I have documented everything here. >>>>>> >>>>>> Please check published document. >>>>>> >>>>>> >>>>>> https://software.intel.com/sites/default/files/managed/bb/bf/Apache-Zeppelin.pdf >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Karthik Vadla >>>>>> >>>>>> >>>>>> >>>>>> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] >>>>>> *Sent:* Sunday, August 2, 2015 9:25 PM >>>>>> *To:* users@zeppelin.incubator.apache.org >>>>>> *Subject:* Yarn + Spark + Zepplin ? >>>>>> >>>>>> >>>>>> >>>>>> Hello, >>>>>> >>>>>> I would like to try out Zepplin and hence i got a 7 node Hadoop >>>>>> cluster with spark history server setup. I am able to run sample spark >>>>>> applications on my YARN cluster. >>>>>> >>>>>> >>>>>> >>>>>> I have no clue how to get zepplin to connect to this YARN cluster. >>>>>> Under https://zeppelin.incubator.apache.org/docs/install/install.html >>>>>> i see MASTER to point to spark master. I do not have a spark master >>>>>> running. >>>>>> >>>>>> >>>>>> >>>>>> How do i get Zepplin to be able to read data from YARN cluster ? >>>>>> Please share documentation. >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Deepak >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Christian Tzolov <http://www.linkedin.com/in/tzolov> | Solution >>>>> Architect, EMEA Practice Team | Pivotal <http://pivotal.io/> >>>>> ctzo...@pivotal.io|+31610285517 >>>>> >>>> >>>> >>>> >>>> -- >>>> Deepak >>>> >>>> >>> >>> >>> -- >>> Deepak >>> >>> >> >> >> -- >> Deepak >> >> > > > -- > Deepak > > -- Deepak