I am unable to see the visualization with Zeppelin from blog :
http://hortonworks.com/blog/introduction-to-data-science-with-apache-spark/


Notebook
%spark
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import java.sql.Date
import org.apache.spark.sql.Row

case class Log(level: String, date: Date, fileName: String)

import java.text.SimpleDateFormat

    val df = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss,SSS")

    val ambari = ambariLogs.map { line =>
        val s =  line.split(" ")
        val logLevel = s(0)
        val dateTime = df.parse(s(1) + " " + s(2))
        val fileName = s(3).split(":")(0)
        Log(logLevel,new Date(dateTime.getTime()), fileName)}.toDF()
ambari.registerTempTable("ambari")


//ambari.groupBy("level").count()
sqlContext.sql("SELECT COUNT(*) from ambari")

Output:

sqlContext: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.SQLContext@5ca68ee6 import sqlContext.implicits._
import java.sql.Date import org.apache.spark.sql.Row defined class Log
import java.text.SimpleDateFormat df: java.text.SimpleDateFormat =
java.text.SimpleDateFormat@98f267e7 ambari: org.apache.spark.sql.DataFrame
= [level: string, date: date, fileName: string] res74:
org.apache.spark.sql.DataFrame = [c0: bigint]


Hence the table ambari is created successfully.

In a new note, i wrote this

%spark
import org.apache.spark.sql.Row

 val result = sqlContext.sql("SELECT level, COUNT(1) from ambari group by
level").map {
 case Row(level: String, count: Long) => {
      level + "\t" + count
 }
}.collect()

println("%table Log Level\tCount\n" + result.mkString("\n"))


Output:
import org.apache.spark.sql.Row result: Array[String] = Array(INFO 2444,
WARNING 3) %table Log Level Count INFO 2444 WARNING 3

I did not get graph rendering despite am outputing %table from println.

Any suggestions ?


On Mon, Aug 3, 2015 at 1:47 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote:

> Fixed it
>
>  mvn clean package -Pspark-1.3 -Dspark.version=1.3.1
> -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests
>
> Earlier i had
>
> mvn clean install -DskipTests -Pspark-1.3 -Dspark.version=1.3.1
> -Phadoop-2.7 -Pyarn
>
> On Mon, Aug 3, 2015 at 1:31 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote:
>
>> I have hadoop cluster up using Ambari. It also allowed me to install
>> Spark 1.3.1 and i can run sample spark application & Yarn application. So
>> cluster is up and running fine.
>>
>> I got Zeppelin setup on a new box and was able to launch UI.
>>
>> I modified spark interpreter to set
>>
>> masteryarn-clientspark.app.nameZeppelinspark.cores.max
>> spark.driver.extraJavaOptions-Dhdp.version=2.3.1.0-2574
>> spark.executor.memory512mspark.home/usr/hdp/2.3.1.0-2574/spark
>> spark.yarn.am.extraJavaOptions-Dhdp.version=2.3.1.0-2574spark.yarn.jar
>> /home/zeppelin/incubator-zeppelin/interpreter/spark/zeppelin-spark-0.6.0-incubating-SNAPSHOT.jar
>> zeppelin.dep.localrepolocal-repo
>>
>> When i run a spark notebook
>> %spark
>> val ambariLogs =
>> sc.textFile("file:///var/log/ambari-agent/ambari-agent.log")
>> ambariLogs.take(10).mkString("\n")
>>
>> (The location exists)
>>
>> I see two exceptions in Zeppelin spark interpreter logs
>>
>> ERROR [2015-08-03 13:30:50,262] ({pool-1-thread-2}
>> ProcessFunction.java[process]:41) - Internal error processing getProgress
>>
>> java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$
>>
>> at
>> org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:38)
>>
>> at
>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:55)
>>
>> at
>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
>>
>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:381)
>>
>> at
>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:301)
>>
>> at
>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
>>
>> at
>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:423)
>>
>> at
>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
>>
>> at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
>>
>> at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109)
>>
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:298)
>>
>> at
>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1068)
>>
>> at
>> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1053)
>>
>> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>>
>> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>>
>> at
>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> AND
>>
>>
>> WARN [2015-08-03 13:30:50,085] ({pool-1-thread-2}
>> Logging.scala[logWarning]:71) - Service 'SparkUI' could not bind on port
>> 4041. Attempting port 4042.
>>
>>  INFO [2015-08-03 13:30:50,112] ({pool-1-thread-2}
>> Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT
>>
>>  WARN [2015-08-03 13:30:50,123] ({pool-1-thread-2}
>> AbstractLifeCycle.java[setFailed]:204) - FAILED
>> SelectChannelConnector@0.0.0.0:4042: java.net.BindException: Address
>> already in use
>>
>> java.net.BindException: Address already in use
>>
>> at sun.nio.ch.Net.bind0(Native Method)
>>
>> at sun.nio.ch.Net.bind(Net.java:444)
>>
>> at sun.nio.ch.Net.bind(Net.java:436)
>>
>> at
>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>>
>>
>> Any suggestions ?
>>
>>
>> On Mon, Aug 3, 2015 at 11:00 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
>> wrote:
>>
>>> Thanks a lot for all these documents. Appreciate your effort & time.
>>>
>>> On Mon, Aug 3, 2015 at 10:15 AM, Christian Tzolov <ctzo...@pivotal.io>
>>> wrote:
>>>
>>>> ÐΞ€ρ@Ҝ (๏̯͡๏),
>>>>
>>>> I've successfully run Zeppelin with Spark on YARN. I'm using Ambari and
>>>> PivotalHD30. PHD30 is ODP compliant so you should be able to repeat the
>>>> configuration for HDP (e.g. hortonworks).
>>>>
>>>> 1. Before you start with Zeppelin, make sure that your Spark/YARN env.
>>>> works from the command line (e.g run Pi test). If it doesn't work make sure
>>>> that the hdp.version is set correctly or you can hardcode the
>>>> stack.name and stack.version properties as Ambari Custom yarn-site
>>>> properties (that is what i did).
>>>>
>>>> 2. Your Zeppelin should be build with proper Spark and Hadoop versions
>>>> and YARN support enabled. In my case used this build command:
>>>>
>>>> mvn clean package -Pspark-1.4 -Dspark.version=1.4.1
>>>> -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests -Pbuild-distr
>>>>
>>>> 3. Open the Spark interpreter configuration and set 'master' property
>>>> to 'yarn-client' ( e.g. master=yarn-client). then press Save.
>>>>
>>>> 4. In (conf/zeppelin-env.sh) set HADOOP_CONF_DIR for PHD and HDP it
>>>> will look like this:
>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>>>
>>>> 5. (optional) i've restarted the zeppelin daemon but i don't think this
>>>> is required.
>>>>
>>>> 6. Make sure that HDFS has /user/<zeppelin user>  folder exists and has
>>>> HDFS write permissions. Otherwise you can create it like this:
>>>>   sudo -u hdfs hdfs dfs -mkdir /user/<zeppelin user>
>>>>   sudo -u hdfs hdfs dfs -chown -R <zeppelin user>t:hdfs /user/<zeppelin
>>>> user>
>>>>
>>>> Good to go!
>>>>
>>>> Cheers,
>>>> Christian
>>>>
>>>> On 3 August 2015 at 17:50, Vadla, Karthik <karthik.va...@intel.com>
>>>> wrote:
>>>>
>>>>> Hi Deepak,
>>>>>
>>>>>
>>>>>
>>>>> I have documented everything here.
>>>>>
>>>>> Please check published document.
>>>>>
>>>>>
>>>>> https://software.intel.com/sites/default/files/managed/bb/bf/Apache-Zeppelin.pdf
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Karthik Vadla
>>>>>
>>>>>
>>>>>
>>>>> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com]
>>>>> *Sent:* Sunday, August 2, 2015 9:25 PM
>>>>> *To:* users@zeppelin.incubator.apache.org
>>>>> *Subject:* Yarn + Spark + Zepplin ?
>>>>>
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I would like to try out Zepplin and hence i got a 7 node Hadoop
>>>>> cluster with spark history server setup. I am able to run sample spark
>>>>> applications on my YARN cluster.
>>>>>
>>>>>
>>>>>
>>>>> I have no clue how to get zepplin to connect to this YARN cluster.
>>>>> Under https://zeppelin.incubator.apache.org/docs/install/install.html
>>>>> i see MASTER to point to spark master. I do not have a spark master
>>>>> running.
>>>>>
>>>>>
>>>>>
>>>>> How do i get Zepplin to be able to read data from YARN cluster ?
>>>>> Please share documentation.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Deepak
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Christian Tzolov <http://www.linkedin.com/in/tzolov> | Solution
>>>> Architect, EMEA Practice Team | Pivotal <http://pivotal.io/>
>>>> ctzo...@pivotal.io|+31610285517
>>>>
>>>
>>>
>>>
>>> --
>>> Deepak
>>>
>>>
>>
>>
>> --
>> Deepak
>>
>>
>
>
> --
> Deepak
>
>


-- 
Deepak

Reply via email to