I have only set HADOOP_CONF_DIR as following (my hadoop conf files are in
/usr/local/lib/hadoop/etc/hadoop/, eg
/usr/local/lib/hadoop/etc/hadoop/yarn-site.xml):
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version
2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# export JAVA_HOME=
# export MASTER= # Spark master url. eg.
spark://master_addr:7077. Leave empty if you want to use local mode.
# export ZEPPELIN_JAVA_OPTS # Additional jvm options. for
example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g
-Dspark.cores.max=16"
# export ZEPPELIN_MEM # Zeppelin jvm mem options Default
-Xms1024m -Xmx1024m -XX:MaxPermSize=512m
# export ZEPPELIN_INTP_MEM # zeppelin interpreter process jvm mem
options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
# export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm
options.
# export ZEPPELIN_SSL_PORT # ssl port (used when ssl environment
variable is set to true)
# export ZEPPELIN_LOG_DIR # Where log files are stored. PWD by
default.
# export ZEPPELIN_PID_DIR # The pid files are stored.
${ZEPPELIN_HOME}/run by default.
# export ZEPPELIN_WAR_TEMPDIR # The location of jetty temporary
directory.
# export ZEPPELIN_NOTEBOOK_DIR # Where notebook saved
# export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be displayed
in homescreen. ex) 2A94M5J1Z
# export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen notebook
from list when this value set to "true". default "false"
# export ZEPPELIN_NOTEBOOK_S3_BUCKET # Bucket where notebook
saved
# export ZEPPELIN_NOTEBOOK_S3_ENDPOINT # Endpoint of the bucket
# export ZEPPELIN_NOTEBOOK_S3_USER # User in bucket where
notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
# export ZEPPELIN_IDENT_STRING # A string representing this instance
of zeppelin. $USER by default.
# export ZEPPELIN_NICENESS # The scheduling priority for daemons.
Defaults to 0.
# export ZEPPELIN_INTERPRETER_LOCALREPO # Local repository for
interpreter's additional dependency loading
# export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook
storage class, can have two classes simultaneously with a sync between them
(e.g. local and remote).
# export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple
notebook storages, should we treat the first one as the only source of
truth?
#### Spark interpreter configuration ####
## Use provided spark installation ##
## defining SPARK_HOME makes Zeppelin run spark interpreter process
using spark-submit
##
# export SPARK_HOME # (required) When it is
defined, load it instead of Zeppelin embedded Spark libraries
# export SPARK_SUBMIT_OPTIONS # (optional) extra
options to pass to spark submit. eg) "--driver-memory 512M
--executor-memory 1G".
# export SPARK_APP_NAME # (optional) The name
of spark application.
## Use embedded spark binaries ##
## without SPARK_HOME defined, Zeppelin still able to run spark
interpreter process using embedded spark binaries.
## however, it is not encouraged when you can define SPARK_HOME
##
# Options read in YARN client mode
export HADOOP_CONF_DIR = /usr/local/lib/hadoop/etc/hadoop/ #
yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
# Pyspark (supported with Spark 1.2.1 and above)
# To configure pyspark, you need to set spark distribution's path to
'spark.home' property in Interpreter setting screen in Zeppelin GUI
# export PYSPARK_PYTHON # path to the python command. must be
the same path on the driver(Zeppelin) and all workers.
# export PYTHONPATH
## Spark interpreter options ##
##
# export ZEPPELIN_SPARK_USEHIVECONTEXT # Use HiveContext instead of
SQLContext if set true. true by default.
# export ZEPPELIN_SPARK_CONCURRENTSQL # Execute multiple SQL
concurrently if set true. false by default.
# export ZEPPELIN_SPARK_IMPORTIMPLICIT # Import implicits, UDF
collection, and sql if set true. true by default.
# export ZEPPELIN_SPARK_MAXRESULT # Max number of Spark SQL
result to display. 1000 by default.
# export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE # Size in
characters of the maximum text message to be received by websocket.
Defaults to 1024000
#### HBase interpreter configuration ####
## To connect to HBase running on a cluster, either HBASE_HOME or
HBASE_CONF_DIR must be set
# export HBASE_HOME= # (require) Under which HBase
scripts and configuration should be
# export HBASE_CONF_DIR= # (optional) Alternatively,
configuration directory can be set to point to the directory that has
hbase-site.xml
#### ZeppelinHub connection configuration ####
# export ZEPPELINHUB_API_ADDRESS # Refers to the address of the
ZeppelinHub service in use
# export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance token
of the user
# export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with
authentication.
I also tried simply /usr/local/lib/hadoop and I also create a conf
directory within /usr/local/lib/hadoop/etc/hadoop and placed yarn-site.xml
within this folder
Thanks
On Wed, Nov 2, 2016 at 10:06 AM, Hyung Sung Shim <[email protected]> wrote:
> Could you share your zeppelin-env.sh ?
> 2016년 11월 2일 (수) 오후 4:57, Benoit Hanotte <[email protected]>님이 작성:
>
>> Thanks for your reply,
>> I have tried setting it within zeppelin-env.sh but it doesn't work any
>> better.
>>
>> Thanks
>>
>> On Wed, Nov 2, 2016 at 2:13 AM, Hyung Sung Shim <[email protected]>
>> wrote:
>>
>> Hello.
>> You should set the HADOOP_CONF_DIR to /usr/local/lib/hadoop/etc/hadoop/
>> in the conf/zeppelin-env.sh.
>> Thanks.
>> 2016년 11월 2일 (수) 오전 5:07, Benoit Hanotte <[email protected]>님이 작성:
>>
>> Hello,
>>
>> I'd like to use zeppelin on my local computer and use it to run spark
>> executors on a distant yarn cluster since I can't easily install zeppelin
>> on the cluster gateway.
>>
>> I installed the correct hadoop version (2.6), and compiled zeppelin (from
>> the master branch) as following:
>>
>> *mvn clean package -DskipTests -Phadoop-2.6
>> -Dhadoop.version=2.6.0-cdh5.5.0 -Pyarn -Pspark-2.0 -Pscala-2.11*
>>
>> I also set HADOOP_HOME_DIR to /usr/local/lib/hadoop where my hadoop is
>> installed (I also tried with /usr/local/lib/hadoop/etc/hadoop/ where the
>> conf files such as yarn-site.xml are). I set yarn.resourcemanager.hostname
>> to the resource manager of the cluster (I copied the value from the config
>> file on the cluster) but when I start a spark command it still tries to
>> connect to 0.0.0.0:8032 as one can see in the logs:
>>
>> *INFO [2016-11-01 20:48:26,581] ({pool-2-thread-2}
>> Client.java[handleConnectionFailure]:862) - Retrying connect to server:
>> 0.0.0.0/0.0.0.0:8032 <http://0.0.0.0/0.0.0.0:8032>. Already tried 9
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>> sleepTime=1000 MILLISECONDS)*
>>
>> Am I missing something something? Is there any additional parameters to
>> set?
>>
>> Thanks!
>>
>> Benoit
>>
>>
>>
>>