Hi Dharmin
With the 1st approach , you will have to read the properties from the
--files using this below:
SparkFiles.get('file.txt')

Or else , you can copy the file to hdfs , read it using sc.textFile and use
the property within it.

If you add files using --files , it gets copied to executor's working
directory but you still have to read it and use the properties to be set in
conf.
Thanks
Deepak

On Fri, Feb 23, 2018 at 10:25 AM, Dharmin Siddesh J <
siddeshjdhar...@gmail.com> wrote:

> I am trying to write a Spark program that reads data from HBase and store
> it in DataFrame.
>
> I am able to run it perfectly with hbase-site.xml in the $SPARK_HOME/conf
> folder, but I am facing few issues here.
>
> Issue 1
>
> The first issue is passing hbase-site.xml location with the --files
> parameter submitted through client mode (it works in cluster mode).
>
>
> When I removed hbase-site.xml from $SPARK_HOME/conf and tried to execute
> it in client mode by passing with the --files parameter over YARN I keep
> getting the an exception (which I think means it is not taking the
> ZooKeeper configuration from hbase-site.xml.
>
> spark-submit \
>
>   --master yarn \
>
>   --deploy-mode client \
>
>   --files /home/siddesh/hbase-site.xml \
>
>   --class com.orzota.rs.json.HbaseConnector \
>
>   --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 \
>
>   --repositories http://repo.hortonworks.com/content/groups/public/ \
>
>   target/scala-2.11/test-0.1-SNAPSHOT.jar
>
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(
> ClientCnxn.java:1125)
>
> 18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server
> localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using
> SASL (unknown error)
>
> 18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected
> error, closing socket connection and attempting reconnect
>
> java.net.ConnectException: Connection refused
>
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>
>         at sun.nio.ch.SocketChannelImpl.finishConnect(
> SocketChannelImpl.java:717)
>
>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> ClientCnxnSocketNIO.java:361)
>
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
> ClientCnxn.java:1125)
>
> However it works good when I run it in cluster mode.
>
>
> Issue 2
>
> Passing the HBase configuration details through the Spark session, which I
> can't get to work in both client and cluster mode.
>
>
> Instead of passing the entire hbase-site.xml I am trying to add the
> configuration directly in the code by adding it as a configuration
> parameter in the SparkSession, e.g.:
>
>
> val spark = SparkSession
>
>   .builder()
>
>   .appName(name)
>
>   .config("hbase.zookeeper.property.clientPort", "2181")
>
>   .config("hbase.zookeeper.quorum", "ip1,ip2,ip3")
>
>   .config("spark.hbase.host","zookeeperquorum")
>
>   .getOrCreate()
>
>
> val json_df =
>
>   spark.read.option("catalog",catalog_read).
>
>   format("org.apache.spark.sql.execution.datasources.hbase").
>
>   load()
>
> This is not working in cluster mode either.
>
>
> Can anyone help me with a solution or explanation why this is happening
> are there any workarounds?
>
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Reply via email to