Re: spark with kerberos

Saisai Shao Thu, 13 Oct 2016 05:20:07 -0700

I think security has nothing to do with what API you use, spark sql or RDD
API.


Assuming you're running on yarn cluster (that is the only cluster manager
supports Kerberos currently).

Firstly you need to get Kerberos tgt in your local spark-submit process,
after being authenticated by Kerberos, Spark could get delegation tokens
from HDFS, so that you could communicate with security hadoop cluster. Here
in your case since you have to communicate with other remote HDFS clusters,
so you have to get tokens from all these remote clusters, you could
configure "spark.yarn.access.namenodes" to list all the security hdfs
cluster you want to access, then hadoop client API will get tokens from all
these clusters.

For the details you could refer to
https://spark.apache.org/docs/latest/running-on-yarn.html.

I didn't try personally since I don't have such requirements. It may
requires additional steps which I missed. You could take a a try.


On Thu, Oct 13, 2016 at 6:38 PM, Denis Bolshakov <bolshakov.de...@gmail.com>
wrote:

> The problem happens when writting (reading works fine)
>
> rdd.saveAsNewAPIHadoopFile
>
> We use just RDD and HDFS, no other things.
> Spark 1.6.1 version.
> `Claster A` - CDH 5.7.1
> `Cluster B` - vanilla hadoop 2.6.5
> `Cluster C` - CDH 5.8.0
>
> Best regards,
> Denis
>
> On 13 October 2016 at 13:06, ayan guha <guha.a...@gmail.com> wrote:
>
>> And a little more details on Spark version, hadoop version and
>> distribution would also help...
>>
>> On Thu, Oct 13, 2016 at 9:05 PM, ayan guha <guha.a...@gmail.com> wrote:
>>
>>> I think one point you need to mention is your target - HDFS, Hive or
>>> Hbase (or something else) and which end points are used.
>>>
>>> On Thu, Oct 13, 2016 at 8:50 PM, dbolshak <bolshakov.de...@gmail.com>
>>> wrote:
>>>
>>>> Hello community,
>>>>
>>>> We've a challenge and no ideas how to solve it.
>>>>
>>>> The problem,
>>>>
>>>> Say we have the following environment:
>>>> 1. `cluster A`, the cluster does not use kerberos and we use it as a
>>>> source
>>>> of data, important thing is - we don't manage this cluster.
>>>> 2. `cluster B`, small cluster where our spark application is running and
>>>> performing some logic. (we manage this cluster and it does not have
>>>> kerberos).
>>>> 3. `cluster C`, the cluster uses kerberos and we use it to keep results
>>>> of
>>>> our spark application, we manage this cluster
>>>>
>>>> Our requrements and conditions that are not mentioned yet:
>>>> 1. All clusters are in a single data center, but in the different
>>>> subnetworks.
>>>> 2. We cannot turn on kerberos on `cluster A`
>>>> 3. We cannot turn off kerberos on `cluster C`
>>>> 4. We can turn on/off kerberos on `cluster B`, currently it's turned
>>>> off.
>>>> 5. Spark app is built on top of RDD and does not depend on spark-sql.
>>>>
>>>> Does anybody know how to write data using RDD api to remote cluster
>>>> which is
>>>> running with Kerberos?
>>>>
>>>> --
>>>> //with Best Regards
>>>> --Denis Bolshakov
>>>> e-mail: bolshakov.de...@gmail.com
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/spark-with-kerberos-tp27894.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>
>
> --
> //with Best Regards
> --Denis Bolshakov
> e-mail: bolshakov.de...@gmail.com
>
>
>

Re: spark with kerberos

Reply via email to