Re: Does Spark support role-based authentication and access to Amazon S3? (Kubernetes cluster deployment)

Mich Talebzadeh Fri, 15 Dec 2023 06:13:56 -0800

Hi kurt,

I read this document of yours. indeed interesting and pretty recent (9th
Dec).


I am more focused on GCP and GKE
<https://cloud.google.com/kubernetes-engine?hl=en>. But obviously the
concepts are the same. One thing I noticed, there was a lack of mention
of Workload Identity federation or eqivalent, which is the recommended way
for workloads running on k8s  to access  Cloud services in a secure and
manageable way. Specifically I quote "Workload Identity
<https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity>
allows workloads in your GKE clusters to impersonate Identity and Access
Management (IAM) service accounts to access Google Cloud services."

*Cheers*

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 14 Dec 2023 at 07:58, Koert Kuipers <ko...@tresata.com> wrote:

> yes it does using IAM roles for service accounts.
> see:
>
> https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
>
> i wrote a little bit about this also here:
> https://technotes.tresata.com/spark-on-k8s/
>
> On Wed, Dec 13, 2023 at 7:52 AM Atul Patil <atulsp...@gmail.com> wrote:
>
>> Hello Team,
>>
>>
>>
>> Does Spark support role-based authentication and access to Amazon S3 for
>> Kubernetes deployment?
>>
>> *Note: we have deployed our spark application in the Kubernetes cluster.*
>>
>>
>>
>> Below are the Hadoop-AWS dependencies we are using:
>>
>> <dependency>
>>    <groupId>org.apache.hadoop</groupId>
>>    <artifactId>hadoop-aws</artifactId>
>>    <version>3.3.4</version>
>> </dependency>
>>
>>
>>
>> We are using the following configuration when creating the spark session,
>> but it is not working::
>>
>> sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.aws.credentials.provider",
>> "org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider");
>>
>> sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.arn",
>> System.getenv("AWS_ROLE_ARN"));
>>
>> sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.credentials.provider",
>> "com.amazonaws.auth.WebIdentityTokenCredentialsProvider");
>>
>> sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.sts.endpoint",
>> "s3.eu-central-1.amazonaws.com");
>>
>> sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.sts.endpoint.region",
>> Regions.EU_CENTRAL_1.getName());
>>
>> sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.web.identity.token.file",
>> System.getenv("AWS_WEB_IDENTITY_TOKEN_FILE"));
>>
>> sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.assumed.role.session.duration",
>> "30m");
>>
>>
>>
>> Thank you!
>>
>>
>>
>> Regards,
>>
>> Atul
>>
>
> CONFIDENTIALITY NOTICE: This electronic communication and any files
> transmitted with it are confidential, privileged and intended solely for
> the use of the individual or entity to whom they are addressed. If you are
> not the intended recipient, you are hereby notified that any disclosure,
> copying, distribution (electronic or otherwise) or forwarding of, or the
> taking of any action in reliance on the contents of this transmission is
> strictly prohibited. Please notify the sender immediately by e-mail if you
> have received this email by mistake and delete this email from your system.
>
> Is it necessary to print this email? If you care about the environment
> like we do, please refrain from printing emails. It helps to keep the
> environment forested and litter-free.

Re: Does Spark support role-based authentication and access to Amazon S3? (Kubernetes cluster deployment)

Reply via email to