Hi Tamir,

Thanks for providing the information. I don't know of a current solution
right now, perhaps some other user has an idea, but I do find your input
valuable for future improvements with regards to the S3 Client in Hadoop.

Best regards,

Martijn

On Fri, 19 Nov 2021 at 09:21, Tamir Sagi <tamir.s...@niceactimize.com>
wrote:

> Hey Martijn,
>
> sorry for late respond.
>
> We wanted to replace the default client with our custom S3 client and not
> use the AmazonS3Client provided by the plugin.
>
> We used Flink-s3-fs-hadoop v1.12.2 and for our needs we had to upgrade to
> v1.14.0 [1].
>
> AmazonS3 client factory is initialized[2] - if the property
> "fs.s3a.s3.client.factory.impl" [3] is not provided the default factory is
> created [4] which provides AmazonS3Client - which does not support what we
> need.
> I know that both the property and the factory interface are annotated with
>
> @InterfaceAudience.Private
> @InterfaceStability.Unstable
> from very early version.
>
> but we found this solution cleaner than extend the whole class and
> override the #setAmazonS3Client method.
>
> Bottom line, all we had to do was to create our own implementation for
> S3ClientFactory interface [5]
> and add to flink-conf.yaml :  s3.s3.client.factory.impl: <our factory
> canonical name> .
> place both the plugin and our artifact(with Factory and client impl) under
> ${FLINK_HOME}/plugins/s3
>
> One important note:  Flink-s3-fs-hadoop plugin includes the whole
> com.amazonaws.s3 source code, to avoid plugin class loader issues, we
> needed to remove the aws-s3-java-sdk dependency and provide the plugin
> dependency with scope "provided".
> If the jobs needs to do some work with S3,then shading com.amazonaws was
> also necessary.
>
> [1]
> https://mvnrepository.com/artifact/org.apache.flink/flink-s3-fs-hadoop/1.14.0
>
> [2]
> https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L264-L266
>
> [3]
> https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L366-L369
>
> [4]
> https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java#L66
>
> [5]
> https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ClientFactory.java
>
> Best,
> Tamir.
>
> ------------------------------
> *From:* Martijn Visser <mart...@ververica.com>
> *Sent:* Wednesday, October 13, 2021 8:28 PM
> *To:* Tamir Sagi <tamir.s...@niceactimize.com>; user@flink.apache.org <
> user@flink.apache.org>
> *Subject:* Re: Replacing S3 Client in Hadoop plugin
>
>
> *EXTERNAL EMAIL*
>
>
> Hi,
>
> Could you elaborate on why you would like to replace the S3 client?
>
> Best regards,
>
> Martijn
>
> On Wed, 13 Oct 2021 at 17:18, Tamir Sagi <tamir.s...@niceactimize.com>
> wrote:
>
> I found the dependency
>
> <dependency>
>     <groupId>org.apache.hadoop</groupId>
>     <artifactId>hadoop-aws</artifactId>
>     <version>3.3.1</version>
> </dependency>
>
> apparently its possible, there is a method
> setAmazonS3Client
>
> I think I found the solution.
>
> Thanks.
>
> Tamir.
>
> ------------------------------
> *From:* Tamir Sagi <tamir.s...@niceactimize.com>
> *Sent:* Wednesday, October 13, 2021 5:44 PM
> *To:* user@flink.apache.org <user@flink.apache.org>
> *Subject:* Replacing S3 Client in Hadoop plugin
>
> Hey community.
>
> I would like to know if there is any way to replace the S3 client in
> Hadoop plugin[1] to a custom client(AmazonS3).
>
> I did notice that Hadoop plugin supports replacing the implementation of
> S3AFileSystem using
> "fs.s3a.impl" (in flink-conf.yaml it will be "s3.impl") but not the client
> itself [2]
>
> <property>
>   <name>fs.s3a.impl</name>
>   <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
>   <description>The implementation class of the S3A Filesystem</description>
> </property>
>
> I delved into Hadoop plugin source code [3] , the Client itself is of type
> AmazonS3Client and cannot be replaced (for example) with a client of
> type AmazonS3EncryptionV2.
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins
> Amazon S3 | Apache Flink
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins>
> Entropy injection for S3 file systems # The bundled S3 file systems
> (flink-s3-fs-presto and flink-s3-fs-hadoop) support entropy
> injection.Entropy injection is a technique to improve the scalability of
> AWS S3 buckets through adding some random characters near the beginning of
> the key.
> ci.apache.org
>
> [2]
> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html
> <https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A>
> [3]
> https://github.com/apache/hadoop/blob/master/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
>
> Hadoop-AWS module: Integration with Amazon Web Services - Apache Hadoop
> <https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=property%3E%0A%0A%3Cproperty%3E%0A%20%20%3Cname%3E-,fs.s3a.impl,-%3C/name%3E%0A%20%20%3Cvalue%3Eorg>
> Overview. Apache Hadoop’s hadoop-aws module provides support for AWS
> integration. applications to easily use this support.. To include the S3A
> client in Apache Hadoop’s default classpath: Make sure
> thatHADOOP_OPTIONAL_TOOLS in hadoop-env.sh includes hadoop-aws in its list
> of optional modules to add in the classpath.. For client side interaction,
> you can declare that relevant JARs must be ...
> hadoop.apache.org
> Thank you,
>
> Best,
> Tamir.
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>

Reply via email to