RE: S3 for state backend in Flink 1.4.0

Marchant, Hayden Thu, 01 Feb 2018 00:22:51 -0800

Edward,

We are using Object Storage for checkpointing. I'd like to point out that we 
were seeing performance problems using the S3 protocol. Btw, we had quite a few 
problems using the flink-s3-fs-hadoop jar with Object Storage and had to do 
some ugly hacking to get it working all over. We recently discovered an 
alternative connector developed by IBM Research called stocator. It's a 
streaming writer and performs better than using the S3 protocol.


Here is a link to the library - https://github.com/SparkTC/stocator, and a blog 
explaining about it - 
http://www.spark.tc/stocator-the-fast-lane-connecting-object-stores-to-spark/

Good luck!!

-----Original Message-----
From: Edward Rojas [mailto:edward.roja...@gmail.com] 
Sent: Wednesday, January 31, 2018 3:02 PM
To: user@flink.apache.org
Subject: RE: S3 for state backend in Flink 1.4.0

Hi,

We are having a similar problem when trying to use Flink 1.4.0 with IBM Object 
Storage for reading and writing data. 

We followed
https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Drelease-2D1.4_ops_deployment_aws.html&d=DwICAg&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=g-5xYRH8L3aCnCNTROw5LrsB5gbTayWjXSm6Nil9x0c&m=jSLW4Ugl1FvEta-R0h_thQVZ6tQ2LsUX10cRoIWNNkk&s=gY41yFjnJzQNaL3R1YK7HzG8XUyBn0kJ6_3m-4t7E7k&e=
and the suggestion on 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D851&d=DwICAg&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=g-5xYRH8L3aCnCNTROw5LrsB5gbTayWjXSm6Nil9x0c&m=jSLW4Ugl1FvEta-R0h_thQVZ6tQ2LsUX10cRoIWNNkk&s=bDXNhnIV4KFTK9Byg5w2R_8UlWiXH05uAp9rkWJm_jo&e=.

We put the flink-s3-fs-hadoop jar from the opt/ folder to the lib/ folder and 
we added the configuration on the flink-config.yaml:

s3.access-key: <ACCESS_KEY>
s3.secret-key: <SECRET_KEY>
s3.endpoint: s3.us-south.objectstorage.softlayer.net 

With this we can read from IBM Object Storage without any problem when using 
env.readTextFile("s3://flink-test/flink-test.txt");

But we are having problems when trying to write. 
We are using a kafka consumer to read from the bus, we're making some 
processing and after saving  some data on Object Storage.

When using stream.writeAsText("s3://flink-test/data.txt").setParallelism(1);
The file is created but only when the job finish (or we stop it). But we need 
to save the data without stopping the job, so we are trying to use a Sink.

But when using a BucketingSink, we get the error: 
java.io.IOException: No FileSystem for scheme: s3 at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
        at
org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1196)
        at
org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:411)
        at
org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:355)


Do you have any idea how could we make it work using Sink?

Thanks,
Regards,

Edward



--
Sent from: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dflink-2Duser-2Dmailing-2Dlist-2Darchive.2336050.n4.nabble.com_&d=DwICAg&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=g-5xYRH8L3aCnCNTROw5LrsB5gbTayWjXSm6Nil9x0c&m=jSLW4Ugl1FvEta-R0h_thQVZ6tQ2LsUX10cRoIWNNkk&s=vN9sFldnlnzHZPgOBi42Rwfq1Hbq79gUPUNLgi0zmSM&e=

RE: S3 for state backend in Flink 1.4.0

Reply via email to