Re: S3 checkpointing in AWS in Frankfurt

2016-11-24 Thread Stephan Ewen
We have been looking for a while for some way to decouple the S3 filesystem support from Hadoop. Does anyone know a good S3 connector library that works independent of Hadoop and EMRFS? Best, Stephan On Wed, Nov 23, 2016 at 7:57 PM, Greg Hogan wrote: > EMRFS looks to

Re: S3 checkpointing in AWS in Frankfurt

2016-11-23 Thread Greg Hogan
EMRFS looks to *add* cost (and consistency). Storing an object to S3 costs "$0.005 per 1,000 requests", so $0.432/day at 1 Hz. Is the number of checkpoint files simply parallelism * number of operators? That could add up quickly. Is the recommendation to run HDFS on EBS? On Wed, Nov 23, 2016 at

Re: S3 checkpointing in AWS in Frankfurt

2016-11-23 Thread Jonathan Share
<rmetz...@apache.org> > *Reply-To: *"user@flink.apache.org" <user@flink.apache.org> > *Date: *Wednesday, November 23, 2016 at 8:24 AM > *To: *"user@flink.apache.org" <user@flink.apache.org> > *Subject: *Re: S3 checkpointing in AWS in Frankfurt > > >

Re: S3 checkpointing in AWS in Frankfurt

2016-11-23 Thread Jonathan Share
Hi Scott, Thanks for the suggestion, it sounds like you and I think alike, going over to hdfs sounds to me like the simplest solution. There are no requirements to use S3, just another team member who is generally sceptical fearing that adding HDFS will add a new class of maintenance problems to

Re: S3 checkpointing in AWS in Frankfurt

2016-11-23 Thread Jonathan Share
Hi Greg, Standard storage class, everything is on defaults, we've not done anything special with the bucket. Cloud Watch only appears to give me total billing for S3 in general, I don't see a breakdown unless that's something I can configure somewhere. Regards, Jonathan On 23 November 2016 at

Re: S3 checkpointing in AWS in Frankfurt

2016-11-23 Thread Foster, Craig
etz...@apache.org> Reply-To: "user@flink.apache.org" <user@flink.apache.org> Date: Wednesday, November 23, 2016 at 8:24 AM To: "user@flink.apache.org" <user@flink.apache.org> Subject: Re: S3 checkpointing in AWS in Frankfurt Hi Jonathan, have you tried using Amazon's latest

Re: S3 checkpointing in AWS in Frankfurt

2016-11-23 Thread Robert Metzger
Hi Jonathan, have you tried using Amazon's latest EMR Hadoop distribution? Maybe they've fixed the issue in their for older Hadoop releases? On Wed, Nov 23, 2016 at 4:38 PM, Scott Kidder wrote: > Hi Jonathan, > > You might be better off creating a small Hadoop HDFS

Re: S3 checkpointing in AWS in Frankfurt

2016-11-23 Thread Scott Kidder
Hi Jonathan, You might be better off creating a small Hadoop HDFS cluster just for the purpose of storing Flink checkpoint & savepoint data. Like you, I tried using S3 to persist Flink state, but encountered AWS SDK issues and felt like I was going down an ill-advised path. I then created a small

S3 checkpointing in AWS in Frankfurt

2016-11-22 Thread Jonathan Share
Hi, I'm interested in hearing if anyone else has experience with using Amazon S3 as a state backend in the Frankfurt region. For political reasons we've been asked to keep all European data in Amazon's Frankfurt region. This causes a problem as the S3 endpoint in Frankfurt requires the use of AWS