[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-04 Thread Jamie Grier (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426132#comment-16426132 ] Jamie Grier commented on FLINK-9061: Yup, sounds good to me :) > S3 checkpoint data not partitioned

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-04 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426124#comment-16426124 ] Steven Zhen Wu commented on FLINK-9061: --- [~jgrier] Amazon doesn't want to reveal internal details,

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-04 Thread Jamie Grier (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426103#comment-16426103 ] Jamie Grier commented on FLINK-9061: Okay, this is the best documentation I've found on this: 

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424625#comment-16424625 ] Steven Zhen Wu commented on FLINK-9061: --- [~jgrier] [~StephanEwen] Here are our thinking. if you

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424552#comment-16424552 ] Steven Zhen Wu commented on FLINK-9061: --- it seems that S3 walk through the prefix from left to right

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423335#comment-16423335 ] Steven Zhen Wu commented on FLINK-9061: --- I don't know if it has to be "the very first characters".

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Jamie Grier (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423089#comment-16423089 ] Jamie Grier commented on FLINK-9061: As I understand it the above doesn't work – maybe if you ask

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423080#comment-16423080 ] Steven Zhen Wu commented on FLINK-9061: --- reversing the components (split by slash char) doesn't give 

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Jamie Grier (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423072#comment-16423072 ] Jamie Grier commented on FLINK-9061: So, what I'm suggesting is that we, optionally, split on '/' and

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422976#comment-16422976 ] Steven Zhen Wu commented on FLINK-9061: --- I think S3 has more sophisticated pattern searching for

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Greg Hogan (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422890#comment-16422890 ] Greg Hogan commented on FLINK-9061: --- Since S3 key names are opaque it sounds like any "prefix" is

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422843#comment-16422843 ] Steven Zhen Wu commented on FLINK-9061: --- Usually 4-char random prefix can go a long way. Even 2-char

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422645#comment-16422645 ] Steve Loughran commented on FLINK-9061: --- something less than 8, maybe 5, though it's mostly all

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422534#comment-16422534 ] Stephan Ewen commented on FLINK-9061: - Do we know how many characters are used for the partitioning?

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-30 Thread Jamie Grier (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420885#comment-16420885 ] Jamie Grier commented on FLINK-9061: Maybe we should keep this super simple and make a change at the

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-28 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417964#comment-16417964 ] Steve Loughran commented on FLINK-9061: --- [~greghogan] I cut the link as it was just a duplicate of

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-28 Thread Greg Hogan (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417402#comment-16417402 ] Greg Hogan commented on FLINK-9061: --- [~ste...@apache.org], not sure why your

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-28 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417145#comment-16417145 ] Steve Loughran commented on FLINK-9061: --- The s3a connector will have the same issue, though there we

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-28 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417032#comment-16417032 ] Stephan Ewen commented on FLINK-9061: - One advantage of making the changes in the state backend would

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-27 Thread Jamie Grier (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416271#comment-16416271 ] Jamie Grier commented on FLINK-9061: [~StephanEwen] I don't know if the s3a-based connector exhibits

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-27 Thread Jamie Grier (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416254#comment-16416254 ] Jamie Grier commented on FLINK-9061: Yeah, so I completely agree that should be a 503 but it's not.  I

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-27 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415501#comment-16415501 ] Steve Loughran commented on FLINK-9061: --- [~StephanEwen]: I knew that, but it's still the same AWS

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-27 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415368#comment-16415368 ] Stephan Ewen commented on FLINK-9061: - [~ste...@apache.org] I think they are not using the

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Jamie Grier (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414828#comment-16414828 ] Jamie Grier commented on FLINK-9061: [~ste...@apache.org] Here's the full stack trace:      

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414696#comment-16414696 ] Steven Zhen Wu commented on FLINK-9061: --- It seems that our internal change *only* works with

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414280#comment-16414280 ] Steve Loughran commented on FLINK-9061: --- you can get it on delete requests too, if you try hard.

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu commented on FLINK-9061: --- [~StephanEwen] [~jgrier] We run into S3 throttling issue

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413628#comment-16413628 ] Stephan Ewen commented on FLINK-9061: - That would be a great contribution, valuable for many S3 users.

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-24 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412814#comment-16412814 ] Steven Zhen Wu commented on FLINK-9061: --- [~jgrier] Yes, we want to contribute this back. We can

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-24 Thread Jamie Grier (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412737#comment-16412737 ] Jamie Grier commented on FLINK-9061: [~stevenz3wu] Did you contribute those changes back to Flink?  I

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-23 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412326#comment-16412326 ] Steven Zhen Wu commented on FLINK-9061: --- Jamie, yes, we run into the same issue at Netflix. We did