[jira] [Created] (FLINK-10712) RestartPipelinedRegionStrategy does not restore state

2018-10-29 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10712: -- Summary: RestartPipelinedRegionStrategy does not restore state Key: FLINK-10712 URL: https://issues.apache.org/jira/browse/FLINK-10712 Project: Flink

[jira] [Assigned] (FLINK-8995) Add a test operator with keyed state that uses custom, stateful serializer

2018-10-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reassigned FLINK-8995: - Assignee: Stefan Richter > Add a test operator with keyed state that uses cus

[jira] [Closed] (FLINK-10630) Update migration tests for Flink 1.6

2018-10-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10630. -- Resolution: Duplicate > Update migration tests for Flink

[jira] [Created] (FLINK-10652) Update StatefulJobWBroadcastStateMigrationITCase for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10652: -- Summary: Update StatefulJobWBroadcastStateMigrationITCase for 1.6 Key: FLINK-10652 URL: https://issues.apache.org/jira/browse/FLINK-10652 Project: Flink

[jira] [Created] (FLINK-10652) Update StatefulJobWBroadcastStateMigrationITCase for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10652: -- Summary: Update StatefulJobWBroadcastStateMigrationITCase for 1.6 Key: FLINK-10652 URL: https://issues.apache.org/jira/browse/FLINK-10652 Project: Flink

[jira] [Created] (FLINK-10651) Update CEPMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10651: -- Summary: Update CEPMigrationTest for 1.6 Key: FLINK-10651 URL: https://issues.apache.org/jira/browse/FLINK-10651 Project: Flink Issue Type: Sub-task

[jira] [Created] (FLINK-10650) Update BucketingSinkMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10650: -- Summary: Update BucketingSinkMigrationTest for 1.6 Key: FLINK-10650 URL: https://issues.apache.org/jira/browse/FLINK-10650 Project: Flink Issue Type

[jira] [Created] (FLINK-10650) Update BucketingSinkMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10650: -- Summary: Update BucketingSinkMigrationTest for 1.6 Key: FLINK-10650 URL: https://issues.apache.org/jira/browse/FLINK-10650 Project: Flink Issue Type

[jira] [Created] (FLINK-10651) Update CEPMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10651: -- Summary: Update CEPMigrationTest for 1.6 Key: FLINK-10651 URL: https://issues.apache.org/jira/browse/FLINK-10651 Project: Flink Issue Type: Sub-task

[jira] [Created] (FLINK-10649) Update FlinkKafkaConsumerBaseMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10649: -- Summary: Update FlinkKafkaConsumerBaseMigrationTest for 1.6 Key: FLINK-10649 URL: https://issues.apache.org/jira/browse/FLINK-10649 Project: Flink Issue

[jira] [Created] (FLINK-10649) Update FlinkKafkaConsumerBaseMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10649: -- Summary: Update FlinkKafkaConsumerBaseMigrationTest for 1.6 Key: FLINK-10649 URL: https://issues.apache.org/jira/browse/FLINK-10649 Project: Flink Issue

[jira] [Created] (FLINK-10648) Update ContinuousFileProcessingMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10648: -- Summary: Update ContinuousFileProcessingMigrationTest for 1.6 Key: FLINK-10648 URL: https://issues.apache.org/jira/browse/FLINK-10648 Project: Flink

[jira] [Created] (FLINK-10648) Update ContinuousFileProcessingMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10648: -- Summary: Update ContinuousFileProcessingMigrationTest for 1.6 Key: FLINK-10648 URL: https://issues.apache.org/jira/browse/FLINK-10648 Project: Flink

[jira] [Created] (FLINK-10646) Update AbstractOperatorRestoreTestBase for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10646: -- Summary: Update AbstractOperatorRestoreTestBase for 1.6 Key: FLINK-10646 URL: https://issues.apache.org/jira/browse/FLINK-10646 Project: Flink Issue

[jira] [Created] (FLINK-10647) Update WindowOperatorMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10647: -- Summary: Update WindowOperatorMigrationTest for 1.6 Key: FLINK-10647 URL: https://issues.apache.org/jira/browse/FLINK-10647 Project: Flink Issue Type

[jira] [Created] (FLINK-10646) Update AbstractOperatorRestoreTestBase for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10646: -- Summary: Update AbstractOperatorRestoreTestBase for 1.6 Key: FLINK-10646 URL: https://issues.apache.org/jira/browse/FLINK-10646 Project: Flink Issue

[jira] [Created] (FLINK-10647) Update WindowOperatorMigrationTest for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10647: -- Summary: Update WindowOperatorMigrationTest for 1.6 Key: FLINK-10647 URL: https://issues.apache.org/jira/browse/FLINK-10647 Project: Flink Issue Type

[jira] [Created] (FLINK-10645) Update StatefulJobSavepointMigrationITCase for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10645: -- Summary: Update StatefulJobSavepointMigrationITCase for 1.6 Key: FLINK-10645 URL: https://issues.apache.org/jira/browse/FLINK-10645 Project: Flink Issue

[jira] [Created] (FLINK-10645) Update StatefulJobSavepointMigrationITCase for 1.6

2018-10-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10645: -- Summary: Update StatefulJobSavepointMigrationITCase for 1.6 Key: FLINK-10645 URL: https://issues.apache.org/jira/browse/FLINK-10645 Project: Flink Issue

[jira] [Assigned] (FLINK-10630) Update migration tests for Flink 1.6

2018-10-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reassigned FLINK-10630: -- Assignee: Stefan Richter (was: vinoyang) > Update migration tests for Flink

[jira] [Closed] (FLINK-10567) Lost serialize fields when ttl state store with the mutable serializer

2018-10-18 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10567. -- Resolution: Fixed Fix Version/s: 1.6.3 Merged in: master: 4be1afa3a6 release-1.6

[jira] [Closed] (FLINK-4052) Unstable test ConnectionUtilsTest

2018-10-16 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-4052. - Resolution: Fixed Merged in: master: f195a6ed5c > Unstable test ConnectionUtilsT

[jira] [Closed] (FLINK-10565) Refactor SchedulerTestBase

2018-10-16 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10565. -- Resolution: Fixed Merged in: master: 05a4789042 > Refactor SchedulerTestB

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Stefan Richter
Hi, I see, then the important question for me is if the problem exists on the release/master code or just on your branches. Of course we can hardly give any advice for custom builds and without any code. In general, you should debug in HeapKeyedStateBackend lines lines 774-776 (the write part)

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Stefan Richter
Hi, I think it is rather unlikely that this is the problem because it should give a different kind of exception. Would it be possible to provide a minimal and self-contained example code for a problematic job? Best, Stefan > On 15. Oct 2018, at 08:29, Averell wrote: > > Hi everyone, > >

Re: Flink leaves a lot RocksDB sst files in tmp directory

2018-10-12 Thread Stefan Richter
Hi, Can you maybe show us what is inside of one of the directory instance? Furthermore, your TM logs show multiple instances of OutOfMemoryErrors, so that might also be a problem. Also how was the job moved? If a TM is killed, of course it cannot cleanup. That is why the data goes to tmp dir

Re: Large rocksdb state restore/checkpoint duration behavior

2018-10-10 Thread Stefan Richter
Hi, I would assume that the problem about blocked processing during a checkpoint is caused by [1], because you mentioned the use of RocksDB incremental checkpoints and it could be that you use it in combination with heap-based timers. This is the one combination that currently still uses a

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Stefan Richter
Hi, adding to Dawids questions, it would also be very helpful to know which Flink version was used to create the savepoint, which Flink version was used in the restore attempt, if the savepoint was moved or modified. Outside of potential conflicts with those things, I would not expect anything

[jira] [Closed] (FLINK-10289) Classify Exceptions to different category for apply different failover strategy

2018-10-09 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10289. -- Resolution: Implemented Merged in: master: 20b9326397 > Classify Exceptions to differ

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-10-08 Thread Stefan Richter
Hi Pedro, unfortunately the interesting parts are all removed from the log, we already know about the exception itself. In particular, what I would like to see is what checkpoints have been triggered and completed before the exception happens. Best, Stefan > Am 08.10.2018 um 10:23 schrieb

Re: Data loss when restoring from savepoint

2018-10-04 Thread Stefan Richter
ht be a factor. How would you assume that backpressure would influence your updates? Updates to each local state still happen event-by-event, in a single reader/writing thread. > > On Thu, Oct 4, 2018 at 4:18 PM Stefan Richter <mailto:s.rich...@data-artisans.com>> wrote: > Hi

Re: Data loss when restoring from savepoint

2018-10-04 Thread Stefan Richter
Hi, you could take a look at Bravo [1] to query your savepoints and to check if the state in the savepoint complete w.r.t your expectations. I somewhat doubt that there is a general problem with the state/savepoints because many users are successfully running it on a large state and I am not

[jira] [Created] (FLINK-10490) OperatorSnapshotUtil should probably use SavepointV2Serializer

2018-10-04 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10490: -- Summary: OperatorSnapshotUtil should probably use SavepointV2Serializer Key: FLINK-10490 URL: https://issues.apache.org/jira/browse/FLINK-10490 Project: Flink

[jira] [Created] (FLINK-10490) OperatorSnapshotUtil should probably use SavepointV2Serializer

2018-10-04 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10490: -- Summary: OperatorSnapshotUtil should probably use SavepointV2Serializer Key: FLINK-10490 URL: https://issues.apache.org/jira/browse/FLINK-10490 Project: Flink

[jira] [Created] (FLINK-10433) Stepwise creation of the ExecutionGraph sub-structures

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10433: -- Summary: Stepwise creation of the ExecutionGraph sub-structures Key: FLINK-10433 URL: https://issues.apache.org/jira/browse/FLINK-10433 Project: Flink

[jira] [Created] (FLINK-10433) Stepwise creation of the ExecutionGraph sub-structures

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10433: -- Summary: Stepwise creation of the ExecutionGraph sub-structures Key: FLINK-10433 URL: https://issues.apache.org/jira/browse/FLINK-10433 Project: Flink

[jira] [Created] (FLINK-10432) Introduce bulk/group-aware scheduling

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10432: -- Summary: Introduce bulk/group-aware scheduling Key: FLINK-10432 URL: https://issues.apache.org/jira/browse/FLINK-10432 Project: Flink Issue Type: Sub

[jira] [Created] (FLINK-10432) Introduce bulk/group-aware scheduling

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10432: -- Summary: Introduce bulk/group-aware scheduling Key: FLINK-10432 URL: https://issues.apache.org/jira/browse/FLINK-10432 Project: Flink Issue Type: Sub

[jira] [Created] (FLINK-10431) Extract scheduling-related code from SlotPool

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10431: -- Summary: Extract scheduling-related code from SlotPool Key: FLINK-10431 URL: https://issues.apache.org/jira/browse/FLINK-10431 Project: Flink Issue Type

[jira] [Created] (FLINK-10431) Extract scheduling-related code from SlotPool

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10431: -- Summary: Extract scheduling-related code from SlotPool Key: FLINK-10431 URL: https://issues.apache.org/jira/browse/FLINK-10431 Project: Flink Issue Type

[jira] [Created] (FLINK-10430) Extract scheduling-related code from Executions

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10430: -- Summary: Extract scheduling-related code from Executions Key: FLINK-10430 URL: https://issues.apache.org/jira/browse/FLINK-10430 Project: Flink Issue

[jira] [Created] (FLINK-10430) Extract scheduling-related code from Executions

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10430: -- Summary: Extract scheduling-related code from Executions Key: FLINK-10430 URL: https://issues.apache.org/jira/browse/FLINK-10430 Project: Flink Issue

[jira] [Created] (FLINK-10429) Redesign Flink Scheduling, introducing dedicated Scheduler component

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10429: -- Summary: Redesign Flink Scheduling, introducing dedicated Scheduler component Key: FLINK-10429 URL: https://issues.apache.org/jira/browse/FLINK-10429 Project

[jira] [Created] (FLINK-10429) Redesign Flink Scheduling, introducing dedicated Scheduler component

2018-09-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10429: -- Summary: Redesign Flink Scheduling, introducing dedicated Scheduler component Key: FLINK-10429 URL: https://issues.apache.org/jira/browse/FLINK-10429 Project

Re: Rocksdb Metrics

2018-09-25 Thread Stefan Richter
Hi, this feature is tracked here https://issues.apache.org/jira/browse/FLINK-10423 Best, Stefan > Am 25.09.2018 um 17:51 schrieb Sayat Satybaldiyev : > > Flink provides a rich number of metrics. However, I didn't find any metrics > for

Re: ArrayIndexOutOfBoundsException

2018-09-25 Thread Stefan Richter
der Smirnov > : > > Thanks Stefan. > > is it only Flink runtime should be updated, or the job should be recompiled > too? > Is there a workaround to start the job without upgrading Flink? > > Alex > > On Tue, Sep 25, 2018 at 5:48 PM Stefan Richter <mailto:s.

Re: ArrayIndexOutOfBoundsException

2018-09-25 Thread Stefan Richter
Hi, this problem looks like https://issues.apache.org/jira/browse/FLINK-8836 which would also match to your Flink version. I suggest to update to 1.4.3 or higher to avoid the issue in the future. Best, Stefan > Am 25.09.2018 um 16:37 schrieb

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-25 Thread Stefan Richter
Hi, I cannot spot anything bad or „wrong“ about your job configuration. Maybe you can try to save and send the logs if it happens again? Did you observe this only once, often, or is it something that is even reproduceable? Best, Stefan > Am 24.09.2018 um 10:15 schrieb PedroMrChaves : > >

[jira] [Closed] (FLINK-10157) Allow `null` user values in map state with TTL

2018-09-21 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10157. -- Resolution: Fixed Fix Version/s: 1.6.2 1.7.0 Merged in: master

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-21 Thread Stefan Richter
Hi, could you provide some logs for this problematic job because I would like to double check the reason why this violated precondition did actually happen? Thanks, Stefan > Am 20.09.2018 um 17:24 schrieb Stefan Richter : > > FYI, here a link to my PR: https://github.com/apache/f

Re: Flink can't initialize operator state backend when starting from checkpoint

2018-09-21 Thread Stefan Richter
Hi, that is correct. If you are using custom serializers you should double check their correctness, maybe using our test base for type serializers. Another reason could be that you modified the job in a way that silently changed the schema somehow. Concurrent use of serializers across

Re: multiple flink applications on yarn are shown in one application.

2018-09-21 Thread Stefan Richter
gt; I browse the yarn application. As the picture shows I got 2 applications(0013 > / 0012) but the job in both applications is the same. I can’t find the job > submitted secondly. The job in application_XXX_0013 should be > rpt_company_user_s. This will not happen in session mode.

[jira] [Commented] (FLINK-10382) Writer has already been opened while using AvroKeyValueSinkWriter and BucketingSink

2018-09-20 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622246#comment-16622246 ] Stefan Richter commented on FLINK-10382: Thanks for reporting this. [~aljoscha

Re: Flink takes 40ms ~ 100ms to proceed from one operator to another?

2018-09-20 Thread Stefan Richter
Oh yes exactly, enable is right. > Am 20.09.2018 um 17:48 schrieb Hequn Cheng : > > Hi Stefan, > > Do you mean enable object reuse? > If you want to reduce latency between chained operators, you can also try to > disable object-reuse: > > On Thu, Sep 20, 2018

[jira] [Commented] (FLINK-10374) [Map State] Let user value serializer handle null values

2018-09-20 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622213#comment-16622213 ] Stefan Richter commented on FLINK-10374: Thanks for creating a Jira to track this! +1 > [

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-20 Thread Stefan Richter
FYI, here a link to my PR: https://github.com/apache/flink/pull/6723 > Am 20.09.2018 um 14:52 schrieb Stefan Richter : > > Hi, > > I think the failing precondition is too strict because sometimes a checkpoint > can overtake another checkpoint and in that case the commit is a

[jira] [Updated] (FLINK-10377) Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete

2018-09-20 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter updated FLINK-10377: --- Description: The precondition {{checkState(pendingTransactionIterator.hasNext

Re: Unit/Integration test for stateful source

2018-09-20 Thread Stefan Richter
Hi, maybe you can use AbstractStreamOperatorTestHarness to test your source, including the snapshotting. You can take a look at the tests of some other source, e.g. StatefulSequenceSourceTest#testCheckpointRestore. Best, Stefan > Am 20.09.2018 um 15:29 schrieb Darshan Singh : > > Hi, > > I

Re: Flink takes 40ms ~ 100ms to proceed from one operator to another?

2018-09-20 Thread Stefan Richter
Sorry, forgot the link for reference [1], which is https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/datastream_api.html#controlling-latency > Am 20.09.2018 um 16:36 schrieb Stefan Richter : > > Hi, > > you provide not very much information for this question, e.

Re: Flink takes 40ms ~ 100ms to proceed from one operator to another?

2018-09-20 Thread Stefan Richter
Hi, you provide not very much information for this question, e.g. what and how exactly your measure, if this is a local or distributed setting etc. I assume that it is distributed and that the cause for your observation is the buffer timeout, i.e. the maximum time that Flink waits until

[jira] [Created] (FLINK-10377) Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete

2018-09-20 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10377: -- Summary: Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete Key: FLINK-10377 URL: https://issues.apache.org/jira/browse/FLINK-10377

[jira] [Created] (FLINK-10377) Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete

2018-09-20 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10377: -- Summary: Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete Key: FLINK-10377 URL: https://issues.apache.org/jira/browse/FLINK-10377

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-20 Thread Stefan Richter
Hi, I think the failing precondition is too strict because sometimes a checkpoint can overtake another checkpoint and in that case the commit is already subsumed. I will open a Jira and PR with a fix. Best, Stefan > Am 19.09.2018 um 10:04 schrieb PedroMrChaves : > > Hello, > > I have a

Re: Trying to figure out why a slot takes a long time to checkpoint

2018-09-20 Thread Stefan Richter
A debug log for state backend and checkpoint coordinator could also help. > Am 20.09.2018 um 14:19 schrieb Stefan Richter : > > Hi, > > if some tasks take like 50 minutes, could you wait until such a checkpoint is > in progress and (let’s say after 10 minutes) log into t

Re: Trying to figure out why a slot takes a long time to checkpoint

2018-09-20 Thread Stefan Richter
Hi, if some tasks take like 50 minutes, could you wait until such a checkpoint is in progress and (let’s say after 10 minutes) log into the node and create a (or multiple over time) thread-dump(s) for the JVM that runs the slow checkpointing task. This could help to figure out where it is

Re: How to get the location of keytab when using flink on yarn

2018-09-20 Thread Stefan Richter
Hi, maybe Aljoscha or Eron (both in CC) can help you with this problem, I think they might know best about the Kerberos security. Best, Stefan > Am 20.09.2018 um 11:20 schrieb 杨光 : > > Hi, > i am using the " per-job YARN session " mode deploy flink job on yarn and my > flink > version is

Re: Writer has already been opened on BucketingSink to S3

2018-09-20 Thread Stefan Richter
Hi, thanks for putting some effort into debugging the problem. Could you open a Jira with the problem and your analysis so that we can discuss how to proceed with it? Best, Stefan > Am 18.09.2018 um 23:16 schrieb Chengzhi Zhao : > > After checking the code, I think the issue might be related

Re: S3 connector Hadoop class mismatch

2018-09-20 Thread Stefan Richter
Hi, I could not find any open Jira for the problem you describe. Could you please open one? Best, Stefan > Am 19.09.2018 um 09:54 schrieb Paul Lam : > > Hi, > > I’m using StreamingFileSink of 1.6.0 to write logs to S3 and encounter a > classloader problem. It seems that there are conflicts

Re: multiple flink applications on yarn are shown in one application.

2018-09-20 Thread Stefan Richter
Hi, currently, Flink still has to use session mode under the hood if you submit the job in attached-mode. The reason is that the job could consists of multiple parts that require to run one after the other. This will be changed in the future and also should not happen if you submit the job

Re: Checkpointing not working

2018-09-20 Thread Stefan Richter
Hi, in the absence of any logs, my guess would be that your checkpoints are just not able to complete within 10 seconds, the state might be to large or the network and fs to slow. Are you using full or incremental checkpoints? For your relative small interval, I suggest that you try using

Re: Why FlinkKafkaConsumer08 does not support exactly once or at least once semantic?

2018-09-20 Thread Stefan Richter
Hi, I think this part of the documentation is talking about KafkaProducer, and you are reading in the source code of KafkaConsumer. Best, Stefan > Am 20.09.2018 um 10:48 schrieb 徐涛 : > > Hi All, > In document of Flink 1.6, it says that "Before 0.9 Kafka did not > provide any mechanisms

Re: Flink 1.5.2 process keeps reference to deleted blob files.

2018-09-20 Thread Stefan Richter
Hi, I think it would be very helpful if you could identify what data is behind. For example, I could imagine that it can be a jar file that was used by the TM and some classes are still in use or loaded by a classloader that was not yet GCed. Depending on that, there could be a problem in the

Re: How to use checkpoint in flink1.5.3

2018-09-20 Thread Stefan Richter
Hi, did you introduce some custom modifications to the code? Your stack trace does not match the lines in the code of release-1.5.3, e.g. line 230 is not in method internalTimeServiceManager(…) which makes it hard to draw any conclusions. Best, Stefan > Am 19.09.2018 um 14:03 schrieb

Re: [PATCH] firewire: nosy: don't read packets bigger than requested

2018-09-18 Thread Stefan Richter
atch and (b) clean out my mailbox and update my mail sorting filters (long overdue after my mail service provider changed backends). Again sorry, and thank you for your extraordinary patience. -- Stefan Richter -==---=- =--= =--=- http://arcgraph.de/sr/

Re: [PATCH] firewire: nosy: don't read packets bigger than requested

2018-09-18 Thread Stefan Richter
atch and (b) clean out my mailbox and update my mail sorting filters (long overdue after my mail service provider changed backends). Again sorry, and thank you for your extraordinary patience. -- Stefan Richter -==---=- =--= =--=- http://arcgraph.de/sr/

Re: Trying to figure out why a slot takes a long time to checkpoint

2018-09-18 Thread Stefan Richter
Adding to my previous email, I start to doubt a little bit about the explanation because also alignment times are very low. Could it be possible that it takes very long for the checkpoint operation (for whatever reason) to get the checkpointing lock? > Am 18.09.2018 um 11:58 schrieb Ste

Re: Trying to figure out why a slot takes a long time to checkpoint

2018-09-18 Thread Stefan Richter
Hi, from your screenshot, it looks like everything is running fine as soon as the snapshots are actually running, sync and async part times are normal. So I think the explanation is the time that the checkpoint barrier needs to reach this particular operator. It seems like there is a large

Re: Flink 1.3.2 RocksDB map segment fail if configured as state backend

2018-09-17 Thread Stefan Richter
Hi, I think the exception pretty much says what is wrong, the native library cannot be mapped into the process because of some access rights problem. Please make sure that your path /tmp has the exec right. Best, Stefan > Am 17.09.2018 um 11:37 schrieb Andrea Spina : > > Hi everybody, > > I

[jira] [Commented] (FLINK-10346) MemoryStateBackend does not clean up checkpoint directory

2018-09-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617533#comment-16617533 ] Stefan Richter commented on FLINK-10346: I think the checkpoint directory is the basis to create

[jira] [Closed] (FLINK-10267) [State] Fix arbitrary iterator access on RocksDBMapIterator

2018-09-11 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10267. -- Resolution: Fixed Merged in: master: 16dc5978c7 release-1.6: a19e1ee3ba release-1.5

[jira] [Commented] (FLINK-9486) Introduce TimerState in keyed state backend

2018-09-11 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610602#comment-16610602 ] Stefan Richter commented on FLINK-9486: --- Exactly, that's how it is right now. In the future, we

[jira] [Commented] (FLINK-9486) Introduce TimerState in keyed state backend

2018-09-11 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610497#comment-16610497 ] Stefan Richter commented on FLINK-9486: --- Hi, you can migrate an old savepoint from earlier Flink

[jira] [Closed] (FLINK-9735) Potential resource leak in RocksDBStateBackend#getDbOptions

2018-09-10 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9735. - Resolution: Fixed Merged in: master: 1a94c2094b > Potential resource l

Re: [ANNOUNCE] New committer Gary Yao

2018-09-07 Thread Stefan Richter
Congrats Gary! > Am 07.09.2018 um 15:14 schrieb Till Rohrmann : > > Hi everybody, > > On behalf of the PMC I am delighted to announce Gary Yao as a new Flink > committer! > > Gary started contributing to the project in June 2017. He helped with the > Flip-6 implementation, implemented many of

Re: After OutOfMemoryError State can not be readed

2018-09-07 Thread Stefan Richter
Hi, what I can say is that any failures like OOMs should not corrupt checkpoint files, because only successfully completed checkpoints are used for recovery by the job manager. Just to get a bit more info, are you using full or incremental checkpoints? Unfortunately, it is a bit hard to say

[jira] [Comment Edited] (FLINK-10297) PostVersionedIOReadableWritable ignores result of InputStream.read(...)

2018-09-07 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606844#comment-16606844 ] Stefan Richter edited comment on FLINK-10297 at 9/7/18 8:35 AM: Yes

[jira] [Commented] (FLINK-10297) PostVersionedIOReadableWritable ignores result of InputStream.read(...)

2018-09-07 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606844#comment-16606844 ] Stefan Richter commented on FLINK-10297: Yes, that is a very common trap that can have very bad

[jira] [Created] (FLINK-10297) PostVersionedIOReadableWritable ignores result of InputStream.read(...)

2018-09-07 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10297: -- Summary: PostVersionedIOReadableWritable ignores result of InputStream.read(...) Key: FLINK-10297 URL: https://issues.apache.org/jira/browse/FLINK-10297 Project

[jira] [Created] (FLINK-10297) PostVersionedIOReadableWritable ignores result of InputStream.read(...)

2018-09-07 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10297: -- Summary: PostVersionedIOReadableWritable ignores result of InputStream.read(...) Key: FLINK-10297 URL: https://issues.apache.org/jira/browse/FLINK-10297 Project

[jira] [Commented] (FLINK-8098) LeaseExpiredException when using FsStateBackend for checkpointing due to multiple mappers tries to access the same file.

2018-09-06 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605446#comment-16605446 ] Stefan Richter commented on FLINK-8098: --- Maybe you could first give us some more details

Re: Increased Size of Incremental Checkpoint

2018-09-06 Thread Stefan Richter
Hi, you should expect that the size can vary for some checkpoints, even if the change rate is constant. Some checkpoints will upload compacted replacements for previous checkpoints to prevent that the checkpoint history will grow without bounds. Whenever that happens, the checkpoint does some

[jira] [Created] (FLINK-10200) Consolidate FileBasedStateOutputStream and FsCheckpointStateOutputStream

2018-08-22 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10200: -- Summary: Consolidate FileBasedStateOutputStream and FsCheckpointStateOutputStream Key: FLINK-10200 URL: https://issues.apache.org/jira/browse/FLINK-10200 Project

[jira] [Created] (FLINK-10200) Consolidate FileBasedStateOutputStream and FsCheckpointStateOutputStream

2018-08-22 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10200: -- Summary: Consolidate FileBasedStateOutputStream and FsCheckpointStateOutputStream Key: FLINK-10200 URL: https://issues.apache.org/jira/browse/FLINK-10200 Project

[jira] [Created] (FLINK-10198) Set Env object in DBOptions for RocksDB

2018-08-22 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-10198: -- Summary: Set Env object in DBOptions for RocksDB Key: FLINK-10198 URL: https://issues.apache.org/jira/browse/FLINK-10198 Project: Flink Issue Type

[jira] [Closed] (FLINK-10176) Replace ByteArrayData[Input|Output]View with Data[Output|InputDe]Serializer

2018-08-22 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10176. -- Resolution: Implemented Merged in: master: 3fd6587 > Replace ByteArrayData[Input|Output]V

[jira] [Closed] (FLINK-10175) Fix concurrent access to shared buffer in map state / querable state

2018-08-22 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10175. -- Resolution: Fixed Merged in: master: 2b6beed > Fix concurrent access to shared buffer in

[jira] [Closed] (FLINK-10151) [State TTL] Fix false recursion call in TransformingStateTableKeyGroupPartitioner.tryAddToSource

2018-08-21 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10151. -- Resolution: Fixed Merged in: master: c73747b release-1.6: 00a2e81 > [State TTL] Fix fa

[jira] [Closed] (FLINK-10042) Extract snapshot algorithms from inner classes into full classes

2018-08-21 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10042. -- Resolution: Implemented Fix Version/s: 1.7.0 Merged in: master: f803280bb9 > Extr

[jira] [Closed] (FLINK-10068) Add documentation for async/RocksDB-based timers

2018-08-20 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-10068. -- Resolution: Implemented Fix Version/s: 1.7.0 1.6.1 Merged

[jira] [Commented] (FLINK-10176) Replace ByteArrayData[Input|Output]View with Data[Output|InputDe]Serializer

2018-08-20 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585700#comment-16585700 ] Stefan Richter commented on FLINK-10176: Yes, they almost did what I was looking for. In my PR

<    1   2   3   4   5   6   7   8   9   10   >