Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-20 Thread Stefan Richter
ers anyone might have. > > Thanks, Ashish > > On Monday, June 18, 2018, 12:54:03 PM EDT, ashish pok > wrote: > > > Right, thats where I am headed now but was wondering there are any “gochas” I > am missing before I try and dig into a few gigs of heap dump. > > >

[jira] [Closed] (FLINK-9571) Switch to internal states in StateBinder

2018-06-18 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9571. - Resolution: Implemented Merged in: master: 0bdde8377c > Switch to internal sta

[jira] [Closed] (FLINK-9601) Snapshot of CopyOnWriteStateTable will failed when the amount of record is more than MAXIMUM_CAPACITY

2018-06-18 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9601. - Resolution: Fixed Merged in: master: 0e9b066aab > Snapshot of CopyOnWriteStateTable will fai

Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-18 Thread Stefan Richter
Hi, can you take a heap dump from a JVM that runs into the problem and share it with us? That would make finding the cause a lot easier. Best, Stefan > Am 15.06.2018 um 23:01 schrieb ashish pok : > > All, > > I have another slow Memory Leak situation using basic TimeSession Window >

[jira] [Closed] (FLINK-9487) Prepare InternalTimerHeap for asynchronous snapshots

2018-06-15 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9487. - Resolution: Implemented Merged in: master: 7e0eafa74d > Prepare InternalTimerH

[jira] [Closed] (FLINK-9506) Flink ReducingState.add causing more than 100% performance drop

2018-06-13 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9506. - Resolution: Not A Problem > Flink ReducingState.add causing more than 100% performance d

[jira] [Commented] (FLINK-9506) Flink ReducingState.add causing more than 100% performance drop

2018-06-13 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510870#comment-16510870 ] Stefan Richter commented on FLINK-9506: --- [~yow] I would suggest that you discuss it on the user

[jira] [Commented] (FLINK-9506) Flink ReducingState.add causing more than 100% performance drop

2018-06-13 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510747#comment-16510747 ] Stefan Richter commented on FLINK-9506: --- Hi, I think this discussion has no connection

Re: Clarity on Flink 1.5 Rescale mechanism

2018-06-12 Thread Stefan Richter
Hi, it means that you can now modify the parallelism of a running job with a new „modify“ command in the CLI, see [1]. Adding task manager will add their offered slots to the pool of available slots, it will not automatically change the parallelism. Best, Stefan [1]

Re: Stopping of a streaming job empties state store on HDFS

2018-06-11 Thread Stefan Richter
Hi, > Am 08.06.2018 um 01:16 schrieb Peter Zende : > > Hi all, > > We have a streaming pipeline (Flink 1.4.2) for which we implemented stoppable > sources to be able to gracefully exit from the job with Yarn state > "finished/succeeded". > This works fine, however after creating a savepoint,

Re: Having a backoff while experiencing checkpointing failures

2018-06-11 Thread Stefan Richter
Hi, I think the behaviour of min_pause_between_checkpoints is either buggy or we should at least discuss if it would not be better to respect a pause also for failed checkpoints. As far as I know there is no ongoing work to add backoff, so I suggest you open a jira issue and make a case for

[jira] [Closed] (FLINK-9440) Allow cancelation and reset of timers

2018-06-05 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9440. - Resolution: Fixed Fix Version/s: 1.6.0 Merged in: master: a0f4239fae > Allow cancelat

[jira] [Closed] (FLINK-8790) Improve performance for recovery from incremental checkpoint

2018-06-05 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8790. - Resolution: Fixed Merged in: master: bbf7ff2273 > Improve performance for recovery f

[jira] [Closed] (FLINK-7866) Weigh list of preferred locations for scheduling

2018-06-04 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-7866. - Resolution: Fixed Merged in: master: 8868ff5b05 > Weigh list of preferred locati

[jira] [Commented] (FLINK-9506) Flink ReducingState.add causing more than 100% performance drop

2018-06-04 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500165#comment-16500165 ] Stefan Richter commented on FLINK-9506: --- [~yow] I had another look at your code and can point out

[jira] [Commented] (FLINK-9506) Flink ReducingState.add causing more than 100% performance drop

2018-06-04 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499947#comment-16499947 ] Stefan Richter commented on FLINK-9506: --- >From what I can see, the problem is purely rela

[jira] [Updated] (FLINK-9491) Implement timer data structure based on RocksDB

2018-06-01 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter updated FLINK-9491: -- Summary: Implement timer data structure based on RocksDB (was: Implement timer service based

[jira] [Created] (FLINK-9491) Implement timer service based on RocksDB

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9491: - Summary: Implement timer service based on RocksDB Key: FLINK-9491 URL: https://issues.apache.org/jira/browse/FLINK-9491 Project: Flink Issue Type: Sub

[jira] [Created] (FLINK-9490) Provide backwards compatibility for timer state of Flink 1.5

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9490: - Summary: Provide backwards compatibility for timer state of Flink 1.5 Key: FLINK-9490 URL: https://issues.apache.org/jira/browse/FLINK-9490 Project: Flink

[jira] [Created] (FLINK-9490) Provide backwards compatibility for timer state of Flink 1.5

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9490: - Summary: Provide backwards compatibility for timer state of Flink 1.5 Key: FLINK-9490 URL: https://issues.apache.org/jira/browse/FLINK-9490 Project: Flink

[jira] [Updated] (FLINK-9490) Provide backwards compatibility for timer state of Flink 1.5

2018-06-01 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter updated FLINK-9490: -- Fix Version/s: 1.6.0 Component/s: State Backends, Checkpointing > Provide backwa

[jira] [Created] (FLINK-9489) Checkpoint timers as part of managed keyed state instead of raw keyed state

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9489: - Summary: Checkpoint timers as part of managed keyed state instead of raw keyed state Key: FLINK-9489 URL: https://issues.apache.org/jira/browse/FLINK-9489 Project

[jira] [Created] (FLINK-9489) Checkpoint timers as part of managed keyed state instead of raw keyed state

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9489: - Summary: Checkpoint timers as part of managed keyed state instead of raw keyed state Key: FLINK-9489 URL: https://issues.apache.org/jira/browse/FLINK-9489 Project

[jira] [Commented] (FLINK-9486) Introduce TimerState in keyed state backend

2018-06-01 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497952#comment-16497952 ] Stefan Richter commented on FLINK-9486: --- Hi, I am already creating the issues for planning, but I

[jira] [Created] (FLINK-9487) Prepare InternalTimerHeap for asynchronous snapshots

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9487: - Summary: Prepare InternalTimerHeap for asynchronous snapshots Key: FLINK-9487 URL: https://issues.apache.org/jira/browse/FLINK-9487 Project: Flink Issue

[jira] [Assigned] (FLINK-9486) Introduce TimerState in keyed state backend

2018-06-01 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reassigned FLINK-9486: - Assignee: Stefan Richter > Introduce TimerState in keyed state back

[jira] [Created] (FLINK-9487) Prepare InternalTimerHeap for asynchronous snapshots

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9487: - Summary: Prepare InternalTimerHeap for asynchronous snapshots Key: FLINK-9487 URL: https://issues.apache.org/jira/browse/FLINK-9487 Project: Flink Issue

[jira] [Created] (FLINK-9486) Introduce TimerState in keyed state backend

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9486: - Summary: Introduce TimerState in keyed state backend Key: FLINK-9486 URL: https://issues.apache.org/jira/browse/FLINK-9486 Project: Flink Issue Type: Sub

[jira] [Created] (FLINK-9486) Introduce TimerState in keyed state backend

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9486: - Summary: Introduce TimerState in keyed state backend Key: FLINK-9486 URL: https://issues.apache.org/jira/browse/FLINK-9486 Project: Flink Issue Type: Sub

[jira] [Created] (FLINK-9485) Improving Flink’s timer management for large state

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9485: - Summary: Improving Flink’s timer management for large state Key: FLINK-9485 URL: https://issues.apache.org/jira/browse/FLINK-9485 Project: Flink Issue

[jira] [Created] (FLINK-9485) Improving Flink’s timer management for large state

2018-06-01 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9485: - Summary: Improving Flink’s timer management for large state Key: FLINK-9485 URL: https://issues.apache.org/jira/browse/FLINK-9485 Project: Flink Issue

[jira] [Closed] (FLINK-9423) Implement efficient deletes for heap based timer service

2018-05-31 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9423. - Resolution: Fixed Merged in: master: ff0b9c1eed > Implement efficient deletes for heap ba

[jira] [Closed] (FLINK-9436) Remove generic parameter namespace from InternalTimeServiceManager

2018-05-31 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9436. - Resolution: Fixed Merged in: master: 57b950796d > Remove generic parameter namespace f

[jira] [Commented] (FLINK-9480) Let local recovery support rescaling

2018-05-30 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494897#comment-16494897 ] Stefan Richter commented on FLINK-9480: --- Maybe [~StephanEwen] or [~till.rohrmann] can also give

[jira] [Commented] (FLINK-9480) Let local recovery support rescaling

2018-05-30 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494892#comment-16494892 ] Stefan Richter commented on FLINK-9480: --- Can you give some more details why you think

[jira] [Commented] (FLINK-9268) RockDB errors from WindowOperator

2018-05-30 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494851#comment-16494851 ] Stefan Richter commented on FLINK-9268: --- Please notice that this is our attempt to "fix"

[jira] [Assigned] (FLINK-9440) Allow cancelation and reset of timers

2018-05-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reassigned FLINK-9440: - Assignee: Stefan Richter > Allow cancelation and reset of tim

[jira] [Closed] (FLINK-9153) TaskManagerRunner should support rpc port range

2018-05-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9153. - Resolution: Fixed Fix Version/s: 1.6.0 Merged in: master: 7c90447849 release-1.5

[jira] [Comment Edited] (FLINK-9450) Job hangs if S3 access it denied during checkpoints

2018-05-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492587#comment-16492587 ] Stefan Richter edited comment on FLINK-9450 at 5/28/18 11:55 AM: - Do you

[jira] [Commented] (FLINK-9450) Job hangs if S3 access it denied during checkpoints

2018-05-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492587#comment-16492587 ] Stefan Richter commented on FLINK-9450: --- Do you have some logs for this problem, and/or a thread

[jira] [Commented] (FLINK-9440) Allow cancelation and reset of timers

2018-05-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492547#comment-16492547 ] Stefan Richter commented on FLINK-9440: --- Yes, was just about to answer to this. Once FLINK-9423

[jira] [Closed] (FLINK-9355) Simplify configuration of local recovery to a simple on/off

2018-05-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9355. - Resolution: Fixed Fix Version/s: 1.6.0 Merged in: master: 7f42259 release-1.5: b907af2ed6

Re: [PROPOSAL] Improving Flink’s timer management for large state

2018-05-28 Thread Stefan Richter
the most highly anticipated > features from Flink users, and it's finally coming, officially. I also > would love to see bringing timer more closely to state backend, for the > sake of easier development and maintenance of code. > > On Fri, May 25, 2018 at 7:13 AM, Stefan Richte

[PROPOSAL] Improving Flink’s timer management for large state

2018-05-25 Thread Stefan Richter
Hi, I am currently planning how to improve Flink’s timer management for large state. In particular, I would like to introduce timer state that is managed in RocksDB and also to improve the capabilities of the heap-based timer service, e.g. support for asynchronous checkpoints. You can find a

Re: Increasing Disk Read Throughput and IOPS

2018-05-25 Thread Stefan Richter
ist-archive.2336050.n4.nabble.com/checkpoint-stuck-with-rocksdb-statebackend-and-s3-filesystem-td18679.html> > Am 25.05.2018 um 09:59 schrieb Stefan Richter <s.rich...@data-artisans.com>: > > Hi, > > if the problem is seemingly from reads, I think incremental checkpoints

[jira] [Created] (FLINK-9436) Remove generic parameter namespace from InternalTimeServiceManager

2018-05-25 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9436: - Summary: Remove generic parameter namespace from InternalTimeServiceManager Key: FLINK-9436 URL: https://issues.apache.org/jira/browse/FLINK-9436 Project: Flink

[jira] [Created] (FLINK-9436) Remove generic parameter namespace from InternalTimeServiceManager

2018-05-25 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9436: - Summary: Remove generic parameter namespace from InternalTimeServiceManager Key: FLINK-9436 URL: https://issues.apache.org/jira/browse/FLINK-9436 Project: Flink

Re: Kryo Exception

2018-05-25 Thread Stefan Richter
I agree, it looks like one of the two mentioned issues. > Am 25.05.2018 um 06:15 schrieb sihua zhou : > > Hi Gordon, > > I think this might not be caused by > https://issues.apache.org/jira/browse/FLINK-9263 > , the bug in

Re: Increasing Disk Read Throughput and IOPS

2018-05-25 Thread Stefan Richter
Hi, if the problem is seemingly from reads, I think incremental checkpoints are less likely to cause the problem. What Flink version are you using? Since you mentioned the use of map state, what comes to my mind as a potential cause is described in this issue

[jira] [Closed] (FLINK-9426) Harden RocksDBWriteBatchPerformanceTest.benchMark()

2018-05-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9426. - Resolution: Fixed Fix Version/s: 1.5.1 Merged in: master: b485f8cc60 release-1.5

[jira] [Closed] (FLINK-9064) Add Scaladocs link to documentation

2018-05-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9064. - Resolution: Fixed Fix Version/s: 1.5.1 1.6.0 Merged in: master

[jira] [Updated] (FLINK-9423) Implement efficient deletes for heap based timer service

2018-05-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter updated FLINK-9423: -- Description: The current data structures in the `HeapInternalTimerService` are not able

[jira] [Assigned] (FLINK-9423) Implement efficient deletes for heap based timer service

2018-05-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reassigned FLINK-9423: - Assignee: Stefan Richter > Implement efficient deletes for heap based timer serv

[jira] [Created] (FLINK-9423) Implement efficient deletes for heap based timer service

2018-05-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9423: - Summary: Implement efficient deletes for heap based timer service Key: FLINK-9423 URL: https://issues.apache.org/jira/browse/FLINK-9423 Project: Flink

Re: [VOTE] Enable GitBox integration (#2)

2018-05-23 Thread Stefan Richter
+1 > Am 23.05.2018 um 14:31 schrieb Stephan Ewen : > > +1 > > On Wed, May 23, 2018 at 11:11 AM, Fabian Hueske wrote: > >> +1 >> >> 2018-05-23 8:49 GMT+02:00 Aljoscha Krettek : >> >>> +1 >>> On 22. May 2018, at 15:36, Thomas

[jira] [Closed] (FLINK-8845) Use WriteBatch to improve performance for recovery in RocksDB backend

2018-05-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8845. - Resolution: Fixed Fix Version/s: 1.5.1 Merged in: master: 1c7341ad1a release-1.5

Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-05-23 Thread Stefan Richter
Hi, In general, I like the proposal as well. We should try to integrate all forms of keyed state with the backend, to avoid the problems that we are currently facing with the timer service. We should discuss which exact implementation of bloom filters are the best fit. @Fabian: There are also

Re: Missing MapState when Timer fires after restored state

2018-05-18 Thread Stefan Richter
effects. > > As for the bug I faced, indeed I was able to reproduce it consistently. And I > have provided TRACE-level logs personally to Stefan. If there is no Jira > ticket for this yet, would you like me to create one? > > On Thu, May 17, 2018 at 1:00 PM, Stefan Richter <

Re: are there any ways to test the performance of rocksdb state backend?

2018-05-18 Thread Stefan Richter
Hi, Sihua is right, of course we would like to update our RocksDB version but are currently blocked on a performance regression. Here is our issue in the RocksDB tracker for this: https://github.com/facebook/rocksdb/issues/3865 . Best, Stefan

[jira] [Commented] (FLINK-9375) Introduce AbortCheckpoint message from JM to TMs

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479298#comment-16479298 ] Stefan Richter commented on FLINK-9375: --- [~yanghua] this task maybe a bit more tricky than it sounds

[jira] [Closed] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9373. - Resolution: Fixed Fix Version/s: 1.6.0 Merged in: master: 105b30686f release-1.5

[jira] [Commented] (FLINK-9390) Shutdown of KafkaProducer causes confusing log message

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478941#comment-16478941 ] Stefan Richter commented on FLINK-9390: --- >From the log, it seems that Kafka09 connectors are u

[jira] [Commented] (FLINK-9390) Shutdown of KafkaProducer causes confusing log message

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478936#comment-16478936 ] Stefan Richter commented on FLINK-9390: --- [~triones] Unfortunately I don't know a way to reproduce

[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478866#comment-16478866 ] Stefan Richter commented on FLINK-9373: --- Great, thanks a lot! > Fix potential data los

[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478861#comment-16478861 ] Stefan Richter commented on FLINK-9373: --- [~sihuazhou] Will you be able to do this quickly or can I

[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478857#comment-16478857 ] Stefan Richter commented on FLINK-9373: --- Well, but we are also at risk to have only a partial fix

[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478842#comment-16478842 ] Stefan Richter commented on FLINK-9373: --- Maybe we can also take the PR basically "as is"

[jira] [Created] (FLINK-9390) Shutdown of KafkaProducer causes confusing log message

2018-05-17 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9390: - Summary: Shutdown of KafkaProducer causes confusing log message Key: FLINK-9390 URL: https://issues.apache.org/jira/browse/FLINK-9390 Project: Flink Issue

[jira] [Created] (FLINK-9390) Shutdown of KafkaProducer causes confusing log message

2018-05-17 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9390: - Summary: Shutdown of KafkaProducer causes confusing log message Key: FLINK-9390 URL: https://issues.apache.org/jira/browse/FLINK-9390 Project: Flink Issue

[jira] [Created] (FLINK-9388) Inconsistency in job shutdown produces confusing log message

2018-05-17 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9388: - Summary: Inconsistency in job shutdown produces confusing log message Key: FLINK-9388 URL: https://issues.apache.org/jira/browse/FLINK-9388 Project: Flink

[jira] [Created] (FLINK-9388) Inconsistency in job shutdown produces confusing log message

2018-05-17 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9388: - Summary: Inconsistency in job shutdown produces confusing log message Key: FLINK-9388 URL: https://issues.apache.org/jira/browse/FLINK-9388 Project: Flink

[jira] [Closed] (FLINK-8910) Introduce automated end-to-end test for local recovery (including sticky scheduling)

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8910. - Resolution: Fixed > Introduce automated end-to-end test for local recovery (including sti

[jira] [Updated] (FLINK-8910) Introduce automated end-to-end test for local recovery (including sticky scheduling)

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter updated FLINK-8910: -- Fix Version/s: 1.5.0 > Introduce automated end-to-end test for local recovery (including sti

[jira] [Reopened] (FLINK-8910) Introduce automated end-to-end test for local recovery (including sticky scheduling)

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reopened FLINK-8910: --- Change fix version > Introduce automated end-to-end test for local recovery (including sti

[jira] [Closed] (FLINK-8910) Introduce automated end-to-end test for local recovery (including sticky scheduling)

2018-05-17 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8910. - Resolution: Fixed Release Note: We changed the default SLOT_IDLE_TIMEOUT

Re: [DISCUSS] GitBox

2018-05-16 Thread Stefan Richter
+1 > Am 16.05.2018 um 12:40 schrieb Chesnay Schepler : > > Hello, > > during the discussion about how to better manage pull requests [1] the topic > of GitBox integration came up again. > > This seems like a good opportunity to restart this discussion that we had > about

Re: Missing MapState when Timer fires after restored state

2018-05-16 Thread Stefan Richter
se but INFO > level logs: > > > > > > > > > > > > > > > > > > Maybe I need to edit the log4j.properties instead(?). Indeed it's Flink >

[jira] [Created] (FLINK-9375) Introduce AbortCheckpoint message from JM to TMs

2018-05-16 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9375: - Summary: Introduce AbortCheckpoint message from JM to TMs Key: FLINK-9375 URL: https://issues.apache.org/jira/browse/FLINK-9375 Project: Flink Issue Type

[jira] [Created] (FLINK-9375) Introduce AbortCheckpoint message from JM to TMs

2018-05-16 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9375: - Summary: Introduce AbortCheckpoint message from JM to TMs Key: FLINK-9375 URL: https://issues.apache.org/jira/browse/FLINK-9375 Project: Flink Issue Type

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Stefan Richter
t simply exceed the limit. If you have an overview how many parallel operator instances with keyed state were running on the machine and assume some reasonable number of files per RocksDB instance and the limit configured in your OS, could that be the case? > > Thanks! Thanks for you

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Stefan Richter
open files“ problem only happen with local recovery (asking since it should actually not add the the amount of open files), and did you deactivate it on the second cluster for the restart or changed your OS settings? > Am 15.05.2018 um 10:09 schrieb Stefan Richter <s.rich...@data-artisa

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Stefan Richter
e tweaking, because I > don't want to tangle with the same kafka consumer group offsets or send old > data again to production endpoint. > > Please keep in mind that there was that Too Many Open Files issue on the > cluster that created the problematic checkpoint, if you think that's relevant. >

Re: Missing MapState when Timer fires after restored state

2018-05-15 Thread Stefan Richter
Hi, I agree, this looks like a bug. Can you tell us your exact configuration of the state backend, e.g. if you are using incremental checkpoints or not. Are you using the local recovery feature? Are you restarting the job from a checkpoint or a savepoint? Can you provide logs for both the job

Re: Taskmanager JVM crash

2018-05-14 Thread Stefan Richter
mperma...@okkam.it>: > > My job is a batch one, not a streaming job. Is it possible that the cause is > the one you mentioned? > > On Mon, 14 May 2018, 14:23 Stefan Richter, <s.rich...@data-artisans.com > <mailto:s.rich...@data-artisans.com>> wrote: > Hi, > &g

Re: Taskmanager JVM crash

2018-05-14 Thread Stefan Richter
Hi, that looks like a known issue where Flink did not wait for the shutdown of the timer service before disposing state backends. This is problem fixed in the >= 1.4 branches. Best, Stefan > Am 14.05.2018 um 14:12 schrieb Flavio Pompermaier : > > Hi to all, > I have a

[jira] [Created] (FLINK-9355) Simplify configuration of local recovery to a simple on/off

2018-05-14 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9355: - Summary: Simplify configuration of local recovery to a simple on/off Key: FLINK-9355 URL: https://issues.apache.org/jira/browse/FLINK-9355 Project: Flink

[jira] [Created] (FLINK-9355) Simplify configuration of local recovery to a simple on/off

2018-05-14 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9355: - Summary: Simplify configuration of local recovery to a simple on/off Key: FLINK-9355 URL: https://issues.apache.org/jira/browse/FLINK-9355 Project: Flink

Re: [DISCUSS] Configuration for local recovery

2018-05-14 Thread Stefan Richter
I think that is a good idea +1. > Am 11.05.2018 um 20:41 schrieb Stephan Ewen : > > Hi! > > The configuration option (in flink-conf.yaml) for local recovery is currently > an enumeration with the values "DISABLED" and "ENABLE_FILE_BASED". > > I would suggest to change that,

[jira] [Commented] (FLINK-9302) Checkpoints continues to fail when using filesystem state backend with CIRCULAR REFERENCE:java.io.IOException

2018-05-08 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467013#comment-16467013 ] Stefan Richter commented on FLINK-9302: --- Thanks for the update, I appreciate it. This information

[jira] [Created] (FLINK-9304) Timer service shutdown should not be interrupted

2018-05-07 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9304: - Summary: Timer service shutdown should not be interrupted Key: FLINK-9304 URL: https://issues.apache.org/jira/browse/FLINK-9304 Project: Flink Issue Type

[jira] [Created] (FLINK-9304) Timer service shutdown should not be interrupted

2018-05-07 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-9304: - Summary: Timer service shutdown should not be interrupted Key: FLINK-9304 URL: https://issues.apache.org/jira/browse/FLINK-9304 Project: Flink Issue Type

[jira] [Closed] (FLINK-9302) Checkpoints continues to fail when using filesystem state backend with CIRCULAR REFERENCE:java.io.IOException

2018-05-07 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-9302. - Resolution: Not A Problem > Checkpoints continues to fail when using filesystem state back

[jira] [Commented] (FLINK-9302) Checkpoints continues to fail when using filesystem state backend with CIRCULAR REFERENCE:java.io.IOException

2018-05-07 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465575#comment-16465575 ] Stefan Richter commented on FLINK-9302: --- If you take a look at this stack trace, it tells you

Re: Wrong joda lib

2018-05-04 Thread Stefan Richter
ars/jruby-cloudera-1.0.0.jar > > On Fri, May 4, 2018 at 6:11 PM, Stefan Richter <s.rich...@data-artisans.com > <mailto:s.rich...@data-artisans.com>> wrote: > Hi, > > you can try to figure out the jar with > org.joda.time.DateTime.class.getProtectionDomain().ge

Re: Wrong joda lib

2018-05-04 Thread Stefan Richter
hat the problem occurs during the initialization of a static > variable: > > private static final DateTime MIN_DATE = new DateTime(1850, 01, 01, 0, 0); > > On Fri, May 4, 2018 at 6:11 PM, Stefan Richter <s.rich...@data-artisans.com > <mailto:s.rich...@data-artisans.com>>

Re: Wrong joda lib

2018-05-04 Thread Stefan Richter
Hi, you can try to figure out the jar with org.joda.time.DateTime.class.getProtectionDomain().getCodeSource().getLocation() in the right context. Best, Stefan > Am 04.05.2018 um 18:02 schrieb Flavio Pompermaier : > > Hi to all, > I'm trying to run a job on a test

Re: Stashing key with AggregateFunction

2018-05-04 Thread Stefan Richter
Hi, I have two possible options to achieve this. The first option is that you could obviously always derive the key again from the input of the aggregate function. The second option is combining an AggregateFunction with a ProcessWindowFunction. With the second solution you get incremental

[jira] [Updated] (FLINK-8978) End-to-end test: Job upgrade

2018-05-04 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter updated FLINK-8978: -- Fix Version/s: (was: 1.5.1) 1.5.0 > End-to-end test: Job upgr

[jira] [Closed] (FLINK-8978) End-to-end test: Job upgrade

2018-05-04 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8978. - Resolution: Fixed > End-to-end test: Job upgr

[jira] [Reopened] (FLINK-8978) End-to-end test: Job upgrade

2018-05-04 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reopened FLINK-8978: --- Change fix version > End-to-end test: Job upgr

<    2   3   4   5   6   7   8   9   10   11   >