[jira] [Comment Edited] (FLINK-8837) Move DataStreamUtils to package 'experimental'.

2018-03-02 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383502#comment-16383502 ] Stefan Richter edited comment on FLINK-8837 at 3/2/18 11:47 AM: I think

[jira] [Commented] (FLINK-8837) Move DataStreamUtils to package 'experimental'.

2018-03-02 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383502#comment-16383502 ] Stefan Richter commented on FLINK-8837: --- I think there is one problem with keeping

[jira] [Closed] (FLINK-8751) Canceling a job results in a InterruptedException in the TM

2018-03-02 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8751. - Resolution: Fixed Assignee: Stefan Richter Fix Version/s: 1.5.0 Fixed in f9a583b

[jira] [Closed] (FLINK-8557) OperatorSubtaskDescriptionText causes failures on Windows

2018-02-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8557. - Resolution: Fixed Fixed in af8efe92c3. > OperatorSubtaskDescriptionText causes failu

[jira] [Assigned] (FLINK-8557) OperatorSubtaskDescriptionText causes failures on Windows

2018-02-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reassigned FLINK-8557: - Assignee: Stefan Richter > OperatorSubtaskDescriptionText causes failures on Wind

[jira] [Closed] (FLINK-8515) update RocksDBMapState to replace deprecated remove() with delete()

2018-02-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8515. - Resolution: Fixed Already fixed in ff6662c. > update RocksDBMapState to replace depreca

[jira] [Commented] (FLINK-8413) Snapshot state of aggregated data is not maintained in flink's checkpointing

2018-02-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380262#comment-16380262 ] Stefan Richter commented on FLINK-8413: --- [~suganyap] and [~aljoscha] are there any updates

[jira] [Commented] (FLINK-7901) Detecting whether an operator is restored doesn't work with chained state (Flink 1.3)

2018-02-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380251#comment-16380251 ] Stefan Richter commented on FLINK-7901: --- [~aljoscha] can we close this issue or will this still

[jira] [Commented] (FLINK-7167) Job can't set checkpoint meta directory to cover the system setting state.checkpoints.dir

2018-02-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380246#comment-16380246 ] Stefan Richter commented on FLINK-7167: --- [~StephanEwen]  isn't this already covered by PR

[jira] [Closed] (FLINK-8777) improve resource release when recovery from failover

2018-02-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8777. - Resolution: Fixed Merged in 296f9ff742. > improve resource release when recovery from failo

[jira] [Commented] (FLINK-8467) Restoring job that does not use checkpointing from savepoint breaks

2018-02-28 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380155#comment-16380155 ] Stefan Richter commented on FLINK-8467: --- [~jelmer] any updates on this problem or can this be closed

[jira] [Closed] (FLINK-8360) Implement task-local state recovery

2018-02-27 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8360. - Resolution: Fixed > Implement task-local state recov

[jira] [Closed] (FLINK-8781) Try to reschedule failed tasks to previous allocation

2018-02-26 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8781. - Resolution: Fixed Merged in d63bc75ffa. > Try to reschedule failed tasks to previous allocat

[jira] [Created] (FLINK-8781) Try to reschedule failed tasks to previous allocation

2018-02-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-8781: - Summary: Try to reschedule failed tasks to previous allocation Key: FLINK-8781 URL: https://issues.apache.org/jira/browse/FLINK-8781 Project: Flink Issue

[jira] [Created] (FLINK-8781) Try to reschedule failed tasks to previous allocation

2018-02-26 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-8781: - Summary: Try to reschedule failed tasks to previous allocation Key: FLINK-8781 URL: https://issues.apache.org/jira/browse/FLINK-8781 Project: Flink Issue

[jira] [Closed] (FLINK-8679) RocksDBKeyedBackend.getKeys(stateName, namespace) doesn't filter data with namespace

2018-02-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8679. - Resolution: Fixed Merged in eeac022f05. > RocksDBKeyedBackend.getKeys(stateName, namesp

[jira] [Closed] (FLINK-8639) Fix always need to seek multiple times when iterator RocksDBMapState

2018-02-22 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8639. - Resolution: Fixed Merged in 1e315f0. > Fix always need to seek multiple times when itera

[jira] [Closed] (FLINK-8674) Efficiently handle alwaysFlush case (0ms flushTimeout)

2018-02-22 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8674. - Resolution: Fixed Fix Version/s: 1.5.0 Merged in a144d0f77f. > Efficiently han

[jira] [Closed] (FLINK-8547) Implement CheckpointBarrierHandler not to spill data for exactly-once based on credit-based flow control

2018-02-22 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8547. - Resolution: Fixed Fix Version/s: 1.5.0 Merged in 3126bf5229. > Implem

Re: Serious memory leak in DefaultOperatorStateBackend

2018-02-22 Thread Stefan Richter
nough, but what about completely > dropping a keyed state? > > Gyula > > Stefan Richter <s.rich...@data-artisans.com> ezt írta (időpont: 2018. febr. > 22., Cs, 11:46): > >> >> Hi, >> >> I don’t think that this is a bug, but rather a necessity

Re: Serious memory leak in DefaultOperatorStateBackend

2018-02-22 Thread Stefan Richter
ieb Stefan Richter <s.rich...@data-artisans.com>: > > Right now, I don’t think there is a way of doing that. I don’t think there is > something fundament against having a method that drops a state complete, data > and registered meta data. But so far that never existed and it

Re: Serious memory leak in DefaultOperatorStateBackend

2018-02-22 Thread Stefan Richter
Hi, I don’t think that this is a bug, but rather a necessity that comes with the (imo questionable) design of allowing lazy state registration. In this design, just because a state is *currently* not registered does not mean that you can simply drop it. Imagine that your code did *not yet*

[jira] [Commented] (FLINK-8709) Flaky test: SlotPoolRpcTest.testCancelSlotAllocationWithoutResourceManager

2018-02-21 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371146#comment-16371146 ] Stefan Richter commented on FLINK-8709: --- Also still seeing this. https://api.travis-ci.org/v3/job

Re: Get which key groups are assigned to an operator

2018-02-20 Thread Stefan Richter
ways serialized when stored in the state and I'm not sure if there is even > some disk access (maybe not synchronously) that could hurt performance. > > Gerard > > On Tue, Feb 20, 2018 at 2:42 PM, Stefan Richter <s.rich...@data-artisans.com > <mailto:s.rich...@data-artisans

Re: Get which key groups are assigned to an operator

2018-02-20 Thread Stefan Richter
Hi, from what I read, I get the impression that you attempt to implement you own "keyed state" with a hashmap? Why not using the keyed state that is already provided by Flink and gives you efficient rescaling etc. out of the box? Please see [1] for the details. [1]

[jira] [Closed] (FLINK-8657) Fix incorrect description for external checkpoint vs savepoint

2018-02-19 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8657. - Resolution: Fixed Merged in 5909b5bb7f. > Fix incorrect description for external checkpoint

[jira] [Closed] (FLINK-8571) Provide an enhanced KeyedStream implementation to use ForwardPartitioner

2018-02-09 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8571. - Resolution: Fixed Fix Version/s: 1.4.1 1.5.0 Merged in 91eea376ee

[jira] [Closed] (FLINK-8385) Fix exceptions in AbstractEventTimeWindowCheckpointingITCase

2018-02-07 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8385. - Resolution: Fixed Merged in  b32b835. > Fix excepti

[jira] [Assigned] (FLINK-8571) Provide an enhanced KeyedStream implementation to use ForwardPartitioner

2018-02-07 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reassigned FLINK-8571: - Assignee: Stefan Richter > Provide an enhanced KeyedStream implementation to

[jira] [Assigned] (FLINK-8569) Provide a hook to override the default KeyGroupRangeAssignment

2018-02-07 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reassigned FLINK-8569: - Assignee: Stefan Richter > Provide a hook to override the defa

[jira] [Assigned] (FLINK-8569) Provide a hook to override the default KeyGroupRangeAssignment

2018-02-07 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reassigned FLINK-8569: - Assignee: (was: Stefan Richter) > Provide a hook to override the defa

Re: RocksDB / checkpoint questions

2018-02-05 Thread Stefan Richter
Hi, you are correct that RocksDB has a „working directory“ on local disk and checkpoints + savepoints go to a distributed filesystem. - if I have 3 TaskManager I should expect more or less (depending on how the tasks are balanced) to find a third of my overall state stored on disk on each of

[jira] [Closed] (FLINK-4809) Operators should tolerate checkpoint failures

2018-02-05 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-4809. - Resolution: Fixed Fix Version/s: (was: 1.4.0) 1.5.0 Merged

[jira] [Reopened] (FLINK-4809) Operators should tolerate checkpoint failures

2018-02-05 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter reopened FLINK-4809: --- > Operators should tolerate checkpoint failu

[jira] [Closed] (FLINK-4809) Operators should tolerate checkpoint failures

2018-02-05 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-4809. - Resolution: Fixed Fix Version/s: 1.4.0 Merged in  {{[7c63526|https://github.com/apache

[git pull] FireWire (IEEE 1394) updates post v4.15

2018-02-02 Thread Stefan Richter
394: allow user-configured MTU of up to 4096 bytes Hector Martin (1): firewire-ohci: work around oversized DMA reads on JMicron controllers Stefan Richter (1): firewire: net: max MTU off by one drivers/firewire/net.c | 7 ++- drivers/firewire/ohci.c | 8 +++- 2 files c

[git pull] FireWire (IEEE 1394) updates post v4.15

2018-02-02 Thread Stefan Richter
394: allow user-configured MTU of up to 4096 bytes Hector Martin (1): firewire-ohci: work around oversized DMA reads on JMicron controllers Stefan Richter (1): firewire: net: max MTU off by one drivers/firewire/net.c | 7 ++- drivers/firewire/ohci.c | 8 +++- 2 files c

Re: Sync and Async checkpoint time

2018-01-31 Thread Stefan Richter
gt; So you mean that any window use in the stream will result in synchronous > snapshotting? > When are you planning to fix this? > And is there a workaround? > > Thanks again, > Tovi > From: Stefan Richter [mailto:s.rich...@data-artisans.com] > Sent: יום ג 30 י

Re: Joining data in Streaming

2018-01-31 Thread Stefan Richter
den <hayden.march...@citi.com>: > > Stefan, > > So are we essentially saying that in this case, for now, I should stick to > DataSet / Batch Table API? > > Thanks, > Hayden > > -----Original Message- > From: Stefan Richter [mailto:s.rich...@data-artisans.com]

Re: Sync and Async checkpoint time

2018-01-30 Thread Stefan Richter
Hi, this looks like the timer service is the culprit for this problem. Timers are currently not stored in the state backend, but in a separate on-heap data structure that does not support copy-on-write or async snapshots in general. Therefore, writing the timers for a snapshot is always

Re: Joining data in Streaming

2018-01-30 Thread Stefan Richter
Hi, as far as I know, this is not easily possible. What would be required is something like a CoFlatmap function, where one input stream is blocking until the second stream is fully consumed to build up the state to join against. Maybe Aljoscha (in CC) can comment on future plans to support

Re: How back pressure is handled by source?

2018-01-30 Thread Stefan Richter
Hi, backpressure comes into play when the source is attempting to pass the generated events to downstream operators. If the downstream operators build up backpressure, passing data to them can block. You might think of this like a bounded queue that is full in case of backpressure and blocks

[jira] [Commented] (FLINK-8411) HeapListState#add(null) will wipe out entire list state

2018-01-30 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344895#comment-16344895 ] Stefan Richter commented on FLINK-8411: --- Fine with me, we can revert the commit on the 1.4 branch

[jira] [Commented] (FLINK-8411) HeapListState#add(null) will wipe out entire list state

2018-01-30 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344870#comment-16344870 ] Stefan Richter commented on FLINK-8411: --- Ola, I always assumed that to be a copy-paste error when

[jira] [Commented] (FLINK-8411) HeapListState#add(null) will wipe out entire list state

2018-01-30 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344793#comment-16344793 ] Stefan Richter commented on FLINK-8411: --- That is a tough one. I feel like the previous behaviour

[jira] [Commented] (FLINK-8522) DefaultOperatorStateBackend writes data in checkpoint that is never read.

2018-01-29 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343622#comment-16343622 ] Stefan Richter commented on FLINK-8522: --- I don't think that we ever need to read the int

[jira] [Commented] (FLINK-8522) DefaultOperatorStateBackend writes data in checkpoint that is never read.

2018-01-29 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343580#comment-16343580 ] Stefan Richter commented on FLINK-8522: --- Option 1 looks good for me. > DefaultOperatorStateBack

[jira] [Comment Edited] (FLINK-3089) State API Should Support Data Expiration (State TTL)

2018-01-26 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341298#comment-16341298 ] Stefan Richter edited comment on FLINK-3089 at 1/26/18 5:13 PM

[jira] [Comment Edited] (FLINK-3089) State API Should Support Data Expiration (State TTL)

2018-01-26 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341298#comment-16341298 ] Stefan Richter edited comment on FLINK-3089 at 1/26/18 5:12 PM

[jira] [Commented] (FLINK-3089) State API Should Support Data Expiration (State TTL)

2018-01-26 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341298#comment-16341298 ] Stefan Richter commented on FLINK-3089: --- [~phoenixjiangnan] I agree that we could start

[jira] [Closed] (FLINK-7719) Send checkpoint id to task as part of deployment descriptor when resuming

2018-01-25 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-7719. - Resolution: Fixed Fix Version/s: 1.5.0 Merged in 402a2e30c750e1bcb753643ed66c6df0dd861112

[jira] [Closed] (FLINK-7720) Centralize creation of backends and state related resources

2018-01-25 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-7720. - Resolution: Fixed Fix Version/s: 1.5.0 Merged in 517b3f87214168a445b5751cda210ecf3a292fd6

[jira] [Closed] (FLINK-8441) serialize values and value separator directly to stream in RocksDBListState

2018-01-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8441. - Resolution: Fixed Merged in ce25688bac1c1ecfa53a03ab1857bb82963b0696 . > serialize val

[jira] [Closed] (FLINK-8365) Relax List type in HeapListState and HeapKeyedStateBackend

2018-01-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8365. - Resolution: Fixed Merged in e075da5d5eb0f5ae8c394ea0c549f9dbce28fcf3 . > Relax List t

[jira] [Closed] (FLINK-8411) HeapListState#add(null) will wipe out entire list state

2018-01-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8411. - Resolution: Fixed Merged in e157cfa77f83608a0cd6d7d41a96edb0ca1f97f6 . > HeapListState#add(n

[jira] [Closed] (FLINK-8469) relocate and unify RocksDB option params in RocksDBPerformanceTest

2018-01-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-8469. - Resolution: Fixed Merged in 1c9c1e36c1dd7b3f2a160216e405302d7854c148 . > relocate and un

[jira] [Commented] (FLINK-8487) State loss after multiple restart attempts

2018-01-23 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335551#comment-16335551 ] Stefan Richter commented on FLINK-8487: --- Afaik [~aljoscha] already fixed this in FLINK-7783

[jira] [Closed] (FLINK-7938) support addAll() in ListState

2018-01-19 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-7938. - Resolution: Implemented Merged in 14840809b1. > support addAll() in ListSt

Re: [PATCH] firewire-ohci: work around oversized DMA reads on JMicron controllers

2018-01-13 Thread Stefan Richter
On Jan 11 Hector Martin 'marcan' wrote: > On 2017-11-13 06:05, Stefan Richter wrote: > > Thanks Hector for the troubleshooting and for the patch. > > Thanks Clemens for the review. > > > > It's been a while since I last reviewed and tested kernel patches, and > >

Re: [PATCH] firewire-ohci: work around oversized DMA reads on JMicron controllers

2018-01-13 Thread Stefan Richter
On Jan 11 Hector Martin 'marcan' wrote: > On 2017-11-13 06:05, Stefan Richter wrote: > > Thanks Hector for the troubleshooting and for the patch. > > Thanks Clemens for the review. > > > > It's been a while since I last reviewed and tested kernel patches, and > >

[jira] [Comment Edited] (FLINK-8413) Snapshot state of aggregated data is not maintained in flink's checkpointing

2018-01-11 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322191#comment-16322191 ] Stefan Richter edited comment on FLINK-8413 at 1/11/18 1:25 PM: Can you

[jira] [Commented] (FLINK-8413) Snapshot state of aggregated data is not maintained in flink's checkpointing

2018-01-11 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322191#comment-16322191 ] Stefan Richter commented on FLINK-8413: --- Can you share a (minimal) code example that shows

[jira] [Comment Edited] (FLINK-8411) inconsistent behavior between HeapListState#add() and RocksDBListState#add()

2018-01-11 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321968#comment-16321968 ] Stefan Richter edited comment on FLINK-8411 at 1/11/18 9:46 AM: Yes, I

[jira] [Commented] (FLINK-8411) inconsistent behavior between HeapListState#add() and RocksDBListState#add()

2018-01-11 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321968#comment-16321968 ] Stefan Richter commented on FLINK-8411: --- Yes, I agree. Would you like to make a PR or should I just

[jira] [Closed] (FLINK-7475) support update() in ListState

2018-01-10 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter closed FLINK-7475. - Resolution: Fixed Merged in 438e4e3742. > support update() in ListSt

Re: Stream job failed after increasing number retained checkpoints

2018-01-10 Thread Stefan Richter
Hi, there is no known limitation in the strict sense, but you might run out of dfs space or job manager memory if you keep around a huge number checkpoints. I wonder what reason you might have that you ever want such a huge number of retained checkpoints? Usually keeping one checkpoint should

Re: does the flink sink only support bio?

2018-01-08 Thread Stefan Richter
ay that allows you to opportunistically cancel all potential orphaned transactions. > 2018-01-04 22:15 GMT+08:00 Stefan Richter <s.rich...@data-artisans.com > <mailto:s.rich...@data-artisans.com>>: > Yes, that is how it works. > > > Am 04.01.2018 um 14:47 schrieb Jinhua Luo <l

[jira] [Created] (FLINK-8385) Fix exceptions in AbstractEventTimeWindowCheckpointingITCase

2018-01-08 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-8385: - Summary: Fix exceptions in AbstractEventTimeWindowCheckpointingITCase Key: FLINK-8385 URL: https://issues.apache.org/jira/browse/FLINK-8385 Project: Flink

[jira] [Created] (FLINK-8385) Fix exceptions in AbstractEventTimeWindowCheckpointingITCase

2018-01-08 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-8385: - Summary: Fix exceptions in AbstractEventTimeWindowCheckpointingITCase Key: FLINK-8385 URL: https://issues.apache.org/jira/browse/FLINK-8385 Project: Flink

Re: Separate checkpoint directories

2018-01-04 Thread Stefan Richter
//some-bucket/data/topic1/ > s3://some-bucket/data/topic2/ > . > . > . > s3://some-bucket/data/topic1000/ > > Thanks very much for your help Stefan! > > On Wed, Jan 3, 2018 at 10:51 AM Stefan Richter <s.rich...@data-artisans.com > <mailto:s.rich...@data-artisans.com>

Re: does the flink sink only support bio?

2018-01-04 Thread Stefan Richter
in the later restart, the > commit would be triggered again, correct? So the commit would not be > forgotten, correct? > > 2018-01-03 22:54 GMT+08:00 Stefan Richter <s.rich...@data-artisans.com>: >> I think a mix of async UPDATES and exactly-once all this might be tricky, >>

[jira] [Issue Comment Deleted] (FLINK-8360) Implement task-local state recovery

2018-01-04 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Richter updated FLINK-8360: -- Comment: was deleted (was: More general implementation that also works for non-filebased state

[jira] [Commented] (FLINK-8360) Implement task-local state recovery

2018-01-04 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311322#comment-16311322 ] Stefan Richter commented on FLINK-8360: --- More general implementation that also works for non

[jira] [Created] (FLINK-8360) Implement task-local state recovery

2018-01-04 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-8360: - Summary: Implement task-local state recovery Key: FLINK-8360 URL: https://issues.apache.org/jira/browse/FLINK-8360 Project: Flink Issue Type: New Feature

[jira] [Created] (FLINK-8360) Implement task-local state recovery

2018-01-04 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-8360: - Summary: Implement task-local state recovery Key: FLINK-8360 URL: https://issues.apache.org/jira/browse/FLINK-8360 Project: Flink Issue Type: New Feature

Re: Lower Parallelism derives better latency

2018-01-04 Thread Stefan Richter
,000 messages per second. > Regarding the code example, this is a bit confidential, let me think what I > can do and get back to you. > Am I the first one who encountered such an issue? > > Thanks, > Liron > > > From: Stefan Richter [mailto:s.rich...@data-artisan

Re: Job Manager not able to fetch job info when restarted

2018-01-04 Thread Stefan Richter
> Hi Stefan, > It was just ran without HA, via yarn. Will try running a yarn > session with HA and test, thanks for you time. Also is there a way we can get > alerts if a pipeline restarts or gets cancelled in Flink? > > Regards, > Sushil Ks > > > On Jan 3, 2018

Re: Lower Parallelism derives better latency

2018-01-04 Thread Stefan Richter
uffer timeout of zero: > env.setBufferTimeout(0); > so this means that the buffers were flushed after each record. > > Any other explanation? J > > Thanks, > Liron > > > From: Stefan Richter [mailto:s.rich...@data-artisans.com] > Sent: Wednesday, January 03

Re: does the flink sink only support bio?

2018-01-03 Thread Stefan Richter
n). > On the other hand, if I update and commit per record, the sql/stored > procedure have to handle duplicate updates at failure restart. > > So, when or where to commit so that we could get exactly-once db ingress. > > 2018-01-03 21:57 GMT+08:00 Stefan Richter <s.rich...@data-a

Re: does the flink sink only support bio?

2018-01-03 Thread Stefan Richter
state> . It does not require a KeyedSteam. Best, Stefan > > > 2018-01-03 18:43 GMT+08:00 Stefan Richter <s.rich...@data-artisans.com>: >> >> >>> Am 01.01.2018 um 15:22 schrieb Jinhua Luo <luajit...@gmail.com>: >>> >>> 2017-12-08 18

Re: Job Manager not able to fetch job info when restarted

2018-01-03 Thread Stefan Richter
Hi, did you configure high availability with Zookeeper, as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html ? This should

Re: Lower Parallelism derives better latency

2018-01-03 Thread Stefan Richter
Hi, one possible explanation that I see is the following: in a shuffle, each there are input and output buffers for each parallel subtask to which data could be shuffled. Those buffers are flushed either when full or after a timeout interval. If you increase the parallelism, there are more

Re: Separate checkpoint directories

2018-01-03 Thread Stefan Richter
Hi, first, let my ask why you want to have a different checkpoint directory per topic? It is perfectly ok to have just a single checkpoint directory, so I wonder what the intention is? Flink will already create proper subdirectories and filenames and can identify the right checkpoint data for

Re: Apache Flink - Question about rolling window function on KeyedStream

2018-01-03 Thread Stefan Richter
Hi, I would interpret this as: the reduce produces an output for every new reduce call, emitting the updated value. There is no need for a window because it kicks in on every single invocation. Best, Stefan > Am 31.12.2017 um 22:28 schrieb M Singh : > > Hi: > > Apache

Re: does the flink sink only support bio?

2018-01-03 Thread Stefan Richter
> Am 01.01.2018 um 15:22 schrieb Jinhua Luo <luajit...@gmail.com>: > > 2017-12-08 18:25 GMT+08:00 Stefan Richter <s.rich...@data-artisans.com>: >> You need to be a bit careful if your sink needs exactly-once semantics. In >> this case things should either be

Re: JobManager not receiving resource offers from Mesos

2018-01-03 Thread Stefan Richter
Hi, did you see this exception right at the head of your log? java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:265) at org.apache.hadoop.util.Shell.(Shell.java:290) at

[jira] [Commented] (FLINK-8309) JVM sigsegv crash when enabling async checkpoints

2017-12-22 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16301756#comment-16301756 ] Stefan Richter commented on FLINK-8309: --- Looks like this is very likely related to this Java problem

Re: Facing issue of long running Flink job on Yarn

2017-12-20 Thread Stefan Richter
hanks > > On Wed, Dec 20, 2017 at 4:24 PM, Stefan Richter <s.rich...@data-artisans.com >> wrote: > >> Hi, >> >> did you see that the problem starts from a Kafka exception „Failed to send >> data to Kafka: This server is not the leader for that topic-part

Re: Facing issue of long running Flink job on Yarn

2017-12-20 Thread Stefan Richter
Hi, did you see that the problem starts from a Kafka exception „Failed to send data to Kafka: This server is not the leader for that topic-partition.“? Is it possible that you had a network issue and the producer could not find the leader broker? Best, Stefan > Am 20.12.2017 um 10:57

Re: Static Variables

2017-12-20 Thread Stefan Richter
Hi, I think the cause is very likely a race condition between the tasks checking and setting the static value, because tasks run in different threads. You could try to use an Atomic reference or synchronization for setting the state variable’s value. Best, Stefan > Am 20.12.2017 um 00:29

[jira] [Comment Edited] (FLINK-8281) org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream cannot be cast to org.apache.flink.core.fs.WrappingProxyCloseable

2017-12-18 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295806#comment-16295806 ] Stefan Richter edited comment on FLINK-8281 at 12/18/17 10:58 PM: -- Hi, I

[jira] [Commented] (FLINK-8281) org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream cannot be cast to org.apache.flink.core.fs.WrappingProxyCloseable

2017-12-18 Thread Stefan Richter (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295806#comment-16295806 ] Stefan Richter commented on FLINK-8281: --- Hi, I had a look at your stacktraces and something seems

Re: [VOTE] Release 1.4.0, release candidate #3

2017-12-11 Thread Stefan Richter
+1 (non-binding) - did extensive cluster tests on Google Cloud with special focus on checkpointing and recovery and Kafka 0.11 end-to-end exactly-once + at-least-once. - build from source. > Am 11.12.2017 um 09:53 schrieb Piotr Nowojski : > > Hi, > > +1 (non-binding)

Re: does the flink sink only support bio?

2017-12-08 Thread Stefan Richter
> I have two new questions: > > 1) the async operator must emit some value to the async collector > (even it acts as a sink), right? > I think so, but you should be able to simply return empty collection. > 2) How could I use CheckpointListener with async operator? Could you > give a simple

Re: Manual Classloading on the job does not work

2017-12-08 Thread Stefan Richter
Hi, I think the general approach looks good and should be fine. There is even a dedicated end-to-end test checking that this works („ClassLoaderTestProgram“). Have you double checked that all the paths are correct (i.e. the classloading code works in a standalone program with the paths and can

Re: Problem with runGatherSumApplyIteration

2017-12-08 Thread Stefan Richter
Hi, it would be helpful if you could tell us the Flink version you are using and the full stacktrace. However, this looks like there could be a version conflict, e.g. is your cluster running the same version of Flink that you build your job against? Best, Stefan > Am 08.12.2017 um 10:23

Re: does the flink sink only support bio?

2017-12-08 Thread Stefan Richter
Hi, Flink currently does not offer async sinks out of the box, but there is no fundamental problem against having them and we will probably offer something is this direction in the future. In the meantime, you can build something like this by replacing the sink with an async io operator that

Re: Incremental RocksDB checkpointing

2017-12-01 Thread Stefan Richter
ieb vijayakumar palaniappan > <vijayakuma...@gmail.com>: > > I observed the job for 18 hrs, it went from 118kb to 1.10MB. > > I am using version 1.3.0 flink > > On Fri, Dec 1, 2017 at 11:39 AM, Stefan Richter <s.rich...@data-artisans.com > <mailto:s.rich...@data

Re: Incremental RocksDB checkpointing

2017-12-01 Thread Stefan Richter
Maybe one more question: is the size always increasing, or will it also reduce eventually? Over what period of time did you observe growth? From the way how RocksDB works, it does persist updates in a way that is sometimes closer to a log than in-place updates. So it is perfectly possible that

Re: [VOTE] Release 1.4.0, release candidate #2

2017-11-28 Thread Stefan Richter
+1 (non-binding) I tested Flink in a cluster setup on Google Cloud, YARN-per-job, checked that for all backends that HA, recovery, at-least-once, end-to-end exactly once (with Kafka11 Producer), savepoints, externalized checkpoints, and rescaling work correctly. > Am 28.11.2017 um 11:47

<    4   5   6   7   8   9   10   11   12   13   >