Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Tony Wei
Hi, As the follow up, it seem that savepoint can't be subsumed, so that its notification could still be send to each TMs. Is this a bug that need to be fixed in TwoPhaseCommitSinkFunction? Best, Tony Wei Tony Wei 於 2019年11月27日 週三 下午3:43寫道: > Hi, > > I want to raise this question again, since I

Some doubts about window start time and end time

2019-11-27 Thread Jun Zhang
, Hi: I defined a Tumbling window, I set the time size to one hour, and the resulting windows are [00: 00: 00-01: 00: 00], [01: 00: 00-02: 00: 00]. This meets my expectations, but when I set the time size to 6 hours, the resulting window size is [02: 00: 00-08: 00: 00], [08: 00

Re: Some doubts about window start time and end time

2019-11-27 Thread Caizhi Weng
Hi Jun, How do you define your window? Could you please show us the code? Thanks. Jun Zhang <825875...@qq.com> 于2019年11月27日周三 下午5:22写道: > , > Hi: > I defined a Tumbling window, I set the time size to one hour, and the > resulting windows are [00: 00: 00-01: 00: 00], [01: 00: 00-02: 00: 00]. ...

Re: Some doubts about window start time and end time

2019-11-27 Thread Jun Zhang
Hi??Caizhi ?? the code like this : dataStream .keyBy(??device") .window(TumblingProcessingTimeWindows.of(Time.hours(6))) .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds(5)))     .aggregate(new MyAggre(), new WindowResultFunction())                      

Re: Some doubts about window start time and end time

2019-11-27 Thread Caizhi Weng
Hi Jun, You have to specify an offset when defining the windows. According to the Java docs of TumblingProcessingTimeWindows: "*if you are living in somewhere which is not using UTC±00:00 time*,* such as China which is using UTC+08:00*,*and you want a time window with size of one day*,* and window

Re: Some doubts about window start time and end time

2019-11-27 Thread Jun Zhang
Hi,Caizhi : 1.if I add offset , window(TumblingProcessingTimeWindows.of(Time.hours(6),Time.hours(-8)))    it wil get a error: TumblingProcessingTimeWindows parameters must satisfy abs(offset) < size 2.If it is caused by do not adding an offset, then why the same code, I set the window size to b

Re: Flink behavior as a slow consumer - out of Heap MEM

2019-11-27 Thread Robert Metzger
Hi Hanan, Flink does handle backpressure gracefully. I guess your custom ZMQ source is receiving events in a separate thread? In a Flink source, the SourceContext.collect() method will not return if the downstream operators are not able to process incoming data fast enough. If my assumptions are

Re: What happens to a Source's Operator State if it stops being initialized and snapshotted? Accidentally exponential?

2019-11-27 Thread Congxian Qiu
Hi Do you use UNION state in your scenario, when using UNION state, then JM may encounter OOM because each TDD will contains all the state of all subtasks[1] [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#using-managed-operator-state Best, Congxian Aaron

Re: ArrayIndexOutOfBoundException on checkpoint creation

2019-11-27 Thread Congxian Qiu
Hi Which version are you using now(if on some old version, could you please try if this exception is till there on Flink 1.9), on the other hand, did you try RocksDBStateBackend for this? Best, Congxian Theo Diefenthal 于2019年11月26日周二 下午6:52写道: > Hi, > > We have a pipeline with a custom Proce

Re: How to recover state from savepoint on embedded mode?

2019-11-27 Thread Congxian Qiu
Hi, You can recovery from checkpoint/savepoint if JM&TM can read from the given path. no math which mode the job is running on. Best, Congxian Reo Lei 于2019年11月26日周二 下午12:18写道: > > > -- Forwarded message - > 发件人: Reo Lei > Date: 2019年11月26日周二 上午9:53 > Subject: Re: How to reco

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Chesnay Schepler
This looks to me like the TwoPhaseCommitSinkFunction is a bit too strict. The notification for complete checkpoints is not reliable; it may be late, not come at all, possibly even in different order than expected. As such, if you a simple case of snapshot -> snapshot -> notify -> notify the s

Re: Apache Flink - Troubleshooting exception while starting the job from savepoint

2019-11-27 Thread Congxian Qiu
Hi, As the doc[1] said we should assign uid to all the stateful operators. If you do not set uid for an operator, Flink will generate an operatorId for it, AFAIK, operatorId will not change as far as the job DAG does not change. you can skip the operator's state which is not in the new job, pleas

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Piotr Nowojski
Hi, Maybe Chesney you are right, but I’m not sure. TwoPhaseCommitSink was based on Pravega’s sink for Flink, which was implemented by Stephan, and it has the same logic [1]. If I remember the discussions with Stephan/Till, the way how Flink is using Akka probably guarantees that messages will b

AW: ArrayIndexOutOfBoundException on checkpoint creation

2019-11-27 Thread theo.diefent...@scoop-software.de
Sorry, I forgot to mention the environment. We use Flink 1.9.1 on a cloudera cdh6. 3.1 cluster (with Hadoop 3.0.0 but using Flink shaded 2.8.3-7. Might this be a problem? As it seems to arise from kryo, I doubt it) Our flink is configured as default. Our job uses FsStateBackend and exactly once

[PROPOSAL/SURVEY] Enable background cleanup for state with TTL by default

2019-11-27 Thread Andrey Zagrebin
Hi all, We were thinking about enabling background cleanup for the state with TTL by default: StateTtlConfig#Builder#cleanupInBackground() Previously, we did not have it in the first implementation of TTL if you remember. So technically, we were a bit conservative to not enable it by default at o

What happens to the channels when there is backpressure?

2019-11-27 Thread Felipe Gutierrez
Hi community, I have a question about backpressure. Suppose a scenario that I have a map and a reducer, and the reducer is back pressuring the map operator. I know that the reducer is processing tuples at a lower rate than it is receiving. However, can I say that at least one channel between the

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Tony Wei
Hi Piotrek, The case here was that the first snapshot is a savepoint. I know that if the following checkpoint succeeded before the previous one, the previous one will be subsumed by JobManager. However, if that previous one is a savepoint, it won't be subsumed. That leads to the case that Chesney

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Piotr Nowojski
Hi Tony, Thanks for the explanation. Assuming that’s what’s happening, then I agree, this checkStyle should be removed. I created a ticket for this issue https://issues.apache.org/jira/browse/FLINK-14979 Piotrek > On 27 Nov 2019, at 16:28, T

Re: Apache Flink - Throttling stream flow

2019-11-27 Thread Rong Rong
Hi Mans, is this what you are looking for [1][2]? -- Rong [1] https://issues.apache.org/jira/browse/FLINK-11501 [2] https://github.com/apache/flink/pull/7679 On Mon, Nov 25, 2019 at 3:29 AM M Singh wrote: > Thanks Ciazhi & Thomas for your responses. > > I read the throttling example but want

Re: What happens to a Source's Operator State if it stops being initialized and snapshotted? Accidentally exponential?

2019-11-27 Thread Aaron Levin
Hi, Yes, we're using UNION state. I would assume, though, that if you are not reading the UNION state it would either stop stick around as a constant factor in your state size, or get cleared. Looks like I should try to recreate a small example and submit a bug if this is true. Otherwise it's imp

Re: What happens to a Source's Operator State if it stops being initialized and snapshotted? Accidentally exponential?

2019-11-27 Thread Gyula Fóra
You are right Aaron. I would say this is like this by design as Flink doesn't require you to initialize state in the open method so it has no safe way to delete the non-referenced ones. What you can do is restore the state and clear it on all operators and not reference it again. I know this feel

Converting streaming to batch execution

2019-11-27 Thread Nicholas Walton
Hi, I’ve been working with a pipleline that was initially aimed at processing high speed sensor data, but for a proof of concept I’m feeding simulated data from a CSV file. Each row of the file is a sample across a number of time series, and I’ve been using the streaming environment to process

Re: ***UNCHECKED*** Error while confirming Checkpoint

2019-11-27 Thread Tony Wei
Hi Piotrek, There was already an issue [1] and PR for this thread. Should we mark it as duplicated or related issue? Best, Tony Wei [1] https://issues.apache.org/jira/browse/FLINK-10377 Piotr Nowojski 於 2019年11月28日 週四 上午12:17寫道: > Hi Tony, > > Thanks for the explanation. Assuming that’s what’

Re: Converting streaming to batch execution

2019-11-27 Thread Caizhi Weng
Hi Nick, It seems to me that the slow part of the whole pipeline is the Derby sink. Could you change it into other sinks (for example, csv sink or even a "discard everything" sink) and see if the throughput improves? If this is the case, are you using the JDBC connector? If yes, you might conside

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-27 Thread yingjie
Piotr is right, that depend on the data size you are reading and the memory pressure. Those memory occupied by mmapped region can be recycled and used by other processes if memory pressure is high, that is, other process or service on the same node won't be affected because the OS will recycle the

Re: What happens to the channels when there is backpressure?

2019-11-27 Thread yingjie cao
Hi Felipe, That depends on what do you mean by 'bandwidth'. If you mean the capability of the network stack, the answer would be no. Here is a post about Flink network stack which may help: https://flink.apache.org/2019/06/05/flink-network-stack.html. Thanks, Yingjie Felipe Gutierrez 于2019年11月

Re: JobGraphs not cleaned up in HA mode

2019-11-27 Thread seuzxc
hi ,I've the same problem with flink 1.9.1 , any solution to fix it when the k8s redoploy jobmanager , the error looks like (seems zk not remove submitted job info, but jobmanager remove the file): Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from stat

Re: JobGraphs not cleaned up in HA mode

2019-11-27 Thread Vijay Bhaskar
Following are the mandatory condition to run in HA: a) You should have persistent common external store for jobmanager and task managers to while writing the state b) You should have persistent external store for zookeeper to store the Jobgraph. Zookeeper is referring path: /flink/checkpoints/su

?????? JobGraphs not cleaned up in HA mode

2019-11-27 Thread ??????
/flink/checkpoints  is a external persistent store (a nas directory mounts to the job manager) --  -- ??: "Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: JobGraphs not cleaned up in HA mode

2019-11-27 Thread Vijay Bhaskar
Is it filesystem or hadoop? If its NAS then why the exception "Caused by: org.apache.hadoop.hdfs.BlockMissingException: " It seems you configured hadoop state store and giving NAS mount. Regards Bhaskar On Thu, Nov 28, 2019 at 11:36 AM 曾祥才 wrote: > /flink/checkpoints is a external persistent

?????? JobGraphs not cleaned up in HA mode

2019-11-27 Thread ??????
if i clean the zookeeper data , it runs fine .  but next time when the jobmanager failed and redeploy the error occurs again --  -- ??: "Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: JobGraphs not cleaned up in HA mode

2019-11-27 Thread Vijay Bhaskar
Can you share the flink configuration once? Regards Bhaskar On Thu, Nov 28, 2019 at 12:09 PM 曾祥才 wrote: > if i clean the zookeeper data , it runs fine . but next time when the > jobmanager failed and redeploy the error occurs again > > > > > -- 原始邮件 -- > *发件人:*

?????? JobGraphs not cleaned up in HA mode

2019-11-27 Thread ??????
the config  (/flink is the NASdirectory ):   jobmanager.rpc.address: flink-jobmanager taskmanager.numberOfTaskSlots: 16 web.upload.dir: /flink/webUpload blob.server.port: 6124 jobmanager.rpc.port: 6123 taskmanager.rpc.port: 6122 jobmanager.heap.size: 1024m taskmanager.heap.size: 1024m high-availa

Flink 'Job Cluster' mode Ui Access

2019-11-27 Thread Jatin Banger
Hi, It seems there is Web Ui for Flink Session cluster, But for Flink Job Cluster it is Showing {"errors":["Not found."]} Is it the expected behavior for Flink Job Cluster Mode ? Best Regards, Jatin

Re: Flink 'Job Cluster' mode Ui Access

2019-11-27 Thread vino yang
Hi Jatin, Flink web UI does not depend on any deployment mode. You should check if there are error logs in the log file and the job status is running state. Best, Vino Jatin Banger 于2019年11月28日周四 下午3:43写道: > Hi, > > It seems there is Web Ui for Flink Session cluster, But for Flink Job > Clust