Re: about the checkpoint and state backend

2018-01-05 Thread Jinhua Luo
gether into one directory but > the handles to those directories are sent to the CheckpointCoordinator which > creates a checkpoint that stores handles to all the states stored in DFS. > > Best, > Aljoscha > >> On 4. Jan 2018, at 15:06, Jinhua Luo wrote: >> >>

Re: about the checkpoint and state backend

2018-01-04 Thread Jinhua Luo
> >> On 4. Jan 2018, at 14:56, Jinhua Luo wrote: >> >> ok, I see. >> >> But as known, one rocksdb instance occupy one directory, so I am still >> wondering what's the relationship between the states and rocksdb >> instances. >> >> 2018-01-

Re: about the checkpoint and state backend

2018-01-04 Thread Jinhua Luo
; CheckpointCoordinator figures out which handles need to be sent to which > operators for restoring. > > Best, > Aljoscha > >> On 4. Jan 2018, at 14:44, Jinhua Luo wrote: >> >> OK, I think I get the point. >> >> But another question raises: how t

Re: does the flink sink only support bio?

2018-01-04 Thread Jinhua Luo
e-a-java-transaction-manager-that-works-with-postgresql/ > . > > Does this help or do you really need async read-modify-update? > > Best, > Stefan > > > Am 03.01.2018 um 15:08 schrieb Jinhua Luo : > > No, I mean how to implement exactly-once db commit (given our async i

Re: about the checkpoint and state backend

2018-01-04 Thread Jinhua Luo
uctor. > > Best, > Aljoscha > > >> On 4. Jan 2018, at 14:23, Jinhua Luo wrote: >> >> I still do not understand the relationship between rocksdb backend and >> the filesystem (here I refer to any filesystem impl, including local, >> hdfs, s3). >> >

Re: keyby() issue

2018-01-04 Thread Jinhua Luo
; would be failed afterwards immediately. 2018-01-04 21:31 GMT+08:00 Jinhua Luo : > 2018-01-04 21:04 GMT+08:00 Aljoscha Krettek : >> Memory usage should grow linearly with the number of windows you have active >> at any given time, the number of keys and the number of different wind

Re: keyby() issue

2018-01-04 Thread Jinhua Luo
2018-01-04 21:04 GMT+08:00 Aljoscha Krettek : > Memory usage should grow linearly with the number of windows you have active > at any given time, the number of keys and the number of different window > operations you have. But the memory usage is still too much, especially when the incremental a

Re: about the checkpoint and state backend

2018-01-04 Thread Jinhua Luo
jects/flink/flink-docs-release-1.4/ops/state/state_backends.html > > > > Am 1/1/18 um 3:50 PM schrieb Jinhua Luo: > >> Hi All, >> >> I have two questions: >> >> a) does the records/elements themselves would be checkpointed? or just >> record offset c

Re: keyby() issue

2018-01-04 Thread Jinhua Luo
t; aggregate elements of the windows? > > Regards, > Timo > > > > Am 1/1/18 um 6:00 AM schrieb Jinhua Luo: > >> I checked the logs, but no information indicates what happens. >> >> In fact, in the same app, there is another stream, but its kafka >>

Re: does the flink sink only support bio?

2018-01-03 Thread Jinhua Luo
You can use non-keyed state, aka operator state, to store such information. > See here: > https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#using-managed-operator-state > . It does not require a KeyedSteam. > > Best, > Stefan > > > > 2018-

Re: does the flink sink only support bio?

2018-01-03 Thread Jinhua Luo
operators, e.g. fold, so it normally faces the DataStream but not KeyedStream, and DataStream only supports ListState, right? 2018-01-03 18:43 GMT+08:00 Stefan Richter : > > >> Am 01.01.2018 um 15:22 schrieb Jinhua Luo : >> >> 2017-12-08 18:25 GMT+08:00 Stefan Richter : >>

about the checkpoint and state backend

2018-01-01 Thread Jinhua Luo
Hi All, I have two questions: a) does the records/elements themselves would be checkpointed? or just record offset checkpointed? That is, what data included in the checkpoint except for states? b) where flink stores the state globally? so that the job manager could restore them on each task mang

Re: does the flink sink only support bio?

2018-01-01 Thread Jinhua Luo
2017-12-08 18:25 GMT+08:00 Stefan Richter : > You need to be a bit careful if your sink needs exactly-once semantics. In > this case things should either be idempotent or the db must support rolling > back changes between checkpoints, e.g. via transactions. Commits should be > triggered for conf

Re: keyby() issue

2017-12-31 Thread Jinhua Luo
ook. > > have you tried to do a thread dump? > How is the GC pause? > do you see flink restart? check the exception tab in Flink web UI for your > job. > > > > On Sun, Dec 31, 2017 at 6:20 AM, Jinhua Luo wrote: >> >> I take time to read some source codes ab

Re: keyby() issue

2017-12-31 Thread Jinhua Luo
ut I am really frustrated that flink could be fulfill its feature just like the doc said. Thank you all. 2017-12-29 17:42 GMT+08:00 Jinhua Luo : > I misuse the key selector. I checked the doc and found it must return > deterministic key, so using random is wrong, but I still could not > under

Re: keyby() issue

2017-12-29 Thread Jinhua Luo
I misuse the key selector. I checked the doc and found it must return deterministic key, so using random is wrong, but I still could not understand why it would cause oom. 2017-12-28 21:57 GMT+08:00 Jinhua Luo : > It's very strange, when I change the key selector to use random key, &

Re: keyby() issue

2017-12-28 Thread Jinhua Luo
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) Could anybody explain the internal of keyby()? 2017-12-28 17:33 GMT+08:00 Ufuk Celebi : > Hey Jinhua, > > On Thu, Dec 28, 2017 at 9:57 AM, Jinhua Luo wrote: >> The keyby() upon the field would generate unique key as the f

Re: keyby() issue

2017-12-28 Thread Jinhua Luo
? 2017-12-28 17:33 GMT+08:00 Ufuk Celebi : > Hey Jinhua, > > On Thu, Dec 28, 2017 at 9:57 AM, Jinhua Luo wrote: >> The keyby() upon the field would generate unique key as the field >> value, so if the number of the uniqueness is huge, flink would have >> trouble both on c

keyby() issue

2017-12-28 Thread Jinhua Luo
Hi All, I need to aggregate some field of the event, at first I use keyby(), but I found the flink performs very slow (even stop working out results) due to the number of keys is around half a million per min. So I use windowAll() instead, and flink works as expected then. The keyby() upon the fi

Add new slave to running cluster?

2017-12-19 Thread Jinhua Luo
Hi All, If I add new slave (start taskmanager on new host) which does not included in the conf/slaves, I see below logs conintuously printed: ...Trying to register at JobManager...(attempt 147, timeout: 3 milliseconds) Is it normal? And does the new slave successfully added in the cluster?

How flink assigns task slots to the streams of the same app?

2017-12-18 Thread Jinhua Luo
Hi All, I start an app which contains multiple streams, and I found some stream of them get processed very slow, which uses keyBy() and the number of key would be up to million, and I also found all streams share only one task slot. Is it possible to assign more slots to the app?

Re: how flink extracts timestamp from transformed elements?

2017-12-18 Thread Jinhua Luo
of > windows for the same time have the same timestamp. > > 2017-12-18 11:30 GMT+01:00 Jinhua Luo : >> >> Thanks. >> >> The keyBy() splits the stream into multiple logical streams, if I do >> timeWindow(), then how flink merge all logical windows into one? >

Re: how flink extracts timestamp from transformed elements?

2017-12-18 Thread Jinhua Luo
igned. > If records are aggregated in a time window, the aggregation results has the > maximum allowed timestamp of the window. For example a tumbling window of > size 1 hour that starts at 14:00 emits its results with a timestamp of > 14:59:59.999. > > Best, Fabian > > 2017

how flink extracts timestamp from transformed elements?

2017-12-16 Thread Jinhua Luo
Hi All, The timestamp assigner is for one type, normally for the type from the source, but after several operators, the element type would change and the elements would be aggregated, if I do timeWindow again, how flink extracts timestamp from elements? For example, the fold operators aggregate 10

netty conflict using lettuce redis client

2017-12-13 Thread Jinhua Luo
Hi All, The io.netty package included in flnk 1.3.2 is 4.0.23, while the latest lettuce-core (4.4) depends on netty 4.0.35. If I include netty 4.0.35 in the app jar, it would throw java.nio.channels.UnresolvedAddressException. It seems the netty classes are mixed between versions from app jar and

Re: when does the timed window ends?

2017-12-12 Thread Jinhua Luo
Jinhua Luo : > If the window contains only one element, no more elements come in, > then by default (with EventTimeTrigger), the window would be fired by > next element if that element advances watermark which passes the end > of the window, correct? > That is, even if the windo

Re: when does the timed window ends?

2017-12-12 Thread Jinhua Luo
ow is created when the first > element arrives". > Otherwise, you'd have to fire empty windows for all possible keys (in case > of a window operator on a keyed stream) which is obviously not possible. > > 2017-12-12 9:30 GMT+01:00 Jinhua Luo : >> >> OK, I see. >

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Jinhua Luo
ss). If a late event arrives, you can update the result and emit an > update. In this case your downstream operators systems have to be able to > deal with updates. > 3) send the late events to a different channel via side outputs and handle > them later. > > > > 2017-12-12 12:14

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Jinhua Luo
t; > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_timestamps_watermarks.html > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#allowed-lateness > [3] > https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/w

what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Jinhua Luo
Hi All, The watermark is monotonous incremental in a stream, correct? Given a stream out-of-order extremely, e.g. e4(12:04:33) --> e3 (15:00:22) --> e2(12:04:21) --> e1 (12:03:01) Here e1 appears first, so watermark start from 12:03:01, so e3 is an early event, it would be placed in another wind

Re: when does the timed window ends?

2017-12-12 Thread Jinhua Luo
uilt-in session window. > You can also define custom windows like that. > > Best, Fabian > > 2017-12-12 7:57 GMT+01:00 Jinhua Luo : >> >> Hi All, >> >> The document said "a window is created as soon as the first element >> that should belong to this

could I chain two timed window?

2017-12-11 Thread Jinhua Luo
Hi All, Given one stream source which generates 20k events/sec, and I need to aggregate the element count using sliding window of 1 hour size. The problem is, the window may buffer too many elements (which may cause a lot of block I/O because of checkpointing?), and in fact it does not necessary

when does the timed window ends?

2017-12-11 Thread Jinhua Luo
Hi All, The document said "a window is created as soon as the first element that should belong to this window arrives, and the window is completely removed when the time (event or processing time) passes its end timestamp plus the user-specified allowed lateness (see Allowed Lateness).". I am sti

Re: does the flink sink only support bio?

2017-12-08 Thread Jinhua Luo
t support rolling > back changes between checkpoints, e.g. via transactions. Commits should be > triggered for confirmed checkpoints („notifyCheckpointComplete“). > > Your assumptions about the blocking behavior of the non-async sinks is > correct. > > Best, > Stefan >

does the flink sink only support bio?

2017-12-07 Thread Jinhua Luo
Hi, all. The invoke method of sink seems no way to make async io? e.g. returns Future? For example, the redis connector uses jedis lib to execute redis command synchronously: https://github.com/apache/bahir-flink/blob/master/flink-connector-redis/src/main/java/org/apache/flink/streaming/connecto