Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Jingsong Li
Congratulations Yu, well deserved! Best, Jingsong On Wed, Jun 17, 2020 at 1:42 PM Yuan Mei wrote: > Congrats, Yu! > > GXGX & well deserved!! > > Best Regards, > > Yuan > > On Wed, Jun 17, 2020 at 9:15 AM jincheng sun > wrote: > >> Hi all, >> >> On behalf of the Flink PMC, I'm happy to announce

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Yuan Mei
Congrats, Yu! GXGX & well deserved!! Best Regards, Yuan On Wed, Jun 17, 2020 at 9:15 AM jincheng sun wrote: > Hi all, > > On behalf of the Flink PMC, I'm happy to announce that Yu Li is now > part of the Apache Flink Project Management Committee (PMC). > > Yu Li has been very active on Flink'

Re: Reading from AVRO files

2020-06-16 Thread Lorenzo Nicora
Thanks Arvid, now it makes sense. Unfortunately, the problematic schema comes from a 3rd party we cannot control, we have to ingest and do some work with it before being able to map out of it. But at least now the boundary of the problem is clear Thanks to the whole community Lorenzo On Tue, 16

Re: Blink Planner Retracting Streams

2020-06-16 Thread godfrey he
hi John, You can use Tuple2[Boolean, Row] to replace CRow, the StreamTableEnvironment#toRetractStream method return DataStream[(Boolean, T)]. the code looks like: tEnv.toRetractStream[Row](table).map(new MapFunction[(Boolean, Row), R] { override def map(value: (Boolean, Row)): R = ...

Re: Any python example with json data from Kafka using flink-statefun

2020-06-16 Thread Sunil
Thanks Gordon. Really appreciate your detailed response and this definitely helps. On 2020/06/17 04:45:11, "Tzu-Li (Gordon) Tai" wrote: > (forwarding this to user@ as it is more suited to be located there) > > Hi Sunil, > > With remote functions (using the Python SDK), messages sent to / from

Re: Any python example with json data from Kafka using flink-statefun

2020-06-16 Thread Tzu-Li (Gordon) Tai
(forwarding this to user@ as it is more suited to be located there) Hi Sunil, With remote functions (using the Python SDK), messages sent to / from them must be Protobuf messages. This is a requirement since remote functions can be written in any language, and we use Protobuf as a means for cross

Re: Kinesis ProvisionedThroughputExceededException

2020-06-16 Thread M Singh
Thanks Roman for your response and advice. >From my understanding increasing shards will increase throughput but still if >more than 5 requests are made per shard/per second, and since we have 20 apps >(and increasing) then the exception might occur.  Please let me know if I have missed anythin

Blink Planner Retracting Streams

2020-06-16 Thread John Mathews
Hello, I am working on migrating from the flink table-planner to the new blink one, and one problem I am running into is that it doesn't seem like Blink has a concept of a CRow, unlike the original table-planner. I am therefore struggling to figure out how to properly convert a retracting stream

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Zhijiang
Congratulations Yu! Well deserved! Best, Zhijiang -- From:Dian Fu Send Time:2020年6月17日(星期三) 10:48 To:dev Cc:Haibo Sun ; user ; user-zh Subject:Re: [ANNOUNCE] Yu Li is now part of the Flink PMC Congrats Yu! Regards, Dian > 在 2

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Dian Fu
Congrats Yu! Regards, Dian > 在 2020年6月17日,上午10:35,Jark Wu 写道: > > Congratulations Yu! Well deserved! > > Best, > Jark > > On Wed, 17 Jun 2020 at 10:18, Haibo Sun wrote: > >> Congratulations Yu! >> >> Best, >> Haibo >> >> >> At 2020-06-17 09:15:02, "jincheng sun" wrote: >>> Hi all, >>>

Fwd: Flink Table program cannot be compiled when enable checkpoint of StreamExecutionEnvironment

2020-06-16 Thread 杜斌
-- Forwarded message - 发件人: 杜斌 Date: 2020年6月17日周三 上午10:31 Subject: Re: Flink Table program cannot be compiled when enable checkpoint of StreamExecutionEnvironment To: add the full stack trace here: Caused by: org.apache.flink.shaded.guava18.com.google.common.util.concurrent.Un

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Benchao Li
Congratulations Yu! Jark Wu 于2020年6月17日周三 上午10:36写道: > Congratulations Yu! Well deserved! > > Best, > Jark > > On Wed, 17 Jun 2020 at 10:18, Haibo Sun wrote: > > > Congratulations Yu! > > > > Best, > > Haibo > > > > > > At 2020-06-17 09:15:02, "jincheng sun" wrote: > > >Hi all, > > > > > >On b

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Jark Wu
Congratulations Yu! Well deserved! Best, Jark On Wed, 17 Jun 2020 at 10:18, Haibo Sun wrote: > Congratulations Yu! > > Best, > Haibo > > > At 2020-06-17 09:15:02, "jincheng sun" wrote: > >Hi all, > > > >On behalf of the Flink PMC, I'm happy to announce that Yu Li is now > >part of the Apache F

Re:[ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Haibo Sun
Congratulations Yu! Best, Haibo At 2020-06-17 09:15:02, "jincheng sun" wrote: >Hi all, > >On behalf of the Flink PMC, I'm happy to announce that Yu Li is now >part of the Apache Flink Project Management Committee (PMC). > >Yu Li has been very active on Flink's Statebackend component, working

what is the "Flink" recommended way of assigning a backfill to an average on an event time keyed windowed stream?

2020-06-16 Thread Marco Villalobos
I need to compute averages on time series data upon a 15 minute tumbling event time window that is backfilled. The time series data is a Tuple3 of name: String, value: double, event_time: Timestamp (Instant). I need to compute the average value of the name time series on a tumbling window of 1

Re: How do I backfill time series data?

2020-06-16 Thread Marco Villalobos
Hi Robert, I believe that I cannot use a "ProcessFunction" because I key the stream, and I use TumblingEventTimeWindows, which does not allow for the use of "ProcessFunction" in that scenario. I compute the averages with a ProcessWindowFunction. I am going to follow up this question in a new

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Leonard Xu
Congratulations Yu ! Best, Leonard Xu > 在 2020年6月17日,09:50,Yangze Guo 写道: > > Congrats, Yu! > Best, > Yangze Guo > > On Wed, Jun 17, 2020 at 9:35 AM Xintong Song wrote: >> >> Congratulations Yu, well deserved~! >> >> Thank you~ >> >> Xintong Song >> >> >> >> On Wed, Jun 17, 2020 at 9:15

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Yangze Guo
Congrats, Yu! Best, Yangze Guo On Wed, Jun 17, 2020 at 9:35 AM Xintong Song wrote: > > Congratulations Yu, well deserved~! > > Thank you~ > > Xintong Song > > > > On Wed, Jun 17, 2020 at 9:15 AM jincheng sun wrote: >> >> Hi all, >> >> On behalf of the Flink PMC, I'm happy to announce that Yu Li

Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Xintong Song
Congratulations Yu, well deserved~! Thank you~ Xintong Song On Wed, Jun 17, 2020 at 9:15 AM jincheng sun wrote: > Hi all, > > On behalf of the Flink PMC, I'm happy to announce that Yu Li is now > part of the Apache Flink Project Management Committee (PMC). > > Yu Li has been very active on F

[ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread jincheng sun
Hi all, On behalf of the Flink PMC, I'm happy to announce that Yu Li is now part of the Apache Flink Project Management Committee (PMC). Yu Li has been very active on Flink's Statebackend component, working on various improvements, for example the RocksDB memory management for 1.10. and keeps che

Re: Reading from AVRO files

2020-06-16 Thread Arvid Heise
Hi Lorenzo, I didn't mean to dismiss the issue, but it's not a matter of incompatibility, it's a matter of unsound generated code. It will break independently of Flink, since it apparently is a bug in the Avro compiler 1.8.2, so our options to fix it are limited. What we should do is to bump the A

Re: Reading from AVRO files

2020-06-16 Thread Lorenzo Nicora
Hi Arvid, Sorry but saying the AVRO compiler setup is "broken" sounds like an easy way for dismissing a problem ;) I am using the official AVRO 1.8.2 Maven plugin with no customisation to generate the code. There might be some legit AVRO configurations that are incompatible with Flink or somethin

Re: Kinesis ProvisionedThroughputExceededException

2020-06-16 Thread Roman Grebennikov
Hi, usually this exception is thrown by aws-java-sdk and means that your kinesis stream is hitting a throughput limit (what a surprise). We experienced the same thing when we had a single "event-bus" style stream and multiple flink apps reading from it. Each Kinesis partition has a limit of 5

Re: [EXTERNAL] Flink Count of Events using metric

2020-06-16 Thread Slotterback, Chris
As the answer on SO suggests, Prometheus comes with lots of functionality to do what you’re requesting using just a simple count metric: https://prometheus.io/docs/prometheus/latest/querying/functions/ If you want to implement the function on your own inside flink, you can make your own metrics

Flink Count of Events using metric

2020-06-16 Thread aj
Please help me with this: https://stackoverflow.com/questions/62297467/flink-count-of-events-using-metric I have a topic in Kafka where I am getting multiple types of events in JSON format. I have created a file stream sink to write these events to S3 with bucketing. Now I want to publish an hou

Re: Flink on yarn : yarn-session understanding

2020-06-16 Thread Vikash Dat
yarn will assign a random port when flink is deployed. To get the port you need to do a yarn application -list and see the tracking url assigned to your flink cluster. The port in that url will be the port you need to use for the rest api. On Tue, Jun 16, 2020 at 08:49 aj wrote: > Ok, thanks for

Re: Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

2020-06-16 Thread Marco Villalobos
Okay, it is not supported. I thought about this more and I disagree that this would break "distributability". Currently, the API accepts a String which is a path, whether it be a path to a remote URL or a local file. However, after the URL is parsed, ultimately what ends up happening is that

Re: pyflink连接elasticsearch5.4问题

2020-06-16 Thread Jark Wu
Hi, 据我所知,Flink 1.10 官方没有支持Elasticsearch 5.x 版本的 sql connector。 Best, Jark On Tue, 16 Jun 2020 at 16:08, Dian Fu wrote: > 可以发一下完整的异常吗? > > 在 2020年6月16日,下午3:45,jack 写道: > > 连接的版本部分我本地已经修改为 5了,发生了下面的报错; > > >> st_env.connect( > >> Elasticsearch() > >> .version("5") > >>

Writing to S3 parquet files in Blink batch mode. Flink 1.10

2020-06-16 Thread Dmytro Dragan
Hi guys, In our use case we consider to write data to AWS S3 in parquet format using Blink Batch mode. As far as I see from one side to write parquet file valid approach is to use StreamingFileSink with Parquet bulk-encoded format, but Based to documentation and tests it works only with OnCheckp

Re: EventTimeSessionWindow firing too soon

2020-06-16 Thread Ori Popowski
Hi @aljoscha The watermark metrics look fine. (attached screenshot) [image: image.png] This is the extractor: class TimestampExtractor[A, B <: AbstractEvent] extends BoundedOutOfOrdernessTimestampExtractor[(A, B)](Time.minutes(5)) { override def extractTimestamp(element: (A, B)): Long = Ins

Flink ML

2020-06-16 Thread Dimitris Vogiatzidakis
Hello, I'm a cs student currently working on my Bachelor's thesis. I've used Flink to extract features out of some datasets, and I would like to use them together with another dataset of (1,0) (Node exists or doesn't) to perform a logistic regresssion. I have found that FLIP-39 has been accepted a

Re: Flink on yarn : yarn-session understanding

2020-06-16 Thread aj
Ok, thanks for the clarification on yarn session. I am trying to connect to job manager on 8081 but it's not connecting. [image: image.png] So this is the address shown on my Flink job UI and i am trying to connect rest address on 8081 but its refusing connection. On Tue, Jun 9, 2020 at 1:03

Re: EventTimeSessionWindow firing too soon

2020-06-16 Thread Aljoscha Krettek
Did you look at the watermark metrics? Do you know what the current watermark is when the windows are firing. You could also get the current watemark when using a ProcessWindowFunction and also emit that in the records that you're printing, for debugging. What is that TimestampAssigner you're

Re: EventTimeSessionWindow firing too soon

2020-06-16 Thread Aljoscha Krettek
Sorry, I now saw that this thread diverged. My mail client didn't pick it up because someone messed up the subject of the thread. On 16.06.20 14:06, Aljoscha Krettek wrote: Hi, what is the timescale of your data in Kafka. If you have data in there that spans more than ~30 minutes I would expe

Re: EventTimeSessionWindow firing too soon

2020-06-16 Thread Ori Popowski
Okay, so I created a simple stream (similar to the original stream), where I just write the timestamps of each evaluated window to S3. The session gap is 30 minutes, and this is one of the sessions: (first-event, last-event, num-events) 11:23-11:23 11 events 11:25-11:26 51 events 11:28-11:29 74 ev

Re: EventTimeSessionWindow firing too soon

2020-06-16 Thread Aljoscha Krettek
Hi, what is the timescale of your data in Kafka. If you have data in there that spans more than ~30 minutes I would expect your windows to fire very soon after the job is started. Event time does not depend on a wall clock but instead advances with the time in the stream. As Flink advances th

Re: Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

2020-06-16 Thread Aljoscha Krettek
Hi Marco, this is not possible since Flink is designed mostly to read files from a distributed filesystem, where paths are used to refer to those files. If you read from files on the classpath you could just use plain old Java code and won't need a distributed processing system such as Flink.

Re: Improved performance when using incremental checkpoints

2020-06-16 Thread Aljoscha Krettek
Hi, it might be that the operations that Flink performs on RocksDB during checkpointing will "poke" RocksDB somehow and make it clean up it's internal hierarchies of storage more. Other than that, I'm also a bit surprised by this. Maybe Yun Tang will come up with another idea. Best, Aljosch

Re: MapState bad performance

2020-06-16 Thread Yun Tang
Hi Nick From my experience, it's not easy to tune this without code to reproduce. Could you please give code with fake source to reproduce so that we could help you? If CPU usage is 100% at rocksDB related methods, it's might be due to we access RocksDB too often . If the CPU usage is not 100%

Re: EventTimeSessionWindow firing too soon

2020-06-16 Thread Ori Popowski
Hi, thanks for answering. > I guess you consume from Kafka from the earliest offset, so you consume historical data and Flink is catching-up. Yes, it's what's happening. But Kafka is partitioned on sessionId, so skew between partitions cannot explain it. I think the only way it can happen is when

Re: Improved performance when using incremental checkpoints

2020-06-16 Thread nick toker
Hi, We used both flink versions 1.9.1 and 1.10.1 We used rocksDB default configuration. The streaming pipeline is very simple. 1. Kafka consumer 2. Process function 3. Kafka producer The code of the process function is listed below: private transient MapState testMapState; @Override public

Re: MapState bad performance

2020-06-16 Thread nick toker
Hi, We are using flink version 1.10.1 The task manager memory 16GB The number of slots is 32 but the job parallelism is 1. We used the default configuration for rocksdb. We checked the disk speed on the machine running the task manager: Write 300MB and read 1GB BR, Nick ‫בתאריך יום ג׳, 16 ביוני

Re: MapState bad performance

2020-06-16 Thread Yun Tang
Hi Nick As you might know, RocksDB suffers not so good performance for iterator-like operations due to it needs to merge sort for multi levels. [1] Unfortunately, rocksDBMapState.isEmpty() needs to call iterator and seek operations over rocksDB [2], and rocksDBMapState.clear() needs to iterator

Re: EventTimeSessionWindow firing too soon

2020-06-16 Thread Rafi Aroch
Hi Ori, I guess you consume from Kafka from the earliest offset, so you consume historical data and Flink is catching-up. Regarding: *My event-time timestamps also do not have big gaps* Just to verify, if you do keyBy sessionId, do you check the gaps of events from the same session? Rafi On T

Re: Improved performance when using incremental checkpoints

2020-06-16 Thread Yun Tang
Hi Nick It's really strange that performance could improve when checkpoint is enabled. In general, enable checkpoint might bring a bit performance downside to the whole job. Could you give more details e.g. Flink version, configurations of RocksDB and simple code which could reproduce this prob

Re: [External] Measuring Kafka consumer lag

2020-06-16 Thread Theo Diefenthal
Hi Padarn, We configure our Flink KafkaConsumer with setCommitOffsetsOnCheckpoints(true). In this case, the offsets are committed on each checkpoint for the conumer group of the application. We have an external monitoring on our kafka consumer groups (Just a small script) which writes kafka in

Re: pyflink连接elasticsearch5.4问题

2020-06-16 Thread Dian Fu
可以发一下完整的异常吗? > 在 2020年6月16日,下午3:45,jack 写道: > > 连接的版本部分我本地已经修改为 5了,发生了下面的报错; > >> st_env.connect( > >> Elasticsearch() > >> .version("5") > >> .host("localhost", 9200, "http") > >> .index("taxiid-cnts") > >> .document_type('taxiidcnt')

Re: Latency tracking together with broadcast state can cause job failure

2020-06-16 Thread Arvid Heise
Hi Lasse, your reported issue [1] will be fixed in the next release of 1.10 and the upcoming 1.11. Thank you for your detailed report. [1] https://issues.apache.org/jira/browse/FLINK-17322 On Wed, Apr 22, 2020 at 12:54 PM Lasse Nedergaard < lassenedergaardfl...@gmail.com> wrote: > Hi Yun > > Th

Re:Re: pyflink连接elasticsearch5.4问题

2020-06-16 Thread jack
连接的版本部分我本地已经修改为 5了,发生了下面的报错; >> st_env.connect( >> Elasticsearch() >> .version("5") >> .host("localhost", 9200, "http") >> .index("taxiid-cnts") >> .document_type('taxiidcnt') >> .key_delimiter("$")) \ 在 2020-06-1

Improved performance when using incremental checkpoints

2020-06-16 Thread nick toker
Hello, We are using RocksDB as the backend state. At first we didn't enable the checkpoints mechanism. We observed the following behaviour and we are wondering why ? When using the rocksDB *without* checkpoint the performance was very extremely bad. And when we enabled the checkpoint the perform

Re: pyflink连接elasticsearch5.4问题

2020-06-16 Thread Dian Fu
I guess it's because the ES version specified in the job is `6`, however, the jar used is `5`. > 在 2020年6月16日,下午1:47,jack 写道: > > 我这边使用的是pyflink连接es的一个例子,我这边使用的es为5.4.1的版本,pyflink为1.10.1,连接jar包我使用的是 > flink-sql-connector-elasticsearch5_2.11-1.10.1.jar,kafka,json的连接包也下载了,连接kafka测试成功了。 > 连接es的时候

MapState bad performance

2020-06-16 Thread nick toker
Hello, We wrote a very simple streaming pipeline containing: 1. Kafka consumer 2. Process function 3. Kafka producer The code of the process function is listed below: private transient MapState testMapState; @Override public void processElement(Map value, Context ctx, Collector> out) throws