Congratulations Yu, well deserved!
Best,
Jingsong
On Wed, Jun 17, 2020 at 1:42 PM Yuan Mei wrote:
> Congrats, Yu!
>
> GXGX & well deserved!!
>
> Best Regards,
>
> Yuan
>
> On Wed, Jun 17, 2020 at 9:15 AM jincheng sun
> wrote:
>
>> Hi all,
>>
>> On behalf of the Flink PMC, I'm happy to announce
Congrats, Yu!
GXGX & well deserved!!
Best Regards,
Yuan
On Wed, Jun 17, 2020 at 9:15 AM jincheng sun
wrote:
> Hi all,
>
> On behalf of the Flink PMC, I'm happy to announce that Yu Li is now
> part of the Apache Flink Project Management Committee (PMC).
>
> Yu Li has been very active on Flink'
Thanks Arvid,
now it makes sense.
Unfortunately, the problematic schema comes from a 3rd party we cannot
control, we have to ingest and do some work with it before being able to
map out of it.
But at least now the boundary of the problem is clear
Thanks to the whole community
Lorenzo
On Tue, 16
hi John,
You can use Tuple2[Boolean, Row] to replace CRow, the
StreamTableEnvironment#toRetractStream method return DataStream[(Boolean,
T)].
the code looks like:
tEnv.toRetractStream[Row](table).map(new MapFunction[(Boolean, Row), R] {
override def map(value: (Boolean, Row)): R = ...
Thanks Gordon.
Really appreciate your detailed response and this definitely helps.
On 2020/06/17 04:45:11, "Tzu-Li (Gordon) Tai" wrote:
> (forwarding this to user@ as it is more suited to be located there)
>
> Hi Sunil,
>
> With remote functions (using the Python SDK), messages sent to / from
(forwarding this to user@ as it is more suited to be located there)
Hi Sunil,
With remote functions (using the Python SDK), messages sent to / from them
must be Protobuf messages.
This is a requirement since remote functions can be written in any
language, and we use Protobuf as a means for cross
Thanks Roman for your response and advice.
>From my understanding increasing shards will increase throughput but still if
>more than 5 requests are made per shard/per second, and since we have 20 apps
>(and increasing) then the exception might occur.
Please let me know if I have missed anythin
Hello,
I am working on migrating from the flink table-planner to the new blink
one, and one problem I am running into is that it doesn't seem like Blink
has a concept of a CRow, unlike the original table-planner.
I am therefore struggling to figure out how to properly convert a
retracting stream
Congratulations Yu! Well deserved!
Best,
Zhijiang
--
From:Dian Fu
Send Time:2020年6月17日(星期三) 10:48
To:dev
Cc:Haibo Sun ; user ; user-zh
Subject:Re: [ANNOUNCE] Yu Li is now part of the Flink PMC
Congrats Yu!
Regards,
Dian
> 在 2
Congrats Yu!
Regards,
Dian
> 在 2020年6月17日,上午10:35,Jark Wu 写道:
>
> Congratulations Yu! Well deserved!
>
> Best,
> Jark
>
> On Wed, 17 Jun 2020 at 10:18, Haibo Sun wrote:
>
>> Congratulations Yu!
>>
>> Best,
>> Haibo
>>
>>
>> At 2020-06-17 09:15:02, "jincheng sun" wrote:
>>> Hi all,
>>>
-- Forwarded message -
发件人: 杜斌
Date: 2020年6月17日周三 上午10:31
Subject: Re: Flink Table program cannot be compiled when enable checkpoint
of StreamExecutionEnvironment
To:
add the full stack trace here:
Caused by:
org.apache.flink.shaded.guava18.com.google.common.util.concurrent.Un
Congratulations Yu!
Jark Wu 于2020年6月17日周三 上午10:36写道:
> Congratulations Yu! Well deserved!
>
> Best,
> Jark
>
> On Wed, 17 Jun 2020 at 10:18, Haibo Sun wrote:
>
> > Congratulations Yu!
> >
> > Best,
> > Haibo
> >
> >
> > At 2020-06-17 09:15:02, "jincheng sun" wrote:
> > >Hi all,
> > >
> > >On b
Congratulations Yu! Well deserved!
Best,
Jark
On Wed, 17 Jun 2020 at 10:18, Haibo Sun wrote:
> Congratulations Yu!
>
> Best,
> Haibo
>
>
> At 2020-06-17 09:15:02, "jincheng sun" wrote:
> >Hi all,
> >
> >On behalf of the Flink PMC, I'm happy to announce that Yu Li is now
> >part of the Apache F
Congratulations Yu!
Best,
Haibo
At 2020-06-17 09:15:02, "jincheng sun" wrote:
>Hi all,
>
>On behalf of the Flink PMC, I'm happy to announce that Yu Li is now
>part of the Apache Flink Project Management Committee (PMC).
>
>Yu Li has been very active on Flink's Statebackend component, working
I need to compute averages on time series data upon a 15 minute tumbling event
time window that is backfilled.
The time series data is a Tuple3 of name: String, value: double, event_time:
Timestamp (Instant).
I need to compute the average value of the name time series on a tumbling
window of 1
Hi Robert,
I believe that I cannot use a "ProcessFunction" because I key the stream, and I
use TumblingEventTimeWindows, which does not allow for the use of
"ProcessFunction" in that scenario.
I compute the averages with a ProcessWindowFunction.
I am going to follow up this question in a new
Congratulations Yu !
Best,
Leonard Xu
> 在 2020年6月17日,09:50,Yangze Guo 写道:
>
> Congrats, Yu!
> Best,
> Yangze Guo
>
> On Wed, Jun 17, 2020 at 9:35 AM Xintong Song wrote:
>>
>> Congratulations Yu, well deserved~!
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Wed, Jun 17, 2020 at 9:15
Congrats, Yu!
Best,
Yangze Guo
On Wed, Jun 17, 2020 at 9:35 AM Xintong Song wrote:
>
> Congratulations Yu, well deserved~!
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jun 17, 2020 at 9:15 AM jincheng sun wrote:
>>
>> Hi all,
>>
>> On behalf of the Flink PMC, I'm happy to announce that Yu Li
Congratulations Yu, well deserved~!
Thank you~
Xintong Song
On Wed, Jun 17, 2020 at 9:15 AM jincheng sun
wrote:
> Hi all,
>
> On behalf of the Flink PMC, I'm happy to announce that Yu Li is now
> part of the Apache Flink Project Management Committee (PMC).
>
> Yu Li has been very active on F
Hi all,
On behalf of the Flink PMC, I'm happy to announce that Yu Li is now
part of the Apache Flink Project Management Committee (PMC).
Yu Li has been very active on Flink's Statebackend component, working on
various improvements, for example the RocksDB memory management for 1.10.
and keeps che
Hi Lorenzo,
I didn't mean to dismiss the issue, but it's not a matter of
incompatibility, it's a matter of unsound generated code. It will break
independently of Flink, since it apparently is a bug in the Avro compiler
1.8.2, so our options to fix it are limited.
What we should do is to bump the A
Hi Arvid,
Sorry but saying the AVRO compiler setup is "broken" sounds like an easy
way for dismissing a problem ;)
I am using the official AVRO 1.8.2 Maven plugin with no customisation to
generate the code.
There might be some legit AVRO configurations that are incompatible with
Flink or somethin
Hi,
usually this exception is thrown by aws-java-sdk and means that your kinesis
stream is hitting a throughput limit (what a surprise). We experienced the same
thing when we had a single "event-bus" style stream and multiple flink apps
reading from it.
Each Kinesis partition has a limit of 5
As the answer on SO suggests, Prometheus comes with lots of functionality to do
what you’re requesting using just a simple count metric:
https://prometheus.io/docs/prometheus/latest/querying/functions/
If you want to implement the function on your own inside flink, you can make
your own metrics
Please help me with this:
https://stackoverflow.com/questions/62297467/flink-count-of-events-using-metric
I have a topic in Kafka where I am getting multiple types of events in JSON
format. I have created a file stream sink to write these events to S3 with
bucketing.
Now I want to publish an hou
yarn will assign a random port when flink is deployed. To get the port you
need to do a yarn application -list and see the tracking url assigned to
your flink cluster. The port in that url will be the port you need to use
for the rest api.
On Tue, Jun 16, 2020 at 08:49 aj wrote:
> Ok, thanks for
Okay, it is not supported.
I thought about this more and I disagree that this would break
"distributability".
Currently, the API accepts a String which is a path, whether it be a path to a
remote URL or a local file.
However, after the URL is parsed, ultimately what ends up happening is that
Hi,
据我所知,Flink 1.10 官方没有支持Elasticsearch 5.x 版本的 sql connector。
Best,
Jark
On Tue, 16 Jun 2020 at 16:08, Dian Fu wrote:
> 可以发一下完整的异常吗?
>
> 在 2020年6月16日,下午3:45,jack 写道:
>
> 连接的版本部分我本地已经修改为 5了,发生了下面的报错;
>
> >> st_env.connect(
> >> Elasticsearch()
> >> .version("5")
> >>
Hi guys,
In our use case we consider to write data to AWS S3 in parquet format using
Blink Batch mode.
As far as I see from one side to write parquet file valid approach is to use
StreamingFileSink with Parquet bulk-encoded format, but
Based to documentation and tests it works only with OnCheckp
Hi @aljoscha
The watermark metrics look fine. (attached screenshot)
[image: image.png]
This is the extractor:
class TimestampExtractor[A, B <: AbstractEvent] extends
BoundedOutOfOrdernessTimestampExtractor[(A, B)](Time.minutes(5)) {
override def extractTimestamp(element: (A, B)): Long =
Ins
Hello,
I'm a cs student currently working on my Bachelor's thesis. I've used Flink
to extract features out of some datasets, and I would like to use them
together with another dataset of (1,0) (Node exists or doesn't) to perform
a logistic regresssion. I have found that FLIP-39 has been accepted a
Ok, thanks for the clarification on yarn session.
I am trying to connect to job manager on 8081 but it's not connecting.
[image: image.png]
So this is the address shown on my Flink job UI and i am trying to connect
rest address on 8081 but its refusing connection.
On Tue, Jun 9, 2020 at 1:03
Did you look at the watermark metrics? Do you know what the current
watermark is when the windows are firing. You could also get the current
watemark when using a ProcessWindowFunction and also emit that in the
records that you're printing, for debugging.
What is that TimestampAssigner you're
Sorry, I now saw that this thread diverged. My mail client didn't pick
it up because someone messed up the subject of the thread.
On 16.06.20 14:06, Aljoscha Krettek wrote:
Hi,
what is the timescale of your data in Kafka. If you have data in there
that spans more than ~30 minutes I would expe
Okay, so I created a simple stream (similar to the original stream), where
I just write the timestamps of each evaluated window to S3.
The session gap is 30 minutes, and this is one of the sessions:
(first-event, last-event, num-events)
11:23-11:23 11 events
11:25-11:26 51 events
11:28-11:29 74 ev
Hi,
what is the timescale of your data in Kafka. If you have data in there
that spans more than ~30 minutes I would expect your windows to fire
very soon after the job is started. Event time does not depend on a wall
clock but instead advances with the time in the stream. As Flink
advances th
Hi Marco,
this is not possible since Flink is designed mostly to read files from a
distributed filesystem, where paths are used to refer to those files. If
you read from files on the classpath you could just use plain old Java
code and won't need a distributed processing system such as Flink.
Hi,
it might be that the operations that Flink performs on RocksDB during
checkpointing will "poke" RocksDB somehow and make it clean up it's
internal hierarchies of storage more. Other than that, I'm also a bit
surprised by this.
Maybe Yun Tang will come up with another idea.
Best,
Aljosch
Hi Nick
From my experience, it's not easy to tune this without code to reproduce. Could
you please give code with fake source to reproduce so that we could help you?
If CPU usage is 100% at rocksDB related methods, it's might be due to we access
RocksDB too often . If the CPU usage is not 100%
Hi, thanks for answering.
> I guess you consume from Kafka from the earliest offset, so you consume
historical data and Flink is catching-up.
Yes, it's what's happening. But Kafka is partitioned on sessionId, so skew
between partitions cannot explain it.
I think the only way it can happen is when
Hi,
We used both flink versions 1.9.1 and 1.10.1
We used rocksDB default configuration.
The streaming pipeline is very simple.
1. Kafka consumer
2. Process function
3. Kafka producer
The code of the process function is listed below:
private transient MapState testMapState;
@Override
public
Hi,
We are using flink version 1.10.1
The task manager memory 16GB
The number of slots is 32 but the job parallelism is 1.
We used the default configuration for rocksdb.
We checked the disk speed on the machine running the task manager: Write
300MB and read 1GB
BR,
Nick
בתאריך יום ג׳, 16 ביוני
Hi Nick
As you might know, RocksDB suffers not so good performance for iterator-like
operations due to it needs to merge sort for multi levels. [1]
Unfortunately, rocksDBMapState.isEmpty() needs to call iterator and seek
operations over rocksDB [2], and rocksDBMapState.clear() needs to iterator
Hi Ori,
I guess you consume from Kafka from the earliest offset, so you consume
historical data and Flink is catching-up.
Regarding: *My event-time timestamps also do not have big gaps*
Just to verify, if you do keyBy sessionId, do you check the gaps of
events from the same session?
Rafi
On T
Hi Nick
It's really strange that performance could improve when checkpoint is enabled.
In general, enable checkpoint might bring a bit performance downside to the
whole job.
Could you give more details e.g. Flink version, configurations of RocksDB and
simple code which could reproduce this prob
Hi Padarn,
We configure our Flink KafkaConsumer with setCommitOffsetsOnCheckpoints(true).
In this case, the offsets are committed on each checkpoint for the conumer
group of the application. We have an external monitoring on our kafka consumer
groups (Just a small script) which writes kafka in
可以发一下完整的异常吗?
> 在 2020年6月16日,下午3:45,jack 写道:
>
> 连接的版本部分我本地已经修改为 5了,发生了下面的报错;
> >> st_env.connect(
> >> Elasticsearch()
> >> .version("5")
> >> .host("localhost", 9200, "http")
> >> .index("taxiid-cnts")
> >> .document_type('taxiidcnt')
Hi Lasse,
your reported issue [1] will be fixed in the next release of 1.10 and the
upcoming 1.11.
Thank you for your detailed report.
[1] https://issues.apache.org/jira/browse/FLINK-17322
On Wed, Apr 22, 2020 at 12:54 PM Lasse Nedergaard <
lassenedergaardfl...@gmail.com> wrote:
> Hi Yun
>
> Th
连接的版本部分我本地已经修改为 5了,发生了下面的报错;
>> st_env.connect(
>> Elasticsearch()
>> .version("5")
>> .host("localhost", 9200, "http")
>> .index("taxiid-cnts")
>> .document_type('taxiidcnt')
>> .key_delimiter("$")) \
在 2020-06-1
Hello,
We are using RocksDB as the backend state.
At first we didn't enable the checkpoints mechanism.
We observed the following behaviour and we are wondering why ?
When using the rocksDB *without* checkpoint the performance was very
extremely bad.
And when we enabled the checkpoint the perform
I guess it's because the ES version specified in the job is `6`, however, the
jar used is `5`.
> 在 2020年6月16日,下午1:47,jack 写道:
>
> 我这边使用的是pyflink连接es的一个例子,我这边使用的es为5.4.1的版本,pyflink为1.10.1,连接jar包我使用的是
> flink-sql-connector-elasticsearch5_2.11-1.10.1.jar,kafka,json的连接包也下载了,连接kafka测试成功了。
> 连接es的时候
Hello,
We wrote a very simple streaming pipeline containing:
1. Kafka consumer
2. Process function
3. Kafka producer
The code of the process function is listed below:
private transient MapState testMapState;
@Override
public void processElement(Map value, Context ctx,
Collector> out) throws
52 matches
Mail list logo