I'd like to thank, I'm learning Flink with the new book "Stream
> Processing with Apache Flink". :) Thanks for your amazing efforts on
> publishing nice book!
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> On Mon, Aug 5, 2019 at 10:21 PM Fabian Hueske wr
ler. The latest timestamp
> will be handled first ?
>
>
>
>
>
> BTW I tried to use a ContinuousEventTimeTrigger to make sure the window
> is calculated ? and got the processing to trigger multiple times so I’m
> not sure exactly how this type of trigger works..
>
>
>
> Tha
Hi Vinod,
This sounds like a watermark issue to me.
The commonly used watermark strategies (like bounded out-of-order) are only
advancing when there is a new record.
Moreover, the current watermark is the minimum of the current watermarks of
all input partitions.
So, the watermark only moves forwa
Hi,
Can you share a few more details about the data source?
Are you continuously ingesting files from a folder?
You are correct, that the parallelism should not affect the results, but
there are a few things that can affect that:
1) non-determnistic keys
2) out-of-order data with inappropriate wa
gt; My key fields is array of multiple type, in this case is string and long.
> The result that i'm posting is just represents sampling of output dataset.
>
> Thank you in advance !
>
> Anissa
>
> Le jeu. 22 août 2019 à 11:24, Fabian Hueske a écrit :
>
>> H
Hi Anissa,
This looks strange. If I understand your code correctly, your GroupReduce
function is summing up a field.
Looking at the results that you posted, it seems as if there is some data
missing (the total sum does not seem to match).
For groupReduce it is important that the grouping keys are
Hi Sung,
There is no switch to configure the WM to be the max of both streams and it
would also in fact violate the core principles of the mechanism.
Watermarks are used to track the progress of event time in streams.
The implementations of operators rely on the fact that (almost) all records
tha
Hi Dongwon,
I'm not super familiar with Flink's MATCH_RECOGNIZE support, but Dawid (in
CC) might have some ideas about it.
Best,
Fabian
Am Mi., 21. Aug. 2019 um 07:23 Uhr schrieb Dongwon Kim <
eastcirc...@gmail.com>:
> Hi,
>
> Flink relational apis with MATCH_RECOGNITION looks very attractive a
Hi Manvi,
A NoSuchMethodError typically indicates a version mismatch.
I would check if the Flink versions of your program, the client, and the
cluster are the same.
Best, Fabian
Am Di., 20. Aug. 2019 um 21:09 Uhr schrieb manvmali :
> Hi, I am facing the issue of writing the data stream result i
Hi Anissa,
Are you using combineGroup or reduceGroup?
Your question refers to combineGroup, but the code only shows reduceGroup.
combineGroup is non-deterministic by design to enable efficient partial
results without network and disk IO.
reduceGroup is deterministic given a deterministic key extr
Great!
Thanks for the feedback.
Cheers, Fabian
Am Mo., 19. Aug. 2019 um 17:12 Uhr schrieb Ahmad Hassan <
ahmad.has...@gmail.com>:
>
> Thank you Fabian. This works really well.
>
> Best Regards,
>
> On Fri, 16 Aug 2019 at 09:22, Fabian Hueske wrote:
>
>> Hi
dows.
>
> Thanks.
>
> Best,
>
> On 2 Aug 2019, at 12:49, Fabian Hueske wrote:
>
> Ok, I won't go into the implementation detail.
>
> The idea is to track all products that were observed in the last five
> minutes (i.e., unique product ids) in a five minute tumb
Hi Tony,
I'm sorry I cannot help you with this issue, but Becket (in CC) might have
an idea what went wrong here.
Best, Fabian
Am Mi., 14. Aug. 2019 um 07:00 Uhr schrieb Tony Wei :
> Hi,
>
> Currently, I was trying to update our kafka cluster with larger `
> transaction.max.timeout.ms`. The
> o
Hi Theo,
The main problem is that the semantics of your join (Join all events that
happened on the same day) are not well-supported by Flink yet.
In terms of true streaming joins, Flink supports the time-windowed join
(with the BETWEEN predicate) and the time-versioned table join (which does
not
Hi Padarn,
What you describe is essentially publishing Flink's watermarks to an
outside system.
Flink processes time windows, by waiting for a watermark that's past the
window end time. When it receives such a WM it processes and emits all
ended windows and forwards the watermark.
When a sink rece
sult of that queries taking into account only the last
> values of each row. The result is inserted/updated in a in-memory K-V
> database for fast access.
>
>
>
> Thanks in advance!
>
>
>
> Best
>
>
>
> *De: *Fabian Hueske
> *Fecha: *miércoles, 7 de agost
ess a
> checkpoint I can change the join strategy.
>
> and if you do, do you have any toy example of this?
>
> Thanks,
> Felipe
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipe
Hi,
Just to clarify. You cannot dynamically switch the join strategy while a
job is running.
What Hequn suggested was to have a util method Util.joinDynamically(ds1,
ds2) that chooses the join strategy when the program is generated (before
it is submitted for execution).
The problem is that distr
Congrats Andrey!
Am Do., 15. Aug. 2019 um 07:58 Uhr schrieb Gary Yao :
> Congratulations Andrey, well deserved!
>
> Best,
> Gary
>
> On Thu, Aug 15, 2019 at 7:50 AM Bowen Li wrote:
>
> > Congratulations Andrey!
> >
> > On Wed, Aug 14, 2019 at 10:18 PM Rong Rong wrote:
> >
> >> Congratulations A
Thanks for reporting this issue.
It is already discussed on Flink's dev mailing list in this thread:
->
https://lists.apache.org/thread.html/10f0f3aefd51444d1198c65f44ffdf2d78ca3359423dbc1c168c9731@%3Cdev.flink.apache.org%3E
Please continue the discussion there.
Thanks, Fabian
Am Di., 13. Aug.
x this issue could be
> pretty simple .
>
> Thanks
> Jacky Du
>
> Fabian Hueske 于2019年8月2日周五 下午12:07写道:
>
>> Thanks for the bug report Jacky!
>>
>> Would you mind opening a Jira issue, preferably with a code snippet that
>> reproduces the bug?
>
Hi Vincent,
I don't think there is such a flag in Flink.
However, this sounds like a really good idea.
Would you mind creating a Jira ticket for this?
Thank you,
Fabian
Am Di., 6. Aug. 2019 um 17:53 Uhr schrieb Vincent Cai <
caidezhi...@foxmail.com>:
> Hi Users,
> In Spark, we can invoke Data
Congratulations Hequn!
Am Mi., 7. Aug. 2019 um 14:50 Uhr schrieb Robert Metzger <
rmetz...@apache.org>:
> Congratulations!
>
> On Wed, Aug 7, 2019 at 1:09 PM highfei2...@126.com
> wrote:
>
> > Congrats Hequn!
> >
> > Best,
> > Jeff Yang
> >
> >
> > Original Message
> > Subject:
>> how much state the query will need to maintain.
>>
>>
>> I am not sure to understand the problem. If i have to append-only table
>> and perform some join on it, what's the issue ?
>>
>>
>> On Tue, Aug 6, 2019 at 8:03 PM Maatary Okouya
&
Hi Jungtaek,
I would recommend to implement the logic in a ProcessFunction and avoid
Flink's windowing API.
IMO, the windowing API is difficult to use, because there are many pieces
like WindowAssigner, Window, Trigger, Evictor, WindowFunction that are
orchestrated by Flink.
This makes it very har
Thanks for the bug report Jacky!
Would you mind opening a Jira issue, preferably with a code snippet that
reproduces the bug?
Thank you,
Fabian
Am Fr., 2. Aug. 2019 um 16:01 Uhr schrieb Jacky Du :
> Hi, All
>
> Just find that Flink Table API have some issue if define nested object in
> an objec
>
>
> However, with your proposed solution, how would we be able to achieve this
> sliding window mechanism of emitting 24 hour window every 5 minute using
> processfunction ?
>
>
> Best,
>
>
> On Fri, 2 Aug 2019 at 09:48, Fabian Hueske wrote:
>
>> Hi Ahmad,
>&g
Hi Joern,
Thanks for sharing your connectors!
The Flink community is currently working on a website that collects and
lists externally maintained connectors and libraries for Flink.
We are still figuring out some details, but hope that it can go live soon.
Would be great to have your repositories
Hi,
Regarding step 3, it is sufficient to check that you got on message from
each parallel task of the previous operator. That's because a task
processes the timers of all keys before moving forward.
Timers are always processed per key, but you could deduplicate on the
parallel task id and check t
Hi,
Flink does not distinguish between streams and tables. For the Table API /
SQL, there are only tables that are changing over time, i.e., dynamic
tables.
A Stream in the Kafka Streams or KSQL sense, is in Flink a Table with
append-only changes, i.e., records are only inserted and never deleted
. 2019 um 20:00 Uhr schrieb Ahmad Hassan <
ahmad.has...@gmail.com>:
>
> Hi Fabian,
>
> > On 4 Jul 2018, at 11:39, Fabian Hueske wrote:
> >
> > - Pre-aggregate records in a 5 minute Tumbling window. However,
> pre-aggregation does not work for FoldFunctions.
&g
Hi Boxiu,
This sounds like a good feature.
Please have a look at our contribution guidelines [1].
To propose a feature, you should open a Jira issue [2] and start a
discussion there.
Please note that the feature freeze for the Flink 1.9 release happened a
few weeks ago.
The community is currentl
Hi Oytun,
Is QS enabled in your Docker image or did you enable QS by copying/moving
flink-queryable-state-runtime_2.11-1.8.0.jar from ./opt to ./lib [1]?
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html#activating-queryable-state
Am M
Hi Oytun,
Thanks for your input and feature request!
The right way to propose a feature and contribute it is described here [1].
Basically, you should open a Jira issue and start a discussion about the
feature there.
If it is a bigger features, you should also bring it to the dev@f.a.o
mailing li
If you need an outer join, the only solution is to convert the table into a
retraction stream and correctly handle the retraction messages.
Btw. even then this might not perform as you would like it to be.
The query will store all input tables completely in state. So you might run
out of space soon
Hi everyone,
I'm happy to announce the program of the Flink Forward EU 2019 conference.
The conference takes place in the Berlin Congress Center (bcc) from October
7th to 9th.
On the first day, we'll have four training sessions [1]:
* Apache Flink Developer Training
* Apache Flink Operations Trai
Sure:
/--> AsyncIO --\
STREAM --> ProcessFunc -- -- Union -- WindowFunc
\--/
ProcessFunc keeps track of the unique keys per window duration and emits
each
Hi Pedro,
each pattern gets translated into one or more Flink operators. Hence, your
Flink program becomes *very* large and requires much more time to be
deployed.
Hence, the timeout.
I'd try to limit the size your job by grouping your patterns and creating
an own job for each group.
You can also
AsyncDataStream should provide a first-class support to keyed streams
> (and thus perform a single call per key and window..). What do you think?
>
> On Tue, Jul 23, 2019 at 12:56 PM Fabian Hueske wrote:
>
>> Hi Flavio,
>>
>> Not sure I understood the requirements c
our production clustered crippled like this.
>
> Richard
>
> On Tue, Jul 23, 2019 at 12:47 PM Fabian Hueske wrote:
>
>> Hi Richard,
>>
>> I hope you could resolve the problem in the meantime.
>>
>> Nonetheless, maybe Till (in CC) has an idea what could h
Hi,
Right now it is not possible to mix batch and streaming environments in a
job.
You would need to implement the batch logic via the streaming API which is
not always straightforward.
However, the Flink community is spending a lot of effort on unifying batch
and stream processing. So this will
Hi Dongwon,
regarding the question about the conversion: If you keep using the Row type
and not adding/removing fields, the conversion is pretty much for free
right now.
It will be a MapFunction (sometimes even not function at all) that should
be chained with the other operators. Hence, it should
Hi Tim,
One thing that might be interesting is that Flink might emit results more
than once when a job recovers from a failure.
It is up to the receiver to deal with that.
Depending on the type of results this might be easy (idempotent updates) or
impossible.
Best, Fabian
Am Fr., 19. Juli 2019
Hi Flavio,
Not sure I understood the requirements correctly.
Couldn't you just collect and bundle all records with a regular window
operator and forward one record for each key-window to an AsyncIO operator?
Best, Fabian
Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier <
pomperma...
Hi Richard,
I hope you could resolve the problem in the meantime.
Nonetheless, maybe Till (in CC) has an idea what could have gone wrong.
Best, Fabian
Am Mi., 17. Juli 2019 um 19:50 Uhr schrieb Richard Deurwaarder <
rich...@xeli.eu>:
> Hello,
>
> I've got a problem with our flink cluster where
Hi John,
You could implement your own n-ary Either type.
It's a bit of work because you'd need also a custom TypeInfo & Serializer
but rather straightforward if you follow the implementation of Either.
Best,
Fabian
Am Mi., 17. Juli 2019 um 16:28 Uhr schrieb John Tipper <
john_tip...@hotmail.com>
Hi Peter,
The performance drops probably be due to de/serialization.
When tasks are chained, records are simply forwarded as Java objects via
method calls.
When a task chain in broken into multiple operators, the records (Java
objects) are serialized by the sending task, possibly shipped over the
Hi Yitzchak,
Thanks for reaching out.
I'm not an expert on the Kafka consumer, but I think the number of
partitions and the number of source tasks might be interesting to know.
Maybe Gordon (in CC) has an idea of what's going wrong here.
Best, Fabian
Am Di., 23. Juli 2019 um 08:50 Uhr schrieb Y
Hi Juan,
Which Flink version do you use?
Best, Fabian
Am Di., 23. Juli 2019 um 06:49 Uhr schrieb Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com>:
> Hi,
>
> I'm trying to use AbstractTestBase in a test in order to use the mini
> cluster. I'm using specs2 with Scala, so I cannot extend
Hi Fanbin,
The delay is most likely caused by the watermark delay.
A window is computed when the watermark passes the end of the window. If
you configured the watermark to be 10 minutes before the current max
timestamp (probably to account for out of order data), then the window will
be computed w
ics, if a
> processor or window trigger registers with a ProcessingTime and EventTime
> timers - they will all fire when the appropriate watermarks arrive.
>
> Thanks again.
>
> On Thursday, July 11, 2019, 05:41:54 AM EDT, Fabian Hueske <
> fhue...@gmail.com> wrote:
>
>
Hi everyone,
I'm very happy to announce that Rong Rong accepted the offer of the Flink
PMC to become a committer of the Flink project.
Rong has been contributing to Flink for many years, mainly working on SQL
and Yarn security features. He's also frequently helping out on the
user@f.a.o mailing l
Hi Mans,
IngestionTime is uses the same internal mechanisms as EventTime (record
timestamps and watermarks).
The difference is that instead of extracting a timestamp from the record
(using a custom timestamp extractor & wm assigner), Flink will assign
timestamps based on the machine clock of the
Hi,
I'd suggest to implement your own custom deserialization schema for example
by extending JSONKeyValueDeserializationSchema.
Then you can implement whatever logic you need to handle incorrectly
formatted messages.
Best, Fabian
Am Mi., 10. Juli 2019 um 04:29 Uhr schrieb Zhechao Ma <
mazhechaom
Hi John,
let's say Flink performed a checkpoint after the 2nd record (by injecting a
checkpoint marker into the data flow) and the sink fails on the 5th record.
When Flink restarts the application, it resets the offset after the 2nd
record (it will read the 3rd record first). Hence, the 3rd and 4t
Hi,
AFAIK Flink should remove temporary files automatically when they are not
needed anymore.
However, I'm not 100% sure that there are not corner cases when a TM
crashes.
In general it is a good idea to properly configure the directories that
Flink uses for spilling, logging, blob storage, etc.
Hi Vishwas,
Sorry for the late response.
Are you still facing the issue?
I have no experience with EMC ECS, but the exception suggests an issue with
the host name:
1378 Caused by: java.net.UnknownHostException:
aip-featuretoolkit.SU73ECSG1P1d.***.COM
1379 at java.net.InetAddress.getAl
Hi,
Kafka offsets are only managed by the Flink Kafka Consumer. All following
operators do not care whether the events were read from Kafka, files,
Kinesis or whichever source.
It is the responsibility of the source to include its reading position (in
case of Kafka the partition offsets) in a chec
Hi,
What kind of function do you use to implement the operator that has the
blocking call?
Did you have a look at the AsyncIO operator? It was designed for exactly
such use cases.
It issues multiple asynchronous requests to an external service and waits
for the response.
Best, Fabian
Am Mo., 24.
d up using xStream as a 'base' while
> I needed to use yStream as a 'base'. I.e. yStream.element - 60 min <=
> xStream.element <= yStream.element + 30 min. Interchanging both datastreams
> fixed this issue.
>
> Thanks anyways.
>
> Cheers, Wouter
>
Hi Wouter,
Not sure what is going wrong there, but something that you could try is to
use a custom watemark assigner and always return a watermark of 0.
When the source finished serving the watermarks, it emits a final
Long.MAX_VALUE watermark.
Hence the join should consume all events and store th
Hi Ben,
Flink's Kafka consumers track their progress independent of any worker.
They keep track of the reading offset for themselves (committing progress
to Kafka is optional and only necessary to have progress monitoring in
Kafka's metrics).
As soon as a consumer reads and forwards an event, it i
Hi Visha,
If I remember correctly, the behavior of the Kafka consumer was changed in
Flink 1.8 to account for such situations.
Please check the release notes [1] and the corresponding Jira issue [2].
If this is not the behavior you need, please feel free to create a new Jira
issue and start a dis
Hi Syed,
The build fails because Maven could not download the required dependency
com.mapr.hadoop:maprfs:jar:5.2.1-mapr.
The dependency is hosted on MapR's Maven repository. Maybe the service was
not available for some time.
I checked it right now and it seems to be working.
I'd suggest to try it
Hi,
Yes, multiple instances of the same De/SerializationSchema can be executed
in the same JVM.
Regarding 2. I'm not 100%, but would suspect that one
De/SerializationSchema instance handles multiple partitions.
Gordon (in CC) should know this for sure.
Best,
Fabian
Am Mo., 10. Juni 2019 um 05:25
Hi Debasish,
No, I don't think there's a particular reason.
There a few Jira issues that propose adding an Avro Serialization Schema
for Confluent Schema Registry [1] [2].
Please check them out and add a new one if they don't describe what you are
looking for.
Cheers,
Fabian
[1] https://issues.a
t;> )
>>
>> GROUP BY user_id, event_month, event_year
>>
>>
>>
>> We are also using idle state retention time to clean up unused state, but
>> that is much longer (a week or month depending on the usecase). We will
>> switch to count(DISTINCT) as soo
Hi,
There are two ways:
1. make the non-serializable member variable transient (meaning that it
won't be serialized) and check in the aggregate call if it has been
initialized or not.
2. implement your own serialization logic by overriding readObject() and
writeObject() [1].
Best, Fabian
[1]
ht
Hi,
I found a few issues in Jira that are related to not deleted checkpoint
directories, but only FLINK-10855 [1] seems to be a possible reason in your
case.
Is it possible that the checkpoints of the remaining directories failed?
If that's not the case, would you mind creating a Jira issue and d
Hi,
The networking libraries that Flink uses (Netty & Akka) support seem to
support IPv6.
So, it might work.
However, I'm not aware of anybody running Flink on IPv6.
Maybe somebody with more info could help out here?
Best, Fabian
Am Do., 6. Juni 2019 um 16:25 Uhr schrieb Siew Wai Yow :
> Hi gu
Hi Sergey,
I would not consider this to be a topology change (the sink operator would
still be a Kafka producer).
It seems that dynamic topic selection is possible with a
KeyedSerializationSchema (Check "Advanced Serialization Schema: [1]).
Best,
Fabian
[1]
https://ci.apache.org/projects/flink/f
Hi Ben,
Flink correctly maintains the offsets of all partitions that are read by a
Kafka consumer.
A checkpoint is only complete when all functions successful checkpoint
their state. For a Kafka consumer, this state is the current reading offset.
In case of a failure the offsets and the state of a
Hi,
There are a few things to point out about your example:
1. The the CoFlatMapFunction is probably executed in parallel. The
configuration is only applied to one of the parallel function instances.
You probably want to broadcast the configuration changes to all function
instances. Have a look a
Hi Yu,
When you register a DataStream as a Table, you can create a new attribute
that contains the event timestamp of the DataStream records.
For that, you would need to assign timestamps and generate watermarks
before registering the stream:
FlinkKafkaConsumer kafkaConsumer =
new FlinkKa
Hi Vinod,
IIRC, support for DISTINCT aggregates was added in Flink 1.6.0 (released
August, 9th 2018) [1].
Also note that by default, this query will accumulate more and more state,
i.e., for each grouping key it will hold all unique event_ids.
You could configure an idle state retention time to cl
Hi folks,
Next Tuesday (June, 4th), Vasia and I will be speaking at a meetup in
Munich about Flink and how we wrote our book "Stream Processing with Apache
Flink".
We will also raffle a few copies of the book.
Please RSVP if you'd like to attend:
-> https://www.meetup.com/inovex-munich/events/26
I see, that's unfortunate.
Both classes are also tagged with @Public, making them unchangeable until
Flink 2.0.
Nonetheless, feel free to open a Jira issue to improve the situation for a
future release.
Best, Fabian
Am Mo., 27. Mai 2019 um 16:55 Uhr schrieb spoganshev :
> I've tried that, but t
Configuring the split assigner wasn't a common requirement so far.
You can just implement your own format extending from FileInputFormat (or
any of its subclasses) and override the getInputSplitAssigner() method.
Best, Fabian
Am Mo., 27. Mai 2019 um 15:30 Uhr schrieb spoganshev :
> Why is FileIn
Hi Wayne,
Long story short, this is not possible with Flink yet.
I posted a more detailed answer to your question on SO.
Best, Fabian
Am Di., 21. Mai 2019 um 19:24 Uhr schrieb Wayne Heaney <
wayne.hea...@gmail.com>:
> I'm trying to build a Dynamic table that will be updated when records
> haven
Hi Abhishek,
Your observation is correct. Right now, the Flink ML module is in a
half-baked state and is only supported in batch mode.
It is not integrated with the DataStream API. FLIP-23 proposes a feature
that allows to evaluated an externally trained model (stored as PMML) on a
stream of data.
Hi,
I'm afraid I don't see another solution than touching the Flink code for
this and adding a try catch block around the timestamp conversion.
It would be great if you could create a Jira issue reporting this problem.
IMO, we should have a configuration switch (either per Table or query) to
eith
Thanks for sharing your solution Wouter!
Best, Fabian
Am Mi., 15. Mai 2019 um 15:28 Uhr schrieb Wouter Zorgdrager <
w.d.zorgdra...@tudelft.nl>:
> Hi all,
>
> To answer my own questions I worked on the following solution:
>
> 1) Custom Docker image which pulls the Flink image and moves Prometheus
Hi Shannon,
That's a good observation. To be honest, I know why the Scala AsyncFunction
does not implement RichFunction.
Maybe this was not intentional and just overlooked when porting the
functionality to Scala.
Would you mind creating a Jira ticket for this?
Thank you,
Fabian
Am Di., 14. Mai
cParameter for some reason. Also, how would you create the serializer
>>>> for the type info? can i reuse some builtin Kryo functionality?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-d
marks, it creates watermarks from it's received data. Since it doesn't
> receive any data, it doesn't create any watermarks. D couldn't make
> progress because one of its inputs, C2, doesn't make progress. Is this
> understand correct?
>
> Yes, I think that
g checkpointing location.
>
>
> Boris Lublinsky
> FDP Architect
> boris.lublin...@lightbend.com
> https://www.lightbend.com/
>
> On May 10, 2019, at 2:47 AM, Fabian Hueske wrote:
>
> Hi Boris,
>
> Is your question is in the context of replacing Zookeeper by a different
&
method?
>
> Thanks a lot for your help.
>
> Regards,
> Averell
>
>
> On Fri, May 10, 2019 at 8:52 PM Fabian Hueske wrote:
>
>> Hi Averell,
>>
>> Ah, sorry. I had assumed the toggle events where broadcasted anyway.
>> Since you had both streams keyed, your c
context.applyToKeyedState(toggleStateDescriptor, (k: Key, s:
> ValueState[Boolean]) =>
> if (s != null) context.output(outputTag, (k, s.value(
>}
> }
>
> Thanks for your help.
> Regards,
> Averell
>
> On Thu, May 9, 2019 at 7:31 PM Fabian Hueske wrote:
&g
any point in time. Also Flink does does not give any guarantees
about how keys (or rather key groups) are assigned to tasks. If you rescale
the application to a parallelism of 3, the active key group might be
scheduled to C.2 or C.3.
Long story short, D makes progress in event time because watermarks
Hi Frank,
By default, Flink does not remove any state. It is the responsibility of
the developer to ensure that an application does not leak state.
Typically, you would use timers [1] to discard state that expired and is
not useful anymore.
In the last release 1.8, we added lazy cleanup strategie
reatment of operator state documented anywhere?
>
> On 2019/05/09 07:39:34, Fabian Hueske wrote:
> > Hi,
> >
> > Yes, IMO it is more clear.
> > However, you should be aware that operator state is maintained on heap
> only
> > (not in RocksDB).
> >
&g
Hi Boris,
Is your question is in the context of replacing Zookeeper by a different
service for highly-available setups or are you setting up a regular Flink
cluster?
Best, Fabian
Am Mi., 8. Mai 2019 um 06:20 Uhr schrieb Congxian Qiu <
qcx978132...@gmail.com>:
> Hi, Boris
>
> TM will also need
guishing receiving (different) watermarks and
emitting (the same) watermarks.
Best, Fabian
> On 2019/05/03 07:32:07, Fabian Hueske wrote:
> > Hi,
> >
> > this should be covered here:
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time
Hi,
Passing a Context through a DataStream definitely does not work.
You'd need to have the keyed state that you want to scan over in the
KeyedBroadcastProcessFunction.
For the toggle filter use case, you would need to have a unioned stream
with Toggle and StateReport events.
For the output, you
Hi,
The KeyedBroadcastProcessFunction has a method to iterate over all keys of
a keyed state.
This function is available via the Context object of the processBroadcast()
method.
Hence you need a broadcasted message to trigger the operation.
Best, Fabian
Am Do., 9. Mai 2019 um 08:46 Uhr schrieb C
Hi,
you can use the value construction function ROW to create a nested row (or
object).
However, you have to explicitly reference all attributes that you will add.
If you have a table Cars with (year, modelName) a query could look like
this:
SELECT
ROW(year, modelName) AS car,
enrich(year, m
Hi,
Yes, IMO it is more clear.
However, you should be aware that operator state is maintained on heap only
(not in RocksDB).
Best, Fabian
Am Mi., 8. Mai 2019 um 20:44 Uhr schrieb an0 :
> I switched to using operator list state. It is more clear. It is also
> supported by RocksDBKeyedStateBacke
Hi,
I created FLINK-12460 to update the documentation.
Cheers, Fabian
Am Mi., 8. Mai 2019 um 17:48 Uhr schrieb Flavio Pompermaier <
pomperma...@okkam.it>:
> Great, thanks Till!
>
> On Wed, May 8, 2019 at 4:20 PM Till Rohrmann wrote:
>
>> Hi Flavio,
>>
>> taskmanager.tmp.dirs is the deprecated
The window operator cannot configured to use the max timestamp of the
events in the window as the timestamp of the output record.
The reason is that such a behavior can produce late records.
If you want to do that, you have to track the max timestamp and assign it
yourself with a timestamp assigne
@Override
> public DataSet getDataSet(ExecutionEnvironment execEnv) {
> return execEnv.fromCollection(resourceIterator, modelClass);
> }
>
> @Override
> public TypeInformation getReturnType() {
> return TypeInformation.of(modelClass);
> }
>
> @Overri
101 - 200 of 1728 matches
Mail list logo