Hi Averell,
Yes, timestamps and watermarks do not (completely) move together.
The watermark should always be lower than the timestamps of the currently
processed records.
Otherwise, the records might be processed as late records (depending on the
logic).
The easiest way to check the timestamp of
ortant rule documented anywhere in the official document?
>
> On 2019/04/30 08:47:29, Fabian Hueske wrote:
> > An operator task broadcasts its current watermark to all downstream tasks
> > that might receive its records.
> > If you have an the following code:
> >
> &g
gt; Hi Wouter,
>
> I've met the same issue and finally managed to use operator states to back
> the accumulators, so they can be restored after restarts.
> The downside is that we have to update the values in both accumulators and
> states to make them consistent. FYI.
>
>
e40762e98f8d010/src/main/scala/org/codefeedr/experimental/stats/CommitsStatsProcess.scala#L34
>
>
>
> Op do 2 mei 2019 om 09:36 schreef Fabian Hueske :
>
>> Hi Wouter,
>>
>> The DataStream API accumulators of the AggregateFunction [1] are stored
>> in state and
Hi Averell,
The watermark of a stream is always the low watermark of all its input
streams. If one of the input streams does not have watermarks, Flink does
not compute a watermark for the merged stream.
If you do not need time-based operations on streams 3 and 4, setting the
watermark to MAX_WATE
Hi Josh,
Does your TableSource also implement ProjectableTableSource?
If yes, you need to make sure that the filter information is also forwarded
if ProjectableTableSource.projectFields() is called after
FilterableTableSource.applyPredicate().
Also make sure to correctly implement
FilterableTableS
Hi,
The SQL client can be started with
> ./bin/sql-client.sh embedded
Best, Fabian
Am Di., 30. Apr. 2019 um 20:13 Uhr schrieb Rad Rad :
> Thanks, Fabian.
>
> The problem was incorrect java path. Now, everything works fine.
>
> I would ask about the command for running sql-client.sh
>
> These
Hi Wouter,
The DataStream API accumulators of the AggregateFunction [1] are stored in
state and should be recovered in case of a failure as well.
If this does not work, it would be a serious bug.
What's the type of your accumulator?
Can you maybe share the code?
How to you apply the AggregateFunc
Hi,
With Flink 1.5.0, we introduced a new distributed architecture (see release
announcement [1] and FLIP-6 [2]).
>From what you describe, I cannot tell what is going wrong.
How do you submit your application?
Which action resulted in the error message you shared?
Btw. why do you go for Flink 1.
Hi,
I had a look but couldn't find an ORC writer in flink-orc, only an
InputFormat and TableSource to read ORC data into DataSet programs or Table
/ SQL queries.
Where did you find the ORC writer?
Thanks, Fabian
Am Di., 30. Apr. 2019 um 09:09 Uhr schrieb Hai :
> Hi,
>
>
> I found flink now supp
Hi,
Actually all operators should preserve record timestamps if set the correct
TimeCharacteritics to event time.
A window operator will set the timestamp of all emitted records to the
end-timestamp of the window.
Not sure what happens if you use a processing time window in an event time
applicati
Hi,
Stateful streaming applications are typically designed to run continuously
(i.e., until forever or until they are not needed anymore or replaced).
May jobs run for weeks or months.
IMO, using CEP for "simple" equality matches would add too much complexity
for a use case that can be easily sol
Hi Sean,
I was looking for the max-parallelism value in the UI, but couldn't find
it. Also the REST API does not seem to provide it.
Would you mind opening a Jira issue for adding it to the REST API and the
Web UI?
Thank you,
Fabian
Am Di., 30. Apr. 2019 um 06:36 Uhr schrieb Sean Bollin :
> Tha
An operator task broadcasts its current watermark to all downstream tasks
that might receive its records.
If you have an the following code:
DataStream a = ...
a.map(A).map(B).keyBy().window(C)
and execute this with parallelism 2, your plan looks like this
A.1 -- B.1 --\--/-- C.1
You could implement aggregation functions that just do AVG, COUNT, etc. and
a parameterizable aggregation function that can be configured to call the
avg, count, etc. functions.
When configuring, you would specify the input and output, for example like
this:
input: [int, int, double]
key: input.1
sing whether a Dataset could benefit from a
> rebalance or not could be VERY nice (at least for batch) but I fear this
> would be very hard to implement..am I wrong?
>
> On Mon, Apr 29, 2019 at 3:10 PM Fabian Hueske wrote:
>
>> Hi Flavio,
>>
>> These typos of race c
Hi Sergey,
You are right, keys are managed in key groups. Each key belongs to a key
group and one or more key groups are assigned to each parallel task of an
operator.
Key groups are not exposed to users and the assignments of keys ->
key-groups and key-groups -> tasks cannot be changed without ch
> will be read by a slot (according to the job parallelism) for applying the
>>> map logic.
>>>
>>> In reading from HDFS I read this
>>> <https://stackoverflow.com/a/39153402/8110607> answer by Fabian Hueske
>>> <https://stackoverflow.com/users/3609571/fabian-hueske> and i want to
>>> know is that still the Flink strategy fro reading from distributed system
>>> file?
>>>
>>> thanks
>>>
>>
>
>
Nice!
Thanks for the confirmation :-)
Am Mo., 29. Apr. 2019 um 13:21 Uhr schrieb Avi Levi :
> Thanks! Works like a charm :)
>
> On Mon, Apr 29, 2019 at 12:11 PM Fabian Hueske wrote:
>
>> *This Message originated outside your organization.*
>> ---
from local file system I guess every line of the file will
> be read by a slot (according to the job parallelism) for applying the map
> logic.
>
> In reading from HDFS I read this
> <https://stackoverflow.com/a/39153402/8110607> answer by Fabian Hueske
> <https://stackover
Hi Mans,
I don't know if that would work or not. Would need to dig into the source
code for that.
TBH, I would recommend to check if you can implement the logic using a
(Keyed-)ProcessFunction.
IMO, process functions are a lot easier to reason about than Flink's
windowing framework.
You can manag
Hi Sergei,
It depends whether you want to process the file with the DataSet (batch) or
DataStream (stream) API.
Averell's answer was addressing the DataStream API part.
The DataSet API does not have any built-in support to distinguish files (or
file splits) by folders and process them in order.
F
Hi Avi,
I'm not sure if you cannot emit data from the keyed state when you receive
a broadcasted message.
The Context parameter of the processBroadcastElement() method in the
KeyedBroadcastProcessFunction has the applyToKeyedState() method.
The method takes a KeyedStateFunction that is applied to
Hi,
I don't think that (the current state of) Flink SQL is a good fit for your
requirements.
Each query will be executed as an independent job. So there won't be any
sharing of intermediate results.
You can do some of this manually if you use the Table API, but even then it
won't allow for early r
Hi Juan,
count() and collect() trigger the execution of a job.
Since Flink does not cache intermediate results (yet), all operations from
the sink (count()/collect()) to the sources are executed.
So in a sense a DataSet is immutable (given that the input of the sources
do not change) but completel
hi Lifei,
This sounds to me like you need an OVER window aggregation.
OVER is a standard SQL clause to compute aggregates for each row over a
group of surrounding rows (defined by ordering and partitioning).
Check out the documentation [1].
The example only shows ROW based windows, but Flink also
se/FLINK-12198
>
>
>
> Best,
>
> Konstantinos
>
>
>
> *From:* Papadopoulos, Konstantinos
>
> *Sent:* Δευτέρα, 15 Απριλίου 2019 12:30 μμ
> *To:* Fabian Hueske
> *Cc:* Rong Rong ; user
> *Subject:* RE: Flink JDBC: Disable auto-commit mode
>
>
>
> Hi F
That's great!
Thank you.
Let me know if you have any questions.
Fabian
Am Mo., 15. Apr. 2019 um 11:32 Uhr schrieb Hai :
> Hi Fabian:
>
>
> OK ,I am glad to do that.
>
>
> Regards
>
> Original Message
> *Sender:* Fabian Hueske
> *Recipient:* hai
> *Cc:
Hi,
The Jira issue is still unassigned.
Would you be up to work on a fix?
Best, Fabian
Am Fr., 12. Apr. 2019 um 05:07 Uhr schrieb hai :
> Hi, Tang:
>
>
> Thaks for your reply, will this issue fix soon?I don’t think put
> flink-hadoop-compatibility
> jar under FLINK_HOME/lib is a elegant soluti
Hi Konstantinos,
This sounds like a useful extension to me.
Would you like to create a Jira issue and contribute the improvement?
In the meantime, you can just fork the code of JDBCInputFormat and adjust
it to your needs.
Best, Fabian
Am Mo., 15. Apr. 2019 um 08:53 Uhr schrieb Papadopoulos, Kon
Hi Reminia,
What Hequn said is correct.
However, I would *not* use a regular but model the problem as a
time-versioned table join.
A regular join will materialize both inputs which is probably not want you
want to do for a stream.
For a time-versioned table join, only the time-versioned table wou
s-release-1.8/dev/stream/state/broadcast_state.html)
> and still not sure how that will help ?
>
> Thanks for your help.
>
> Mans
>
>
>
>
>
> On Thursday, April 11, 2019, 3:53:59 AM EDT, Fabian Hueske <
> fhue...@gmail.com> wrote:
>
>
> Hi,
>
> you would s
is Long.MAX_VALUE, and that is my concern. So, is
> there any other way of clean up the now purged global windows ?
>
> Thanks again.
>
>
>
> On Thursday, April 11, 2019, 4:16:24 AM EDT, Fabian Hueske <
> fhue...@gmail.com> wrote:
>
>
> Hi,
>
> As far as I
Hi Morven,
You posted the same question a few days ago and it was also answered
correctly.
Please do not repost the same question again.
You can reply to the earlier thread if you have a follow up question.
To answer your question briefly:
No, Flink does not trigger a MapReduce job.
The whole job
Hi all,
Flink Forward Europe returns to Berlin on October 7-9th, 2019.
We are happy to announce that the Call for Presentations is open!
Please submit a proposal if you'd like to present your Apache Flink
experience, best practices, new features, or use cases in front of an
international audience
Hi Esa,
Flink's implementation of SQL MATCH_RECOGNIZE is based on it's CEP library,
i.e., they share the same implementation.
Best, Fabian
Am Do., 11. Apr. 2019 um 10:29 Uhr schrieb Esa Heikkinen (TAU) <
esa.heikki...@tuni.fi>:
> Hi
>
>
>
> Is SQL CEP based (old) FlinkCEP at all and are SQL CEP
Hi,
As far as I know, a window is only completely removed when time (event or
processing time, depending on the window type) passes the window's end
timestamp.
Since, GlobalWindow's end timestamp is Long.MAX_VALUE, it is never
completely removed.
I'm not 100% sure what state is kept around. It mig
Hi,
you would simply pass multiple MapStateDescriptors to the broadcast method:
MapStateDescriptor bcState1 = ...
MapStateDescriptor bcState2 = ...
DataStream stream = ...
BroadcastStream bcStream = stream.broadcast(bcState1, bcState2);
Best,
Fabian
Am Mi., 10. Apr. 2019 um 19:44 Uhr schrieb
Hi Min,
I think the pool size is per parallel sink task, i.e., it should be
independent of the parallelism of the sink operator.
>From my understanding a pool size of 5 should be fine if the maximum number
of concurrent checkpoints is 1.
Running out of connections would mean that there are 5 in-fl
Hi Felipe,
three comments:
1) applying rebalance(), shuffle(), or rescale() before a keyBy() has no
effect:
keyBy() introduces a hash partitioning such that any data partitioning that
you do immediately before keyBy() is destroyed.
You only change the distribution for the call of the key extracto
Hi,
Flink's Hadoop compatibility functions just wrap functions that were
implemented against Hadoop's interfaces in wrapper functions that are
implemented against Flink's interfaces.
There is no Hadoop cluster started or MapReduce job being executed.
Job is just a class of the Hadoop API. It does
Hi,
Packaging the flink-hadoop-compatibility dependency with your code into a
"fat" job jar should work as well.
Best,
Fabian
Am Mi., 10. Apr. 2019 um 15:08 Uhr schrieb Morven Huang <
morven.hu...@gmail.com>:
> Hi,
>
>
>
> I’m using Flink 1.5.6 and Hadoop 2.7.1.
>
>
>
> *My requirement is to re
Congrats to everyone!
Thanks Aljoscha and all contributors.
Cheers, Fabian
Am Mi., 10. Apr. 2019 um 11:54 Uhr schrieb Congxian Qiu <
qcx978132...@gmail.com>:
> Cool!
>
> Thanks Aljoscha a lot for being our release manager, and all the others
> who make this release possible.
>
> Best, Congxian
s again, and congrats on an awesome conference, I had learned a lot
> Shahar
>
> From: Fabian Hueske
> Sent: Monday, April 8, 02:54
> Subject: Re: Schema Evolution on Dynamic Schema
> To: Shahar Cizer Kobrinsky
> Cc: Rong Rong, user
>
>
> Hi Shahar,
>
> Sorry for
Hi Boris,
ZooKeeper is also used by the JobManager to store metadata about the
running job.
The JM writes information like the JobGraph, JAR file, checkpoint metadata
to a persistent storage (like HDFS, S3, ...) and a pointer to this
information to ZooKeeper.
In case of a recovery, the new JM look
Hi Henry,
It seem that the optimizer is not handling this case well.
The search space might be too large (or rather the optimizer explores too
much of the search space).
Can you share the query? Did you add any optimization rules?
Best, Fabian
Am Mi., 3. Apr. 2019 um 12:36 Uhr schrieb 徐涛 :
> Hi
Hi Patrick,
In general, you could also implement the 2PC logic in a regular operator.
It does not have to be a sink.
You would need to add the logic of TwoPhaseCommitSinkFunction to your
operator.
However, the TwoPhaseCommitSinkFunction does not work well with JDBC. The
problem is that you would n
the map elements
> when doing *SELECT map['a', sum(a), 'b', sum(b).. ] FROM.. group by .. *?
>
> On Wed, Mar 20, 2019 at 2:00 AM Fabian Hueske wrote:
>
>> Hi,
>>
>> I think this would work.
>> However, you should be aware that all keys
o success.
>
> I got the same NotSerializableException.
>
>
>
> Best,
>
> Konstantinos
>
>
>
> *From:* Fabian Hueske
> *Sent:* Σάββατο, 6 Απριλίου 2019 2:26 πμ
> *To:* Papadopoulos, Konstantinos
>
> *Cc:* Chesnay Schepler ; user
> *Subject:* Re: InvalidProgramExcep
Hi Min,
Guowei is right, the comment in the documentation about exactly-once in
embarrassingly parallel data flows refers to exactly-once *state
consistency*, not *end-to-end* exactly-once.
However, in strictly forwarding pipelines, enabling exactly-once
checkpoints should not have drawbacks compa
Hi Davood,
Flink uses hash partitioning to assign keys to key groups. Each key group
is then assigned to a task for processing (a task might process multiple
key groups).
There is no way to directly assign a key to a particular key group or task.
All you can do is to experiment with different cust
Hi,
You POJO should implement the Serializable interface.
Otherwise it's not considered to be serializable.
Best, Fabian
Papadopoulos, Konstantinos
schrieb am Mi., 3. Apr. 2019, 07:22:
> Hi Chesnay,
>
>
>
> Thanks for your support. ThresholdAcvFact class is a simple POJO with the
> following d
Hi,
Konstantin is right.
reinterpreteAsKeyedStream only works if you call it on a DataStream that
was keyBy'ed before (with the same parallelism).
Flink cannot reuse the partioning of another system like Kafka.
Best, Fabian
Adrienne Kole schrieb am Do., 4. Apr. 2019, 14:33:
> Thanks a lot for
Hi Piyush,
Custom triggers (or early firing) is currently not supported by SQL or the
Table API.
It is also not on the roadmap [1].
Currently, most efforts on the relational API are focused on restructuring
the code and working towards the integration of the Blink contribution [2].
AFAIK, there a
Hi everyone,
*Flink Forward San Francisco 2019 will take place in a few days on April
1st and 2nd.*
If you haven't done so already and are planning to attend, you should
register soon at:
-> https://sf-2019.flink-forward.org/register
Don't forget to use the 25% discount code *MailingList* for ma
Hi Dongwon,
Couldn't you just return a tuple from the aggregation function and extract
the fields from the nested tuple using a value access function [1]?
table table2 = table1
.window(Slide.over("3.rows").every("1.rows").on("time").as("w"))
.groupBy("w, name")
.select("name, my
p doesnt work for me.
>> Trying something like
>> Select a, map(sum(b) as metric_sum_c ,sum(c) as metric_sum_c) as
>> metric_map
>> group by a
>>
>> results with "Non-query expression encountered in illegal context"
>> is my train of thought the rig
mber in
> Flink may be compelling in batch processing.
>
> Could you help explain a bit more on which works are needed to be done, so
> Flink can support custom partition numbers numbers? We would be willing to
> help improve this area.
>
> Thanks,
> Qi
>
> On Mar 15, 2019,
Hi,
Restarting a changed query from a savepoint is currently not supported.
In general this is a very difficult problem as new queries might result in
completely different execution plans.
The special case of adding and removing aggregates is easier to solve, but
the schema of the stored state cha
Hi,
Flink works a bit differently than Spark.
By default, Flink uses pipelined shuffles which push results of the sender
immediately to the receivers (btw. this is one of the building blocks for
stream processing).
However, pipelined shuffles require that all receivers are online. Hence,
there num
Hi,
This is not possible with Flink. Events in transport channels cannot be
reordered and function cannot pick which input to read from.
There are some upcoming changes for the unified batch-stream integration
that enable to chose which input to read from, but this is not there yet,
AFAIK.
Best,
Thanks Flavio!
Am Di., 5. März 2019 um 11:23 Uhr schrieb Flavio Pompermaier <
pomperma...@okkam.it>:
> I discovered that now (in Flink 1.7.2( queryable state server is enabed if
> queryable state client is found on the classpath, i.e.:
>
>
> org.apache.flink
> flink-queryable-state-client-java_$
Hi Wouter,
We are using Docker Compose (Flink JM, Flink TM, Kafka, Zookeeper) setups
for our trainings and it is working very well.
We have an additional container that feeds a Kafka topic via the
commandline producer to simulate a somewhat realistic behavior.
Of course, you can do it without Kafk
Hi,
The answer is in fact no.
Flink hash-partitions keys into Key Groups [1] which are uniformly assigned
to tasks, i.e., a task can process more than one key group.
AFAIK, there are no plans to change this behavior.
Stefan (in CC) might be able to give more details on this.
Something that might
Hi,
Watermarks of streams are independent as long as the streams are not
connected with each other.
When you union, join, or connect two streams in any other way, their
watermarks are fused, which means that they are synced to the "slower"
stream, i.e., the stream with the earlier watermarks.
Bes
Hi Artur,
In order to subscribe to Flink's user mailing list you need to send a mail
to user-subscr...@flink.apache.org
Best, Fabian
Am Mo., 18. Feb. 2019 um 20:34 Uhr schrieb Artur Mrozowski :
> art...@gmail.com
>
Hi Flavio,
I'm not aware of any particular plan to add sampling operators to the Table
API or SQL.
However, I agree. It would be a good feature.
Best, Fabian
Am Mo., 18. Feb. 2019 um 15:44 Uhr schrieb Flavio Pompermaier <
pomperma...@okkam.it>:
> Hi to all,
> is there any plan to support differ
Hi Paul,
Which components (Flink, JDK, Docker base image, ...) are you upgrading and
which versions do you come from?
I think it would be good to check how (and with which options) the JVM in
the container is started.
Best, Fabian
Am Fr., 15. Feb. 2019 um 09:50 Uhr schrieb Paul Lam :
> Hi all,
Hi Eric,
I did a quick search in our Jira to check if this is a known issue but
didn't find anything.
Maybe Gordon (in CC) knows a bit more about this problem.
Best, Fabian
Am Fr., 15. Feb. 2019 um 11:08 Uhr schrieb Eric Troies :
> Hi, I'm having the exact same issue with flink 1.4.0 using scal
Thanks for pointing this out!
This is indeed a bug in the documentation.
I'll fix that.
Thank you,
Fabian
Am Mi., 13. Feb. 2019 um 02:04 Uhr schrieb yinhua.dai <
yinhua.2...@outlook.com>:
> OK, thanks.
> It might be better to update the document which has the following example
> that confused m
Am Mo., 11. Feb. 2019 um 11:29 Uhr schrieb Stephen Connolly <
stephen.alan.conno...@gmail.com>:
>
>
> On Mon, 11 Feb 2019 at 09:42, Fabian Hueske wrote:
>
>> Hi Stephen,
>>
>> A window is created with the first record that is assigned to it.
>> If the wind
Hi François,
I had a look at the code and the GenericTypeInfo checks equality by
comparing the classes the represent (Class == Class).
Class does not override the default implementation of equals, so this is an
instance equality check. The check can evaluate to false, if Map was loaded
by two diff
Thank you Pablo!
Am Fr., 15. Feb. 2019 um 20:42 Uhr schrieb Pablo Estrada :
> Hello everyone,
> There is an upcoming meetup happening in the Google Seattle office, on
> February 21st, starting at 5:30pm:
> https://www.meetup.com/seattle-apache-flink/events/258723322/
>
> People will be chatting a
roduce a DataSet
> from a single geojson file.
> This doesn't sound compatible with a custom InputFormat, don't you?
>
> Thanks in advance for any addition hint, all the best
>
> François
>
> Le lun. 4 févr. 2019 à 12:10, Fabian Hueske a écrit :
>
>> Hi
Hi,
I would not use a window for that.
Implementing the logic with a ProcessFunction seems more straight-forward.
The function simply collects all events between 00:00 and 01:00 in a
ListState and emits them when the time passes 01:00.
All other records are simply forwarded.
Best, Fabian
Am Fr.,
Hi,
I like the idea of putting the roadmap on the website because it is much
more visible (and IMO more credible, obligatory) there.
However, I share the concerns about frequent updates.
It think it would be great to update the "official" roadmap on the website
once per release (-bugfix releases)
Am Di., 12. Feb. 2019 um 16:26 Uhr schrieb Fabian Hueske :
> Hi everyone,
>
> We announced the program of Flink Forward San Francisco 2019.
> The conference takes place at the Hotel Nikko in San Francisco on April
> 1st and 2nd.
>
> On the first day we offer three
Hi everyone,
We announced the program of Flink Forward San Francisco 2019.
The conference takes place at the Hotel Nikko in San Francisco on April 1st
and 2nd.
On the first day we offer three training sessions [1]:
* Introduction to Streaming with Apache Flink
* Analyzing Streaming Data with Flin
Hi everyone,
On behalf of the Flink PMC I am happy to announce Thomas Weise as a new
member of the Apache Flink PMC.
Thomas is a long time contributor and member of our community.
He is starting and participating in lots of discussions on our mailing
lists, working on topics that are of joint int
Hi,
It's as the error message says.
LIMIT 10 without ORDER BY would pick 10 random rows and hence lead to
non-deterministic results.
That's why it is not supported yet.
Best, Fabian
Am Di., 12. Feb. 2019 um 07:02 Uhr schrieb yinhua.dai <
yinhua.2...@outlook.com>:
> Why flink said "Limiting the
Hi Vishal,
Kostas (in CC) should be able to help here.
Best, Fabian
Am Mo., 11. Feb. 2019 um 00:05 Uhr schrieb Vishal Santoshi <
vishal.santo...@gmail.com>:
> Any one ?
>
> On Sun, Feb 10, 2019 at 2:07 PM Vishal Santoshi
> wrote:
>
>> You don't have to. Thank you for the input.
>>
>> On Sun, F
Hi Stephen,
First of all, yes, windows computing and emitting at the same time can
cause pressure on the downstream system.
There are a few ways how you can achieve this:
* use a custom window assigner. A window assigner decides into which window
a record is assigned. This is the approach you sug
Hi Stephen,
A window is created with the first record that is assigned to it.
If the windows are based on time and a key, than no window will be created
(and not space be occupied) if there is not a first record for a key and
time interval.
Anyway, if tracking the number of open files & average o
Hi William,
Does the cache need to be fault tolerant?
If not you could use a regular in-memory map as cache (+some LRU cleaning).
Or do you expect the cache to group too large for the memory?
Best, Fabian
Am Mo., 4. Feb. 2019 um 18:00 Uhr schrieb William Saar :
> Hi,
> I am trying to implement
rtStreamKeyed.window(TumblingProcessingTimeWindowsetParallelism(1).name("Aggregate
> events");
>
> Thanks
> David
>
> On 2019/02/04 13:54:14, Fabian Hueske wrote:
> > Hi,
> >
> > A WindowAll is executed in a single task. If you sort the data bef
Key)? My understanding is that this will further partition the
> already partitioned input stream (from 1 above) and will not help me, as I
> need to process all LargeMessages for a given MyKey in order.
>
>
>
> Is there an implicit assumption here that the flatMap operation (2) a
nos.papadopou...@iriworldwide.com>:
> Hi Fabian,
>
>
>
> Do you know if there is any plan Flink core framework to support such
> functionality?
>
>
>
> Best,
>
> Konstantinos
>
>
>
> *From:* Fabian Hueske
> *Sent:* Δευτέρα, 4 Φεβρουαρίου 2019 3:49 μμ
> *To:*
Hi,
Calling keyBy twice will not work, because the second call overrides the
first.
You can keyBy on a composite key (MyKey, LargeMessageId).
You can do the following
InputStream
.keyBy ( (MyKey, LargeMessageId) ) \\ use composite key
.flatMap(new MyReassemblyFunction())
.keyBy(MyKey)
.?
Hi Alexey,
I think you are right. It does not seem to be possible to provide a
TypeInformation for side outputs to a TestHarness.
This sounds like a useful addition.
Would you mind creating a Jira issue for that?
Thank you,
Fabian
Am So., 3. Feb. 2019 um 19:13 Uhr schrieb Alexey Trenikhun :
>
Hi,
A WindowAll is executed in a single task. If you sort the data before the
window, the sorting must also happen in a single task, i.e., with
parallelism 1.
The reasons is that an operator somewhat randomly merges multiple input
partitions. So even if each input partition is sorted, the merging
Hi Konstantinos,
Writing headers to files is currently not supported by the underlying
TextOutputFormat.
You can implement a custom OutputFormat by extending TextOutputFormat to
add this functionality.
Best, Fabian
Am Fr., 1. Feb. 2019 um 16:04 Uhr schrieb Papadopoulos, Konstantinos <
konstantin
> before to be processed or will all be streamed?
>
> All the best
> François
>
> Le mar. 29 janv. 2019 à 22:20, Fabian Hueske a écrit :
>
>> Hi,
>>
>> You can point a file-based input format to a directory and the input
>> format should read all fil
Hi,
You can point a file-based input format to a directory and the input format
should read all files in that directory.
That works as well for TableSources that are internally use file-based
input formats.
Is that what you are looking for?
Best, Fabian
Am Mo., 28. Jan. 2019 um 17:22 Uhr schrieb
The problem is that the table "lineitem" does not have a field
"l_returnflag".
The field in "lineitem" are named [TMP_2, TMP_5, TMP_1, TMP_0, TMP_4,
TMP_6, TMP_3].
I guess it depends on how you obtained lineitem.
Best, Fabian
Am Di., 29. Jan. 2019 um 16:38 Uhr schrieb Soheil Pourbafrani <
soheil
I think this is very hard to build in a generic way.
The common approach here would be to get access to the changelog stream of
the table, writing it to a message queue / event log (like Kafka, Pulsar,
Kinesis, ...) and ingesting the changes from the event log into a Flink
application.
You can of
Hi Jonny,
I think this is good use case for event time stream processing.
The idea of taking a savepoint, stopping and later resuming the job is good
as it frees the resources that would otherwise be occupied by the idle job.
In that sense it would behave like a batch job.
However, in contrast to
Hi,
The problem is that you are using processing time which is
non-deterministic.
Both inputs are consumed at the same time and joined based on which record
arrived first. The result depends on a race condition.
If you change the input table to have event time attributes and use these
to register
Hi Chesnay,
Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and connectors
(although in application space).
+1
Fabian
Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
ches...@apache.org>:
> Hello,
>
> the binary
Hi Harshith,
No, you don't need to restart the whole cluster. Flink only needs enough
processing slots to recover the job.
If you have a standby TM, the job should restart immediately (according to
its restart policy). Otherwise, you have to start a new TM to provide more
slots. Once the slots are
Hi Vipul,
I'm not aware of a way to do this.
You could have a list of all registered timers per key as state to be able
to delete them.
However, the problem is to identify in user code when an application was
restarted, i.e., to know when to delete timers.
Also, timer deletion would need to be don
201 - 300 of 1728 matches
Mail list logo