I'm afraid you cannot do that. The inputs having the same key should be
processed by the same CEP operator. Otherwise the results will be
nondeterministic and also be wrong.
Regards,
Dian
> 在 2019年1月24日,下午2:56,dhanuka ranasinghe 写道:
>
> In this example key will be same. I am using 1 million
In this example key will be same. I am using 1 million messages with same
key for performance testing. But still I want to process them parallel.
Can't I use Split function and get a SplitStream for that purpose?
On Thu, Jan 24, 2019 at 2:49 PM Dian Fu wrote:
> Hi Dhanuka,
>
> Does the
Hi Dhanuka,
Does the KeySelector of Event::getTriggerID generate the same key for all the
inputs or only generate very few key values and these key values happen to be
hashed to the same downstream operator? You can print the results of
Event::getTriggerID to check if it's that case.
Regards,
When we start a high-parallelism (1,600) job without any
checkpoint/savepoint, the job struggled to be deployed. After a few
restarts, it eventually got deployed and was running fine after the initial
struggle. jobmanager was very busy. Web UI was very slow. I saw these two
exceptions/failures
Whether using KeyedStream depends on the logic of your job, i.e, whether you
are looking for patterns for some partitions, i.e, patterns for a particular
user. If so, you should partition the input data before the CEP operator.
Otherwise, the input data should not be partitioned.
Regards,
Dian
Hi Dhanuka,
From the code you shared, it seems that you're using event time. The processing
of elements is triggered by watermark in event time and so you should define
how to generate the watermark, i.e with DataStream.assignTimestampsAndWatermarks
Regards,
Dian
> 在
Hi Dhanuka,
In order to make the CEP operator to run parallel, the input stream should be
KeyedStream. You can refer [1] for detailed information.
Regards,
Dian
[1]:
https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns
> 在 2019年1月24日,上午10:18,dhanuka
Thank you for the clarification.
On Thu, 24 Jan 2019, 12:44 Dian Fu Hi Dhanuka,
>
> From the code you shared, it seems that you're using event time. The
> processing of elements is triggered by watermark in event time and so you
> should define how to generate the watermark, i.e with
>
Hi Dian,
I tried that but then kafkaproducer only produce to single partition and
only single flink host working while rest not contribute for processing . I
will share the code and screenshot
Cheers
Dhanuka
On Thu, 24 Jan 2019, 12:31 Dian Fu Hi Dhanuka,
>
> In order to make the CEP operator
hi
I got it. Thanks!
Best
Ben
Kien Truong 于2019年1月23日周三 下午10:31写道:
> Hi,
>
> As of Flink 1.7, the savepoint should not be deleted until after the
> first checkpoint has been successfully taken.
>
>
>
>
Hi Ufuk,
One more update: I tried copying all the hadoop native `.so` files (mainly
`libhadoop.so`) into `/lib` and am I still experiencing the issue I
reported. I also tried naively adding the `.so` files to the jar with the
flink application and am still experiencing the issue I reported
Hi Ufuk,
Two updates:
1. As suggested in the ticket, I naively copied the every `.so` in
`hadoop-3.0.0/lib/native/` into `/lib/` and this did not seem to help. My
knowledge of how shared libs get picked up is hazy, so I'm not sure if
blindly copying them like that should work. I did check what
Thanks Gordon,
I get the same exception in the JM logs and that looks like it's causing the
job failure.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Thanks for the logs.
Is the job restore actually failing? If yes, there should be an exception
for the exact cause of the failure.
Otherwise, the AvroSerializer warnings in the taskmanager logs is actually
expected behaviour when restoring from savepoint versions before 1.7.x, and
shouldn't
There is not much in the log as this immediately happens when I start the
job. I attached one of the taskmanager logs. The first error message I see
is /Could not read a requested serializer. Replaced with a
UnloadableDummyTypeSerializer./ and the exception is
taskmanager.log
+1 for trimming the size by default and offering the fat distribution as
alternative download
On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann wrote:
> Ufuk's proposal (having a lean default release and a user convenience
> tarball) sounds good to me. That way advanced users won't be bothered by
Hi Kien Truong,
Thank you for your answer. I have another question, please !
If I count the number of messages processed for a given key j (denoted c_j), is
there a way to retrieve max{c_j}, min{c_j}?
Thanks
De : Kien Truong [mailto:duckientru...@gmail.com]
Envoyé : mercredi 23 janvier 2019
Hi,
Thanks for reporting this.
Could you provide more details (error message, exception stack trace) that
you are getting?
This is unexpected, as the changes to flink-avro serializers in 1.7.x
should be backwards compatible.
More details on how the restore failed will be helpful here.
Cheers,
Hi,
This quite confusing.
I submitted the same stateless job twice (actually I upload it once).
However when I place a message on kafka, it seems that both jobs consumes
it, and publish the same result (we publish the result to other kafka
topic, so I actually see the massage duplicated on kafka
Ufuk's proposal (having a lean default release and a user convenience
tarball) sounds good to me. That way advanced users won't be bothered by an
unnecessarily large release and new users can benefit from having many
useful extensions bundled in one tarball.
Cheers,
Till
On Wed, Jan 23, 2019 at
Hi Nhan,
Logically, the total number of processed events before an event cannot
be accurately calculated unless events processing are synchronized.
This is not scalable, so naturally I don't think Flink supports it.
Although, I suppose you can get an approximate count by using a
non-keyed
Hi Oliver,
Try replacing Global Window with a KeyedProcessFunction.
Store all the item received between CalcStart and CalcEnd inside a
ListState the process them when CalcEnd is received.
Regards,
Kien
On 1/17/2019 1:06 AM, Oliver Buckley-Salmon wrote:
Hi,
I have a Flink job where I
On Wed, Jan 23, 2019 at 11:01 AM Timo Walther wrote:
> I think what is more important than a big dist bundle is a helpful
> "Downloads" page where users can easily find available filesystems,
> connectors, metric repoters. Not everyone checks Maven central for
> available JAR files. I just saw
Hi,
As of Flink 1.7, the savepoint should not be deleted until after the
first checkpoint has been successfully taken.
https://ci.apache.org/projects/flink/flink-docs-release-1.7/release-notes/flink-1.7.html#savepoints-being-used-for-recovery
Regards,
Kien
On 1/23/2019 6:57 PM, Ben Yan
Hello People,
I'm conducting a study for my PhD about applications using data stream
processing, and I would like to investigate de following questions:
- How test and validate a data stream software?
- Is there specific testing frameworks, tools, or testing environments?
- What are
I am trying to migrate from Flink 1.6.3 to 1.7.1 but am not able to restore
the job from a savepoint taken in 1.6.3.
We are using an AsyncFunction to publish Avro records to SQS. The state for
the AsyncWaitOperator cannot be restored because of serializer changes in
flink-avro from 1.6.3 to
Hi,
Let's say I have flink Kafka consumer read from 3 topics , [ T-1 ,T-2,T-3 ]
.
- T1 and T2 are having partitions equal to 100
- T3 is having partitions equal to 60
- Flink Task (parallelism is 50)
How flink will prioritize Kafka topic ?
If T-3 has more lag than other topics will flink
hi:
Can I delete this savepoint directory immediately after the job resumes
running from the savepoint directory?
Best
Ben
Then common way is to read in the cdc .writing generic operator wont be
easy .
On Wed, Jan 23, 2019 at 12:45 PM Manjusha Vuyyuru
wrote:
> But 'JDBCInputFormat' will exit once its done reading all data.I need
> something like which keeps polling to mysql and fetch if there are any
> updates or
Hi Gary,
Thanks for your support.
I use flink 1.7.0. I will try to test without that -n.
Here below are the JM log (on server .82) and TM log (on server .88). I'm
sorry that I missed that TM log before asking, had a thought that it would
not relevant. I just fixed the issue with connection to
Hi Averell,
What Flink version are you using? Can you attach the full logs from JM and
TMs? Since Flink 1.5, the -n parameter (number of taskmanagers) should be
omitted unless you are in legacy mode [1].
> As per that screenshot, it looks like there are 2 tasks manager still
> running (one on
+1 for Stephan's suggestion. For example, SQL connectors have never been
part of the main distribution and nobody complained about this so far. I
think what is more important than a big dist bundle is a helpful
"Downloads" page where users can easily find available filesystems,
connectors,
I think this is very hard to build in a generic way.
The common approach here would be to get access to the changelog stream of
the table, writing it to a message queue / event log (like Kafka, Pulsar,
Kinesis, ...) and ingesting the changes from the event log into a Flink
application.
You can of
I like the idea of a leaner binary distribution. At the same time I
agree with Jamie that the current binary is quite convenient and
connection speeds should not be that big of a deal. Since the binary
distribution is one of the first entry points for users, I'd like to
keep it as user-friendly as
Windowing and triggering on a keyed stream is done independently for each
key. So for each key, your custom trigger is observing when the lunumState
changes from null to a production cycle number, but it will never change
again -- because only those stream elements with the same key will be
There are some points where a leaner approach could help.
There are many libraries and connectors that are currently being adding to
Flink, which makes the "include all" approach not completely feasible in
long run:
- Connectors: For a proper experience with the Shell/CLI (for example for
SQL)
36 matches
Mail list logo