Re: Parallel CEP

2019-01-23 Thread Dian Fu
I'm afraid you cannot do that. The inputs having the same key should be processed by the same CEP operator. Otherwise the results will be nondeterministic and also be wrong. Regards, Dian > 在 2019年1月24日,下午2:56,dhanuka ranasinghe 写道: > > In this example key will be same. I am using 1 million

Re: Parallel CEP

2019-01-23 Thread dhanuka ranasinghe
In this example key will be same. I am using 1 million messages with same key for performance testing. But still I want to process them parallel. Can't I use Split function and get a SplitStream for that purpose? On Thu, Jan 24, 2019 at 2:49 PM Dian Fu wrote: > Hi Dhanuka, > > Does the

Re: Parallel CEP

2019-01-23 Thread Dian Fu
Hi Dhanuka, Does the KeySelector of Event::getTriggerID generate the same key for all the inputs or only generate very few key values and these key values happen to be hashed to the same downstream operator? You can print the results of Event::getTriggerID to check if it's that case. Regards,

[Flink 1.7.0] initial failures with starting high-parallelism job without checkpoint/savepoint

2019-01-23 Thread Steven Wu
When we start a high-parallelism (1,600) job without any checkpoint/savepoint, the job struggled to be deployed. After a few restarts, it eventually got deployed and was running fine after the initial struggle. jobmanager was very busy. Web UI was very slow. I saw these two exceptions/failures

Re: Parallel CEP

2019-01-23 Thread Dian Fu
Whether using KeyedStream depends on the logic of your job, i.e, whether you are looking for patterns for some partitions, i.e, patterns for a particular user. If so, you should partition the input data before the CEP operator. Otherwise, the input data should not be partitioned. Regards, Dian

Re: Flink CEP : Doesn't generate output

2019-01-23 Thread Dian Fu
Hi Dhanuka, From the code you shared, it seems that you're using event time. The processing of elements is triggered by watermark in event time and so you should define how to generate the watermark, i.e with DataStream.assignTimestampsAndWatermarks Regards, Dian > 在

Re: Parallel CEP

2019-01-23 Thread Dian Fu
Hi Dhanuka, In order to make the CEP operator to run parallel, the input stream should be KeyedStream. You can refer [1] for detailed information. Regards, Dian [1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns > 在 2019年1月24日,上午10:18,dhanuka

Re: Flink CEP : Doesn't generate output

2019-01-23 Thread dhanuka ranasinghe
Thank you for the clarification. On Thu, 24 Jan 2019, 12:44 Dian Fu Hi Dhanuka, > > From the code you shared, it seems that you're using event time. The > processing of elements is triggered by watermark in event time and so you > should define how to generate the watermark, i.e with >

Re: Parallel CEP

2019-01-23 Thread dhanuka ranasinghe
Hi Dian, I tried that but then kafkaproducer only produce to single partition and only single flink host working while rest not contribute for processing . I will share the code and screenshot Cheers Dhanuka On Thu, 24 Jan 2019, 12:31 Dian Fu Hi Dhanuka, > > In order to make the CEP operator

Re: When can the savepoint directory be deleted?

2019-01-23 Thread Ben Yan
hi I got it. Thanks! Best Ben Kien Truong 于2019年1月23日周三 下午10:31写道: > Hi, > > As of Flink 1.7, the savepoint should not be deleted until after the > first checkpoint has been successfully taken. > > > >

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-23 Thread Aaron Levin
Hi Ufuk, One more update: I tried copying all the hadoop native `.so` files (mainly `libhadoop.so`) into `/lib` and am I still experiencing the issue I reported. I also tried naively adding the `.so` files to the jar with the flink application and am still experiencing the issue I reported

Re: `env.java.opts` not persisting after job canceled or failed and then restarted

2019-01-23 Thread Aaron Levin
Hi Ufuk, Two updates: 1. As suggested in the ticket, I naively copied the every `.so` in `hadoop-3.0.0/lib/native/` into `/lib/` and this did not seem to help. My knowledge of how shared libs get picked up is hazy, so I'm not sure if blindly copying them like that should work. I did check what

Re: Trouble migrating state from 1.6.3 to 1.7.1

2019-01-23 Thread pwestermann
Thanks Gordon, I get the same exception in the JM logs and that looks like it's causing the job failure. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Trouble migrating state from 1.6.3 to 1.7.1

2019-01-23 Thread Tzu-Li (Gordon) Tai
Thanks for the logs. Is the job restore actually failing? If yes, there should be an exception for the exact cause of the failure. Otherwise, the AvroSerializer warnings in the taskmanager logs is actually expected behaviour when restoring from savepoint versions before 1.7.x, and shouldn't

Re: Trouble migrating state from 1.6.3 to 1.7.1

2019-01-23 Thread pwestermann
There is not much in the log as this immediately happens when I start the job. I attached one of the taskmanager logs. The first error message I see is /Could not read a requested serializer. Replaced with a UnloadableDummyTypeSerializer./ and the exception is taskmanager.log

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Thomas Weise
+1 for trimming the size by default and offering the fat distribution as alternative download On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann wrote: > Ufuk's proposal (having a lean default release and a user convenience > tarball) sounds good to me. That way advanced users won't be bothered by

RE: [Flink 1.6] How to get current total number of processed events

2019-01-23 Thread Thanh-Nhan Vo
Hi Kien Truong, Thank you for your answer. I have another question, please ! If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}? Thanks De : Kien Truong [mailto:duckientru...@gmail.com] Envoyé : mercredi 23 janvier 2019

Re: Trouble migrating state from 1.6.3 to 1.7.1

2019-01-23 Thread Tzu-Li (Gordon) Tai
Hi, Thanks for reporting this. Could you provide more details (error message, exception stack trace) that you are getting? This is unexpected, as the changes to flink-avro serializers in 1.7.x should be backwards compatible. More details on how the restore failed will be helpful here. Cheers,

getting duplicate messages from duplicate jobs

2019-01-23 Thread Avi Levi
Hi, This quite confusing. I submitted the same stateless job twice (actually I upload it once). However when I place a message on kafka, it seems that both jobs consumes it, and publish the same result (we publish the result to other kafka topic, so I actually see the massage duplicated on kafka

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Till Rohrmann
Ufuk's proposal (having a lean default release and a user convenience tarball) sounds good to me. That way advanced users won't be bothered by an unnecessarily large release and new users can benefit from having many useful extensions bundled in one tarball. Cheers, Till On Wed, Jan 23, 2019 at

Re: [Flink 1.6] How to get current total number of processed events

2019-01-23 Thread Kien Truong
Hi Nhan, Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized. This is not scalable, so naturally I don't think Flink supports it. Although, I suppose you can get an approximate count by using a non-keyed

Re: How to trigger a Global Window with a different Message from the window message

2019-01-23 Thread Kien Truong
Hi Oliver, Try replacing Global Window with a KeyedProcessFunction. Store all the item received between CalcStart and CalcEnd inside a ListState the process them when CalcEnd is received. Regards, Kien On 1/17/2019 1:06 AM, Oliver Buckley-Salmon wrote: Hi, I have a Flink job where I

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Ufuk Celebi
On Wed, Jan 23, 2019 at 11:01 AM Timo Walther wrote: > I think what is more important than a big dist bundle is a helpful > "Downloads" page where users can easily find available filesystems, > connectors, metric repoters. Not everyone checks Maven central for > available JAR files. I just saw

Re: When can the savepoint directory be deleted?

2019-01-23 Thread Kien Truong
Hi, As of Flink 1.7, the savepoint should not be deleted until after the first checkpoint has been successfully taken. https://ci.apache.org/projects/flink/flink-docs-release-1.7/release-notes/flink-1.7.html#savepoints-being-used-for-recovery Regards, Kien On 1/23/2019 6:57 PM, Ben Yan

How test and validate a data stream software?

2019-01-23 Thread Alexandre Strapacao Guedes Vianna
Hello People, I'm conducting a study for my PhD about applications using data stream processing, and I would like to investigate de following questions: - How test and validate a data stream software? - Is there specific testing frameworks, tools, or testing environments? - What are

Trouble migrating state from 1.6.3 to 1.7.1

2019-01-23 Thread pwestermann
I am trying to migrate from Flink 1.6.3 to 1.7.1 but am not able to restore the job from a savepoint taken in 1.6.3. We are using an AsyncFunction to publish Avro records to SQS. The state for the AsyncWaitOperator cannot be restored because of serializer changes in flink-avro from 1.6.3 to

How Flink prioritise read from kafka topics and partitions ?

2019-01-23 Thread sohimankotia
Hi, Let's say I have flink Kafka consumer read from 3 topics , [ T-1 ,T-2,T-3 ] . - T1 and T2 are having partitions equal to 100 - T3 is having partitions equal to 60 - Flink Task (parallelism is 50) How flink will prioritize Kafka topic ? If T-3 has more lag than other topics will flink

When can the savepoint directory be deleted?

2019-01-23 Thread Ben Yan
hi: Can I delete this savepoint directory immediately after the job resumes running from the savepoint directory? Best Ben

Re: Flink Jdbc streaming source support in 1.7.1 or in future?

2019-01-23 Thread Puneet Kinra
Then common way is to read in the cdc .writing generic operator wont be easy . On Wed, Jan 23, 2019 at 12:45 PM Manjusha Vuyyuru wrote: > But 'JDBCInputFormat' will exit once its done reading all data.I need > something like which keeps polling to mysql and fetch if there are any > updates or

Re: No resource available error while testing HA

2019-01-23 Thread Averell
Hi Gary, Thanks for your support. I use flink 1.7.0. I will try to test without that -n. Here below are the JM log (on server .82) and TM log (on server .88). I'm sorry that I missed that TM log before asking, had a thought that it would not relevant. I just fixed the issue with connection to

Re: No resource available error while testing HA

2019-01-23 Thread Gary Yao
Hi Averell, What Flink version are you using? Can you attach the full logs from JM and TMs? Since Flink 1.5, the -n parameter (number of taskmanagers) should be omitted unless you are in legacy mode [1]. > As per that screenshot, it looks like there are 2 tasks manager still > running (one on

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Timo Walther
+1 for Stephan's suggestion. For example, SQL connectors have never been part of the main distribution and nobody complained about this so far. I think what is more important than a big dist bundle is a helpful "Downloads" page where users can easily find available filesystems, connectors,

Re: Flink Jdbc streaming source support in 1.7.1 or in future?

2019-01-23 Thread Fabian Hueske
I think this is very hard to build in a generic way. The common approach here would be to get access to the changelog stream of the table, writing it to a message queue / event log (like Kafka, Pulsar, Kinesis, ...) and ingesting the changes from the event log into a Flink application. You can of

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Ufuk Celebi
I like the idea of a leaner binary distribution. At the same time I agree with Jamie that the current binary is quite convenient and connection speeds should not be that big of a deal. Since the binary distribution is one of the first entry points for users, I'd like to keep it as user-friendly as

Re: GlobalWindow with custom tigger doesn't work correctly

2019-01-23 Thread David Anderson
Windowing and triggering on a keyed stream is done independently for each key. So for each key, your custom trigger is observing when the lunumState changes from null to a production cycle number, but it will never change again -- because only those stream elements with the same key will be

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Stephan Ewen
There are some points where a leaner approach could help. There are many libraries and connectors that are currently being adding to Flink, which makes the "include all" approach not completely feasible in long run: - Connectors: For a proper experience with the Shell/CLI (for example for SQL)