Re: Efficiency with different approaches of aggregation in Flink

2018-04-18 Thread Puneet Kinra
Hi Teena If you are proceeding with point 3, no doubt it will add some overhead but major significance is that you are persisting the state as per some key. so there will not be data loss in case of the job failure. On Thu, Apr 19, 2018 at 11:45 AM, Teena Kappen // BPRISE < teena.kap...@bprise.

Efficiency with different approaches of aggregation in Flink

2018-04-18 Thread Teena Kappen // BPRISE
Hi, If I have to aggregate a value in a stream of records, which one of the below approaches will be the most/least efficient? 1. Using a Global Window to aggregate the value and emit the record when it reaches a particular threshold value. 2. Using a FlatMap with a State Variable which

Confusing debug level log output with Flink 1.5

2018-04-18 Thread Ken Krugler
Hi Till, I just saw https://issues.apache.org/jira/browse/FLINK-9215 I’ve been trying out 1.5, and noticed similar output in my logs, e.g. 18/04/18 17:33:47 DEBUG slotpool.SlotPool:751 - Releasing slot with slot request id 2c9fba45bd28940c4650

Re: Help with OneInputStreamOperatorTestHarness

2018-04-18 Thread Chris Schneider
Hi Ted, I should have written that we’re using Flink 1.4.0. Thanks for the suggestion re: FLINK-8268 ; it could well be the issue (though the pull request appears fairly complex so I’ll need som

Help with OneInputStreamOperatorTestHarness

2018-04-18 Thread Chris Schneider
Hi Gang, I’m having trouble getting my streaming unit test to work. The following code: @Test public void testDemo() throws Throwable { OneInputStreamOperatorTestHarness testHarness = new KeyedOneInputStreamOperatorTestHarness( new StreamFlatMap<>(new

debug for Flink

2018-04-18 Thread Qian Ye
Hi I’m wondering if new debugging methods/tools are urgent for Flink development. I know there already exists some debug methods for Flink, e.g., remote debugging of flink clusters(https://cwiki.apache.org/confluence/display/FLINK/Remote+Debugging+of+Flink+Clusters

Re: FlinkML

2018-04-18 Thread Christophe Salperwyck
Hi, You could try to plug MOA/Weka library too. I did some preliminary work with that: https://moa.cms.waikato.ac.nz/moa-with-apache-flink/ but then it is not anymore FlinkML algorithms. Best regards, Christophe 2018-04-18 21:13 GMT+02:00 shashank734 : > There are no active discussions or gui

Re: why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-18 Thread Yan Zhou [FDS Science]
nvm, I figure it out. The event is not process once it's arrived. It's registered to processed in event time. It make sense. best Yan From: Yan Zhou [FDS Science] Sent: Wednesday, April 18, 2018 12:56:58 PM To: Fabian Hueske Cc: user Subject: Re: why doesn't t

Re: why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-18 Thread Yan Zhou [FDS Science]
Hi Fabian, Thanks for the reply. I think here is the problem. Currently, the timestamp of an event is compared with previous processed element's timestamp, instead of watermark, to determine if it's late. To my understanding, even the order of emitted event in preceding operator is perfec

Re: FlinkML

2018-04-18 Thread shashank734
There are no active discussions or guide on that. But I found this example in the repo : https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/ml/IncrementalLearningSkeleton.java

Re: FlinkML

2018-04-18 Thread Christophe Jolif
Szymon, The short answer is no. See: http://mail-archives.apache.org/mod_mbox/flink-user/201802.mbox/%3ccaadrtt39ciiec1uzwthzgnbkjxs-_h5yfzowhzph_zbidux...@mail.gmail.com%3E On Mon, Apr 16, 2018 at 11:25 PM, Szymon Szczypiński wrote: > Hi, > > i wonder if there are possibility to build FlinkML

Re: Tracking deserialization errors

2018-04-18 Thread Elias Levy
Either proposal would work. In the later case, at a minimum we'd need a way to identify the source within the metric. The basic error metric would then allow us to go into the logs to determine the cause of the error, as we already record the message causing trouble in the log. On Mon, Apr 16,

Re: Substasks - Uneven allocation

2018-04-18 Thread Ken Krugler
Hi Pedro, That’s interesting, and something we’d like to be able to control as well. I did a little research, and it seems like (with some stunts) there could be a way to achieve this via CoLocationConstraint

Substasks - Uneven allocation

2018-04-18 Thread PedroMrChaves
Hello, I have a job that has one async operational node (i.e. implements AsyncFunction). This Operational node will spawn multiple threads that perform heavy tasks (cpu bound). I have a Flink Standalone cluster deployed on two machines of 32 cores and 128 gb of RAM, each machine has one task man

Re: State-machine-based search logic in Flink ?

2018-04-18 Thread Fabian Hueske
As I said before, this is work in progress and there is a pending pull request (PR) to add this feature. So no, MATCH_RECOGNIZE is not supported by Flink yet and hence also not documented. Best, Fabian 2018-04-18 10:12 GMT+02:00 Esa Heikkinen : > Hi > > > > I did mean like “finding series of con

Jars uploaded to jobmanager are deleted but not free'ed by OS

2018-04-18 Thread Jeroen Steggink | knowsy
Sorry, I meant the jobmanager, not the taskmanager. On 18-Apr-18 15:44, Jeroen Steggink | knowsy wrote: Hi, I'm having some troubles running the Flink taskmanager in a Docker container (OpenShift). The container's internal storage is filling up because the deleted jar files in blob storage a

Jars uploaded to taskmanager are deleted but not free'ed by OS

2018-04-18 Thread Jeroen Steggink | knowsy
Hi, I'm having some troubles running the Flink taskmanager in a Docker container (OpenShift). The container's internal storage is filling up because the deleted jar files in blob storage are probably still in use and therefore resources are not free'ed. We are using Apache Beam to start an A

Re: Tracking deserialization errors

2018-04-18 Thread Alexander Smirnov
ouch, i forgot to mention I opened https://issues.apache.org/jira/browse/FLINK-9155 to track this. Should it be a duplicate of 9204 then? On Wed, Apr 18, 2018 at 3:32 PM Tzu-Li (Gordon) Tai wrote: > Hi, > > These are valid concerns. And yes, AFAIK users have been writing to logs > within the des

Re: Flink & Kafka multi-node config

2018-04-18 Thread Tzu-Li (Gordon) Tai
Hi, The partition-to-subtask assignment of is not locality aware. There were discussions to expose functionality for custom user-defined assignment methods, which it might be possible to leverage that for a locality aware assignment. Unfortunately, this feature is not implemented, yet. The rrela

Outputting the content of in flight session windows

2018-04-18 Thread jelmer
I defined a session window and I would like to write the contents of the window to storage before the window closes Initially I was doing this by setting a CountTrigger.of(1) on the session window. But this leads to very frequent writes. To remedy this i switched to a ContinuousProcessingTimeTrig

Re: Flink job testing with

2018-04-18 Thread Tzu-Li (Gordon) Tai
Hi, The docs here [1] provide some example snippets of using the Kafka connector to consume from / write to Kafka topics. Once you consumed a `DataStream` from a Kafka topic using the Kafka consumer, you can use Flink transformations such as map, flatMap, etc. to perform processing on the records

Re: Tracking deserialization errors

2018-04-18 Thread Tzu-Li (Gordon) Tai
Hi, These are valid concerns. And yes, AFAIK users have been writing to logs within the deserialization schema to track this. The connectors as of now have no logging themselves in case of a skipped record. I think we can implement both logging and metrics to track this, most of which you have

Re: Managing state migrations with Flink and Avro

2018-04-18 Thread Timo Walther
Thank you. Maybe we already identified the issue (see https://issues.apache.org/jira/browse/FLINK-9202). I will use your code to verify it. Regards, Timo Am 18.04.18 um 14:07 schrieb Petter Arvidsson: Hi Timo, Please find the generated class (for the second schema) attached. Regards, Pette

Re: Managing state migrations with Flink and Avro

2018-04-18 Thread Petter Arvidsson
Hi Timo, Please find the generated class (for the second schema) attached. Regards, Petter On Wed, Apr 18, 2018 at 11:32 AM, Timo Walther wrote: > Hi Petter, > > could you share the source code of the class that Avro generates out of > this schema? > > Thank you. > > Regards, > Timo > > Am 18.

Re: CaseClassSerializer and/or TraversableSerializer may still not be threadsafe?

2018-04-18 Thread Stefan Richter
Hi, I agree that this looks like a serializer is shared between two threads, one of them being the event processing loop. I am doubting that the problem is with the async fs backend, because there is code in place that will duplicate all serializers for the async snapshot thread and this is als

Re: rest.port is reset to 0 by YarnEntrypointUtils

2018-04-18 Thread Dongwon Kim
Hi Gary, Thanks a lot for replay. Hope the issue is resolved soon. I have a suggestion regarding the rest port. Considering the role of dispatcher, it needs to have its own port range that is not shared by job managers spawned by dispatcher. If I understand FLIP-6 correctly, only a few dispatche

Re: rest.port is reset to 0 by YarnEntrypointUtils

2018-04-18 Thread Gary Yao
Hi Dongwon, I think the rationale was to avoid conflicts between multiple Flink instances running on the same YARN cluster. There is a ticket that proposes to allow configuring a port range instead [1]. Best, Gary [1] https://issues.apache.org/jira/browse/FLINK-5758 On Tue, Apr 17, 2018 at 9:56

Re: Managing state migrations with Flink and Avro

2018-04-18 Thread Timo Walther
Hi Petter, could you share the source code of the class that Avro generates out of this schema? Thank you. Regards, Timo Am 18.04.18 um 11:00 schrieb Petter Arvidsson: Hello everyone, I am trying to figure out how to set up Flink with Avro for state management (especially the content of s

Managing state migrations with Flink and Avro

2018-04-18 Thread Petter Arvidsson
Hello everyone, I am trying to figure out how to set up Flink with Avro for state management (especially the content of snapshots) to enable state migrations (e.g. adding a nullable fields to a class). So far, in version 1.4.2, I tried to explicitly provide an instance of "new AvroTypeInfo(Accumul

RE: State-machine-based search logic in Flink ?

2018-04-18 Thread Esa Heikkinen
Hi I did mean like “finding series of consecutive events”, as it was described in [2]. Are these features already in Flink and how well they are documented ? Can I use Scala or only Java ? I would like some example codes, it they are exist ? Best, Esa From: Fabian Hueske Sent: Tuesday, Apri

Re: why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-18 Thread Fabian Hueske
The over window operates on an unbounded stream of data. Hence it is not possible to sort the complete stream. Instead we can sort ranges of the stream. Flink uses watermarks to define these ranges. The operator processes the records in timestamp order that are not late, i.e., have timestamps larg

Re: assign time attribute after first window group when using Flink SQL

2018-04-18 Thread Fabian Hueske
This sounds like a windowed join between the raw stream and the aggregated stream. It might be possible to do the "lookup" in the second raw stream with another windowed join. If not, you can fall back to the DataStream API / ProcessFunction and implement the lookup logic as you need it. Best, Fab