Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

2017-05-17 Thread Jia Teoh
I'll take a look into the ProcessFunction, thanks for the suggestion. -Jia On Wed, May 17, 2017 at 12:33 AM, Tzu-Li (Gordon) Tai wrote: > Hi Jia, > > Actually just realized you can access the timestamp of records via the > more powerful `ProcessFunction` [1]. > That’ll be

Re: [BUG?] Cannot Load User Class on Local Environment

2017-05-17 Thread Matt
Check the repo at [1]. The important step which I think is what you missed is running an Ignite node on your computer so the Java code, which launches an Ignite client on the JVM, connects to it and executes Flink on that node on a local environment. Be aware "peerClassLoadingEnabled" should be

Re: [BUG?] Cannot Load User Class on Local Environment

2017-05-17 Thread Matt
Thanks for your help Till. I will create a self contained test case in a moment and send you the link, wait for it. Cheers, Matt On Wed, May 17, 2017 at 4:38 AM, Till Rohrmann wrote: > Hi Matt, > > alright, then we have to look into it again. I tried to run your example,

FlinkKafkaConsumer using Kafka-GroupID?

2017-05-17 Thread Valentin
Hi there, As far as I understood, Flink Kafka Connectors don’t use the consumer group management feature from Kafka. Here the post I got the info from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-group-question-td8185.html#none

Re: FlinkCEP latency/throughput

2017-05-17 Thread Dean Wampler
On Wed, May 17, 2017 at 10:34 AM, Kostas Kloudas < k.klou...@data-artisans.com> wrote: > Hello Alfred, > > As a first general remark, Flink was not optimized for multicore > deployments > but rather for distributed environments. This implies overheads > (serialization, > communication etc), when

Re: FlinkCEP latency/throughput

2017-05-17 Thread Kostas Kloudas
Hello Alfred, As a first general remark, Flink was not optimized for multicore deployments but rather for distributed environments. This implies overheads (serialization, communication etc), when compared to libs optimized for multicores. So there may be libraries that are better optimized for

Re: Question about jobmanager.web.upload.dir

2017-05-17 Thread Chesnay Schepler
I don't know why we delete it either, but my guess is that at one point this was a temporary directory that we properly cleaned up, and later allowed it to be configurable. Currently, this directory must be in the local file system. We could change it to also allow non-local paths, which

Re: Question about jobmanager.web.upload.dir

2017-05-17 Thread Timo Walther
Hey Mauro, I'm not aware of any reason for that. I loop in Chesnay, maybe he knows why. @Chesnay wouldn't it be helpful to also archive the jars using the HistoryServer? Timo Am 17.05.17 um 12:31 schrieb Mauro Cortellazzi: Hi Flink comunity, is there a particular reason to delete the

Re: Stateful streaming question

2017-05-17 Thread Flavio Pompermaier
Hi to all, there are a lot of useful discussion points :) I'll try to answer to everybody. @Ankit: - right now we're using Parquet on HDFS to store thrift objects. Those objects are essentially structured like - key - alternative_key - list of tuples (representing the

Re: State in Custom Tumble Window Class

2017-05-17 Thread rizhashmi
Thanks Walther, I am not keeping my window forever.if the event is arrived after graced period(lateness) i update event time to current time or say last event time. That essentially solve infinite issue. 1.3 is not stable yet? -- View this message in context:

Re: Timer fault tolerance in Flink

2017-05-17 Thread Kostas Kloudas
Hi Rahul, The timers are fault tolerant and their timestamp is the absolute value of when to fire. This means that if you are at time t = 10 and you register a timer “10 ms from now”, the timer will have a firing timestamp of 20. This is checkpointed, so the “new machine” that takes over the

Timer fault tolerance in Flink

2017-05-17 Thread Rahul Kavale
I am looking at timers in apache flink and wanted to confirm if the timers in flink are fault tolerant. eg. when a timer registered with processFunction, of say 20 sec is running on a node and after 15 seconds (since the timer started), the node failed for some reason. Does flink guarantee that

Question about jobmanager.web.upload.dir

2017-05-17 Thread Mauro Cortellazzi
Hi Flink comunity, is there a particular reason to delete the jobmanager.web.upload.dir content when the JobManager restart? Could it be interesting have the jar previously uploaded when JM restart? Could the jars be saved to HDFS for HA mode? Thank you for your support. Mauro

Bushy plan execution

2017-05-17 Thread Mathias Eriksen Otkjær
Hi We have attempted to create a bushy logical plan for Flink, but we are not certain whether it actually gets executed in parallel or in a linear fashion inside Flink (we are certain that it works, as we get the same results now as we did using SQL in the table API). How can we confirm that

Re: Stateful streaming question

2017-05-17 Thread Kostas Kloudas
Hi Flavio, For setting the retries, unfortunately there is no such setting yet and, if I am not wrong, in case of a failure of a request, an exception will be thrown and the job will restart. I am also including Till in the thread as he may know better. For consistency guarantees and

Re: State in Custom Tumble Window Class

2017-05-17 Thread Timo Walther
Hi, in general, a class level variable is not managed by Flink if it is not defined as state or the function does not implemented ListCheckpointed interface. Allowing infinite lateness also means that your window content has to be stored infinitely. I'm not sure if I understand your

Re: [BUG?] Cannot Load User Class on Local Environment

2017-05-17 Thread Till Rohrmann
Hi Matt, alright, then we have to look into it again. I tried to run your example, however, it does not seem to be self-contained. Using Ignite 2.0.0 with -DIGNITE_QUIET=false -Xms512m the Ignite object seems to be stuck in Ignite#start. In the logs I see the following warning: May 17, 2017

Re: Stateful streaming question

2017-05-17 Thread Fabian Hueske
Hi Ankit, just a brief comment on the batch job is easier than streaming job argument. I'm not sure about that. I can see that just the batch job might seem easier to implement, but this is only one part of the whole story. The operational side of using batch is more complex IMO. You need a tool

Re: Using Kafka 0.10.x timestamps as a record value in Flink Streaming

2017-05-17 Thread Tzu-Li (Gordon) Tai
Hi Jia, Actually just realized you can access the timestamp of records via the more powerful `ProcessFunction` [1]. That’ll be a bit easier to use than implementing your own custom operator, which is quite low-level. Cheers, Gordon [1] 

Re: Flink issue

2017-05-17 Thread Till Rohrmann
Hi Francisco, is there anything suspicious in the logs files of the Taskmanager and the JobManager? Cheers, Till On Tue, May 16, 2017 at 5:49 PM, Francisco Alves wrote: > Hi, > > I'm running flink in a yarn session, and when a try to cancel a running > program with parameters

Re: Kafka 0.10.x event time with windowing

2017-05-17 Thread Jia Teoh
Hi Gordon, Thank you for the clarification. In that case, it sounds like I should be using a custom extractor to emit watermarks in addition to the Kafka timestamps. I see that the doc already has code demonstrating how to emit the timestamps so I will start there. Thanks, Jia On Tue, May 16,