Amazon Athena

2017-05-30 Thread Madhukar Thota
Anyone used used Amazon Athena with Apache Flink? I have use case where I want to write streaming data ( which is in Avro format) from kafka to s3 by converting into parquet format and update S3 location with daily partitions on Athena table. Any guidance is appreciated.

HTTP listener source

2017-05-30 Thread Madhukar Thota
Hi As anyone implemented HTTP listener in flink source which acts has a Rest API to receive JSON payload via Post method and writes to Kafka or kinesis or any sink sources. Any guidance or sample snippet will be appreciated.

Re: Porting batch percentile computation to streaming window

2017-05-30 Thread William Saar
Nice! The solution is actually starting to look quite clean with this in place. Finally, does Flink offer functionality to retrieve information about the current window that a rich function is running on? I don't see anything in the RuntimeContext classes about the current window... As you point

Does job restart resume from last known internal checkpoint?

2017-05-30 Thread Moiz S Jinia
In a checkpointed Flink job will doing a graceful restart make it resume from last known internal checkpoint? Or are all checkpoints discarded when the job is stopped? If discarded, what will be the resume point? Moiz

Re: Kafka partitions -> task slots? (keyed stream)

2017-05-30 Thread Moiz S Jinia
I have just 1 job (that has a ProcessFunction with timers). You're saying that giving more task slots to my job than the number of partitions on the source topic is not going to help. This implies that 1 partition cannot be assigned to more than 1 task slot. That makes sense as otherwise ordering

Re: Kafka partitions -> task slots? (keyed stream)

2017-05-30 Thread Stefan Richter
Hi, it is not restricting the parallelism of your job. Only increasing the parallelism of your Job’s sources to more than 5 will not bring any improvements. All other operators could still benefit from a higher parallelism. > Am 30.05.2017 um 09:49 schrieb Moiz S Jinia : > > For a keyed stream

Re: Problems submitting Flink to Yarn with Kerberos

2017-05-30 Thread Dominique Rondé
Hi Gordon, we use Flink Flink 1.2.0 bundled with Hadoop 2.6 and Scala 2.11 build on 2017-02-02. Cheers Dominique Am 30.05.2017 um 16:31 schrieb Tzu-Li (Gordon) Tai: > Hi Dominique, > > Could you tell us the version / build commit of Flink that you’re using? > > Cheers, > Gordon > > > On 30 May

Re: Problems submitting Flink to Yarn with Kerberos

2017-05-30 Thread Tzu-Li (Gordon) Tai
Hi Dominique, Could you tell us the version / build commit of Flink that you’re using? Cheers, Gordon On 30 May 2017 at 4:29:08 PM, Dominique Rondé (dominique.ro...@allsecur.de) wrote: Hi folks, I just become into the need to bring Flink into a yarn system, that is configured with kerberos.

Problems submitting Flink to Yarn with Kerberos

2017-05-30 Thread Dominique Rondé
Hi folks, I just become into the need to bring Flink into a yarn system, that is configured with kerberos. According to the documentation, I changed the flink.conf.yaml like that: security.kerberos.login.use-ticket-cache: true security.kerberos.login.contexts: Client I know that providing a keyt

Re: Porting batch percentile computation to streaming window

2017-05-30 Thread Gyula Fóra
I think you could actually do a window operation to get the tDigestStream from windowMetricsByIp: windowMetricsByIp.allWindow(SameWindowAsTumblingTimeWindow).fold(...) This way the watermark mechanism should ensure you get all partial results before flushing the global window. Gyula William Saa

Re: Porting batch percentile computation to streaming window

2017-05-30 Thread William Saar
> This logic now assumes that you get the TDigest result before getting any groupBy metric, which will probably not be the case so you could do some custom buffering in state. Depending on the rate of the stream this might or might not be feasible :) Unfortunately, I think this assumption is a dea

Re: State in Custom Tumble Window Class

2017-05-30 Thread rhashmi
Thanks Aljoscha Krettek, So the results will not be deterministic for late events. For idempotent update, i would need to find an additional key base of current event time if they are late and attached to the aggregator which probably possible by doing some function(maxEventTime, actualEventTime)

Re: How can I increase Flink managed memory?

2017-05-30 Thread Nico Kruber
By default, Flink allocates a fraction of 0.7 (taskmanager.memory.fraction) of the free memory (total memory configured via taskmanager.heap.mb minus memory used for network buffers) for its managed memory. An absolute value may be set using taskmanager.memory.size (overrides the fraction parame

[ANNOUNCE] Flink Forward Berlin (11-13 Sep 2017) Call for Submissions is open now

2017-05-30 Thread Robert Metzger
Dear Flink Community, The Call for Submissions for Flink Forward Berlin 2017 is open now! Since we believe in collaboration, participation and exchange of ideas, we are inviting the Flink community to submit a session. Share your knowledge, applications, use cases and best practices and shape the

Re: Porting batch percentile computation to streaming window

2017-05-30 Thread Gyula Fóra
Hi William, I think basically the feature you are looking for are side inputs which is not implemented yet but let me try to give a workaround that might work. If I understand correctly you have two windowed computations: TDigestStream = allMetrics.windowAll(...).reduce() windowMetricsByIP = allM

Re: Gelly and degree filtering

2017-05-30 Thread Nico Kruber
Does Martin's answer to a similar thread help? https://lists.apache.org/thread.html/ 000af2fb17a883b60f4a2359ebbeca42e3160c2167a88995c2ee28c2@ %3Cuser.flink.apache.org%3E On Monday, 29 May 2017 19:38:20 CEST Martin Junghanns wrote: > Hi Ali :) > > You could compute the degrees beforehand (e.g. u

Re: No Alerts with FinkCEP

2017-05-30 Thread Dawid Wysakowicz
Hi Biplob, The message you mention should not be a problem here. It just says you can't use your events as POJOs (e.g. you can't use keyBy("chargedAccount") ). Your code seems fine and without some example data I think it will be hard to help you. As for the PART 2 of your first email. In 1.3 we

Re: New "Powered by Flink" success case

2017-05-30 Thread Rosellini, Luca
Thank you Fabian! *KEEDIO* *Luca Rosellini* *+34 667 24 38 57 <+34%20667%2024%2038%2057>* *www.keedio.com * C/ Virgilio 25, Pozuelo de Alarcón 2017-05-27 0:08 GMT+02:00 Fabian Hueske : > Hi Luca, > > thanks for sharing this exciting use case! > I added KEEDIO to Flink

Kafka partitions -> task slots? (keyed stream)

2017-05-30 Thread Moiz S Jinia
For a keyed stream (where the key is also the message key in the source kafka topic), is the parallelism of the job restricted to the number of partitions in the topic? Source topic has 5 partitions, but available task slots are 12. (3 task managers each with 4 slots) Moiz

Re: Duplicated data when using Externalized Checkpoints in a Flink Highly Available cluster

2017-05-30 Thread F.Amara
Hi Gordan, Thanks alot for the reply. The events are produced using a KafkaProducer, submitted to a topic and thereby consumed by the Flink application using a FlinkKafkaConsumer. I verified that during a failure recovery scenario(of the Flink application) the KafkaProducer was not interrupted, r