Re: S3 recovery and checkpoint directories exhibit explosive growth

2017-07-20 Thread prashantnayak
Wanted to add - we took some stack traces and memory dumps... will post them or send them to you, but the stack trace indicates that the appmaster is spending a lot of time in the AWS s3 library trying to list a S3 directory (recovery?) Thanks Prashant -- View this message in context: http://a

Re: S3 recovery and checkpoint directories exhibit explosive growth

2017-07-20 Thread prashantnayak
Hi Xiaogang and Stephan Thank you for your response. Sorry about the delay in responding (was traveling): We've been trying to figure out what triggers this - but your points about master not being able to delete files "in time" seems to be correct We've been test out two different environ

Re: Window Function on AllWindowed Stream - Combining Kafka Topics

2017-07-20 Thread Meera
We couldn't put the map phase in between working with stream transformation classes and it created a dangling Mapper - but doing partitioner/tranformation with the window operator worked. WindowOperator operator = ... KeyGroupStreamPartitioner partitioner = new KeyGroupStreamPartitioner(new Dim

Re: Flink Anomaly Detection

2017-07-20 Thread Suneel Marthi
FWIW, We have a built a similar Log Aggregator internally using Apache Nifi + KFC stack (KFC = Kafka, Flink, Cassandra) Using Apache NiFi for ingesting logs from Openstack via rsyslog and writing them out to Kafka topics -> Flink Streaming + CEP for detecting anomalous patterns -> persist the pat

RE: Flink Anomaly Detection

2017-07-20 Thread Branham, Jeremy [IT]
Raj - I'm looking for the same thing. As the ML library doesn't support DataStream api, I'm tossing ideas around maybe using the windowing function to build up a model that changes over time. Jeremy D. Branham Technology Architect - Sprint O: +1 (972) 405-2970 | M: +1 (817) 791-1627 jeremy.d.br

Flink Anomaly Detection

2017-07-20 Thread Raj Kumar
Hi, I don't see much discussion on Anomaly detection using Flink. we are working on a project where we need to monitor the server logs in real time. If there is any sudden spike in the number of transactions(Unusual), server errors, we need to create an alert. 1. How can we best achieve this? 2.

Re: Detached execution API

2017-07-20 Thread nragon
Yes, something like that. Although, this would be an MVP for this purpose which has minimal impacts on next releases. Moreover, I think clients are too specific to be included as API because each user will have their own way to implement a stop, start, or whatever (of course a base client is welcom

Re: Detached execution API

2017-07-20 Thread Aljoscha Krettek
Hi, I think you might find those two issues interesting: - https://issues.apache.org/jira/browse/FLINK-2313: Change Streaming Driver Execution Model - https://issues.apache.org/jira/browse/FLINK-4272: Create a JobClient for job control and monitoring Best, Aljoscha > On 20. Jul 2017, at 17:51

Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Aljoscha Krettek
You said you cancel and restart the job. How do you then restart the Job? From a savepoint or externalised checkpoint? Do you also see missing data when using an externalised checkpoint or a savepoint? Best, Aljoscha > On 20. Jul 2017, at 16:15, Francisco Blaya > wrote: > > Forgot to add tha

Detached execution API

2017-07-20 Thread nragon
It would be nice to let users deploy detached jobs through api. For instance> *StreamExecutionEnviroment* public JobExecutionResult execute() throws Exception { return execute(DEFAULT_JOB_NAME, false); } Which keep backward compatibility public abstract JobExecutionResult execute(Strin

Re: Kafka control source in addition to Kafka data source

2017-07-20 Thread Gabriele Di Bernardo
Hi Konstantin, Thank you so much for your answer. Yes, I think this is exactly what I need. Thank you. Best, Gabriele > On 18 Jul 2017, at 21:27, Konstantin Knauf > wrote: > > Hi Gabriele, > > I think this is actually a quite common pattern. Generally, you can > `join` the two streams a

Re: Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Francisco Blaya
Forgot to add that when a job gets cancelled via the UI (this is not the case when the Yarn session is killed) a part file ending in ".pending" does appear in S3, but that never seems to be promoted to finished upon restart of the job On 20 July 2017 at 11:41, Francisco Blaya wrote: > Hi, > > Th

Re: global window trigger

2017-07-20 Thread Aljoscha Krettek
Hi, Yes, you can have state in a WindowFunction if you use Flink’s state abstraction that you can access from a RichWindowFunction using the RuntimeContext. (Or by using a ProcessWindowFunction). Trigger purging behaviour makes a difference if the Trigger fires repeatedly before the watermark

Azkaban Job Type Plugin for Flink

2017-07-20 Thread Yann Pauly
Hi all, We want to integrate our Flink instances with our Azkaban scheduler. For that we will have to create a custom job type plugin for Azkaban. Has anyone already started creating something like that ? If not... I guess we will try to ! Best, Yann

Re: Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Francisco Blaya
Hi, Thanks for your answers. @Fabian. I can see that Flink's consumer commits the offsets to Kafka, no problem there. However I'd expect everything that gets read from Kafka to appear in S3 at some point, even if the job gets stopped/killed before flushing and then restarted. And that's what is n

Re: global window trigger

2017-07-20 Thread Aljoscha Krettek
Hi, I’m afraid this will not work well because a WindowAssigner should be stateless, i.e. it should not keep any state in fields. The reason is that there can be several WindowAssigners used on the different partitions and the order in which a WindowAssigner sees the incoming elements is also n

Re: cannot use ElasticsearchSink in Flink1.3.0

2017-07-20 Thread Tzu-Li (Gordon) Tai
Hi, There was an issue with release ES 5 in 1.3.0, and the artifacts were not released to Maven central. Please use 1.3.1 instead. Cheers, Gordon On 20 July 2017 at 3:31:39 PM, ZalaCheung (gzzhangdesh...@corp.netease.com) wrote: Hi all, I am using Flink 1.3.0 and following instructions here

cannot use ElasticsearchSink in Flink1.3.0

2017-07-20 Thread ZalaCheung
Hi all, I am using Flink 1.3.0 and following instructions here to add elasticsearch as a sink. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/elasticsearch.html I follow