Flink Kafka connector08 not updating the offsets with the zookeeper

2016-10-11 Thread Anchit Jatana
Hi All, I'm using Flink Kafka connector08. I need to check/monitor the offsets of the my flink application's kafka consumer. When running this: bin/kafka-consumer-groups.sh --zookeeper --describe --group I get the message: No topic available for consumer group provided. Why is the consumer no

Re: Listening to timed-out patterns in Flink CEP

2016-10-11 Thread Sameer W
Try this. Your WM's need to move forward. Also don't use System Timestamp. Use the timestamp of the element seen as the reference as the elements are most likely lagging the system timestamp. DataStream withTimestampsAndWatermarks = tuples .assignTimestampsAndWatermarks(new AssignerWithPer

Re: Listening to timed-out patterns in Flink CEP

2016-10-11 Thread David Koch
Hello, I tried setting the watermark to System.currentTimeMillis() - 5000L, event timestamps are System.currentTimeMillis(). I do not observe the expected behaviour of the PatternTimeoutFunction firing once the watermark moves past the timeout "anchored" by a pattern match. Here is the complete t

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-11 Thread Sameer W
This is one of my challenges too- 1. The JavaScript rules are only applicable inside one operator (next, followedBy, notFollowedBy). And the JavaScript rules can apply to only the event in that operator. I make it a little more dynamic by creating a Rules HashMap and add rules with the names "firs

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-11 Thread lgfmt
Hi Sameer, I just replied to the earlier post, but I will copy it here: We also have the same requirement - we want to allow the user to change the matching patterns and have them take effect immediately. I'm wondering whether the proposed trigger DSL takes us one step closer:(I don't think it so

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-11 Thread lgfmt
We also have the same requirement - we want to allow the user to change the matching patterns and have them take effect immediately. I'm wondering whether the proposed trigger DSL takes us one step closer:(I don't think it solves the problem) or we have to dynamically generate Flink job JAR file

mapreduce.HadoopOutputFormat config value issue

2016-10-11 Thread Shannon Carey
In Flink 1.1.1, I am seeing what looks like a serialization issue of org.apache.hadoop.conf.Configuration or when used with org.apache.flink.api.scala.hadoop.mapreduce.HadoopOutputFormat. When I use the mapred.HadoopOutputFormat version, it works just fine. Specifically, the job fails because "

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-11 Thread Sameer W
I have used a JavaScript engine in my CEP to evaluate my patterns. Each event is a list of named attributes (HashMap like). And event is attached to a list of rules expressed as JavaScript code (See example below with one rule but I can match as many rules). The rules are distributed over a connec

What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-11 Thread PedroMrChaves
Hello, I am new to Apache Flink and am trying to build a CEP using Flink's API. One of the requirements is the ability to add/change patterns at runtime for anomaly detection (maintaining the systems availability). Any Ideas of how could I do that? For instance, If I have a stream of security eve

Re: Listening to timed-out patterns in Flink CEP

2016-10-11 Thread Sameer W
Assuming an element with timestamp which is later than the last emitted watermark arrives, would it just be dropped because the PatternStream does not have a max allowed lateness method? In that case it appears that CEP cannot handle late events yet out of the box. If we do want to support late ev

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
Ah ok great, thanks! I will try upgrading sometime this week then. Cheers, Josh On Tue, Oct 11, 2016 at 5:37 PM, Stephan Ewen wrote: > Hi Josh! > > I think the master has gotten more stable with respect to that. The issue > you mentioned should be fixed. > > Another big set of changes (the last

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Stephan Ewen
Hi Josh! I think the master has gotten more stable with respect to that. The issue you mentioned should be fixed. Another big set of changes (the last big batch) is going in in the next days - this time for re-sharding timers (window operator) and other state that is not organized by key. If you

Re: CEP and slightly out of order elements

2016-10-11 Thread Sameer W
Thanks Till - This is helpful to know. Sameer On Tue, Oct 11, 2016 at 12:20 PM, Till Rohrmann wrote: > Hi Sameer, > > the CEP operator will take care of ordering the elements. > > Internally what happens is that the elements are buffered before being > applied to the state machine. The operator

Re: Listening to timed-out patterns in Flink CEP

2016-10-11 Thread David Koch
I will give it a try, my current time/watermark assigner extends AscendingTimestampExtractor so I can't override setting the watermark to the last seen event timestamp. Thanks for your replies. /David On Tue, Oct 11, 2016 at 6:17 PM, Till Rohrmann wrote: > But then no element later than the la

Re: CEP and slightly out of order elements

2016-10-11 Thread Till Rohrmann
Hi Sameer, the CEP operator will take care of ordering the elements. Internally what happens is that the elements are buffered before being applied to the state machine. The operator only applies the elements after it has seen a watermark which is greater than the timestamps of the elements being

Re: Listening to timed-out patterns in Flink CEP

2016-10-11 Thread Till Rohrmann
But then no element later than the last emitted watermark must be issued by the sources. If that is the case, then this solution should work. Cheers, Till On Tue, Oct 11, 2016 at 4:50 PM, Sameer W wrote: > Hi, > > If you know that the events are arriving in order and a consistent lag, > why not

Re: PathIsNotEmptyDirectoryException in Namenode HDFS log when using Jobmanager HA in YARN

2016-10-11 Thread Stephan Ewen
Hi! I think to some extend this is expected. There is some cleanup code that deletes files and then issues parent directory remove requests. It relies on the fact that the parent directory is only removed if it is empty (after the last file was deleted). Is this a problem right now, or just a co

Re: more complex patterns for CEP (was: CEP two transitions to the same state)

2016-10-11 Thread lgfmt
Thanks, Till. I will wait for your response. - LF From: Till Rohrmann To: user@flink.apache.org; lg...@yahoo.com Sent: Tuesday, October 11, 2016 2:49 AM Subject: Re: more complex patterns for CEP (was: CEP two transitions to the same state) The timeline is hard to predict to be

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
Hi Stephan, Thanks, that sounds good! I'm planning to upgrade to Flink 1.2-SNAPSHOT as soon as possible - I was delaying upgrading due to the issues with restoring operator state you mentioned on my other thread here: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-job-f

PathIsNotEmptyDirectoryException in Namenode HDFS log when using Jobmanager HA in YARN

2016-10-11 Thread static-max
Hi, I get many (multiple times per minute) errors in my Namenode HDFS logfile: 2016-10-11 17:17:07,596 INFO ipc.Server (Server.java:logException(2401)) - IPC Server handler 295 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from datanode1:34872 Call#2361 Retry#0 org.apache.h

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Stephan Ewen
Hi Josh! There are two ways to improve the RocksDB / S3 behavior (1) Use the FullyAsync mode. It stores the data in one file, not in a directory. Since directories are the "eventual consistent" part of S3, this prevents many issues. (2) Flink 1.2-SNAPSHOT has some additional fixes that circumven

Re: Listening to timed-out patterns in Flink CEP

2016-10-11 Thread Sameer W
Hi, If you know that the events are arriving in order and a consistent lag, why not just increment the watermark time every time the getCurrentWatermark() method is invoked based on the autoWatermarkInterval (or less to be conservative). You can check if the watermark has changed since the arriva

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
Hi Aljoscha, Yeah I'm using S3. Is this a known problem when using S3? Do you have any ideas on how to restore my job from this state, or prevent it from happening again? Thanks, Josh On Tue, Oct 11, 2016 at 1:58 PM, Aljoscha Krettek wrote: > Hi, > you are using S3 to store the checkpoints, r

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
Thank you Fabian and Flavio for your help. Best, Yassine 2016-10-11 14:02 GMT+02:00 Flavio Pompermaier : > I posted a workaround for that at https://github.com/okkam-it/ > flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/ > datasourcemanager/importers/Csv2RowExample.java >

re: About Sliding window

2016-10-11 Thread Zhangrucong
Hi Kostas: Thank you for your rapid response! My use-case is that : For every incoming event, we want to age the out-of-date event , count the event in window and send the result. For example: The events are coming as flowing: [cid:image002.png@01D22401.7DD230E0] We want flowing result: [cid:im

Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream

2016-10-11 Thread Aljoscha Krettek
Hi, are you windowing based on event time? Cheers, Aljoscha On Fri, 7 Oct 2016 at 09:28 Fabian Hueske wrote: > If you are using time windows, you can access the TimeWindow parameter of > the WindowFunction.apply() method. > The TimeWindow contains the start and end timestamp of a window (as Lon

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Aljoscha Krettek
Hi, you are using S3 to store the checkpoints, right? It might be that you're running into a problem with S3 "directory listings" not being consistent. Cheers, Aljoscha On Tue, 11 Oct 2016 at 12:40 Josh wrote: Hi all, I just have a couple of questions about checkpointing and restoring state f

Re: Handling decompression exceptions

2016-10-11 Thread Flavio Pompermaier
I posted a workaround for that at https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java On 11 Oct 2016 1:57 p.m., "Fabian Hueske" wrote: > Hi, > > Flink's String parser does not support escaped quotes.

Re: Handling decompression exceptions

2016-10-11 Thread Fabian Hueske
Hi, Flink's String parser does not support escaped quotes. You data contains a double double quote (""). The parser identifies this as the end of the string field. As a workaround, you can read the file as a regular text file, line by line and do the parsing in a MapFunction. Best, Fabian 2016-1

CEP and slightly out of order elements

2016-10-11 Thread Sameer W
Hi, If using CEP with event-time I have events which can be slightly out of order and I want to sort them by timestamp within their time-windows before applying CEP- For example, if using 5 second windows and I use the following ds2 = ds.keyBy.window(TumblingWindow(10 seconds).apply(/*Sort by Ti

Re: About Sliding window

2016-10-11 Thread Kostas Kloudas
Hi Zhangrucong, Sliding windows only support time-based slide. So your use-case is not supported out-of-the-box. But, if you describe a bit more what you want to do, we may be able to find a way together to do your job using the currently offered functionality. Kostas > On Oct 11, 2016, at 1

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
Forgot to add parseQuotedStrings('"'). After adding it I'm getting the same exception with the second code too. 2016-10-11 13:29 GMT+02:00 Yassine MARZOUGUI : > Hi Fabian, > > I tried to debug the code, and it turns out a line in my csv data is > causing the ArrayIndexOutOfBoundsException, here i

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
Hi Fabian, I tried to debug the code, and it turns out a line in my csv data is causing the ArrayIndexOutOfBoundsException, here is the exception stacktrace: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.flink.types.parser.StringParser.parseField(StringParser.java:49) at org.apache.f

About Sliding window

2016-10-11 Thread Zhangrucong
Hello everyone: Now, I am want to use DataStream sliding window API. I look at the API and I have a question, dose the sliding time window support sliding by every incoming event? Thanks in advance!

Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
Hi all, I just have a couple of questions about checkpointing and restoring state from RocksDB. 1) In some cases, I find that it is impossible to restore a job from a checkpoint, due to an exception such as the one pasted below[*]. In this case, it appears that the last checkpoint is somehow co

Re: Listening to timed-out patterns in Flink CEP

2016-10-11 Thread Till Rohrmann
Hi David, the problem is still that there is no corresponding watermark saying that 4 seconds have now passed. With your code, watermarks will be periodically emitted but the same watermark will be emitted until a new element arrives which will reset the watermark. Thus, the system can never know

Re: jdbc.JDBCInputFormat

2016-10-11 Thread Alberto Ramón
I will check it this nigth Thanks 2016-10-11 11:24 GMT+02:00 Timo Walther : > I have opened a PR (https://github.com/apache/flink/pull/2619). Would be > great if you could try it and comment if it solves you problem. > > Timo > > Am 10/10/16 um 17:48 schrieb Timo Walther: > > I could reproduce t

Re: Create stream from multiple files using "env.readFile(format, input1, FileProcessingMode.PROCESS_CONTINUOUSLY, 1000, FilePathFilter.createDefaultFilter())" ?

2016-10-11 Thread Aljoscha Krettek
Hi, how does "doesn't work" manifest? Cheers, Aljoscha On Wed, 28 Sep 2016 at 22:54 Anchit Jatana wrote: > Hi All, > > I have a use case where in need to create multiple source streams from > multiple files and monitor the files for any changes using the " > FileProcessingMode.PROCESS_CONTINUOU

Re: more complex patterns for CEP (was: CEP two transitions to the same state)

2016-10-11 Thread Till Rohrmann
The timeline is hard to predict to be honest. It depends a little bit on how fast the community can proceed with these things. At the moment I'm personally involved in other issues and, thus, cannot work on the CEP library. I hope to get back to it soon. Cheers, Till On Sat, Oct 8, 2016 at 12:42

Re: jdbc.JDBCInputFormat

2016-10-11 Thread Timo Walther
I have opened a PR (https://github.com/apache/flink/pull/2619). Would be great if you could try it and comment if it solves you problem. Timo Am 10/10/16 um 17:48 schrieb Timo Walther: I could reproduce the error locally. I will prepare a fix for it. Timo Am 10/10/16 um 11:54 schrieb Alberto

Re: Handling decompression exceptions

2016-10-11 Thread Fabian Hueske
Hi Yassine, I ran your code without problems and got the correct result. Can you provide the Stacktrace of the Exception? Thanks, Fabian 2016-10-10 10:57 GMT+02:00 Yassine MARZOUGUI : > Thank you Fabian and Stephan for the suggestions. > I couldn't override "readLine()" because it's final, so w

"Slow ReadProcessor" warnings when using BucketSink

2016-10-11 Thread static-max
Hi, I have a low throughput job (approx. 1000 messager per Minute), that consumes from Kafka und writes directly to HDFS. After an hour or so, I get the following warnings in the Task Manager log: 2016-10-10 01:59:44,635 WARN org.apache.hadoop.hdfs.DFSClient - Slow ReadProcessor

Re: Current alternatives for async I/O

2016-10-11 Thread Fabian Hueske
Hi Ken, I think your solution should work. You need to make sure though, that you properly manage the state of your function, i.e., memorize all records which have been received but haven't be emitted yet. Otherwise records might get lost in case of a failure. Alternatively, you can implement thi