Re: Python API not working

2018-01-04 Thread Yassine MARZOUGUI
Hi all, Any ideas on this? 2017-12-15 15:10 GMT+01:00 Yassine MARZOUGUI : > Hi Ufuk, > > Thanks for your response. Unfortunately specifying 'streaming` or `batch` > doesn't work, it looks like mode should be either "plan" or "operator" , > and t

Re: Python API not working

2017-12-15 Thread Yassine MARZOUGUI
lebi : > Hey Yassine, > > let me include Chesnay (cc'd) who worked on the Python API. > > I'm not familiar with the API and what it expects, but try entering > `streaming` or `batch` for the mode. Chesnay probably has the details. > > – Ufuk > > >

Python API not working

2017-12-15 Thread Yassine MARZOUGUI
Hi All, I'm trying to use Flink with the python API, and started with the wordcount exemple from the Documentation. I'm using Flink 1.4 and python 2.7. When running env.execute(local=True), the comm

Re: notNext() and next(negation) not yielding same output in Flink CEP

2017-07-24 Thread Yassine MARZOUGUI
be “c a1 a2 a3 d” which is what > you expect, I think. In the other version with notNext(A) a partial match > “c a1” will be discarded after “a2” as the notNext says that after the A’s > there should be no A. > > I hope this helps understanding how notNext works. > > Regards, >

notNext() and next(negation) not yielding same output in Flink CEP

2017-07-22 Thread Yassine MARZOUGUI
Hi all, I would like to match the maximal consecutive sequences of events of type A in a stream. I'm using the following : Pattern.begin("start").where(event is not A) .next("middle").where(event is A).oneOrMore().consecutive() .next("not").where(event is not A) I This give the output I want.

Re: Behaviour of the BucketingSink when checkpoints fail

2017-04-28 Thread Yassine MARZOUGUI
t you know are correct > to “final”? > > Best, > Aljoscha > > On 28. Apr 2017, at 11:22, Yassine MARZOUGUI > wrote: > > Hi all, > > I'm have a failed job containing a BucketingSink. The last successful > checkpoint was before the source started emitting data.

Behaviour of the BucketingSink when checkpoints fail

2017-04-28 Thread Yassine MARZOUGUI
Hi all, I'm have a failed job containing a BucketingSink. The last successful checkpoint was before the source started emitting data. The following checkpoints all failed due to the long timeout as I mentioned here : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-v

Re: Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
obmanager_heap_mb: 1024 >> >> Basic program structure: >> >> 1) read batch from Kinesis >> >> 2) Split batch and shuffle using custom partitioner (consistent hashing). >> >> 3) enrich using external REST service >> >> 4) Write to database

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine Marzougui
Hi all, I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future actions on the stream using ProcessFunction and processing time timers (elements a

Fwd: Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
-- Forwarded message -- From: "Yassine MARZOUGUI" Date: Apr 23, 2017 20:53 Subject: Checkpoints very slow with high backpressure To: Cc: Hi all, I have a Streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby

Re: Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Im sorry guys if you received multiple instances of this mail, I kept trying to send it yesterday, but looks like the mailing list was stuck and didn't dispatch it until now. Sorry for the disturb. On Apr 23, 2017 20:53, "Yassine MARZOUGUI" wrote: > Hi all, > > I have

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Hi all, I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future actions on the stream using ProcessFunction and processing time timers (elements a

Re: Flink Checkpoint runs slow for low load stream

2017-04-24 Thread Yassine MARZOUGUI
Hi all, I am experiencing a similar problem but with HDFS as a source instead of Kafka. I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future ac

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Re-sending as it looks like the previous mail wasn't correctly sent --- Hi all, I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI

Checkpoints very slow with high backpressure

2017-04-23 Thread Yassine Marzougui

Checkpoints very slow with high backpressure

2017-04-23 Thread Yassine MARZOUGUI
Hi all, I have a Streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future actions on the stream using ProcessFunction and processing time timers (elements a

Re: Windows emit results at the end of the stream

2017-03-24 Thread Yassine MARZOUGUI
Hi Sonex, I don't known well Scala as I know Java, but I guess it should be correct if no error is raised. The behaviour you described seems wierd to me and should not happen. I'm unfortunately unable to identify an apparent cause, maybe someone in the mailing list can shed a light on that. Best,

Re: Windows emit results at the end of the stream

2017-03-23 Thread Yassine MARZOUGUI
Hi Sonex, When using readTextFile(...) with event time, only one watermark with the value Long.MAX_VALUE is sent at the end of the stream, which explais why the windows are stored until the whole file is processed. In order to have periodic watermarks, you need to process the file continuousely as

ProcessingTimeTimer in ProcessFunction after a savepoint

2017-03-17 Thread Yassine MARZOUGUI
Hi all, How does the processing time timer behave when a job is taken down with a savepoint and then restarted after the timer was supposed to fire? Will the timer fire at restart because it was missed during the savepoint? I'm wondering because I would like to schedule periodic timers in the fut

Re: Appropriate State to use to buffer events in ProcessFunction

2017-03-16 Thread Yassine MARZOUGUI
ds are used. But given that the state may grow up to ~100GB, > RocksDB statebackends are strongly recommended. > > May the information helps you. > > Regards, > Xiaogang > > 2017-03-09 23:19 GMT+08:00 Yassine MARZOUGUI : > >> Hi Timo, >> >> I thought abo

Re: Late Events with BoundedOutOfOrdernessTimestampExtractor and allowed lateness

2017-03-15 Thread Yassine MARZOUGUI
Hi Nico, You might check Fabian's answer on a similar question I posted previousely on the mailing list, it can be helpful : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/BoundedOutOfOrdernessTimestampExtractor-and-allowedlateness-td9583.html Best, Yassine On Mar 15, 2017 1

Re: Question about processElement(...) and onTimer(...) in ProcessFunction

2017-03-15 Thread Yassine MARZOUGUI
red by > watermarks which are internally handled as records) and I'm pretty sure > that the behavior is the same for processing time timers. > @Kostas (in CC) please correct me if I'm wrong here. > > Best, Fabian > > 2017-03-14 8:04 GMT+01:00 Yassine MARZOUGUI : &

Question about processElement(...) and onTimer(...) in ProcessFunction

2017-03-14 Thread Yassine MARZOUGUI
Hi all, In ProcessFuction, does processElement() still get called on incoming elements when onTimer() is called, or are elements buffered until onTimer() returns? I am wondering because both processElement() and onTimer() can access and manipulate the state, so if for example state.clear() is call

Re: Appropriate State to use to buffer events in ProcessFunction

2017-03-09 Thread Yassine MARZOUGUI
has > arrived. > If you use a RocksDB as state backend, 100+ GB of state should not be a > problem. Have you thought about using Flink's CEP library? It might fit to > your needs without implementing a custom process function. > > I hope that helps. > > Timo > > &g

Appropriate State to use to buffer events in ProcessFunction

2017-03-07 Thread Yassine MARZOUGUI
Hi all, I want to label events in a stream based on a condition on some future events. For example my stream contains events of type A and B and and I would like to assign a label 1 to an event E of type A if an event of type B happens within a duration x of E. I am using event time and my events

Re: AsyncIO/QueryableStateClient hanging with high parallelism

2017-03-06 Thread Yassine MARZOUGUI
nce I inreased the parallelism) the job simply doesn't finish. I solved the problem by using resultFuture.get()which araised the appropriate exceptions when they happens and failed the job. Best, Yassine 2017-03-06 15:53 GMT+01:00 Yassine MARZOUGUI : > Hi all, > > I set up a job with sim

AsyncIO/QueryableStateClient hanging with high parallelism

2017-03-06 Thread Yassine MARZOUGUI
Hi all, I set up a job with simple queryable state sink and tried to query it from another job using the new Async I/O API. Everything worked as expected, except when I tried to increase the parallelism of the querying job it hanged. As you can see in the attached image, when the parallism is 5 (e

OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory

2017-03-03 Thread Yassine MARZOUGUI
Hi all, I tried starting a local Flink 1.2.0 cluster using start-local.sh, with the following settings for the taskmanager memory: taskmanager.heap.mb: 16384 taskmanager.memory.off-heap: true taskmanager.memory.preallocate: true That throws and OOM error: Caused by: java.lang.Exception: OutOfMem

Re: Http Requests from Flink

2017-03-02 Thread Yassine MARZOUGUI
Hi Ulf, I've done HTTP requests in Flink using OkHttp library . I found it practical and easy to use. Here is how I used it to make a POST request for each incoming element in the stream and ouput the response: DataStream stream = stream.map(new RichMapFunct

Re: Flink requesting external web service with rate limited requests

2017-02-28 Thread Yassine MARZOUGUI
Hi Fabian, I have a related question regarding throttling at the source: If there is a sleep in the source as in ContinuousFileMonitoringFunction.java

RE: Aggregation problem.

2017-02-18 Thread Yassine MARZOUGUI
Hi, I think this is an expected output and not necessarily a bug. To get the element having the maximum value, maxBy() should be used instead of max(). See this answer for more details : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-t

Re: JavaDoc 404

2017-02-14 Thread Yassine MARZOUGUI
release-1.2/api/java/org/apache/flink/streaming/connectors/fs/bucketing/ BucketingSink.html The build for master should be done in 30 minutes. On Wed, Feb 8, 2017 at 10:49 AM, Yassine MARZOUGUI < y.marzou...@mindlytix.com> wrote: > Thanks Robert and Ufuk for the update. > > 2017-02-07

Re: Flink 1.2 Maven dependency

2017-02-09 Thread Yassine MARZOUGUI
Hi, I coud find the dependency here : https://search.maven.org/#artifactdetails%7Corg.apache.flink%7Cflink-core%7C1.2.0%7Cjar , I wonder why it still doesn't show in http://mvnrepository.com/ artifact/org.apache.flink/flink-core. The dependency version for Flink 1.2 is 1.2.0. org.apache.fli

Re: JavaDoc 404

2017-02-08 Thread Yassine MARZOUGUI
asap. Sorry for the inconvenience. >> >> On Mon, Feb 6, 2017 at 4:43 PM, Ufuk Celebi wrote: >> >>> Thanks for reporting this. I think Robert (cc'd) is working in fixing >>> this, correct? >>> >>> On Sat, Feb 4, 2017 at 12:12 PM, Yassine

JavaDoc 404

2017-02-04 Thread Yassine MARZOUGUI
Hi, The JavaDoc link of BucketingSink in this page[1] yields to a 404 error. I couldn't find the correct url. The broken link : https://ci.apache.org/projects/flink/flink-docs- master/api/java/org/apache/flink/streaming/connectors/fs/ bucketing/BucketingSink.html Other pages in the JavaDoc, like

Re: Externalized Checkpoints vs Periodic Checkpoints

2017-02-02 Thread Yassine MARZOUGUI
rdinator only retains the last > successfully completed checkpoint. This means that whenever a new > checkpoint completes then the last completed checkpoint will be discarded. > This also applies to externalized checkpoints. > > Cheers, > Till > > On Wed, Feb 1, 2017 at 2:03 PM, Y

Externalized Checkpoints vs Periodic Checkpoints

2017-02-01 Thread Yassine MARZOUGUI
Hi all, Could someone clarify the difference between externalized checkpoints[1] and regular periodic checkpoints[2]? Moreover, I have a question regarding the retention of checkpoints: For regular checkpoints, does the last checkpoint discard the previous ones? If yes, is that the case too for th

Re: Rate-limit processing

2017-01-20 Thread Yassine MARZOUGUI
Hi, You might find this similar thread from the mailing list archive helpful : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/throttled-stream-td6138.html . Best, Yassine 2017-01-20 10:53 GMT+01:00 Florian König : > Hi, > > i need to limit the rate of processing in a Flink

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Yassine MARZOUGUI
. @Kostas, If you haven't already started working on a fix for this, I would happily contribute a fix for it if you like. Best, Yassine 2017-01-09 17:23 GMT+01:00 Yassine MARZOUGUI : > Hi Kostas, > > I debugged the code and the nestedFileEnumeration parameter was always > true during

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Yassine MARZOUGUI
rameter is still true? > > I am asking in order to pin down if the problem is in the way we ship the > code to the tasks or in reading the > nested files. > > Thanks, > Kostas > > On Jan 9, 2017, at 12:56 PM, Yassine MARZOUGUI > wrote: > > Hi, > > Any upd

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Yassine MARZOUGUI
Hi, Any updates on this issue? Thank you. Best, Yassine On Dec 20, 2016 6:15 PM, "Aljoscha Krettek" wrote: +kostas, who probably has the most experience with this by now. Do you have an idea what might be going on? On Fri, 16 Dec 2016 at 15:45 Yassine MARZOUGUI wrote: > Looks

Re: Triggering a saveppoint failed the Job

2017-01-05 Thread Yassine MARZOUGUI
/ > jira/browse/FLINK-5407 > > We'll look into it as part of the 1.2 release testing. If you have any > more details that may help diagnose/fix that, would be great if you could > share them with us. > > Thanks, > Stephan > > > On Wed, Jan 4, 2017 at 10:52 AM, Y

Triggering a saveppoint failed the Job

2017-01-04 Thread Yassine MARZOUGUI
Hi all, I tried to trigger a savepoint for a streaming job, both the savepoint and the job failed. The job failed with the following exception: java.lang.RuntimeException: Error while triggering checkpoint for IterationSource-7 (1/1) at org.apache.flink.runtime.taskmanager.Task$3.run(Tas

Re: Continuous File monitoring not reading nested files

2016-12-16 Thread Yassine MARZOUGUI
Looks like this is not specific to the continuous file monitoring, I'm having the same issue (files in nested directories are not read) when using: env.readFile(fileInputFormat, "hdfs:///shared/mydir", FileProcessingMode.PROCESS_ONCE, -1L) 2016-12-16 11:12 GMT+01:00 Yassine MARZOU

Continuous File monitoring not reading nested files

2016-12-16 Thread Yassine MARZOUGUI
Hi all, I'm using the following code to continuously process files from a directory "mydir". final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); FileInputFormat fileInputFormat = new TextInputFormat(new Path("hdfs:///shared/mydir")); fileInputFormat.setNe

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-14 Thread Yassine MARZOUGUI
o reproduce the problem. The cause is that we recently changed >> how the timer service is being cleaned up and now the watermark timers are >> not firing anymore. >> >> I'll keep you posted and hope to find a solution fast. >> >> Cheers, >> Aljoscha

Re: Incremental aggregations - Example not working

2016-12-12 Thread Yassine MARZOUGUI
0 Matt : > I'm using 1.2-SNAPSHOT, should it work in that version? > > On Mon, Dec 12, 2016 at 12:18 PM, Yassine MARZOUGUI < > y.marzou...@mindlytix.com> wrote: > >> Hi Matt, >> >> What version of Flink are you using? >> The incremental agregation

Re: Incremental aggregations - Example not working

2016-12-12 Thread Yassine MARZOUGUI
Hi Matt, What version of Flink are you using? The incremental agregation with fold(ACC, FoldFunction, WindowFunction) in a new change that will be part of Flink 1.2, for Flink 1.1 the correct way to perform incrementation aggregations is : apply(ACC, FoldFunction, WindowFunction) (see the docs for

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-11 Thread Yassine MARZOUGUI
heers, > Aljoscha > > On Mon, 5 Dec 2016 at 18:54 Robert Metzger wrote: > > I'll add Aljoscha and Kostas Kloudas to the conversation. They have the > best overview over the changes to the window operator between 1.1. and 1.2. > > On Mon, Dec 5, 2016 at 11:33 AM, Yass

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-05 Thread Yassine MARZOUGUI
I forgot to mention : the watermark extractor is the one included in Flink API. 2016-12-05 11:31 GMT+01:00 Yassine MARZOUGUI : > Hi robert, > > Yes, I am using the same code, just swithcing the version in pom.xml to > 1.2-SNAPSHOT and the cluster binaries to the compiled lastest mast

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-05 Thread Yassine MARZOUGUI
bit like the watermarks for the 1.2 code are not > generated correctly. > > Regards, > Robert > > > On Sat, Dec 3, 2016 at 9:01 AM, Yassine MARZOUGUI < > y.marzou...@mindlytix.com> wrote: > >> Hi all, >> >> With 1.1-SNAPSHOT, EventTimeSessionWindows fire

In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-03 Thread Yassine MARZOUGUI
Hi all, With 1.1-SNAPSHOT, EventTimeSessionWindows fire as soon as the windows boundaries are detected, but with 1.2-SNAPDHOT the state keeps increasing in memory and the windows results are not emitted until the whole stream is processed. Is this a temporary behaviour due to the developments in 1

Re: Sink not switched to "RUNUNG" even though a task slot is available

2016-11-23 Thread Yassine MARZOUGUI
batch mode, an operator is switching to > RUNNING once it received the first record. In your case, there are some > operations (reduce, join) before the sink that first need to consume all > data and process it. > If you wait long enough, you should see the sink to become active. &

Sink not switched to "RUNUNG" even though a task slot is available

2016-11-23 Thread Yassine MARZOUGUI
Hi all, My batch job has the follwoing plan in the end (figure attached): ​ I have a total of 32 task slots, and I have set the parallelism of the last two operators before the sink to 31. The sink parallelism is 1. The last operator before the sink is a MapOperator, so it doesn't need to buffer

Re: Csv to windows?

2016-11-07 Thread Yassine MARZOUGUI
Hi Flelix, As I see in kddcup.newtestdata_small_unlabeled_index , the first field of connectionRecords (splits[0]), is unique for each record, therefore when a

Re: Questions about FoldFunction and WindowFunction

2016-11-02 Thread Yassine MARZOUGUI
+01:00 Aljoscha Krettek : > Would you be interested in contributing a fix for that? Otherwise I'll > probably fix work on that in the coming weeks. > > On Wed, 2 Nov 2016 at 13:38 Yassine MARZOUGUI > wrote: > >> Thank you Aljoscha for your quick response. >> >

Re: Questions about FoldFunction and WindowFunction

2016-11-02 Thread Yassine MARZOUGUI
y but right at the end of this issue I'm > proposing a different solution that we'll hopefully get in for the next > release. > > Cheers, > Aljoscha > > On Wed, 2 Nov 2016 at 10:42 Yassine MARZOUGUI > wrote: > >> Hi all, >> >> I have a couple questions

Questions about FoldFunction and WindowFunction

2016-11-02 Thread Yassine MARZOUGUI
Hi all, I have a couple questions about FoldFunction and WindowFunction: 1. When using a RichFoldFunction after a window as in keyedStream.window().fold(new RichFoldFunction()), is the close() method called after each window or after all the windows for that key are fired? 2. When applying a Fol

Re: Bug: Plan generation for Unions picked a ship strategy between binary plan operators.

2016-10-25 Thread Yassine MARZOUGUI
g a few weeks a ago, but apparently the fix > did not catch all cases. > Can you please reopen FLINK-2662 and post the program to reproduce the bug > there? > > Thanks, > Fabian > > [1] https://issues.apache.org/jira/browse/FLINK-2662 > > 2016-10-25 12:33 GMT+02:00 Ya

Bug: Plan generation for Unions picked a ship strategy between binary plan operators.

2016-10-25 Thread Yassine MARZOUGUI
Hi all, My job fails with the folowing exception : CompilerException: Bug: Plan generation for Unions picked a ship strategy between binary plan operators. The exception happens when adding partitionByRange(1).sortPartition(1, Order.DESCENDING) to the union of datasets. I made a smaller version t

Re: NoClassDefFoundError on cluster with httpclient 4.5.2

2016-10-21 Thread Yassine MARZOUGUI
; Till > > On Wed, Oct 19, 2016 at 6:41 PM, Yassine MARZOUGUI < > y.marzou...@mindlytix.com> wrote: > >> Hi all, >> >> I'm using httpclient with the following dependency: >> >> >> org.apache.httpcomponents >> httpclient >> 4.5

NoClassDefFoundError on cluster with httpclient 4.5.2

2016-10-19 Thread Yassine MARZOUGUI
Hi all, I'm using httpclient with the following dependency: org.apache.httpcomponents httpclient 4.5.2 On local mode, the program works correctly, but when executed on the cluster, I get the following exception: java.lang.Exception: The user defined 'open(Configuration)' method in class org.m

Re: BoundedOutOfOrdernessTimestampExtractor and allowedlateness

2016-10-17 Thread Yassine MARZOUGUI
d. >> >> So, watermarks tell the Flink what time it is and allowed lateness tells >> the system when state should be discarded and all later arriving data be >> ignored. >> These issue are related but not exactly the same thing. For instance you >> can counter late data

BoundedOutOfOrdernessTimestampExtractor and allowedlateness

2016-10-17 Thread Yassine MARZOUGUI
Hi, I'm a bit confused about how Flink deals with late elements after the introduction of allowedlateness to windows. What is the difference between using a BoundedOutOfOrdernessTimestampExtractor(Time.seconds(X)) and allowedlateness(Time.seconds(X))? What if one is used and the other is not? and

Re: "java.net.SocketException: No buffer space available (maximum connections reached?)" when reading from HDFS

2016-10-17 Thread Yassine MARZOUGUI
;; you will get the code that is the > same as the 1.1.3 release, plus the patch to this problem. > > Greetings, > Stephan > > > On Sat, Oct 15, 2016 at 10:11 PM, Yassine MARZOUGUI < > y.marzou...@mindlytix.com> wrote: > >> Hi all, >> >> I'm readi

"java.net.SocketException: No buffer space available (maximum connections reached?)" when reading from HDFS

2016-10-15 Thread Yassine MARZOUGUI
Hi all, I'm reading a large number of small files from HDFS in batch mode (about 20 directories, each directory contains about 3000 files, using recursive.file.enumeration=true), and each time, at about 200 GB of received data, my job fails with the following exception: java.io.IOException: Error

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
g field. >> As a workaround, you can read the file as a regular text file, line by >> line and do the parsing in a MapFunction. >> >> Best, Fabian >> >> 2016-10-11 13:37 GMT+02:00 Yassine MARZOUGUI : >> >>> Forgot to add parseQuotedStrings(&

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
Forgot to add parseQuotedStrings('"'). After adding it I'm getting the same exception with the second code too. 2016-10-11 13:29 GMT+02:00 Yassine MARZOUGUI : > Hi Fabian, > > I tried to debug the code, and it turns out a line in my csv data is > causing the Arra

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
> I ran your code without problems and got the correct result. > Can you provide the Stacktrace of the Exception? > > Thanks, Fabian > > 2016-10-10 10:57 GMT+02:00 Yassine MARZOUGUI : > >> Thank you Fabian and Stephan for the suggestions. >> I couldn't overrid

Re: Handling decompression exceptions

2016-10-10 Thread Yassine MARZOUGUI
gt; CsvInputFormat and forwards all calls to the wrapped CsvIF. >> The wrapper would also catch and ignore the EOFException. >> >> If you do that, you would not be able to use the env.readCsvFile() >> shortcut but would need to create an instance of your own InputFormat and >

Handling decompression exceptions

2016-10-04 Thread Yassine MARZOUGUI
Hi all, I am reading a large number of GZip compressed csv files, nested in a HDFS directory: Configuration parameters = new Configuration(); parameters.setBoolean("recursive.file.enumeration", true); DataSet> hist = env.readCsvFile("hdfs:///shared/logs/") .ignoreFirstLine()

Re: Simple batch job hangs if run twice

2016-09-23 Thread Yassine MARZOUGUI
ete stacktrace. Most IDEs > have a feature to take a stacktrace while they are executing a program. > > 2016-09-23 11:43 GMT+02:00 Yassine MARZOUGUI : > >> Hi Fabian, >> >> Not sure if this answers your question, here is the stack I got when >> debuggin

Re: Simple batch job hangs if run twice

2016-09-23 Thread Yassine MARZOUGUI
rators.BatchTask.run(BatchTask.java:486) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) at java.lang.Thread.run(Thread.java:745) Best, Yassine 2016-09-23 11:28 GMT+02:00 Yassine MARZOUGUI : > Hi Fabian, > > Is

Re: Simple batch job hangs if run twice

2016-09-19 Thread Yassine MARZOUGUI
Flink itself. >> >> On Sep 17, 2016 15:16, "Aljoscha Krettek" wrote: >> >>> Hi, >>> when is the "first time". It seems you have tried this repeatedly so >>> what differentiates a "first time" from the other times? Are you c

Re: Simple batch job hangs if run twice

2016-09-17 Thread Yassine MARZOUGUI
> differentiates a "first time" from the other times? Are you closing your > IDE in-between or do you mean running the job a second time within the same > program? > > Cheers, > Aljoscha > > On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI > wrote: > >>

Simple batch job hangs if run twice

2016-09-09 Thread Yassine MARZOUGUI
Hi all, When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour? public class BatchJob { public static

Re: Handle deserialization error

2016-08-26 Thread Yassine Marzougui
Hi Jack, As Robert Metzger mentioned in a previous thread, there's an ongoing discussion about the issue in this JIRA: https://issues.apache.org/jira/browse/FLINK-3679. A possible workaround is to use a SimpleStringSchema in the Kafka source, and chain it with a flatMap operator where you can use

Re: Handling Kafka DeserializationSchema() exceptions

2016-08-25 Thread Yassine Marzougui
option. > There are workarounds to the issue, but I think they are all not nice. > > On Thu, Aug 25, 2016 at 8:05 PM, Yassine Marzougui > wrote: > >> Hi all, >> >> Is there a way to handle hafka deserialization exceptions, when a JSON >> message is

Handling Kafka DeserializationSchema() exceptions

2016-08-25 Thread Yassine Marzougui
Hi all, Is there a way to handle hafka deserialization exceptions, when a JSON message is malformed for example? I thought about extending the DeserializationSchema to emit a null or any other value, but that may cause an NPE when using a subsequent TimestampExtractor. The other solution would be

Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread Yassine Marzougui
Sorry I mistyped the code, it should be *timeWindow**(Time.minutes(10))* instead of *window**(Time.minutes(10)).* On Wed, Aug 24, 2016 at 9:29 PM, Yassine Marzougui wrote: > Hi subash, > > A stream is infinite, hence it has no notion of "final" count. To get > distinct cou

Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread Yassine Marzougui
Hi subash, A stream is infinite, hence it has no notion of "final" count. To get distinct counts you need to define a period (= a window [1] ) over which you count elements and emit a result, by adding a winow operator before the reduce. For example the following will emit distinct counts every 10

FLINK-4329 fix version

2016-08-23 Thread Yassine Marzougui
Hi all, The fix version of FLINK-4329 in JIRA is set to 1.1.1, but 1.1.1 is already released. Should I expect it to be fixed in the next release? and will a patch be available meanwhile? Thanks. Yassine

Re: Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-15 Thread Yassine Marzougui
On Mon, Aug 15, 2016 at 12:38 PM, Yassine Marzougui wrote: > Hi all, > > I have a Kafka topic with two partitions, messages within each partition > are ordered in ascending timestamps. > > The following code works correctly (I'm running this on my local machine, > the de

Re: Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-15 Thread Yassine Marzougui
chains) in order to have the same parallelism, but I didn't succeed. Isn't doing env.addSource(). setparallelism(n).startNewChain().map(...)disableChaining() equivalent to setting source and map parallelism to the same value? On Mon, Aug 15, 2016 at 12:38 PM, Yassine Marzougui wrote:

Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-15 Thread Yassine Marzougui
Hi all, I have a Kafka topic with two partitions, messages within each partition are ordered in ascending timestamps. The following code works correctly (I'm running this on my local machine, the default parallelism is the number of cores=8): stream = env.addSource(myFlinkKafkaConsumer09) stream

Re: No output when using event time with multiple Kafka partitions

2016-08-15 Thread Yassine Marzougui
Hi Aljoscha, Sorry for the late response, I was busy and couldn't make time to work on this again again until now. Indeed, it turns out only one of the partitions is not receiving elements. The reason is that the producer will stick to a partition for topic.metadata.refresh.interval.ms (defaults t

Re: Window not emitting output after upgrade to Flink 1.1.1

2016-08-12 Thread Yassine Marzougui
f yes, then the problem can be related to this: > > https://issues.apache.org/jira/browse/FLINK-4329 > > Is this the case? > > Best, > Kostas > > On Aug 12, 2016, at 10:30 AM, Yassine Marzougui > wrote: > > Hi all, > > The following code works under Flink

Window not emitting output after upgrade to Flink 1.1.1

2016-08-12 Thread Yassine Marzougui
Hi all, The following code works under Flink 1.0.3, but under 1.1.1 it just switches to FINISHED and doesn't output any result. stream.map(new RichMapFunction() { private ObjectMapper objectMapper; @Override public void open(Configuration parameters) { object