Re: Python API not working

2018-01-04 Thread Yassine MARZOUGUI
Hi all, Any ideas on this? 2017-12-15 15:10 GMT+01:00 Yassine MARZOUGUI <y.marzou...@mindlytix.com>: > Hi Ufuk, > > Thanks for your response. Unfortunately specifying 'streaming` or `batch` > doesn't work, it looks like mode should be either "plan" or "o

Re: Python API not working

2017-12-15 Thread Yassine MARZOUGUI
...@apache.org>: > Hey Yassine, > > let me include Chesnay (cc'd) who worked on the Python API. > > I'm not familiar with the API and what it expects, but try entering > `streaming` or `batch` for the mode. Chesnay probably has the details. > > – Ufuk > > > On F

Python API not working

2017-12-15 Thread Yassine MARZOUGUI
Hi All, I'm trying to use Flink with the python API, and started with the wordcount exemple from the Documentation. I'm using Flink 1.4 and python 2.7. When running env.execute(local=True), the

Re: notNext() and next(negation) not yielding same output in Flink CEP

2017-07-24 Thread Yassine MARZOUGUI
works. > > Regards, > Dawid > > > On 22 Jul 2017, at 20:32, Yassine MARZOUGUI <y.marzou...@mindlytix.com> > wrote: > > > > Hi all, > > > > I would like to match the maximal consecutive sequences of events of > type A in a stream. > >

notNext() and next(negation) not yielding same output in Flink CEP

2017-07-22 Thread Yassine MARZOUGUI
Hi all, I would like to match the maximal consecutive sequences of events of type A in a stream. I'm using the following : Pattern.begin("start").where(event is not A) .next("middle").where(event is A).oneOrMore().consecutive() .next("not").where(event is not A) I This give the output I want.

Re: Behaviour of the BucketingSink when checkpoints fail

2017-04-28 Thread Yassine MARZOUGUI
y move the pending files that you know are correct > to “final”? > > Best, > Aljoscha > > On 28. Apr 2017, at 11:22, Yassine MARZOUGUI <y.marzou...@mindlytix.com> > wrote: > > Hi all, > > I'm have a failed job containing a BucketingSink. The last successful

Behaviour of the BucketingSink when checkpoints fail

2017-04-28 Thread Yassine MARZOUGUI
Hi all, I'm have a failed job containing a BucketingSink. The last successful checkpoint was before the source started emitting data. The following checkpoints all failed due to the long timeout as I mentioned here :

Re: Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
gt;> 4taskmanager_heap_mb: 4096jobmanager_heap_mb: 1024 >> >> Basic program structure: >> >> 1) read batch from Kinesis >> >> 2) Split batch and shuffle using custom partitioner (consistent hashing). >> >> 3) enrich using external REST service

Fwd: Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
-- Forwarded message -- From: "Yassine MARZOUGUI" <y.marzou...@mindlytix.com> Date: Apr 23, 2017 20:53 Subject: Checkpoints very slow with high backpressure To: <user@flink.apache.org> Cc: Hi all, I have a Streaming pipeline as follows: 1 - read a folder con

Re: Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Im sorry guys if you received multiple instances of this mail, I kept trying to send it yesterday, but looks like the mailing list was stuck and didn't dispatch it until now. Sorry for the disturb. On Apr 23, 2017 20:53, "Yassine MARZOUGUI" <y.marzou...@mindlytix.com> wrote: >

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Hi all, I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future actions on the stream using ProcessFunction and processing time timers (elements

Re: Flink Checkpoint runs slow for low load stream

2017-04-24 Thread Yassine MARZOUGUI
Hi all, I am experiencing a similar problem but with HDFS as a source instead of Kafka. I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Re-sending as it looks like the previous mail wasn't correctly sent --- Hi all, I have a streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen)

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine Marzougui

Checkpoints very slow with high backpressure

2017-04-24 Thread Yassine MARZOUGUI
Hi all, I have a Streaming pipeline as follows: 1 - read a folder continuousely from HDFS 2 - filter duplicates (using keyby(x->x) and keeping a state per key indicating whether its is seen) 3 - schedule some future actions on the stream using ProcessFunction and processing time timers (elements

Re: Windows emit results at the end of the stream

2017-03-23 Thread Yassine MARZOUGUI
Hi Sonex, When using readTextFile(...) with event time, only one watermark with the value Long.MAX_VALUE is sent at the end of the stream, which explais why the windows are stored until the whole file is processed. In order to have periodic watermarks, you need to process the file continuousely

ProcessingTimeTimer in ProcessFunction after a savepoint

2017-03-17 Thread Yassine MARZOUGUI
Hi all, How does the processing time timer behave when a job is taken down with a savepoint and then restarted after the timer was supposed to fire? Will the timer fire at restart because it was missed during the savepoint? I'm wondering because I would like to schedule periodic timers in the

Re: Appropriate State to use to buffer events in ProcessFunction

2017-03-16 Thread Yassine MARZOUGUI
when Heap > statebackends are used. But given that the state may grow up to ~100GB, > RocksDB statebackends are strongly recommended. > > May the information helps you. > > Regards, > Xiaogang > > 2017-03-09 23:19 GMT+08:00 Yassine MARZOUGUI <y.marzou...@mindlytix.com

Re: Late Events with BoundedOutOfOrdernessTimestampExtractor and allowed lateness

2017-03-15 Thread Yassine MARZOUGUI
Hi Nico, You might check Fabian's answer on a similar question I posted previousely on the mailing list, it can be helpful : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/BoundedOutOfOrdernessTimestampExtractor-and-allowedlateness-td9583.html Best, Yassine On Mar 15, 2017

Re: Question about processElement(...) and onTimer(...) in ProcessFunction

2017-03-15 Thread Yassine MARZOUGUI
time timers (they are triggered by > watermarks which are internally handled as records) and I'm pretty sure > that the behavior is the same for processing time timers. > @Kostas (in CC) please correct me if I'm wrong here. > > Best, Fabian > > 2017-03-14 8:04 GMT+01:00 Yassin

Question about processElement(...) and onTimer(...) in ProcessFunction

2017-03-14 Thread Yassine MARZOUGUI
Hi all, In ProcessFuction, does processElement() still get called on incoming elements when onTimer() is called, or are elements buffered until onTimer() returns? I am wondering because both processElement() and onTimer() can access and manipulate the state, so if for example state.clear() is

Re: Appropriate State to use to buffer events in ProcessFunction

2017-03-09 Thread Yassine MARZOUGUI
vent has > arrived. > If you use a RocksDB as state backend, 100+ GB of state should not be a > problem. Have you thought about using Flink's CEP library? It might fit to > your needs without implementing a custom process function. > > I hope that helps. > > Timo > >

Appropriate State to use to buffer events in ProcessFunction

2017-03-07 Thread Yassine MARZOUGUI
Hi all, I want to label events in a stream based on a condition on some future events. For example my stream contains events of type A and B and and I would like to assign a label 1 to an event E of type A if an event of type B happens within a duration x of E. I am using event time and my events

Re: AsyncIO/QueryableStateClient hanging with high parallelism

2017-03-06 Thread Yassine MARZOUGUI
inreased the parallelism) the job simply doesn't finish. I solved the problem by using resultFuture.get()which araised the appropriate exceptions when they happens and failed the job. Best, Yassine 2017-03-06 15:53 GMT+01:00 Yassine MARZOUGUI <y.marzou...@mindlytix.com>: > Hi all, > >

AsyncIO/QueryableStateClient hanging with high parallelism

2017-03-06 Thread Yassine MARZOUGUI
Hi all, I set up a job with simple queryable state sink and tried to query it from another job using the new Async I/O API. Everything worked as expected, except when I tried to increase the parallelism of the querying job it hanged. As you can see in the attached image, when the parallism is 5

OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory

2017-03-03 Thread Yassine MARZOUGUI
Hi all, I tried starting a local Flink 1.2.0 cluster using start-local.sh, with the following settings for the taskmanager memory: taskmanager.heap.mb: 16384 taskmanager.memory.off-heap: true taskmanager.memory.preallocate: true That throws and OOM error: Caused by: java.lang.Exception:

Re: Http Requests from Flink

2017-03-02 Thread Yassine MARZOUGUI
Hi Ulf, I've done HTTP requests in Flink using OkHttp library . I found it practical and easy to use. Here is how I used it to make a POST request for each incoming element in the stream and ouput the response: DataStream stream = stream.map(new

Re: Flink requesting external web service with rate limited requests

2017-02-28 Thread Yassine MARZOUGUI
Hi Fabian, I have a related question regarding throttling at the source: If there is a sleep in the source as in ContinuousFileMonitoringFunction.java

RE: Aggregation problem.

2017-02-18 Thread Yassine MARZOUGUI
Hi, I think this is an expected output and not necessarily a bug. To get the element having the maximum value, maxBy() should be used instead of max(). See this answer for more details :

Re: JavaDoc 404

2017-02-14 Thread Yassine MARZOUGUI
flink/flink-docs- release-1.2/api/java/org/apache/flink/streaming/connectors/fs/bucketing/ BucketingSink.html The build for master should be done in 30 minutes. On Wed, Feb 8, 2017 at 10:49 AM, Yassine MARZOUGUI < y.marzou...@mindlytix.com> wrote: > Thanks Robert and Ufuk for the update.

Re: Flink 1.2 Maven dependency

2017-02-09 Thread Yassine MARZOUGUI
Hi, I coud find the dependency here : https://search.maven.org/#artifactdetails%7Corg.apache.flink%7Cflink-core%7C1.2.0%7Cjar , I wonder why it still doesn't show in http://mvnrepository.com/ artifact/org.apache.flink/flink-core. The dependency version for Flink 1.2 is 1.2.0.

Re: JavaDoc 404

2017-02-08 Thread Yassine MARZOUGUI
; wrote: > >> Yes, I'll try to fix it asap. Sorry for the inconvenience. >> >> On Mon, Feb 6, 2017 at 4:43 PM, Ufuk Celebi <u...@apache.org> wrote: >> >>> Thanks for reporting this. I think Robert (cc'd) is working in fixing >>> this, cor

JavaDoc 404

2017-02-04 Thread Yassine MARZOUGUI
Hi, The JavaDoc link of BucketingSink in this page[1] yields to a 404 error. I couldn't find the correct url. The broken link : https://ci.apache.org/projects/flink/flink-docs- master/api/java/org/apache/flink/streaming/connectors/fs/ bucketing/BucketingSink.html Other pages in the JavaDoc, like

Re: Externalized Checkpoints vs Periodic Checkpoints

2017-02-02 Thread Yassine MARZOUGUI
link's checkpoint coordinator only retains the last > successfully completed checkpoint. This means that whenever a new > checkpoint completes then the last completed checkpoint will be discarded. > This also applies to externalized checkpoints. > > Cheers, > Till > > On Wed, Feb

Externalized Checkpoints vs Periodic Checkpoints

2017-02-01 Thread Yassine MARZOUGUI
Hi all, Could someone clarify the difference between externalized checkpoints[1] and regular periodic checkpoints[2]? Moreover, I have a question regarding the retention of checkpoints: For regular checkpoints, does the last checkpoint discard the previous ones? If yes, is that the case too for

Re: Rate-limit processing

2017-01-20 Thread Yassine MARZOUGUI
Hi, You might find this similar thread from the mailing list archive helpful : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/throttled-stream-td6138.html . Best, Yassine 2017-01-20 10:53 GMT+01:00 Florian König : > Hi, > > i need to limit the

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Yassine MARZOUGUI
. @Kostas, If you haven't already started working on a fix for this, I would happily contribute a fix for it if you like. Best, Yassine 2017-01-09 17:23 GMT+01:00 Yassine MARZOUGUI <y.marzou...@mindlytix.com>: > Hi Kostas, > > I debugged the code and the nestedFileEnumeration param

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Yassine MARZOUGUI
is executed by the tasks, the > nestedFileEnumeration parameter is still true? > > I am asking in order to pin down if the problem is in the way we ship the > code to the tasks or in reading the > nested files. > > Thanks, > Kostas > > On Jan 9, 2017, at 12:56 PM, Yassine MARZOUGUI <y.

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Yassine MARZOUGUI
Hi, Any updates on this issue? Thank you. Best, Yassine On Dec 20, 2016 6:15 PM, "Aljoscha Krettek" <aljos...@apache.org> wrote: +kostas, who probably has the most experience with this by now. Do you have an idea what might be going on? On Fri, 16 Dec 2016 at 15:45 Y

Triggering a saveppoint failed the Job

2017-01-04 Thread Yassine MARZOUGUI
Hi all, I tried to trigger a savepoint for a streaming job, both the savepoint and the job failed. The job failed with the following exception: java.lang.RuntimeException: Error while triggering checkpoint for IterationSource-7 (1/1) at

Re: Continuous File monitoring not reading nested files

2016-12-16 Thread Yassine MARZOUGUI
Looks like this is not specific to the continuous file monitoring, I'm having the same issue (files in nested directories are not read) when using: env.readFile(fileInputFormat, "hdfs:///shared/mydir", FileProcessingMode.PROCESS_ONCE, -1L) 2016-12-16 11:12 GMT+01:00 Yassine MARZOUGUI

Continuous File monitoring not reading nested files

2016-12-16 Thread Yassine MARZOUGUI
Hi all, I'm using the following code to continuously process files from a directory "mydir". final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); FileInputFormat fileInputFormat = new TextInputFormat(new Path("hdfs:///shared/mydir"));

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-14 Thread Yassine MARZOUGUI
e: > >> Hi Yassine, >> I managed to reproduce the problem. The cause is that we recently changed >> how the timer service is being cleaned up and now the watermark timers are >> not firing anymore. >> >> I'll keep you posted and hope to find a solution fast.

Re: Incremental aggregations - Example not working

2016-12-12 Thread Yassine MARZOUGUI
e3<String, Long, Integer>>(). Best, Yassine 2016-12-12 16:38 GMT+01:00 Matt <dromitl...@gmail.com>: > I'm using 1.2-SNAPSHOT, should it work in that version? > > On Mon, Dec 12, 2016 at 12:18 PM, Yassine MARZOUGUI < > y.marzou...@mindlytix.com> wrote: > &g

Re: Incremental aggregations - Example not working

2016-12-12 Thread Yassine MARZOUGUI
Hi Matt, What version of Flink are you using? The incremental agregation with fold(ACC, FoldFunction, WindowFunction) in a new change that will be part of Flink 1.2, for Flink 1.1 the correct way to perform incrementation aggregations is : apply(ACC, FoldFunction, WindowFunction) (see the docs

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-11 Thread Yassine MARZOUGUI
> you can use it like this: > input.transform("WatermarkDebugger", input.getType(), new > WatermarkDebugger<Tuple2<String, Integer>>()); > > That should give us something to work with. > > Cheers, > Aljoscha > > On Mon, 5 Dec 2016 at 18:54 Robert Metz

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-05 Thread Yassine MARZOUGUI
I forgot to mention : the watermark extractor is the one included in Flink API. 2016-12-05 11:31 GMT+01:00 Yassine MARZOUGUI <y.marzou...@mindlytix.com>: > Hi robert, > > Yes, I am using the same code, just swithcing the version in pom.xml to > 1.2-SNAPSHOT and th

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-05 Thread Yassine MARZOUGUI
sure your watermark extractor is the same between the two > versions. It sounds a bit like the watermarks for the 1.2 code are not > generated correctly. > > Regards, > Robert > > > On Sat, Dec 3, 2016 at 9:01 AM, Yassine MARZOUGUI < > y.marzou...@mindlytix.com> wrote: >

In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-03 Thread Yassine MARZOUGUI
Hi all, With 1.1-SNAPSHOT, EventTimeSessionWindows fire as soon as the windows boundaries are detected, but with 1.2-SNAPDHOT the state keeps increasing in memory and the windows results are not emitted until the whole stream is processed. Is this a temporary behaviour due to the developments in

Re: Sink not switched to "RUNUNG" even though a task slot is available

2016-11-23 Thread Yassine MARZOUGUI
; > Regards, > Robert > > > > On Wed, Nov 23, 2016 at 2:03 PM, Yassine MARZOUGUI < > y.marzou...@mindlytix.com> wrote: > >> Hi all, >> >> My batch job has the follwoing plan in the end (figure attached): >> >> >> ​ >> I have

Sink not switched to "RUNUNG" even though a task slot is available

2016-11-23 Thread Yassine MARZOUGUI
Hi all, My batch job has the follwoing plan in the end (figure attached): ​ I have a total of 32 task slots, and I have set the parallelism of the last two operators before the sink to 31. The sink parallelism is 1. The last operator before the sink is a MapOperator, so it doesn't need to

Re: Csv to windows?

2016-11-07 Thread Yassine MARZOUGUI
Hi Flelix, As I see in kddcup.newtestdata_small_unlabeled_index , the first field of connectionRecords (splits[0]), is unique for each record, therefore when

Re: Questions about FoldFunction and WindowFunction

2016-11-02 Thread Yassine MARZOUGUI
+01:00 Aljoscha Krettek <aljos...@apache.org>: > Would you be interested in contributing a fix for that? Otherwise I'll > probably fix work on that in the coming weeks. > > On Wed, 2 Nov 2016 at 13:38 Yassine MARZOUGUI <y.marzou...@mindlytix.com> > wrote: > >&g

Re: Questions about FoldFunction and WindowFunction

2016-11-02 Thread Yassine MARZOUGUI
API instability but right at the end of this issue I'm > proposing a different solution that we'll hopefully get in for the next > release. > > Cheers, > Aljoscha > > On Wed, 2 Nov 2016 at 10:42 Yassine MARZOUGUI <y.marzou...@mindlytix.com> > wrote: > >> Hi all, >

Questions about FoldFunction and WindowFunction

2016-11-02 Thread Yassine MARZOUGUI
Hi all, I have a couple questions about FoldFunction and WindowFunction: 1. When using a RichFoldFunction after a window as in keyedStream.window().fold(new RichFoldFunction()), is the close() method called after each window or after all the windows for that key are fired? 2. When applying a

Re: Bug: Plan generation for Unions picked a ship strategy between binary plan operators.

2016-10-25 Thread Yassine MARZOUGUI
ght I had fixed that bug a few weeks a ago, but apparently the fix > did not catch all cases. > Can you please reopen FLINK-2662 and post the program to reproduce the bug > there? > > Thanks, > Fabian > > [1] https://issues.apache.org/jira/browse/FLINK-2662 > > 2016-10-2

Bug: Plan generation for Unions picked a ship strategy between binary plan operators.

2016-10-25 Thread Yassine MARZOUGUI
Hi all, My job fails with the folowing exception : CompilerException: Bug: Plan generation for Unions picked a ship strategy between binary plan operators. The exception happens when adding partitionByRange(1).sortPartition(1, Order.DESCENDING) to the union of datasets. I made a smaller version

Re: NoClassDefFoundError on cluster with httpclient 4.5.2

2016-10-21 Thread Yassine MARZOUGUI
; > Cheers, > Till > > On Wed, Oct 19, 2016 at 6:41 PM, Yassine MARZOUGUI < > y.marzou...@mindlytix.com> wrote: > >> Hi all, >> >> I'm using httpclient with the following dependency: >> >> >> org.apache.httpcomponents >> httpclien

NoClassDefFoundError on cluster with httpclient 4.5.2

2016-10-19 Thread Yassine MARZOUGUI
Hi all, I'm using httpclient with the following dependency: org.apache.httpcomponents httpclient 4.5.2 On local mode, the program works correctly, but when executed on the cluster, I get the following exception: java.lang.Exception: The user defined 'open(Configuration)' method in class

Re: BoundedOutOfOrdernessTimestampExtractor and allowedlateness

2016-10-17 Thread Yassine MARZOUGUI
be >> ignored. >> These issue are related but not exactly the same thing. For instance you >> can counter late data by increasing the bound or the lateness parameter. >> Increasing the watermark bound will yield higher latencies as windows are >> evaluated later. >> Configuring allowe

BoundedOutOfOrdernessTimestampExtractor and allowedlateness

2016-10-17 Thread Yassine MARZOUGUI
Hi, I'm a bit confused about how Flink deals with late elements after the introduction of allowedlateness to windows. What is the difference between using a BoundedOutOfOrdernessTimestampExtractor(Time.seconds(X)) and allowedlateness(Time.seconds(X))? What if one is used and the other is not? and

Re: "java.net.SocketException: No buffer space available (maximum connections reached?)" when reading from HDFS

2016-10-17 Thread Yassine MARZOUGUI
pache/flink.git; you will get the code that is the > same as the 1.1.3 release, plus the patch to this problem. > > Greetings, > Stephan > > > On Sat, Oct 15, 2016 at 10:11 PM, Yassine MARZOUGUI < > y.marzou...@mindlytix.com> wrote: > >> Hi all, >> >>

"java.net.SocketException: No buffer space available (maximum connections reached?)" when reading from HDFS

2016-10-15 Thread Yassine MARZOUGUI
Hi all, I'm reading a large number of small files from HDFS in batch mode (about 20 directories, each directory contains about 3000 files, using recursive.file.enumeration=true), and each time, at about 200 GB of received data, my job fails with the following exception: java.io.IOException:

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
identifies this as the end of the >> string field. >> As a workaround, you can read the file as a regular text file, line by >> line and do the parsing in a MapFunction. >> >> Best, Fabian >> >> 2016-10-11 13:37 GMT+02:00 Yassine MARZOUGUI <y.mar

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
Forgot to add parseQuotedStrings('"'). After adding it I'm getting the same exception with the second code too. 2016-10-11 13:29 GMT+02:00 Yassine MARZOUGUI <y.marzou...@mindlytix.com>: > Hi Fabian, > > I tried to debug the code, and it turns out a line in my c

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
Hueske <fhue...@gmail.com>: > Hi Yassine, > > I ran your code without problems and got the correct result. > Can you provide the Stacktrace of the Exception? > > Thanks, Fabian > > 2016-10-10 10:57 GMT+02:00 Yassine MARZOUGUI <y.marzou...@mindlytix.com>: > >>

Re: Handling decompression exceptions

2016-10-10 Thread Yassine MARZOUGUI
e the env.readCsvFile() >> shortcut but would need to create an instance of your own InputFormat and >> add it with >> env.readFile(yourIF). >> >> Hope this helps, >> Fabian >> >> 2016-10-04 17:43 GMT+02:00 Yassine MARZOUGUI <y.marzou...@m

Re: Simple batch job hangs if run twice

2016-09-23 Thread Yassine MARZOUGUI
rators.BatchTask.run(BatchTask.java:486) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) at java.lang.Thread.run(Thread.java:745) Best, Yassine 2016-09-23 11:28 GMT+02:00 Yassine MARZOUGUI <y.marzou...@mindl

Re: Simple batch job hangs if run twice

2016-09-17 Thread Yassine MARZOUGUI
ve tried this repeatedly so what > differentiates a "first time" from the other times? Are you closing your > IDE in-between or do you mean running the job a second time within the same > program? > > Cheers, > Aljoscha > > On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI

Re: Handle deserialization error

2016-08-26 Thread Yassine Marzougui
Hi Jack, As Robert Metzger mentioned in a previous thread, there's an ongoing discussion about the issue in this JIRA: https://issues.apache.org/jira/browse/FLINK-3679. A possible workaround is to use a SimpleStringSchema in the Kafka source, and chain it with a flatMap operator where you can

Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread Yassine Marzougui
Sorry I mistyped the code, it should be *timeWindow**(Time.minutes(10))* instead of *window**(Time.minutes(10)).* On Wed, Aug 24, 2016 at 9:29 PM, Yassine Marzougui <yassmar...@gmail.com> wrote: > Hi subash, > > A stream is infinite, hence it has no notion of "final"

Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread Yassine Marzougui
Hi subash, A stream is infinite, hence it has no notion of "final" count. To get distinct counts you need to define a period (= a window [1] ) over which you count elements and emit a result, by adding a winow operator before the reduce. For example the following will emit distinct counts every

FLINK-4329 fix version

2016-08-23 Thread Yassine Marzougui
Hi all, The fix version of FLINK-4329 in JIRA is set to 1.1.1, but 1.1.1 is already released. Should I expect it to be fixed in the next release? and will a patch be available meanwhile? Thanks. Yassine

Re: Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-15 Thread Yassine Marzougui
n, Aug 15, 2016 at 12:38 PM, Yassine Marzougui <yassmar...@gmail.com> wrote: > Hi all, > > I have a Kafka topic with two partitions, messages within each partition > are ordered in ascending timestamps. > > The following code works correctly (I'm running this on my local

Re: Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-15 Thread Yassine Marzougui
tor-chains) in order to have the same parallelism, but I didn't succeed. Isn't doing env.addSource(). setparallelism(n).startNewChain().map(...)disableChaining() equivalent to setting source and map parallelism to the same value? On Mon, Aug 15, 2016 at 12:38 PM, Yassine Marzougui <yassmar...@gma

Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-15 Thread Yassine Marzougui
Hi all, I have a Kafka topic with two partitions, messages within each partition are ordered in ascending timestamps. The following code works correctly (I'm running this on my local machine, the default parallelism is the number of cores=8): stream = env.addSource(myFlinkKafkaConsumer09)

Re: No output when using event time with multiple Kafka partitions

2016-08-15 Thread Yassine Marzougui
Hi Aljoscha, Sorry for the late response, I was busy and couldn't make time to work on this again again until now. Indeed, it turns out only one of the partitions is not receiving elements. The reason is that the producer will stick to a partition for topic.metadata.refresh.interval.ms (defaults

Re: Window not emitting output after upgrade to Flink 1.1.1

2016-08-12 Thread Yassine Marzougui
, then the problem can be related to this: > > https://issues.apache.org/jira/browse/FLINK-4329 > > Is this the case? > > Best, > Kostas > > On Aug 12, 2016, at 10:30 AM, Yassine Marzougui <yassmar...@gmail.com> > wrote: > > Hi all, > > The followin

Window not emitting output after upgrade to Flink 1.1.1

2016-08-12 Thread Yassine Marzougui
Hi all, The following code works under Flink 1.0.3, but under 1.1.1 it just switches to FINISHED and doesn't output any result. stream.map(new RichMapFunction() { private ObjectMapper objectMapper; @Override public void open(Configuration parameters) {