Re: Continuing from the stackoverflow post

2015-12-01 Thread Nirmalya Sengupta
Hello Fabian (), Many thanks for your encouraging words about the blogs. I want to make a sincere attempt. To summarise my understanding of the rule of removal of the elements from the window (after going through your last mail), here are two corollaries: 1) If my workflow has no triggers (and h

Running WebClient from Windows

2015-12-01 Thread Welly Tambunan
Hi All, Is there any way to run WebClient for uploading the job from windows ? I try to run that from mingw but has these error $ bin/start-webclient.sh /c/flink-0.10.0/bin/config.sh: line 261: conditional binary operator expected /c/flink-0.10.0/bin/config.sh: line 261: syntax error near `=~'

Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Welly Tambunan
Ok Robert, Thanks a lot. Looking forward to it. Cheers On Wed, Dec 2, 2015 at 5:50 AM, Robert Metzger wrote: > No, its not yet merged into the source repo of Flink. > > You can find the code here: https://github.com/apache/flink/pull/1425 > You can also check out the code of the PR or downlo

Re: Question about flink message processing guarantee

2015-12-01 Thread Márton Balassi
Dear Jerry, If you do not enable checkpointing you get the at most once processing guarantee (some might call that no guarantee at all). When you enable checkpointing you can choose between exactly and at least once semantics. The latter provides better latency. Best, Marton On Tue, Dec 1, 2015

Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Robert Metzger
No, its not yet merged into the source repo of Flink. You can find the code here: https://github.com/apache/flink/pull/1425 You can also check out the code of the PR or download the PR contents as a patch and apply it to the Flink source. I think the change will be merged tomorrow and then you'll

Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Welly Tambunan
Hi Aljoscha, Is this fix has already been available on 0.10-SNAPSHOT ? Cheers On Tue, Dec 1, 2015 at 6:04 PM, Welly Tambunan wrote: > Thanks a lot Aljoscha. > > When it will be released ? > > Cheers > > On Tue, Dec 1, 2015 at 5:48 PM, Aljoscha Krettek > wrote: > >> Hi, >> I relaxed the restr

Question about flink message processing guarantee

2015-12-01 Thread Jerry Peng
Hello, I have a question regarding link streaming. I now if you enable checkpointing you can have exactly once processing guarantee. If I do not enable checkpointing what are the semantics of the processing? At least once? Best, Jerry

Re: Hello, the performance of apply function after join

2015-12-01 Thread Fabian Hueske
Hi Phil, an apply method after a join runs pipelined with the join, i.e., it starts processing when the first join result is emitted and finishes after it handled the last join result. Unless the logic in your apply function is not terribly complex, this should be OK. If you do not specify an appl

Including option for starting job and task managers in the foreground

2015-12-01 Thread Brian Chhun
Hi All, Is it possible to include a command line flag for starting job and task managers in the foreground? Currently, `bin/jobmanager.sh` and `bin/taskmanager.sh` rely on `bin/flink-daemon.sh`, which starts these things in the background. I'd like to execute these commands inside a docker contain

Hello, the performance of apply function after join

2015-12-01 Thread Philip Lee
Hello, the performance of apply function after join. Just for your information, I am running Flink job on the cluster consisted of 9 machine with each 48 cores. I am working on some benchmark with comparison of Flink, Spark-Sql, and Hive. I tried to optimize *join function with Hint* for better p

Re: Cleanup of OperatorStates?

2015-12-01 Thread Stephan Ewen
Concerning keeping all events in memory: I thought that is sort of a requirement by your application. All events need to go to the same file (which is determined by the time the session times out). If you relax that requirement that you only need to store some aggregate statistic about the session

Re: Working with protobuf wrappers

2015-12-01 Thread Robert Metzger
Hi Flavio, 1. you don't have to register serializers if its working for you. I would add a custom serializer if its not working or if the performance is poor. 2. I don't think that there is such a performance comparison. Tuples are a little faster than POJOs, other types (serialized with Kryo's st

Re: Cleanup of OperatorStates?

2015-12-01 Thread Ufuk Celebi
> On 01 Dec 2015, at 18:34, Stephan Ewen wrote: > > Hi! > > If you want to run with checkpoints (fault tolerance), you need to specify a > place to store the checkpoints to. > > By default, it is the master's memory (or zookeeper in HA), so we put a limit > on the size of the size of the sta

Re: Cleanup of OperatorStates?

2015-12-01 Thread Stephan Ewen
Hi! If you want to run with checkpoints (fault tolerance), you need to specify a place to store the checkpoints to. By default, it is the master's memory (or zookeeper in HA), so we put a limit on the size of the size of the state there. To use larger state, simply configure a different place to

Re: Question about DataStream serialization

2015-12-01 Thread Robert Metzger
Hi Radu, both emails reached the mailing list :) You can not reference to DataSets or DataStreams from inside user defined functions. Both are just abstractions for a data set or stream, so the elements are not really inside the set. We don't have any support for mixing the DataSet and DataStrea

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Maximilian Michels
Thanks! I've linked the issue in JIRA. On Tue, Dec 1, 2015 at 5:39 PM, Robert Metzger wrote: > I think its this one https://issues.apache.org/jira/browse/KAFKA-824 > > On Tue, Dec 1, 2015 at 5:37 PM, Maximilian Michels wrote: >> >> I know this has been fixed already but, out of curiosity, could

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Robert Metzger
I think its this one https://issues.apache.org/jira/browse/KAFKA-824 On Tue, Dec 1, 2015 at 5:37 PM, Maximilian Michels wrote: > I know this has been fixed already but, out of curiosity, could you > point me to the Kafka JIRA issue for this > bug? From the Flink issue it looks like this is a Zoo

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Maximilian Michels
I know this has been fixed already but, out of curiosity, could you point me to the Kafka JIRA issue for this bug? From the Flink issue it looks like this is a Zookeeper version mismatch. On Tue, Dec 1, 2015 at 5:16 PM, Robert Metzger wrote: > Hi Gyula, > > no, I didn't ;) We still deploy 0.10-SN

Re: Cleanup of OperatorStates?

2015-12-01 Thread Niels Basjes
Hi, The first thing I noticed is that the Session object maintains a list of all events in memory. Your events are really small yet in my scenario the predicted number of events per session will be above 1000 and each is expected to be in the 512-1024 bytes range. This worried me yet I decided to

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Robert Metzger
Hi Gyula, no, I didn't ;) We still deploy 0.10-SNAPSHOT versions from the "release-0.10" branch to Apache's maven snapshot repository. I don't think Mihail's code will run when he's compiling it against 1.0-SNAPSHOT. On Tue, Dec 1, 2015 at 5:13 PM, Gyula Fóra wrote: > Hi, > > I think Robert

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Gyula Fóra
Hi, I think Robert meant to write setting the connector dependency to 1.0-SNAPSHOT. Cheers, Gyula Robert Metzger ezt írta (időpont: 2015. dec. 1., K, 17:10): > Hi Mihail, > > the issue is actually a bug in Kafka. We have a JIRA in Flink for this as > well: https://issues.apache.org/jira/browse

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Robert Metzger
Hi Mihail, the issue is actually a bug in Kafka. We have a JIRA in Flink for this as well: https://issues.apache.org/jira/browse/FLINK-3067 Sadly, we haven't released a fix for it yet. Flink 0.10.2 will contain a fix. Since the kafka connector is not contained in the flink binary, you can just s

Question about DataStream serialization

2015-12-01 Thread Radu Tudoran
Hello, I am not sure if this message was received on the user list, if so I apologies for duplicate messages I have the following scenario · Reading a fixed set DataStream fixedset = env.readtextFile(… · Reading a continuous stream of data DataStream stream = …. I would need

NPE with Flink Streaming from Kafka

2015-12-01 Thread Vieru, Mihail
Hi, we get the following NullPointerException after ~50 minutes when running a streaming job with windowing and state that reads data from Kafka and writes the result to local FS. There are around 170 million messages to be processed, Flink 0.10.1 stops at ~8 million. Flink runs locally, started w

Re: Cleanup of OperatorStates?

2015-12-01 Thread Niels Basjes
Thanks! I'm going to study this code closely! Niels On Tue, Dec 1, 2015 at 2:50 PM, Stephan Ewen wrote: > Hi Niels! > > I have a pretty nice example for you here: > https://github.com/StephanEwen/sessionization > > It keeps only one state and has the structure: > > > (source) --> (window sessio

Re: Material on Apache flink internals

2015-12-01 Thread Fabian Hueske
Hi Madhu, checkout the following resources: - Apache Flink Blog: http://flink.apache.org/blog/index.html - Data Artisans Blog: http://data-artisans.com/blog/ - Flink Forward Conference website (Talk slides & recordings): http://flink-forward.org/?post_type=session - Flink Meetup talk recordings:

Re: Material on Apache flink internals

2015-12-01 Thread madhu phatak
Hi everyone, I am fascinated with flink core engine way of streaming of operators rather than typical map/reduce way that followed by hadoop or spark. Is any good documentation/blog/video avalable which talks about this internal. I am ok from a batch or streaming point of view. It will be great i

Re: Cleanup of OperatorStates?

2015-12-01 Thread Stephan Ewen
Hi Niels! I have a pretty nice example for you here: https://github.com/StephanEwen/sessionization It keeps only one state and has the structure: (source) --> (window sessions) ---> (real time sink) | +--> (15 minute files) The real time sink gets t

Re: Cleanup of OperatorStates?

2015-12-01 Thread Stephan Ewen
Just for clarification: The real-time results should also contain the visitId, correct? On Tue, Dec 1, 2015 at 12:06 PM, Stephan Ewen wrote: > Hi Niels! > > If you want to use the built-in windowing, you probably need two window: > - One for ID assignment (that immediately pipes elements throu

question about DataStream serialization

2015-12-01 Thread Radu Tudoran
Hello, I have the following scenario · Reading a fixed set DataStream fixedset = env.readtextFile(... · Reading a continuous stream of data DataStream stream = I would need that for each event read from the continuous stream to make some operations onit and on the fixedse

Re: Cleanup of OperatorStates?

2015-12-01 Thread Stephan Ewen
Hi Niels! If you want to use the built-in windowing, you probably need two window: - One for ID assignment (that immediately pipes elements through) - One for accumulating session elements, and then piping them into files upon session end. You may be able to use the rolling file sink (roll by

Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Welly Tambunan
Thanks a lot Aljoscha. When it will be released ? Cheers On Tue, Dec 1, 2015 at 5:48 PM, Aljoscha Krettek wrote: > Hi, > I relaxed the restrictions on union. This should make it into an upcoming > 0.10.2 bugfix release. > > Cheers, > Aljoscha > > On 01 Dec 2015, at 11:23, Welly Tambunan wrote

Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Aljoscha Krettek
Hi, I relaxed the restrictions on union. This should make it into an upcoming 0.10.2 bugfix release. Cheers, Aljoscha > On 01 Dec 2015, at 11:23, Welly Tambunan wrote: > > Hi All, > > After upgrading our system to the latest version from 0.9 to 0.10.1 we have > this following error. > > Ex

Re: Working with protobuf wrappers

2015-12-01 Thread Flavio Pompermaier
Sorry for the long question but I take advantage of this discussion to ask for something I've never fully understood.. Let's say I have for example a thrift/protobuf/avro object Person. 1. Do I have really to register a custom serializer? In my code I create a dataset from parquet-thrift but

Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-01 Thread Welly Tambunan
Hi All, After upgrading our system to the latest version from 0.9 to 0.10.1 we have this following error. Exception in thread "main" java.lang.UnsupportedOperationException: A DataStream cannot be unioned with itself Then i find the relevant JIRA for this one. https://issues.apache.org/jira/brow

Re: Cleanup of OperatorStates?

2015-12-01 Thread Niels Basjes
Hi Stephan, I created a first version of the Visit ID assignment like this: First I group by sessionid and I create a Window per visit. The custom Trigger for this window does a 'FIRE' after each element and sets an EventTimer on the 'next possible moment the visit can expire'. To avoid getting '

Re: Working with protobuf wrappers

2015-12-01 Thread Robert Metzger
Also, we don't add serializers automatically for DataStream programs. I've opened a JIRA for this a while ago. On Tue, Dec 1, 2015 at 10:20 AM, Till Rohrmann wrote: > Hi Kryzsztof, > > it's true that we once added the Protobuf serializer automatically. > However, due to versioning conflicts (see

Re: Working with protobuf wrappers

2015-12-01 Thread Till Rohrmann
Hi Kryzsztof, it's true that we once added the Protobuf serializer automatically. However, due to versioning conflicts (see https://issues.apache.org/jira/browse/FLINK-1635), we removed it again. Now you have to register the ProtobufSerializer manually: https://ci.apache.org/projects/flink/flink-d