New message processing time after recovery.

2017-06-27 Thread yunfan123
For example, my job failed in timestamp 1. Recovery from checkpoint takes 600 seconds. So the new elements' processing time into my streams is 601? -- View this message in context:

Re: How to perform multiple stream join functionality

2017-06-27 Thread yunfan123
Flink 1.3? I'm use flink 1.3, how can I do to implement this? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-perform-multiple-stream-join-functionality-tp7184p14031.html Sent from the Apache Flink User Mailing List archive. mailing

Re: Window data retention - Appending to previous data and processing

2017-06-27 Thread G.S.Vijay Raajaa
Thanks a lot. It works fine !! Regards, Vijay Raajaa GS On Mon, Jun 26, 2017 at 7:01 PM, Aljoscha Krettek wrote: > Hi, > > I think you should be able to do this by using: > > * GlobalWindows as your window function > * a custom Trigger that fires on every element, sets a

Re: Problem with Summerization

2017-06-27 Thread Greg Hogan
Hi Ali, Could you print and include a gellyGraph which results in this error. Greg > On Jun 27, 2017, at 2:48 PM, rost...@informatik.uni-leipzig.de wrote: > > Dear All, > > I do not understand what the error in the following code can be? > > Graph gellyGraph =

Problem with Summerization

2017-06-27 Thread rostami
Dear All, I do not understand what the error in the following code can be? Graph gellyGraph = ... Graph g = gellyGraph.run(new Summarization());

Re: MapR libraries shading issue

2017-06-27 Thread ani.desh1512
Again as I mentioned in the MapR thread, So, after some more digging, I found out that you can make flink use the default java truststore by passing -Djavax.net.ssl.trustStore=$JAVA_HOME/jre/lib/security/cacerts as JVM_ARGS for Flink. I tested this approach with AWS, datadog along with MapR

Incremental aggregation using Fold and failure recovery

2017-06-27 Thread Ahmad Hassan
Hi All, I am collecting millions of events per hour for 'N' number of products where 'N' can be 50k. I use the following fold mechanism with sliding window: final DataStream eventStream = inputStream .keyBy(TENANT, CATEGORY) .window(SlidingProcessingTimeWindows.of(Time.hour(1,Time.minute(5)))

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-27 Thread sohimankotia
So In following execution flow : source -> map -> partitioner -> flatmap -> sink I am attaching current time to tuple while emitting from map function , and then extracting that timestamp value from tuple in flatmap at a very first step . Then I am calculating difference between time attached

Re: java.lang.IllegalArgumentException with 1.3.0 and Elastic Search connector

2017-06-27 Thread Aljoscha Krettek
Hi Victor, What are you using as a Source? The stack trace you posted indicates that the problem is happening while specifying the source. This might be caused by some interactions with the Elasticsearch dependency. Best, Aljoscha > On 17. Jun 2017, at 18:36, Victor Godoy Poluceno

Partition index from partitionCustom vs getIndexOfThisSubtask downstream

2017-06-27 Thread Urs Schoenenberger
Hi, if I use DataStream::partitionCustom, will the partition number that my custom Partitioner returns always be equal to getIndexOfThisSubtask in the following operator? A test case with different parallelisms seems to suggest this is true, but the Javadoc seems ambiguous to me since the

Re: How to perform multiple stream join functionality

2017-06-27 Thread Aljoscha Krettek
Hi, I’m afraid there is also no simple, built-in feature for doing this in Flink 1.3. Best, Aljoscha > On 27. Jun 2017, at 10:37, yunfan123 wrote: > > In flink release 1.3, can I do this in simple way? > > > > -- > View this message in context: >

Re: Combining streams with static data and using REST API as a sink

2017-06-27 Thread Aljoscha Krettek
A quick note on this: the side-input API is still ongoing work and it turns out it’s more complicated (obviously … ) and we will need quite a bit more work on other parts of Flink before we can provide a good built-in solution. In the meantime, you can check out the Async I/O operator [1]. I

Re: Session ID Error

2017-06-27 Thread Aljoscha Krettek
Hi Will, How did you configure Flink and what is the command that you’re using to submit your job/session? Best, Aljoscha > On 21. Jun 2017, at 01:44, Will Walters wrote: > > Hello, > > In attempting to submit a job via Yarn session on Hadoop cluster (using Flink >

Re: Using Custom Partitioner in Streaming with parallelism 1 adding latency

2017-06-27 Thread Aljoscha Krettek
Hi, That depends, how are you measuring and what are your results? Best, Aljoscha > On 19. Jun 2017, at 06:23, sohimankotia wrote: > > Thanks for pointers Aljoscha. > > I was just wondering, Since Custom partition will run in separate thread . > Is it possible that

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-27 Thread Aljoscha Krettek
Hi, What do you mean by latency and how are you measuring this in your job? Best, Aljoscha > On 22. Jun 2017, at 14:23, sohimankotia wrote: > > Hi Chesnay, > > I have data categorized on some attribute(Key in partition ) which will be > having n possible values. As

Re: Performance Improvement on Flink 1.2.0

2017-06-27 Thread Aljoscha Krettek
Just a quick remark about memory and number of slots: with your configuration of 30 slots but only ~20gb of RAM each processing slot does not have a lot of memory to work with. For batch programs this can be a problem. I would suggest to use less but bigger slots, even if the number of cores is

Re: Different Window Sizes in keyed stream

2017-06-27 Thread Aljoscha Krettek
Hi Ahman, You could in fact do this by writing a custom WindowAssigner. Have a look at the assignWindows() method here:

Re: Default value - Time window expires with no data from source

2017-06-27 Thread Aljoscha Krettek
You mean you want to output some data when you know that you don’t have any counts for a given time window? This is not (easily) possible in Flink right now because this would require an operation with parallelism one that determines that there is no data across all keys. Best, Aljoscha > On

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-27 Thread Aljoscha Krettek
Hi, Hadoop FileInputFormats (by default) also include hidden files (files starting with “.” or “_”). You can override this behaviour in Flink by subclassing TextInputFormat and overriding the accept() method. You can use a custom input format with ExecutionEnvironment.readFile(). Regarding

Re: Checkpointing with RocksDB as statebackend

2017-06-27 Thread vinay patil
Hi Stephan, I am observing similar issue with Flink 1.2.1 The memory is continuously increasing and data is not getting flushed to disk. I have attached the snapshot for reference. Also the data processed till now is only 17GB and above 120GB memory is getting used. Is there any change wrt

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-27 Thread Adarsh Jain
Thanks Stefan, my colleague Shashank has filed a bug for the same in jira https://issues.apache.org/jira/browse/FLINK-6993 Regards, Adarsh On Fri, Jun 23, 2017 at 8:19 PM, Stefan Richter wrote: > Hi, > > I suggest that you simply open an issue for this in our

Re: How to perform multiple stream join functionality

2017-06-27 Thread yunfan123
In flink release 1.3, can I do this in simple way? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-perform-multiple-stream-join-functionality-tp7184p14011.html Sent from the Apache Flink User Mailing List archive. mailing list archive