回复:multiple consumer of intermediate data set

2017-03-14 Thread Zhijiang(wangzhijiang999)
Hi  lining,         From JobGraph level, it is logic topology. There will be one IntermediateDataSet between each producer and consumer, like the case A-IntermediateDataSet-B,  A-IntermediateDataSet-D in the left graph.Also the same case for  B-IntermediateDataSet-C,  B-IntermediateDataSet-D,

Re: multiple consumer of intermediate data set

2017-03-14 Thread lining jing
Hi, if output is same, why not just only one intermediate data set is ok 2017-03-14 14:36 GMT+08:00 Zhijiang(wangzhijiang999) < wangzhijiang...@aliyun.com>: > Hi , > > I think there is no difference between JobVertex(A) and JobVertex(B). > Because the JobVertex(C) is not shown in the

Re: Proper way to call a Python function in WindowFunction.apply()

2017-03-14 Thread 김동원
Alright, it works perfectly. I checked that my Python methods are properly executed inside RichWindowFunction. Thanks a lot! p.s. for those who wonder why I use Jep, refer to https://sushant-hiray.me/posts/python-in-scala-stack/ to grasp

Re: Source and Sink Flink

2017-03-14 Thread Robert Metzger
Hi Alberto, It should be possible. The IBM MQ supports the JMS standard, and we have a JMS compatible connector for Flink in Apache Bahir: http://bahir.apache. org/docs/flink/current/flink-streaming-activemq/ For writing files to HDFS, we have the bucketing sink in Flink

Re: Suggestion for top 'k' products

2017-03-14 Thread Meghashyam Sandeep V
Is there an equivalent of spark function like 'takeOrdered' in Flink? If I implement a function to order messages in the stream, I'm not sure if thats executed in a distributed mode by splitting data into available task manager nodes and then evaluate the function. Thanks, Sandeep On Mon, Mar

Source and Sink Flink

2017-03-14 Thread Alberto Ramón
Can be possible from Flink?: read from IBM MQ write in HDFS using append

Re: Checkpointing with RocksDB as statebackend

2017-03-14 Thread Stephan Ewen
The issue in Flink is https://issues.apache.org/jira/browse/FLINK-5756 On Tue, Mar 14, 2017 at 3:40 PM, Stefan Richter wrote: > Hi Vinay, > > I think the issue is tracked here: https://github.com/ > facebook/rocksdb/issues/1988. > > Best, > Stefan > > Am 14.03.2017

Re: Proper way to call a Python function in WindowFunction.apply()

2017-03-14 Thread Chesnay Schepler
Hey, Naturally this would imply that you're script is available on all nodes, so you will have to distribute it manually. On 14.03.2017 17:23, Chesnay Schepler wrote: Hello, I would suggest implementing the RichWindowFunction instead, and instantiate Jep within open(), or maybe do some

Re: Proper way to call a Python function in WindowFunction.apply()

2017-03-14 Thread Chesnay Schepler
Hello, I would suggest implementing the RichWindowFunction instead, and instantiate Jep within open(), or maybe do some lazy instantiation within apply. Regards, Chesnay On 14.03.2017 15:47, 김동원 wrote: Hi all, What is the proper way to call a Python function in WindowFunction.apply()?

Proper way to call a Python function in WindowFunction.apply()

2017-03-14 Thread 김동원
Hi all, What is the proper way to call a Python function in WindowFunction.apply()? I want to apply a Python function to values in a fixed-side sliding window. I'm trying it because - I'm currently working on time-series prediction using deep learning, which is why I need a sliding window to

Re: Checkpointing with RocksDB as statebackend

2017-03-14 Thread Stefan Richter
Hi Vinay, I think the issue is tracked here: https://github.com/facebook/rocksdb/issues/1988 . Best, Stefan > Am 14.03.2017 um 15:31 schrieb Vishnu Viswanath > : > > Hi Stephan, > > Is there a ticket number/link

Re: Checkpointing with RocksDB as statebackend

2017-03-14 Thread Vishnu Viswanath
Hi Stephan, Is there a ticket number/link to track this, My job has all the conditions you mentioned. Thanks, Vishnu On Tue, Mar 14, 2017 at 7:13 AM, Stephan Ewen wrote: > Hi Vinay! > > We just discovered a bug in RocksDB. The bug affects windows without > reduce() or

Re: Checkpointing with RocksDB as statebackend

2017-03-14 Thread Stephan Ewen
Hi Vinay! We just discovered a bug in RocksDB. The bug affects windows without reduce() or fold(), windows with evictors, and ListState. A certain access pattern in RocksDB starts being so slow after a certain size-per-key that it basically brings down the streaming program and the snapshots.

Re: Isolate Tasks - Run Distinct Tasks in Different Task Managers

2017-03-14 Thread Stephan Ewen
Yes, simply set the number of slots per TaskManager in YARN to 1. That gives you the isolation. On Tue, Mar 14, 2017 at 11:58 AM, PedroMrChaves wrote: > Can YARN provide task isolation? > > > > > - > Best Regards, > Pedro Chaves > -- > View this message in

Re: Isolate Tasks - Run Distinct Tasks in Different Task Managers

2017-03-14 Thread PedroMrChaves
Can YARN provide task isolation? - Best Regards, Pedro Chaves -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Isolate-Tasks-Run-Distinct-Tasks-in-Different-Task-Managers-tp12104p12201.html Sent from the Apache Flink User Mailing List

Re: Batch stream Sink delay ?

2017-03-14 Thread Fabian Hueske
Hi Paul, This might be an issue with the watermarks. A window operation can only be compute and emit its results when the watermark time is later than the end time of the window. Each operator keeps track of the maximum timestamp of all its input tasks and computes its own time as the minimum of

Question about processElement(...) and onTimer(...) in ProcessFunction

2017-03-14 Thread Yassine MARZOUGUI
Hi all, In ProcessFuction, does processElement() still get called on incoming elements when onTimer() is called, or are elements buffered until onTimer() returns? I am wondering because both processElement() and onTimer() can access and manipulate the state, so if for example state.clear() is

回复:multiple consumer of intermediate data set

2017-03-14 Thread Zhijiang(wangzhijiang999)
Hi ,      I think there is no difference between JobVertex(A) and JobVertex(B). Because the JobVertex(C) is not shown in the right graph, it may mislead you.There should be another intermediate result partition between JobVertex(B) and JobVertex(C) for each parallelism, and that is the same