Re: Best practices to maintain reference data for Flink Jobs

2017-05-18 Thread Tzu-Li (Gordon) Tai
Hi, Can the enriching data be keyed? Or is it something that has to be broadcasted to each operator? Either way, I think Side Inputs (an upcoming feature in the future) is the best fit for this. You can take a look at  https://issues.apache.org/jira/browse/FLINK-6131. Regarding the 3 options yo

Queryable state in a keyed stream not querying properly

2017-05-18 Thread Philip Doctor
Dear Flink Users, I’m getting started with Flink and I’ve bumped into a small problem. I have a keyed stream like this: val stream = env.addSource(consumer) .flatMap(new ValidationMap()).name("ValidationMap") .keyBy(x => (x.getObj.foo(), x.getObj.bar(), x.getObj.baz())) .flatMap(new Calcul

Best practices to maintain reference data for Flink Jobs

2017-05-18 Thread Sand Stone
Hi. Say I have a few reference data sets need to be used for a streaming job. The sizes range between 10M-10GB. The data is not static, will be refreshed at minutes and/or day intervals. With the new advancements in Flink, it seems there are quite a few options. A. Store all the data in an exte

Re: questions about Flink's HashJoin performance

2017-05-18 Thread Fabian Hueske
Hi, I'm not aware of a performance report for this feature. I don't think it is well known or used a lot. The classes to check out for prepartitioned / presorted data are SplitDataProperties [1], DataSource [2], and as an example PropertyDataSourceTest [3]. [1] https://github.com/apache/flink/blo

Re: questions about Flink's HashJoin performance

2017-05-18 Thread weijie tong
thanks for tip @Stephan. To [1] , there's a description about "I’ve got sooo much data to join, do I really need to ship it?" . How to configure Flink to touch that target? Is there a performance report ? [1] : https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html

Re: Bushy plan execution

2017-05-18 Thread Fabian Hueske
Hi, Flink does not apply join order optimization (neither in the DataSet nor in the Table API). Joins are executed in the same order as they are specified. You can build bushy join plans for SQL by nesting queries: SELECT * FROM (SELECT * FROM X, Y WHERE x = y) AS t1, (SELECT * FROM U, V WHERE u

Re: Problem with Kafka Consumer

2017-05-18 Thread simone
Hi Kostas, As suggested I switched to version 1.3-SNAPSHOT and the project run without any problem. I will keep you informed if any other issue occurs. Thanks again for the help. Cheers, Simone. On 16/05/2017 16:36, Kostas Kloudas wrote: Hi Simone, Glad I could help ;) Actually it would b

Re: Disk I/O in Flink

2017-05-18 Thread Robert Schmidtke
Minor update: I have executed the flink-runtime tests on XFS, Lustre and DVS (Cray DataWarp), and I observe divergences on XFS and Lustre, but not on DVS. It turns out that cached reads are reported by the file systems as well, so I don't think caching is an issue here. There might still be some th

Re: FlinkKafkaConsumer using Kafka-GroupID?

2017-05-18 Thread Tzu-Li (Gordon) Tai
Hi Valentin! Your understanding is correct, the Kafka connectors do not use the consumer group functionality to distribute messages across multiple instances of a FlinkKafkaConsumer source. It’s basically determining which instances should be assigned which Kafka partitions based on a simple ro

RE: Failed checkpointing on HDFS : Flink don't use the right authentication

2017-05-18 Thread Tzu-Li (Gordon) Tai
Hi Bruno, Thanks for reporting this! And sorry for the stale response here, this one slipped out of my notice. As far as I can tell, this seems to have been fixed indirectly by  https://issues.apache.org/jira/browse/FLINK-5949. Cheers, Gordon On 18 May 2017 at 3:15:18 PM, Bruno Michelin Rakot

RE: Failed checkpointing on HDFS : Flink don't use the right authentication

2017-05-18 Thread Bruno Michelin Rakotondranaivo
Hi all, FYI, this issue seems to be fixed in flink 1.2.1. Regards, From: Bruno Michelin Rakotondranaivo [mailto:bruno.michelin.rakotondrana...@ericsson.com] Sent: vendredi 21 avril 2017 12:47 To: user@flink.apache.org Subject: Failed checkpointing on HDFS : Flink don't use the right authentica