Objects accessible from all Flink nodes

2017-01-12 Thread Matt
Hello, I have a stream of objects which I use to update the model of a classification algorithm and another stream with the objects I need to classify in real time. The problem is that the instances for training and evaluation are processed on potentially different Flink nodes, but the

Re: Flink snapshotting to S3 - Timeout waiting for connection from pool

2017-01-12 Thread Shannon Carey
Good to know someone else has had the same problem... What did you do about it? Did it resolve on its own? -Shannon On 1/12/17, 11:55 AM, "Chen Qin" wrote: >We have seen this issue back to Flink 1.0. Our finding back then was traffic >congestion to AWS in internal

1.1.4 on YARN - vcores change?

2017-01-12 Thread Shannon Carey
Did anything change in 1.1.4 with regard to YARN & vcores? I'm getting this error when deploying 1.1.4 to my test cluster. Only the Flink version changed. java.lang.RuntimeException: Couldn't deploy Yarn cluster at

Re: Getting key from keyed stream

2017-01-12 Thread Paul Joireman
Thanks Jamie, Just figured that out after some digging and a little trial and error, that works great. Paul From: Jamie Grier Sent: Thursday, January 12, 2017 4:59:43 PM To: user@flink.apache.org Subject: Re: Getting key from keyed

Re: Getting key from keyed stream

2017-01-12 Thread Jamie Grier
A simpler and more efficient approach would simply be the following: val stream = env.addSource(new FlinkKafkaConsumer(...)) stream .addSink(new FlinkKafkaProducer(new MyKeyedSerializationSchema(...))) env.execute() In MyKeyedSerializationSchema just override the getTargetTopic() method.

Re: Flink snapshotting to S3 - Timeout waiting for connection from pool

2017-01-12 Thread Shannon Carey
I can't predict when it will occur, but usually it's after Flink has been running for at least a week. Yes, I do believe we had several job restarts due to an exception due to a Cassandra node being down for maintenance and therefore a query failing to meet the QUORUM consistency level

Getting key from keyed stream

2017-01-12 Thread Paul Joireman
Hi all, Is there a simple way to read the key from a KeyedStream. Very simply I'm trying to read a message from Kafka, separate the incoming messages by a field in the message and write the original message back to Kafka using that field as a new topic. I chose to partition the incoming

Re: How to get help on ClassCastException when re-submitting a job

2017-01-12 Thread Yury Ruchin
Hi, I'd like to chime in since I've faced the same issue running Flink 1.1.4. I have a long-running YARN session which I use to run multiple streaming jobs concurrently. Once after cancelling and resubmitting the job I saw the "X cannot be cast to X" ClassCastException exception in logs. I

Re: Flink snapshotting to S3 - Timeout waiting for connection from pool

2017-01-12 Thread Chen Qin
We have seen this issue back to Flink 1.0. Our finding back then was traffic congestion to AWS in internal network. Many teams too dependent on S3 and bandwidth is shared, cause traffic congestion from time to time. Hope it helps! Thanks Chen > On Jan 12, 2017, at 03:30, Ufuk Celebi

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-12 Thread Till Rohrmann
I'm wondering whether we should not depend the webserver encryption on the global encryption activation and activating it instead per default. On Thu, Jan 12, 2017 at 4:54 PM, Chesnay Schepler wrote: > FLINK-5470 is a duplicate of FLINK-5298 for which there is also an open

Kafka topic partition skewness causes watermark not being emitted

2017-01-12 Thread tao xiao
Hi team, I have a topic with 2 partitions in Kafka. I produced all data to partition 0 and no data to partition 1. I created a Flink job with parallelism to 1 that consumes that topic and count the events with session event window (5 seconds gap). It turned out that the session event window was

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-12 Thread Chesnay Schepler
FLINK-5470 is a duplicate of FLINK-5298 for which there is also an open PR. FLINK-5472 is imo invalid since the webserver does support https, you just have to enable it as per the security documentation. On 12.01.2017 16:20, Till Rohrmann wrote: I also found an issue:

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-12 Thread Till Rohrmann
I also found an issue: https://issues.apache.org/jira/browse/FLINK-5470 I also noticed that Flink's webserver does not support https requests. It might be worthwhile to add it, though. https://issues.apache.org/jira/browse/FLINK-5472 On Thu, Jan 12, 2017 at 11:24 AM, Robert Metzger

Flink on YARN: Cannot connect to JobManager

2017-01-12 Thread Malte Schwarzer
Hi all, I trying to run a Flink job on YARN via "$/bin/flink run -m yarn-cluster -yn 2 ..." with two nodes. But only one JobManager seems to be connected. Flinks hangs at this stage (look up message repeats every second): 017-01-11 15:12:13,653 DEBUG org.apache.flink.yarn.YarnClusterClient

Re: Flink snapshotting to S3 - Timeout waiting for connection from pool

2017-01-12 Thread Ufuk Celebi
Hey Shannon! Is this always reproducible and how long does it take to reproduce it? I've not seen this error before but as you say it indicates that some streams are not closed. Did the jobs do any restarts before this happened? Flink 1.1.4 contains fixes for more robust releasing of resources

Re: Increasing parallelism skews/increases overall job processing time linearly

2017-01-12 Thread Chakravarthy varaga
Hi Tim, Thanks for your response. The results are the same. 4 CPU (*8 cores in total) kafka partitions = 4 per topic parallesim for job = 3 task.slot / TM = 4 Basically this flink application consumes (kafka source) from 2 topics and produces (kafka

Re: Sliding Event Time Window Processing: Window Function inconsistent behavior

2017-01-12 Thread Aljoscha Krettek
Great! Thanks for letting us know. On Wed, 11 Jan 2017 at 12:44 Sujit Sakre wrote: > Hi Aljoscha, > > I have realized that the output stream is not defined separately in the > code below, and hence the input values are getting in the sink. After > defining a

Re: About delta awareness caches

2017-01-12 Thread Xingcan
Hi, Aljoscha Thanks for your explanation. About the Storm windows simulation, we had tried your suggestion and gave up due to its complexity and sort of "reinventing the wheel". Without considering the performance, most of our business-logic code have already been transformed to the "Flink

Re: Making batches of small messages

2017-01-12 Thread Kostas Kloudas
Hi, Fabian is right. The only thing I have to add is that if you have parallelism > 1 then each task will know its local “count” of messages it has buffered. In other words, with a parallelism of 2 and a batching threshold of 1000 messages, each one of the parallel tasks will have to reach

RE: Making batches of small messages

2017-01-12 Thread Gwenhael Pasquiers
Thanks, We are waiting for the 1.2 release eagerly ☺ From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: mercredi 11 janvier 2017 18:32 To: user@flink.apache.org Subject: Re: Making batches of small messages Hi, I think this is a case for the ProcessFunction that was recently added and will

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-12 Thread Fabian Hueske
I have another bugfix for 1.2.: https://issues.apache.org/jira/browse/FLINK-2662 (pending PR) 2017-01-10 15:16 GMT+01:00 Robert Metzger : > Hi, > > this depends a lot on the number of issues we find during the testing. > > > These are the issues I found so far: > >