Re: keyBy and parallelism

2018-04-11 Thread Hao Sun
>From what I learnt, you have to control parallelism your self. You can set parallelism on operator or set default one through flink-config.yaml. I might be wrong. On Wed, Apr 11, 2018 at 2:16 PM Christophe Jolif wrote: > Hi all, > > Imagine I have a default parallelism of 16

keyBy and parallelism

2018-04-11 Thread Christophe Jolif
Hi all, Imagine I have a default parallelism of 16 and I do something like stream.keyBy("something").flatMap() Now let's imagine I have less than 16 keys, maybe 8. How many parallel executions of the flatMap function will I get? 8 because I have 8 keys, or 16 because I have default parallelism

State management and heap usage

2018-04-11 Thread TechnoMage
I am pretty new to flink and have an initial streaming job working both locally and remotely. But, both ways if the data volume is too high it runs out of heap. I am using RichMapFunction to process multiple streams of data. I assumed Flink would manage keeping state in ram when possible,

Re: Old JobManager lost its leadership in zk

2018-04-11 Thread Steven Wu
from Flink UI on jobmanager, sometimes I saw taskmanager connected and heartbeat time got updated. but then sometimes the taskmanager page become blank. maybe disconnected. On Wed, Apr 11, 2018 at 1:31 PM, Steven Wu wrote: > Hi, > > After this error/exception, it seems

Old JobManager lost its leadership in zk

2018-04-11 Thread Steven Wu
Hi, After this error/exception, it seems that taskmanager never connects to jobmanager anymore. Job stuck in failed state because there is not enough slots to recover the job. let's assume there was a temp glitch btw jobmanager and zk. would it cause such a permanent failure in Flink? I

Re: Is Flink able to do real time stock market analysis?

2018-04-11 Thread TechnoMage
I am new to Flink so others may have more complete answer or correct me. If you are counting the events in a tumbling window you will get output at the end of each tumbling window, so a running count of events/window. It sounds like you want to compare the raw data to the smoothed data? You

Clarification on slots and efficiency

2018-04-11 Thread Derek VerLee
From the docs ( https://ci.apache.org/projects/flink/flink-docs-master/concepts/runtime.html ) By adjusting the number of task slots, users can define how subtasks are isolated from each other. Having one slot per TaskManager means each

Json KAFKA producer

2018-04-11 Thread Luigi Sgaglione
Hi, I'm trying to create a Flink example with kafka consumer and producer using Json data format. In particular, I'm able to consume and process Json data published on a Kafka topic, but not to publish the results. The problem is that I don't know what is the serialization schema that should be

Re: How to add new Kafka topic to consumer

2018-04-11 Thread Chesnay Schepler
In 1.3 and below I believe this is not possible. For 1.4 and above, please see https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#partition-discovery. On 11.04.2018 14:33, chandresh pancholi wrote: Hi, Is there a way to add Kafka topic dynamically to stream?

Re: How to add new Kafka topic to consumer

2018-04-11 Thread Alexander Smirnov
this feature has been implemented in 1.4.0, take a look at https://issues.apache.org/jira/browse/FLINK-4022 https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-consumers-topic-and-partition-discovery On Wed, Apr 11, 2018 at 3:33 PM chandresh pancholi <

How to add new Kafka topic to consumer

2018-04-11 Thread chandresh pancholi
Hi, Is there a way to add Kafka topic dynamically to stream? For example there were Kafka topic named foo1, foo2, foo3 and task manager will start consuming events from all 3 topics. Now I create another two topic foo4 & bar1 in Kafka. How will FlinkKafkaConsumer would read events from foo4 &

Re: Allowed lateness + side output late data, what's included?

2018-04-11 Thread Juho Autio
Thanks! On Wed, Apr 11, 2018 at 12:59 PM, Chesnay Schepler wrote: > Data that arrives within the allowed lateness should not be written to the > side output. > > > On 11.04.2018 11:12, Juho Autio wrote: > > If I use a non-zero value for allowedLateness and also

Re: Allowed lateness + side output late data, what's included?

2018-04-11 Thread Chesnay Schepler
Data that arrives within the allowed lateness should not be written to the side output. On 11.04.2018 11:12, Juho Autio wrote: If I use a non-zero value for allowedLateness and also sideOutputLateData, does the late data output contain also the events that were triggered in the bounds of

Allowed lateness + side output late data, what's included?

2018-04-11 Thread Juho Autio
If I use a non-zero value for allowedLateness and also sideOutputLateData, does the late data output contain also the events that were triggered in the bounds of allowed lateness? By looking at the docs I can't be sure which way it is. Code example: .timeWindow(Time.days(1))

Re: How to customize triggering of checkpoints?

2018-04-11 Thread Chesnay Schepler
Hello, there is no way to manually trigger checkpoints or configure irregular intervals. You will have to modify the CheckpointCoordinator and build Flink

Re: Window with recent messages

2018-04-11 Thread m@xi
Hello there Krzystzof! Thanks a lot for the answer. Sorry for the late reply. I can see the logic behind custom window processing in Flink. Once, an incoming tuple arrives, you add a timer to it, which is going to tick after "RatingExpiration" time units, as shown in your code. This, is made