Re: Setting operator parallelism of a running job - Flink 1.2

2017-04-21 Thread Dominik Safaric
lease-1.3/setup/savepoints.html> > > Best, > Aljoscha >> On 21. Apr 2017, at 15:22, Dominik Safaric <dominiksafa...@gmail.com >> <mailto:dominiksafa...@gmail.com>> wrote: >> >> Hi all, >> >> Is it possible to set the operator parallelism using F

Setting operator parallelism of a running job - Flink 1.2

2017-04-21 Thread Dominik Safaric
Hi all, Is it possible to set the operator parallelism using Flink CLI while a job is running? I have a cluster of 4 worker nodes, where each node has 4 CPUs, hence the number of task slots is set to 4, whereas the paralellism.default to 16. However, if a worker fails, whereas the jobs were

Re: Flink 1.2 time window operation

2017-03-30 Thread Dominik Safaric
gt; Cheers, > Gordon > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#window-assigners > > <https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#window-assigners> > On March 30, 2017 at 2:37:41 PM, Dominik Safaric (

Re: Flink 1.2 time window operation

2017-03-30 Thread Dominik Safaric
Gordon > > On March 27, 2017 at 11:25:49 PM, Dominik Safaric (dominiksafa...@gmail.com > <mailto:dominiksafa...@gmail.com>) wrote: > >> Hi all, >> >> Lately I’ve been investigating onto the performance characteristics of Flink >> part of our internal

Flink 1.2 time window operation

2017-03-27 Thread Dominik Safaric
Hi all, Lately I’ve been investigating onto the performance characteristics of Flink part of our internal benchmark. Part of this we’ve developed and deployed an application that pools data from Kafka, groups the data by a key during a fixed time window of a minute. In total, the topic that

Re: Benchmarking streaming frameworks

2017-03-23 Thread Dominik Safaric
Dear Giselle, Various stream processing engines benchmarks already exist. Here are only a few of them I believe are worthwhile mentioning: http://ieeexplore.ieee.org/document/7530084/ https://www.usenix.org/node/188989

Re: flink/cancel & shutdown hooks

2017-03-08 Thread Dominik Safaric
I’m not using YARN but instead of starting the cluster using bin/start-cluster.sh > On 8 Mar 2017, at 15:32, Ufuk Celebi <u...@apache.org> wrote: > > On Wed, Mar 8, 2017 at 3:19 PM, Dominik Safaric > <dominiksafa...@gmail.com> wrote: >> The cluster consists

Re: flink/cancel & shutdown hooks

2017-03-08 Thread Dominik Safaric
e.org> wrote: >> Hi Dominik, >> >> did you take a look into the logs? Maybe the exception is not shown in the >> CLI but in the logs. >> >> Timo >> >> Am 07/03/17 um 23:58 schrieb Dominik Safaric: >> >>> Hi all, >>> >

flink/cancel & shutdown hooks

2017-03-07 Thread Dominik Safaric
Hi all, I would appreciate for any help or advice in regard to default Java runtime shutdown hooks and canceling Flink jobs. Namely part of my Flink application I am using a Kafka interceptor class that defines a shutdown hook thread. When stopping the Flink streaming job on my local machine

Re: FlinkKafkaConsumer010 - creating a data stream of type DataStream<ConsumerRecord<K,V>>

2017-03-07 Thread Dominik Safaric
ake sense to > wrap it up again into a `ConsumerRecord`. The schema interface exposes all > available metadata of the record, so it should be sufficient. > > Cheers, > Gordon > > On March 7, 2017 at 3:51:59 AM, Dominik Safaric (dominiksafa...@gmail.com > <mailto:domin

FlinkKafkaConsumer010 - creating a data stream of type DataStream<ConsumerRecord<K,V>>

2017-03-06 Thread Dominik Safaric
Hi, Unfortunately I cannot find the option of using raw ConsumerRecord instances when creating a Kafka data stream. In general, I would like to use an instance of the mentioned type because our use case requires certain metadata such as record offset and partition. So far I’ve examined

TaskManager failure detection

2017-02-22 Thread Dominik Safaric
Hi, As I’m investigating onto Flink’s fault tolerance capabilities, I would like to know what component and class is in charge of TaskManager failure detection and checkpoint restoring? In addition, how does Flink actually determine that a TaskManager has failed due to e.g. hardware failures?

Debugging, logging and measuring operator subtask performance

2017-01-25 Thread Dominik Safaric
Hi, As I am experiencing certain performance degradations in a streaming job, I want to determine the root cause of it by measuring subtask performance in terms of resource utilisation - e.g. CPU utilisation of the thread. Is this somehow possible? Does Flink log scheduled and executed

Re: benchmarking flink streaming

2017-01-25 Thread Dominik Safaric
Hi Stephan, As I’m already familiar with the latency markers of Flink 1.2, there is one question that bothers me in regard to them - how does Flink measure end-to-end latency when dealing with e.g. aggregations? Suppose you have a topology ingesting data from Kafka, and you want to output

Flink 1.1.3 RollingSink - understanding output blocks/parallelism

2016-12-14 Thread Dominik Safaric
Hi everyone, although this question might sound trivial, I’ve been curious about the following. Given a Flink topology with parallelism level set to 6 for example and outputting the data stream to HDFS using an instance RollingSink, how is the output file structured? By structured, I refer to

Flink 1.1.3 RollingSink - mismatch in the number of records consumed/produced

2016-12-12 Thread Dominik Safaric
Hi everyone, As I’ve implemented a RollingSink writing messages consumed from a Kafka log, I’ve observed that there is a significant mismatch in the number of messages consumed and written to file system. Namely, the consumed Kafka topic contains in total 1.000.000 messages. The topology does

Partitioning operator state

2016-12-07 Thread Dominik Safaric
Hi everyone, In the case of scaling out a Flink cluster, how does Flink handle operator state partitioning of a staged topology? Regards, Dominik

Cannot connect to the JobManager - Flink 1.1.3 cluster mode

2016-11-23 Thread Dominik Safaric
Hi all, As I’ve been setting up a cluster comprised of three worker nodes and a master node, I’ve encountered the problem that the JobManager although running is unreachable. The master instance has access using SSH to all worker nodes. The worker nodes do not however have access via SSH to

Re: Flink Material & Papers

2016-11-21 Thread Dominik Safaric
Hi Hanna, I would certainly recommend if you haven’t so far to check the official docs of Flink at flink.apache.org. The documentation is comprehensive and understandable. From that point, I would recommend the following blog posts and academic papers: Apache Flink: Stream and Batch

Running the JobManager and TaskManager on the same node in a cluster

2016-11-16 Thread Dominik Safaric
Hi, It is generally recommended for streaming engines, also including Flink to run a separate master node - in the case of Flink, the JobManager. However, why should one in Flink run the JobManager on a separate node? Performance wise, the JobManager isn’t intense unlike of course

TaskManager log thread

2016-11-11 Thread Dominik Safaric
If taskmanager.debug.memory.startLogThread is set to true, where does the task manager output the logs to? Unfortunately I couldn’t find this information in the documentation, hence the question. Thanks in advance, Dominik

Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Dominik Safaric
Source(new FlinkKafkaConsumer010<>( > parameterTool.getRequired("topic"), > new SimpleStringSchema(), > parameterTool.getProperties())); > > // write kafka stream to standard out. > messageStream.print(); > > env.execute("Read from Kafka exampl

Re: Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Dominik Safaric
ng with Flink is to use the > Kafka 0.10 connector in the current Flink master. > You can probably copy the connector's code into your own project and use the > new connector from there. > > Regards, > Robert > > > On Thu, Nov 3, 2016 at 8:05 AM, Dominik Safaric <do

Flink Kafka 0.10.0.0 connector

2016-11-03 Thread Dominik Safaric
Dear all, Although the Flink 1.2 version will rollout a Flink Kafka 0.10.x connector, I want to use the Flink 0.9 connector in conjunction with the 0.10.x versions. The reason behind this is because we are currently evaluating Flink part of an empirical research, hence a stable release is