Re: ContinuousProcessingTimeTrigger on empty

2016-07-12 Thread Kostas Kloudas
Hi Xiang, According to your code, you just put all your elements (no splitting by key) into a single infinite window, and you apply your window function every 5min (after the first element had arrived). The combination of the two means that if you have elements arriving at steady pace of 1 ele

Re: Create window before the first event

2016-07-12 Thread Kostas Kloudas
Hi Xiang, I think this is a duplicate from the discussion you opened yesterday. I post the same answer here, in case somebody wants to contribute to the discussion. According to your code, you just put all your elements (no splitting by key) into a single infinite window, and you apply your w

error for building flink-runtime from source

2016-07-12 Thread Radu Tudoran
I am trying to build flink-runtime from source. I run mvn install and the compilation builds with the error below. Any though on this? [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project flink-runtime_2.10: Compilation failure [

Re: sampling function

2016-07-12 Thread Paris Carbone
Hey Do, I think that more sophisticated samplers could make a better fit in the ML library and not in the core API but I am not very familiar with the milestones there. Maybe the maintainers of the batch ML library could check if sampling techniques could be useful there I guess. Paris > On 1

Killing Yarn Session Leaves Lingering Flink Jobs

2016-07-12 Thread Konstantin Gregor
Hello everyone, I have a question concerning stopping Flink streaming processes that run in a detached Yarn session. Here's what we do: We start a Yarn session via yarn-session.sh -n 8 -d -jm 4096 -tm 1 -s 10 -qu flink_queue Then, we start our Flink streaming application via flink run -p 65

Re: Parameters to Control Intra-node Parallelism

2016-07-12 Thread Ovidiu-Cristian MARCU
Hi, Can you post your configuration parameters (exclude default settings) and cluster description? Best, Ovidiu > On 11 Jul 2016, at 17:49, Saliya Ekanayake wrote: > > Thank you Greg, I'll check if this was the cause for my TMs to disappear. > > On Mon, Jul 11, 2016 at 11:34 AM, Greg Hogan <

Re: error for building flink-runtime from source

2016-07-12 Thread Márton Balassi
Hi Radu, Which version of Flink are you building? Looking at the current master builds they are coming in green recently [1]. If you are solely building flink-runtime the issue might be that you are using different version of flink-core (a dependency of flink-runtime) and flink-runtime. Could yo

RE: error for building flink-runtime from source

2016-07-12 Thread Radu Tudoran
Hi, I am building the 1.1 snapshot (should be the latest release). I will try to build the whole project to check if it works Dr. Radu Tudoran Research Engineer - Big Data Expert IT R&D Division [cid:image007.jpg@01CD52EB.AD060EE0] HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research Center Ri

Re: Killing Yarn Session Leaves Lingering Flink Jobs

2016-07-12 Thread Ufuk Celebi
Are you running in HA mode? If yes, that's the expected behaviour at the moment, because the ZooKeeper data is only cleaned up on a terminal state (FINISHED, FAILED, CANCELLED). You have to specify separate ZooKeeper root paths via "recovery.zookeeper.path.root". There is an issue which should be f

Re: Killing Yarn Session Leaves Lingering Flink Jobs

2016-07-12 Thread Stephan Ewen
I think there is a confusion between how Flink thinks about HA and job life cycle, and how many users think about it. Flink thinks that a killing of the YARN session is a failure of the job. So as soon as new Yarn resources become available, it tries to recover the job. Most users think that killi

Re: sampling function

2016-07-12 Thread Till Rohrmann
Stratified sampling would also be beneficial for the DataSet API. I think it would be best if this method is also added to DataSetUtils or made available via the flink-contrib module. Furthermore, I think that it would be easiest if you created the JIRA for this feature, because you know what you w

HDFS to Kafka

2016-07-12 Thread Dominique Rondé
Hi folks, on the first view I have a very simple problem. I like to get datasets out of some textfiles in HDFS and send them to a kafka topic. I use the following code to do that: DataStream hdfsDatasource = env.readTextFile("hdfs://" + parameterTool.getRequired("hdfs_env") + "/user/flink/"

[Discuss] Ordering of Records

2016-07-12 Thread vinay patil
Hi, Here are some of the queries I have : I have two different streams stream1 and stream2 in which the elements are in order. 1) Now when I do keyBy on each of these streams, will the order be maintained ? (Since every group here will be sent to one task manager only ) My understanding is that

Re: Parameters to Control Intra-node Parallelism

2016-07-12 Thread Saliya Ekanayake
Hi Ovidiu, Checking the /var/log/messages based on Greg's response revealed TMs were killed due to out of memory. Here's the node architecture. Each node has 128GB of RAM. I was trying to run 2 TMs per node binding each to 12 cores (or 1 socket). The total number of nodes were 16. I finally, manag

Issue with running Flink Python jobs on cluster

2016-07-12 Thread Geoffrey Mon
Hello all, I've set up Flink on a very small cluster of one master node and five worker nodes, following the instructions in the documentation ( https://ci.apache.org/projects/flink/flink-docs-master/setup/cluster_setup.html). I can run the included examples like WordCount and PageRank across the