Re: Multiple consumers and custom triggers

2016-12-14 Thread dromitlabs
Got it. Thanks! > On Dec 15, 2016, at 02:58, Jamie Grier wrote: > > Ahh, sorry, for #2: A single Flink job can have as many sources as you like. > They can be combined in multiple ways, via things like joins, or connect(), > etc. They can also be completely independent — in other words the dat

Updating a Tumbling Window every second?

2016-12-14 Thread Matt
Hello, I have a rather simple problem with a difficult explanation... I have 3 streams, one of objects of class A (stream A), one of class B (stream B) and one of class C (stream C). The elements of A are generated at a rate of about 3 times every second. Elements of type B encapsulates some key

Re: Multiple consumers and custom triggers

2016-12-14 Thread Jamie Grier
Ahh, sorry, for #2: A single Flink job can have as many sources as you like. They can be combined in multiple ways, via things like joins, or connect(), etc. They can also be completely independent — in other words the data flow graph can be completely disjoint. You never to need to call execute()

Re: Multiple consumers and custom triggers

2016-12-14 Thread Matt
Hey Jamie, Ok with #1. I guess #2 is just not possible. I got it about #3. I just checked the code for the tumbling window assigner and I noticed it's just its default trigger that gets overwritten when using a custom trigger, not the way it assigns windows, it makes sense now. Regarding #4, aft

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-14 Thread Yassine MARZOUGUI
Hi Aljoscha, Thanks a lot for the explanation. Using readFile with PROCESS_CONTINUOUSLY solves it. Two more questions though: 1. Is it possible to gracefully stop the job once it has read the input once? 2. Does the watermark extraction period depend on the watch interval, or should any watch int

Flink 1.1.3 RollingSink - understanding output blocks/parallelism

2016-12-14 Thread Dominik Safaric
Hi everyone, although this question might sound trivial, I’ve been curious about the following. Given a Flink topology with parallelism level set to 6 for example and outputting the data stream to HDFS using an instance RollingSink, how is the output file structured? By structured, I refer to t

Re: Multiple consumers and custom triggers

2016-12-14 Thread Jamie Grier
For #1 there are a couple of ways to do this. The easiest is probably stream1.connect(stream2).map(...) where the MapFunction maps the two input types to a common type that you can then process uniformly. For #3 There must always be a WindowAssigner specified. There are some convenient ways to d

Multiple consumers and custom triggers

2016-12-14 Thread Matt
Hello people, I've written down some quick questions for which I couldn't find much or anything in the documentation. I hope you can answer some of them! *# Multiple consumers* *1.* Is it possible to .union() streams of different classes? It is useful to create a consumer that counts elements on

Re: Standalone cluster layout

2016-12-14 Thread Robert Metzger
Hi Avihai, 1. As much as possible (I would leave the operating system at least 1 GB of memory). It depends also on the workload you have. For streaming workload with very small state, you can use Flink with 1-2 GB of heap space and still get very good performance. 2. Yes, I would recommend to run

Re: PartitionedState and watermark of Window coGroup()

2016-12-14 Thread Robert Metzger
Hi, elements are coGrouped on the specified key. So only elements with the same key in both streams end up in the same group. Yes, the watermark uses the minimum of both streams. On Tue, Dec 13, 2016 at 7:02 PM, Sendoh wrote: > Hi Flink users, > > I'm a bit confused about how these two work

Re: checkpoint notifier not found?

2016-12-14 Thread Abhishek R. Singh
Is this more appropriate for dev list? Anyway here is my first: https://github.com/apache/flink/pull/3006 > On Dec 14, 2016, at 2:38 AM, Robert Metzger wrote: > > Hi Abhishek, > you can not push to the Flink repository directly. Only Flink committers

Re: How to retrieve values from yarn.taskmanager.env in a Job?

2016-12-14 Thread Robert Metzger
Looks like the issue was resolved in the JIRA issue: https://issues.apache.org/jira/browse/FLINK-5322 On Tue, Dec 13, 2016 at 7:32 PM, Shannon Carey wrote: > Till, > > Unfortunately, System.getenv() doesn't contain the expected variable even > within the UDFs, but thanks for the info! > > In the

Re: Checkpointing

2016-12-14 Thread Stefan Richter
Hi, for Flink 1.2 the ListCheckpointed interface is intended to replace Checkpointed. From a user perspective they work very similar, but the new interface allows you to break down the state into smaller, redistributable units (the list items) to support job rescaling. A proper documentation fo

Checkpointing

2016-12-14 Thread Mäki Hanna
Hi, I'm learning Flink and trying to calculate counters that are checkpointed at given intervals. Following the examples on page https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/state.html, except in Scala, I tried to create a function class CountersWithState extends

Re: use case for sliding windows

2016-12-14 Thread Aljoscha Krettek
I think so, you can use sliding windows and basically do a WordCount-like job that counts the occurences in each window. Then, you would have a filter afterwards that filters out those elements where the count is lower than a given threshold. Cheers, Aljoscha On Mon, 12 Dec 2016 at 22:30 Meghashy

Re: Flink 1.1.3 RollingSink - mismatch in the number of records consumed/produced

2016-12-14 Thread Aljoscha Krettek
Hi, how did you execute this? Did any failures occur in between? If yes, it can be that the sink writes stuff multiple times but marks the valid contents of files using a .valid-length file. Cheers, Aljoscha On Mon, 12 Dec 2016 at 19:54 Dominik Safaric wrote: > Hi everyone, > > As I’ve implemen

Re: WindowFunction-extension, WindowedStream apply signature mismatch

2016-12-14 Thread Aljoscha Krettek
Hi, are you using the WindowFunction in org.apache.flink.streaming.api.scala.function? It's a bit tricky because there is another WindowFunction in another package. Cheers, Aljoscha On Tue, 13 Dec 2016 at 13:48 MIkkel Islay wrote: > (The following is a cross-post of a Stack Overflow question at

Re: checkpoint notifier not found?

2016-12-14 Thread Robert Metzger
Hi Abhishek, you can not push to the Flink repository directly. Only Flink committers are allowed to do that. But you can fork the Flink repository on github to your own GitHub account and then push the changes to your Github. Then, you can create a pull request to offer those changes to the main F

[ANNOUNCE] New Flink community mailing list

2016-12-14 Thread Kostas Tzoumas
Hi everyone, We have created a new Flink mailing lists, commun...@flink.apache.org where we can post everything related to the broader Flink community including job offers, upcoming meetups and conferences, exciting reads, and everything else that is deemed worthy for the greater Flink community.

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-14 Thread Aljoscha Krettek
Hi Yassine, for a bit more detailed explanation: We internally changed how the timer system works, this timer system is also used to periodically extract watermarks. Due to this change, in your case we don't extract watermarks anymore. Internally, your call resolves to something like this: Env.re