Re: Behaviour of CountWindowAll

2015-12-14 Thread Nirmalya Sengupta
Hello Aljoscha , Thanks again for taking time to explain the behaviour of CountWindowAll(m,n). To be honest, the behaviour seems a bit sketchy to me - and probably it needs a revisit - but if that's the way it is, then that's the way it is! :-) -- Nirmalya -- Software Technologist http://www.l

Re: Using S3 as state backend

2015-12-14 Thread Thomas Götzinger
Hi Brian Can you give me short summary how to achieve this. Am 14.12.2015 23:20 schrieb "Brian Chhun" : > For anyone else looking, I was able to use the s3a filesystem which can > use IAM role based authentication as provided by the underlying AWS client > library. > > Thanks, > Brian > > On Thu,

flink streaming documentation

2015-12-14 Thread Radu Tudoran
Hi, I believe i found 2 small inconsistencies in the documentation for the description of Window Apply https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams 1) in the example provided I believe it should be corrected to remove

Re: Published test artifacts for flink streaming

2015-12-14 Thread Nick Dimiduk
Hi Alex, How's your infra coming along? I'd love to up my unit testing game with your improvements :) -n On Mon, Nov 23, 2015 at 12:20 AM, lofifnc wrote: > Hi Nick, > > This is easily achievable using the framework I provide. > createDataStream(Input input) does actually return a > DataStreamS

Re: Specify jobmanager port in HA mode

2015-12-14 Thread Cory Monty
Ufuk, I'm a colleague of Brian. Unfortunately, we are not running YARN so I don't think that PR applies to us. We're trying to run a standalone cluster. Cheers, Cory On Mon, Dec 14, 2015 at 5:23 PM, Ufuk Celebi wrote: > This has been recently added to the YARN client by Robert [1]: > https://

Re: Specify jobmanager port in HA mode

2015-12-14 Thread Ufuk Celebi
This has been recently added to the YARN client by Robert [1]: https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-firewalls Are you running YARN? – Ufuk [1] https://github.com/apache/flink/pull/1416 > On 15 Dec 2015, at 00:03, Ufuk Celebi

Re: Specify jobmanager port in HA mode

2015-12-14 Thread Ufuk Celebi
Hey Brian, I think that it is currently not possible. I will look into whether there is a workaround. In any case, this sounds like a useful thing and it shouldn’t be too complicated to add the desired behaviour. I’ve opened an issue [1] for it and will look into it tomorrow. Is this currently

Re: Using S3 as state backend

2015-12-14 Thread Brian Chhun
For anyone else looking, I was able to use the s3a filesystem which can use IAM role based authentication as provided by the underlying AWS client library. Thanks, Brian On Thu, Dec 10, 2015 at 4:28 PM, Brian Chhun wrote: > Thanks Ufuk, this did the trick. > > Thanks, > Brian > > On Wed, Dec 9,

Re: Accumulators/Metrics

2015-12-14 Thread Nick Dimiduk
Hi Christian, I've returned to this project and am interested in exploring options. Have you released any of your work yet? Have you considered an implementation where each flink worker exposes it's own metrics via a "well known interface" -- such as HTTP or JMX -- and letting an external process

Specify jobmanager port in HA mode

2015-12-14 Thread Brian Chhun
Hello, Is it possible to set the job manager rpc port when running in HA mode? Or is there a workaround or solution if we're running task managers with a firewall? Thanks, Brian

Re: Streaming to db question

2015-12-14 Thread Stephan Ewen
Hi! If the sink that writes to the Database executes partitioned by the primary key, then this should naturally prevent row conflicts. Greetings, Stephan On Mon, Dec 14, 2015 at 11:32 AM, Flavio Pompermaier wrote: > Hi flinkers, > I was going to evaluate if Flink streaming could fit a use cas

Re: Tiny topology shows '0' for all stats.

2015-12-14 Thread Stephan Ewen
To sum this up, the web dashboard stats are Flink network stats. If you job has no network communication, its all zero. On Mon, Dec 14, 2015 at 5:03 PM, Ufuk Celebi wrote: > > > On 14 Dec 2015, at 16:25, Niels Basjes wrote: > > > > Hi, > > > > I have a very small topology here. > > In fact this

Re: Behaviour of CountWindowAll

2015-12-14 Thread Aljoscha Krettek
Hi, the current behavior is in fact that the window will be triggered every “slide-size” elements and the computation will take into account the last “window-size” elements. So for a window with window-size 10 and slide-size 5 the window will be triggered every 5 elements. This means that your o

Re: Tiny topology shows '0' for all stats.

2015-12-14 Thread Ufuk Celebi
> On 14 Dec 2015, at 16:25, Niels Basjes wrote: > > Hi, > > I have a very small topology here. > In fact this is a thing that generates synthetic data and puts it into Kafka. > When looking at the web UI I see that all counters (i.e. Bytes received, > Records received, Bytes sent, Records sen

Tiny topology shows '0' for all stats.

2015-12-14 Thread Niels Basjes
Hi, I have a very small topology here. In fact this is a thing that generates synthetic data and puts it into Kafka. When looking at the web UI I see that all counters (i.e. Bytes received, Records received, Bytes sent, Records sent) all remain 0. I verified and I'm seeing thousands of records ar

Re: Behaviour of CountWindowAll

2015-12-14 Thread Nirmalya Sengupta
Hello Aljoscha , Thanks for the explanation about the semantics of CountWindowAll's parameters. However, I am thinking about it and what strikes me is this: If I call CountWindowAll(10,5) then what I am instructing Flink to do is to 1) Collect first 10 2) Call max() function *and, *then* begi

Re: Problems with using ZipWithIndex

2015-12-14 Thread Till Rohrmann
I just tested the zipWithIndex method with Flink 0.10.1 and it worked. I used the following code: import org.apache.flink.api.scala._ import org.apache.flink.api.scala.utils._ object Job { def main(args: Array[String]): Unit = { val env = ExecutionEnvironment.getExecutionEnvironment v

Streaming to db question

2015-12-14 Thread Flavio Pompermaier
Hi flinkers, I was going to evaluate if Flink streaming could fit a use case we have, where data comes into the system, gets transformed and then added to a db (a very common problem..). In such use case you have to manage the merge of existing records as new data come in. How can you ensure that o

RE: Crash in a simple "mapper style" streaming app likely due to a memory leak ?

2015-12-14 Thread LINZ, Arnaud
Hi, I’ve just run into another exception, a java.lang.IndexOutOfBoundsException in the zlib library this time. Therefore I suspect a problem in the hadoop’s codec pool usage. I’m investigating, and will keep you informed. Thanks, Arnaud De : ewenstep...@gmail.com [mailto:ewenstep...@gmail.com

Re: streaming state

2015-12-14 Thread Stephan Ewen
Hi Alex! Right now, Flink would not reuse Kafka's partitioning for joins, but shuffle/partition data by itself. Flink is very fast at shuffling and adds very little latency on shuffles, so that is usually not an issue. The reason that design is that we view streaming program as something dynamic:

Re: Crash in a simple "mapper style" streaming app likely due to a memory leak ?

2015-12-14 Thread Stephan Ewen
Hi! That is curious. Can you tell us a bit more about your setup? - Did you set Flink to use off-heap memory in the config? - What parallelism do you run the job with? - What Java and Flink versions are you using? Even better, can you paste the first part of the TaskManager's log (where it

Re: Behaviour of CountWindowAll

2015-12-14 Thread Aljoscha Krettek
Hi Nirmalya, when using count windows the window will trigger after “slide-size” elements have been received. So, since in your example, slide-size is set to 1 it will emit a new max for every element received and once it accumulated 4 elements it will start removing one element for every new el

Re: Serialisation problem

2015-12-14 Thread Aljoscha Krettek
Hi, the problem could be that GValue is not Comparable. Could you try making it extend Comparable (The Java Comparable). Cheers, Aljoscha > On 12 Dec 2015, at 20:43, Robert Metzger wrote: > > Hi, > > Can you check the log output in your IDE or the log files of the Flink client > (./bin/flink)

RE: Crash in a simple "mapper style" streaming app likely due to a memory leak ?

2015-12-14 Thread LINZ, Arnaud
Hello, I did have an off-heap memory leak in my streaming application, due to : https://issues.apache.org/jira/browse/HADOOP-12007. Now that I use the CodecPool to close that leak, I get under load the following error : org.apache.flink.runtime.io.network.netty.exception.LocalTransportException