Re: CountTrigger FIRE or FIRE_AND_PURGE

2016-08-29 Thread Fabian Hueske
Hi Paul, This blog post [1] includes an example of an early trigger that should pretty much do what you are looking for. This one [2] explains the windowing mechanics of Flink (window assigner, trigger, function, etc). Hope this helps, Fabian [1] https://www.mapr.com/blog/essential-guide-streami

CountTrigger FIRE or FIRE_AND_PURGE

2016-08-29 Thread Paul Joireman
Hi all, I'm attempting to use long SlidingEventTime window (duration 24 hours) but I would like updates more frequently than the 24 hour length. I naeively attempted to use a simple CountTrigger(10) to give me the window every time 10 samples are collected, however, the window processing func

Re: Firing windows multiple times

2016-08-29 Thread Shannon Carey
Yes, let me describe an example use-case that I'm trying to implement efficiently within Flink. We've been asked to aggregate per-user data on a daily level, and from there produce aggregates on a variety of time frames. For example, 7 days, 30 days, 180 days, and 365 days. We can talk about t

Re: Data Transfer between TM should be encrypted

2016-08-29 Thread vinay patil
Hi Stephan, Thank you for your reply. Till when can I expect this feature to be integrated in master or release version ? We are going to get production data (financial data) in October end , so want to have this feature before that. Regards, Vinay Patil On Mon, Aug 29, 2016 at 11:15 AM, Steph

Re: Resource isolation in flink among multiple jobs

2016-08-29 Thread Robert Metzger
Hi, this is the JIRA for mesos support: https://issues.apache.org/jira/browse/FLINK-1984 The first part of it has been merged today: https://github.com/apache/flink/pull/2315 So its likely going to be fully functional for the 1.2 release. You can run multiple TaskManager instances per machine, an

Re: Cannot pass objects with null-valued fields to the next operator

2016-08-29 Thread Stephan Ewen
Hi! Null is indeed not supported for some basic data types (tuples / case classes). Can you use Option for nullable fields? Stephan On Mon, Aug 29, 2016 at 8:04 PM, Jack Huang wrote: > Hi all, > > It seems like flink does not allow passing case class objects with > null-valued fields to the

Cannot pass objects with null-valued fields to the next operator

2016-08-29 Thread Jack Huang
Hi all, It seems like flink does not allow passing case class objects with null-valued fields to the next operators. I am getting the following error message: *Caused by: java.lang.RuntimeException: Could not forward element to next operator* at org.apache.flink.streaming.runtime.task

Re: Data Transfer between TM should be encrypted

2016-08-29 Thread Stephan Ewen
Hi! The way that the JIRA issue you linked will achieve this is by hooking into the network stream pipeline directly, and encrypt the raw network byte stream. We built the network stack on Netty, and will use Netty's SSL/TLS handlers for that. That should be much more efficient than manual encryp

Data Transfer between TM should be encrypted

2016-08-29 Thread vinay patil
Hi Ufuk, This is regarding this issue https://issues.apache.org/jira/browse/FLINK-4404 How can we achieve this, I am able to decrypt the data from Kafka coming in, but I want to make sure that the data is encrypted when flowing between TM's. One approach I can think of is to decrypt the data at

Re: Resource isolation in flink among multiple jobs

2016-08-29 Thread Abhishek Agarwal
Hi Robert, Thanks for your reply. Few follow up questions - Is there any timeline for mesos support? In standalone installations, how about having one Task slot per TaskManager and multiple TaskManager instances per machine? On Mon, Aug 29, 2016 at 7:13 PM, Robert Metzger wrote: > Hi, > > for i

Re: Flink JMX

2016-08-29 Thread Sreejith S
Thank you very much Chesnay, your pointer on "java.rmi.server.hostname" solved the issue. Now i am able to get flink metrics in 1.1.1 "host" setting in JMX Reporter was because i thought of not exposing JMX metrics as public. Since my flink cluster is in AWS cloud. So i tried to bind it to private

Re: Flink WebUI on YARN behind firewall

2016-08-29 Thread Shannon Carey
I've noticed this too. The UI incorrectly tries to load its resources (CSS, JS) from root. Try adding a trailing slash to the URL. -Shannon From: Trevor Grant mailto:trevor.d.gr...@gmail.com>> Date: Friday, August 26, 2016 at 12:52 PM To: mailto:user@flink.apache.org>> Subject: Flink WebUI on YA

Re: Flink JMX

2016-08-29 Thread Chesnay Schepler
Hello, That you can't access JMX in 1.0.3 even though you set all the JVM JMX options is unrelated to Flink. As such your JMX setup in general is broken. Note that in order to remotely access JMX you usually have to set "java.rmi.server.hostname" system-property on the host as well. Regarding

Gracefully Stopping Streaming Job Programmatically in LocalStreamEnvironment

2016-08-29 Thread Konstantin Knauf
Hi everyone, I have an integration test for which a use a LocalStreamEnvironment. Currently, the Flink Job is started in a separated thread, which I interrupt after some time and then do some assertions. In this situation is there a better way to stop/cancel a running job in LocalStreamEnvironmen

Re: Submitting watermarks through a Kinesis stream

2016-08-29 Thread Steffen Hausmann
That's just awesome! Thanks, Steffen On August 29, 2016 3:39:52 PM GMT+02:00, Stephan Ewen wrote: >You are thinking too complicated here ;-) because Flink internally >already >does all the logic of monitoring the minimum watermark across stream >partitions. >As long as you match the Flink source

Re: Resource isolation in flink among multiple jobs

2016-08-29 Thread Robert Metzger
Hi, for isolation, we recommend using YARN, or soon Mesos. For standalone installations, you'll need to manually set up multiple independent Flink clusters within one physical cluster if you want them to be isolated. Regards, Robert On Mon, Aug 29, 2016 at 1:41 PM, Abhishek Agarwal wrote: >

Re: Accessing state in connected streams

2016-08-29 Thread aris kol
Any other opinion on this? Thanks :) Aris From: aris kol Sent: Sunday, August 28, 2016 12:04 AM To: user@flink.apache.org Subject: Re: Accessing state in connected streams In the implementation I am passing just one CoFlatMapFunction, where flatMap1, which

Re: Programatically collect taskmanagers details from Job Manager

2016-08-29 Thread Robert Metzger
Hi, I think the JMX port is logged, but not accessible through the REST interface of the JobManager. I think its a useful feature. If you want, you can file a JIRA for it. On Mon, Aug 29, 2016 at 12:14 PM, Sreejith S wrote: > Thank you Stephen ! That helps, > > Is it possible to get the JMX_POR

Re: Flink JMX

2016-08-29 Thread Robert Metzger
Hi, I think in Flink 1.1.1 JMX will be started on port 8080, 8081 or 8082 (on the JM, 8081 is probably occupied by the web interface). On Mon, Aug 29, 2016 at 1:25 PM, Sreejith S wrote: > Hi Chesnay, > > I added the below configuration in flink-conf in each taskmanagers. (flink > 1.0.3 version

Re: Submitting watermarks through a Kinesis stream

2016-08-29 Thread Stephan Ewen
You are thinking too complicated here ;-) because Flink internally already does all the logic of monitoring the minimum watermark across stream partitions. As long as you match the Flink source parallelism to the number of Kinesis shared, that part is taken care of for you. You only need to publis

Submitting watermarks through a Kinesis stream

2016-08-29 Thread Steffen Hausmann
Hi there, I'm feeding a Flink stream with events from a Kinesis stream and I'm looking for some guidance on how to enable event time in the Flink stream. I've read through the documentation and it seems like I want to add events that carry watermark information to the Kinesis stream and subs

Re: Dynamic scaling in flink

2016-08-29 Thread Robert Metzger
Hi, this JIRA is a good starting point: https://issues.apache.org/jira/browse/FLINK-3755 If you don't care about processing guarantees and you are using a stateless streaming job, you can implement a simple Kafka consumer that uses Kafka's consumer group mechanism. I recently implemented such a Ka

Re: different Kafka serialization for keyed and non keyed messages

2016-08-29 Thread Robert Metzger
Hi Rss, > why Flink implements different serialization schemes for keyed and non keyed messages for Kafka? The non-keyed serialization schema is a basic schema, which works for most use cases. For advanced users which need access to the key, offsets, the partition or topic, there's the keyed ser

Re: Kafka and Flink's partitions

2016-08-29 Thread rss rss
Hello, thanks for the answer. 1. There is currently no way to avoid the repartitioning. When you do a > keyBy(), Flink will shuffle the data through the network. What you would > need is a way to tell Flink that the data is already partitioned. If you > would use keyed state, you would also need t

Re: Flink long-running YARN configuration

2016-08-29 Thread Robert Metzger
The JobManager UI starts when running Flink on YARN. The address of the UI is registered at YARN, so you can also access it through YARNs command line tools or its web interface. On Fri, Aug 26, 2016 at 7:28 PM, Trevor Grant wrote: > Stephan, > > Will the jobmanager-UI exist? E.g. if I am runni

Re: How to set a custom JAVA_HOME when run flink on YARN?

2016-08-29 Thread Robert Metzger
The "env.java.home" variable is only evaluated by the start scripts, not the YARN code. The solution you've mentioned earlier is a good work around in my opinion. On Fri, Aug 26, 2016 at 3:48 AM, Renkai wrote: > It seems that this config variant only effect local cluster and stand alone > clust

Re: Kafka and Flink's partitions

2016-08-29 Thread Robert Metzger
Hi rss, Concerning your questions: 1. There is currently no way to avoid the repartitioning. When you do a keyBy(), Flink will shuffle the data through the network. What you would need is a way to tell Flink that the data is already partitioned. If you would use keyed state, you would also need to

Parquet sink

2016-08-29 Thread Egor Mateshuk
Hi, Does anyone have an example of working Parquet Sink for Flink DataStreaming API? In Flink 1.1 a new RollingSink with custom writers was released. According to [FLINK-3637] it was done to allow working with ORC and Parquet formats. Maybe someone has already used it for Parquet? I’ve found an

Re: How to share text file across tasks at run time in flink.

2016-08-29 Thread Robert Metzger
Hi, you could use Zookeeper if you want to dynamically change the DB name / credentials at runtime. The GlobalJobParameters are immutable at runtime, so you can not pass updates through it to the cluster. They are intended for parameters for all operators/the entire job and the web interface. Re

Resource isolation in flink among multiple jobs

2016-08-29 Thread Abhishek Agarwal
In a standalone flink cluster, the TaskManager has multiple slots and can run different applications in different slot but still within same JVM. Given that, two applications are running under same process, how is the cpu and memory isolation achieved? Are there any recommendations to achieve isol

Re: Flink JMX

2016-08-29 Thread Sreejith S
Hi Chesnay, I added the below configuration in flink-conf in each taskmanagers. (flink 1.0.3 version ) # Enable JMX env.java.opts: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port= -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -D

Re: flink no class found error

2016-08-29 Thread Robert Metzger
Hi, this page explains how to relocate classes in a fat jar: https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html Regards, Robert On Wed, Aug 10, 2016 at 10:31 PM, Janardhan Reddy < janardhan.re...@olacabs.com> wrote: > We don't use guava directly, we use another l

Re: Flink JMX

2016-08-29 Thread Chesnay Schepler
Hello, can you post the jmx config entries and give us more details on how you want to access it? Regards, Chesnay On 29.08.2016 12:09, Sreejith S wrote: Hi All, I am using Flink-1.1.1 and i enabled JMX metrics in configuration file. In the task manger log i can see JMX is running. Is th

Re: TimeWindowAll doeesn't assign properly

2016-08-29 Thread Sendoh
Hi, As I found, the problem was rebalance() because the data arrives in 1 minute (it re-processes old events) and it's a bit strange that when configured watermark as 10 minutes it worked. After removing rebalance(), it works as expected that setting watermark less. DataStream streams = env.addS

Re: Programatically collect taskmanagers details from Job Manager

2016-08-29 Thread Sreejith S
Thank you Stephen ! That helps, Is it possible to get the JMX_PORT kind of details from taskmanagers ? My program will connect to Job Manager and get all taskmaneger IP's and JMX Port and create a JMX_URL automatically. This is what i am trying to achieve. Is it possible ? Thanks, On Mon, Aug

Flink JMX

2016-08-29 Thread Sreejith S
Hi All, I am using Flink-1.1.1 and i enabled JMX metrics in configuration file. In the task manger log i can see JMX is running. Is this metrics exposed only through Flink Metrics API's ? I tried to connect to flink JMX URL using normal javax , but connections getting refused. Thanks, -- *

Re: Dynamic scaling in flink

2016-08-29 Thread Abhishek Agarwal
Thanks Stephan. I understand guaranteeing exactly once semantics with the dynamic scaling is tough. If I were to let go of the exactly once requirement, is it not possible in current version? It would be really great if you can point me to the JIRA tracking this work. On Mon, Aug 29, 2016 at 2:30

Re: flink datastream reduce

2016-08-29 Thread Gábor Gévay
Hello, The result contains (a,Map(3 -> rt)) because reduce prints all intermediate results (sometimes called a "rolling reduce"). It's designed this way, because Flink streams are generally infinite, so there is no last element where you could print the "final" results. However, you can use window

Re: Apache siddhi into Flink

2016-08-29 Thread Hao Chen
+1 support siddhi as a flink operator on DataStream, siddhi support rich CEP features other than pattern match, also support extensible snapshot/restore interface for fault-tolerance which should be easy to integrate with flink's state management (tested with some draft code), we (apache eagle comm

Re: Apache siddhi into Flink

2016-08-29 Thread Till Rohrmann
Hi Aparup, I haven't looked in detail at Siddhi's internals especially the way it handles distributed execution. I've only seen that it uses Hazelcast for a distributed in-memory cache. If a distributed cache for the communication between instances of the CEP operator is needed, then you would hav

Re: Programatically collect taskmanagers details from Job Manager

2016-08-29 Thread Stephan Ewen
You should be able to call the Monitor handler of the JobManager: http://jobmanagerhost:8081/taskmanagers That gives you a JSON response like this: { "taskmanagers [ { "id" : "7c8835b89acf533cb8a5119dbcaf4b4f", "path" : "akka.tcp://flink@127.0.1.1:56343/user/taskmanager", "dataPort" : 5

flink datastream reduce

2016-08-29 Thread rimin515
Hi, in flink,the datastream have reduce Transformations,but the result do not satisfy for me,for example,val pairs2 = env.fromCollection((Array(("a", Map(3->"rt")),("a", Map(4->"yt")),("b", Map(5->"dfs")val re= pairs2.keyBy(0).reduce((x1,x2)=>(x1._1,x2._2++x1._2))re.map

Re: Joda exclude in java quickstart maven archetype

2016-08-29 Thread Fabian Hueske
Hi Flavio, yes, Joda should not be excluded. This will be fixed in Flink 1.1.2. Cheers, Fabian 2016-08-29 11:00 GMT+02:00 Flavio Pompermaier : > Hi to all, > I've tried to upgrade from Flink 1.0.2 to 1.1.1 so I've copied the > excludes of the maven shade plugin from the java quickstart pom bu

Re: Dynamic scaling in flink

2016-08-29 Thread Stephan Ewen
Hi! There is a lot of work in progress on that feature, and it looks like you can expect the next version to have some upscale/downscale feature that maintains exactly-once semantics. Stephan On Mon, Aug 29, 2016 at 9:00 AM, Abhishek Agarwal wrote: > Is it possible to upscale or downscale a f

Joda exclude in java quickstart maven archetype

2016-08-29 Thread Flavio Pompermaier
Hi to all, I've tried to upgrade from Flink 1.0.2 to 1.1.1 so I've copied the excludes of the maven shade plugin from the java quickstart pom but it includes the exclude of joda (that is not contained in the flink-dist jar). That causes my job to fail. Shouldn't it be removed from the exclude lis

Re: Apache siddhi into Flink

2016-08-29 Thread Stephan Ewen
Nice idea! If you look at the current CEP library, it is simply a custom operator. Often, you can even get away with a custom FlatMapFunction that uses state: https://ci.apache.org/projects/flink/flink-docs-master/dev/state.html#using-the-keyvalue-state-interface Stephan On Mon, Aug 29, 2016 at

Re: Firing windows multiple times

2016-08-29 Thread Aljoscha Krettek
Hi, that would certainly be possible? What do you think can be gained by having knowledge about the current watermark in the WindowFunction, in a specific case, possibly? Cheers, Aljoscha On Wed, 24 Aug 2016 at 23:21 Shannon Carey wrote: > What do you think about adding the current watermark to

Re: Apache siddhi into Flink

2016-08-29 Thread Aparup Banerjee (apbanerj)
I think siddhi is a fairly matured CEP library. I am thinking it should co-exist with existing CEP library. My thinking is we should be able to use Siddhi QL/ Siddhi Patterns on top of flink data streams. This can co-exist naturally with existing Java / Scala based Flink CEP Library. I am still

Re: Apache siddhi into Flink

2016-08-29 Thread Chesnay Schepler
Hello Aparup, could you provide more information about Siddhi? How mature is it; how is the community? How does it compare to the Flink's CEP library? How should this integration look like? Are you proposing to replace the current CEP library, or will they co-exist with different use-cases fo

Dynamic scaling in flink

2016-08-29 Thread Abhishek Agarwal
Is it possible to upscale or downscale a flink application without re-deploying (similar to rebalancing in storm)? -- Regards, Abhishek Agarwal