Flink CEP state store

2017-12-15 Thread mahesh dhabade
Hello, Can anyone shed some light on the intermittent state store mechanism used by Flink CEP ? Based on the kind of pattern, the state-store may be overwhelmed with intermittent states. How does Flink CEP manage this ? Does it use a persistent store, or just an in-memory store with a cap on the n

Re: Flink 1.4.0 keytab is unreadable

2017-12-15 Thread Tzu-Li (Gordon) Tai
Hi 杨光, Thanks a lot for reporting and looking into this with such detail! Your observations are correct: the changes from 1.3.2 to 1.4.0 in the YarnTaskManagerRunner caused the local Keytab path in TMs to not be correctly set. Unfortunately, AFAIK I don’t think there is a possible workaround to

Job-level close()?

2017-12-15 Thread Andrew Roberts
Hello, I’m writing a Flink operator that connects to a database, and running it in parallel results in issues due to the singleton nature of the connection pool in the library I’m working with. The operator needs to close the connection pool when it’s done, but only when ALL parallel instances

Why did you pick Scala/Java for your project?

2017-12-15 Thread Julio Biason
Hey guys, We are in a point when we are almost good to go with our pipeline in Flink, but when building it, we found a bunch of problems with the docs about the Scala API (the mail I sent a few days ago was one of the many documentation problems I found so far). So, we are kinda wondering: Why ea

Setting jar file directory for Apache Flink

2017-12-15 Thread Soheil Pourbafrani
Hi, My Apache Flink code is using some Flink libraries that are not contained in the path $FLINK_HOME/lib. So as I want to run my code on a remote cluster, I need to set a path that copy dependency libraries there (instead of setting jars in ExecutionEnvirnmnet object like ExecutionEnvironment env

Re: consecutive stream aggregations

2017-12-15 Thread Plamen Paskov
In my case i have a lot of users with one session per user. What i'm thinking is to evenly distribute the users then accumulate and finally merge all accumulators. The problem is that i don't know how to achieve this. On 15.12.2017 17:52, Ufuk Celebi wrote: You can first aggregate the length

Re: consecutive stream aggregations

2017-12-15 Thread Ufuk Celebi
You can first aggregate the length per user and emit it downstream. Then you do the all window and average all lengths. Does that make sense? On Fri, Dec 15, 2017 at 4:48 PM, Plamen Paskov wrote: > I think i got your point. > What happens now: in order to use aggregate() i need an window but the

Re: consecutive stream aggregations

2017-12-15 Thread Ufuk Celebi
You have to specify a window for this to work: stream .keyBy() .timeWindow() .aggregate() On Fri, Dec 15, 2017 at 3:04 PM, Plamen Paskov wrote: > Hi Ufuk, > > Thanks for answer. It looks like in theory the accumulators are the solution > to my problem but as i'm working on KeyedStream it

Re: consecutive stream aggregations

2017-12-15 Thread Plamen Paskov
I think i got your point. What happens now: in order to use aggregate() i need an window but the window requires keyBy() if i want to parallelize the data. In my case it will not work because if i create keyBy("userId") then the average will be calculated per userId  but i want average across al

Re: Python API not working

2017-12-15 Thread Yassine MARZOUGUI
Hi Ufuk, Thanks for your response. Unfortunately specifying 'streaming` or `batch` doesn't work, it looks like mode should be either "plan" or "operator" , and then the program expects other inputs from the stdin (id, port, etc.). 2017-12-15 14:23 GMT+01:00 Ufuk Celebi : > Hey Yassine, > > let m

Re: consecutive stream aggregations

2017-12-15 Thread Plamen Paskov
Hi Ufuk, Thanks for answer. It looks like in theory the accumulators are the solution to my problem but as i'm working on KeyedStream it's not possible to call aggregate with AggregateFunction implementation. Am i missing something? On 15.12.2017 15:46, Ufuk Celebi wrote: Hey Plamen, I th

Re: consecutive stream aggregations

2017-12-15 Thread Ufuk Celebi
Hey Plamen, I think what you are looking for is the AggregateFunction. This you can use on keyed streams. The Javadoc [1] contains an example for your use case (averaging). – Ufuk [1] https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/functions/Aggr

Re: Flink 1.4.0 keytab is unreadable

2017-12-15 Thread Ufuk Celebi
Hey 杨光, thanks for looking into this in such a detail. Unfortunately, I'm not sure what the expected behaviour is (whether the change in behaviour was accidental or on purpose). Let me pull in Gordon who has worked quite a bit on the Kerberos related components in Flink. @Gordon: 1) Do you know

Re: Python API not working

2017-12-15 Thread Ufuk Celebi
Hey Yassine, let me include Chesnay (cc'd) who worked on the Python API. I'm not familiar with the API and what it expects, but try entering `streaming` or `batch` for the mode. Chesnay probably has the details. – Ufuk On Fri, Dec 15, 2017 at 11:05 AM, Yassine MARZOUGUI wrote: > Hi All, > > I

Re: docker-flink images and CI

2017-12-15 Thread Ufuk Celebi
I agree with Patrick's (cc'd) comment from the linked issue. What I understand from the linked issue is that Patrick will take care of the Docker image update for 1.4 manually. Is that ok with you Patrick? :-) Regarding the Flink release process question: I fully agree with the idea to integrate t

SANSA 0.3 (Scalable Semantic Analytics Stack) Released

2017-12-15 Thread Jens Lehmann
Dear all, The Smart Data Analytics group [1] is happy to announce SANSA 0.3 - the third release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large know

Re: Flink State monitoring

2017-12-15 Thread Ufuk Celebi
Hey Liron, unfortunately, there are no built-in metrics related to state. In general, exposing the actual values as metrics is problematic, but exposing summary statistics would be a good idea. I'm not aware of a good work around at the moment that would work in the general case (taking into accou

Re: Flink 1.4.0 can not override JAVA_HOME for single-job deployment on YARN

2017-12-15 Thread Stephan Ewen
Could you open an issue to add the old config keys as backwards supported "deprecated keys"? That should help making the transition smoother. On Fri, Dec 15, 2017 at 9:29 AM, Fabian Hueske wrote: > Thanks for reporting back! > > 2017-12-15 4:52 GMT+01:00 杨光 : > >> Yes , i'm using Java8 , and i f

consecutive stream aggregations

2017-12-15 Thread Plamen Paskov
Hi, I'm trying to calculate the running average of session length and i want to trigger the computation on a regular let's say 2 minutes interval. I'm trying to do it like this: package flink; import lombok.AllArgsConstructor; import lombok.NoArgsConstructor; import lombok.ToString; import o

Flink 1.4.0 keytab is unreadable

2017-12-15 Thread 杨光
Hi, I am using flink single-job mode on YARN to read data from a kafka cluster installation configured for Kerberos. When i upgrade flink to 1.4.0 , the yarn application can not run normally and logs th error like this: Exception in thread "main" java.lang.RuntimeException: org.apache.flink.confi

Python API not working

2017-12-15 Thread Yassine MARZOUGUI
Hi All, I'm trying to use Flink with the python API, and started with the wordcount exemple from the Documentation. I'm using Flink 1.4 and python 2.7. When running env.execute(local=True), the comm

Re: Consecutive windowed operations

2017-12-15 Thread Fabian Hueske
Hi Ron, chaining windows as shown in the example was also possible before 1.4.0. So you can keep using Flink 1.3.2 if this would be the only reason to update to 1.4.0. Best, Fabian 2017-12-15 1:14 GMT+01:00 Ron Crocker : > In the 1.4 docs I stumbled on this section: Consecutive windowed > opera

Re: Can flink aggregate in local TM,then aggregate in global TM?

2017-12-15 Thread Fabian Hueske
Hi, (copying my answer from Stack Overflow) The current release of Flink (Flink 1.4.0, Dec 2017) does not feature built-in support for pre-aggregations. However, there are efforts on the way to add this for the next release (1.5.0), see FLINK-7561 [4] You can implement a pre-aggregation operatio

Re: Flink 1.4.0 can not override JAVA_HOME for single-job deployment on YARN

2017-12-15 Thread Fabian Hueske
Thanks for reporting back! 2017-12-15 4:52 GMT+01:00 杨光 : > Yes , i'm using Java8 , and i found the 1.4 version provided new > parameters : "containerized.master.env.ENV_VAR1" and > "containerized.taskmanager.env". > I change my start command from "-yD yarn.taskmanager.env.JAVA_HOME" to > " -y