CEP with Kafka source

2017-08-04 Thread Björn Hedström
Hi, I am writing a small application which monitors a couple of directories for files which are read by Kafka and later consumed by Flink. Flink then performs some operations on the records (such as extracting the embedded timestamp) and tries to find a pattern using CEP. Since the data can be out

Re: CEP with Kafka source

2017-08-04 Thread Dawid Wysakowicz
Hi Björn, You are correct that CEP library buffers events until a watermark with a greater timestamp arrives. It is because the order of events in case of CEP is crucial. Imagine a Pattern like A next B. And sequence a(t=1) c(t=10) b(t=2). If we do not wait until the Watermark and sort the even

Re: Cannot restore from savepoint after adding a sink operator

2017-08-04 Thread Stefan Richter
Hi, in Flink 1.2.x the restore will not succeed because it was mapping states on a task level, not at the operator level. This makes it impossible to add stateful operators somewhere to an operator chain, because Flink could not figure out which state belongs to which operator after such a modi

Re: State Backend

2017-08-04 Thread Stefan Richter
Hi, if the question is, if there are certain requirements for the filesystem that you use with the state backends, then I think there might be a small misconception. Currently, all state backends in Flink operator local to the task, i.e. either in memory (e.g. FsStateBackend) or also on the loc

Re: Akka Quarantine & Old YARN Versions

2017-08-04 Thread Nico Kruber
Hi Konstantin, I just checked the code and the configuration option is still there and should be working. Somehow, the backport for the 1.2 release branch did contain the documentation while the actual commit on master did not. Thanks for the info, let me create a hotfix to fix that. Nico On T

Re: Akka Quarantine & Old YARN Versions

2017-08-04 Thread Aljoscha Krettek
Hi Konstantin, If you can at all wait, I would suggest to skip updating to 1.3.1 and go directly to (the not yet released) 1.3.2. Flink 1.3.0 and 1.3.1 had a few critical bugs that are not fixed. Most notably, there was a problem in the Kafka consumer that could lead to state corruption/data du

Re: Flink -mesos-app master hang

2017-08-04 Thread Till Rohrmann
Hi Biswajit, are there any Mesos logs which might help us pinpointing the problem? I've actually never run Flink on Mesos with Docker images. But it could be that Flink does not set things properly up for running Docker images. I'll try to run Flink based on Docker images over the weekend in order

Re: Event-time and first watermark

2017-08-04 Thread Aljoscha Krettek
Hi, How are you defining the watermark, i.e. what kind of watermark extractor are you using? Best, Aljoscha > On 3. Aug 2017, at 17:45, Gwenhael Pasquiers > wrote: > > We're not using a Window but a more basic ProcessFunction to handle sessions. > We made this choice because we have to hand

Re: WaterMark & Eventwindow not fired correctly

2017-08-04 Thread Aljoscha Krettek
Hi, Could you please provide a snipped of code or some minimal example that would help us reproducing your problem? Best, Aljoscha > On 3. Aug 2017, at 17:41, aitozi wrote: > > > Hi, > > i have encounted a problem, i apply generate and assign watermark at the > datastream, and then keyBy, a

k8s FileNotFoundException

2017-08-04 Thread Kaepke, Marc
Hi everyone, I always get an FileNotFoundException by following the kubernetes setup guide [1]. I moved my jar and my input file onto the job manager pod After that I join the job manager pod by using: kubectl exec -it - - /bin/bash With ls I can see both files. WordCount-example worked well.

Synchronized Kafka sources

2017-08-04 Thread Yunus Olgun
Hi, Is it possible to synchronize two kafka sources? So they can consume from different Kafka topics in close enough event times. My use case is, I have two Kafka topics: A(very large) and B(large). There is a mapping of one to one or zero between A and B. Topology is simply join A and B in a tum

Null pointer exception while trying to serialize a protobuf message

2017-08-04 Thread Sridhar Chellappa
Folks, I wrote a custom Data source to test me CEP logic. The custom data source looks like : public class CustomerDataSource extends RichParallelSourceFunction { private boolean running = true; private final Random random; public CustomerDataSource() { this.random = new Rand

Manually controlling time for integration test

2017-08-04 Thread Maksym Parkachov
Hi, I'm evaluating Flink as alternative to Spark streaming for test project reading from Kafka and saving to Cassandra. Everything works, but I'm struggling with integration tests. I could not figure out how to manually move time in Flink. Basically, I write message in Kafka with event time in the

Re: Flink Vs Google Cloud Dataflow?

2017-08-04 Thread Gábor Gévay
Hello, Have you seen these two blog posts? They explain the relationship between Apache Flink, Apache Beam, and Google Cloud Dataflow. https://data-artisans.com/blog/why-apache-beam https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective Best, Gábor On Mon, Jul 31,

Re: Null pointer exception while trying to serialize a protobuf message

2017-08-04 Thread Ted Yu
Can you show how CustomerMessage is defined ? Thanks On Fri, Aug 4, 2017 at 7:22 AM, Sridhar Chellappa wrote: > Folks, > > I wrote a custom Data source to test me CEP logic. The custom data source > looks like : > > public class CustomerDataSource extends RichParallelSourceFunction { > priv

RE: Event-time and first watermark

2017-08-04 Thread Gwenhael Pasquiers
We're using a AssignerWithPunctuatedWatermarks that extracts a timestamp from the data. It keeps and returns the higher timestamp it has ever seen and returns a new Watermark everytime the value grows. I know it's bad for performances, but for the moment it's not the issue, i want the most poss

Re: blob store defaults to /tmp and files get deleted

2017-08-04 Thread Shannon Carey
Stephan, Regarding your last reply to http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/blob-store-defaults-to-tmp-and-files-get-deleted-td11720.html You mention "Flink (via the user code class loader) actually holds a reference to the JAR files in "/tmp", so even if "/tmp" ge

Re: state inside functions

2017-08-04 Thread Fabian Hueske
Hi Peter, function objects (such as an instance of a class that extends MapFunction) that are used to construct a plan are serialized using Java serialization and shipped to the workers for execution. Therefore, function classes must be Serializable. In general it is recommended to configure funct

Re: Access Sliding window

2017-08-04 Thread Fabian Hueske
Hi Raj, you have to combine two streams. The first stream has the running avg + std-dev over the last 6 hours, the second stream has the 15 minute counts. Both streams emit one record every 15 minutes. What you wan to do is to join the two records of both streams with the same timestamp. You do th

Re: blob store defaults to /tmp and files get deleted

2017-08-04 Thread Eron Wright
The directory referred to by `blob.storage.directory` is best described as a local cache. For recovery purposes the JARs are also stored in ` high-availability.storageDir`.At least that's my reading of the code in 1.2. Maybe there's some YARN specific behavior too, sorry if this information

Re: Manually controlling time for integration test

2017-08-04 Thread Fabian Hueske
Hi Maxim, you could inject an AssignerWithPunctuatedWatermarks into your plan which emits a watermark for every record it sees. That way you can increment the logical time for every record. Best, Fabian 2017-08-04 16:27 GMT+02:00 Maksym Parkachov : > Hi, > > I'm evaluating Flink as alternative

Re: Proper way to establish bucket counts

2017-08-04 Thread Fabian Hueske
Hi Robert, That's right. The count's are on a per operator-level. I think you can get down to the task-level but counts per bucket are not tracked. Maybe Chesnay (in CC) can help here. He knows the metrics system the best. @Chesnay, is there a way to expire metric counters? Alternatively, you cou

Re: Access Sliding window

2017-08-04 Thread Raj Kumar
Thanks Fabian. The incoming events have the timestamps. Once I aggregate in the first stream to get counts and calculate the mean/standard deviation in the second the new timestamps should be window start time ? How to tackle this issue ? -- View this message in context: http://apache-flink-

Re: Access Sliding window

2017-08-04 Thread Fabian Hueske
TimeWindow.getStart() or TimeWindow.getEnd() -> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#incremental-window-aggregation-with-reducefunction 2017-08-04 22:43 GMT+02:00 Raj Kumar : > Thanks Fabian. > > The incoming events have the timestamps. Once I aggregate in

Re: Access Sliding window

2017-08-04 Thread Raj Kumar
Thanks Fabian. I do have one more question. When we connect the two streams and perfrom coprocess function. There are two separate methods for each streams. Which stream state we need to store and Will the coprocess function automatically trigger once the other stream data or should we set some tim

JMX stats reporter with all task manager/job manager stats aggregated?

2017-08-04 Thread Ajay Tripathy
Hi, I'm running flink jobmanagers/taskmanagers with yarn. I've turned on the JMX reporter in my flink-conf.yaml as follows: metrics.reporters: jmx metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter I was wondering: Is there a JMX server with the aggregated stats across all jo

Re: Null pointer exception while trying to serialize a protobuf message

2017-08-04 Thread Ted Yu
I searched in Flink (and hbase) for GeneratedMessageV3 but didn't find any reference. Which version of protobuf did you use to generate the class ? Please copy user@ in the future so that more people can help. On Fri, Aug 4, 2017 at 8:27 AM, Sridhar Chellappa wrote: > public final class Custom

Re: JMX stats reporter with all task manager/job manager stats aggregated?

2017-08-04 Thread Ajay Tripathy
Sorry: neglected to include the stack trace for JMX failing to instantiate from a taskmanager: 017-08-05 00:59:09,388 INFO org.apache.flink.runtime.metrics.MetricRegistry - Configuring JMXReporter with {port=8789, class=org.apache.flink.metrics.jmx.JMXReporter}. 2017-08-05 00:59:09,4

Re: Flink -mesos-app master hang

2017-08-04 Thread Biswajit Das
Hi Till , Thank you for the reply , I have posted some logs with initial email chain . I think issue is more to do with docker private registry when there is authorization involved . I can run docker running Job manager and task manager as separate task for marathon and connect via RPC port . I wa