Flik typesafe configuration

2017-11-29 Thread Georg Heiler
Starting out with flint from a scala background I would like to use the Typesafe configuration like: https://github.com/pureconfig/pureconfig, however, https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/best_practices.html link recommends to setup:

Maintain heavy hitters in Flink application

2017-11-29 Thread m@xi
Hello everyone! I want to implement a streaming algorithm like Misa-Gries or Space Saving in Flink. The goal is to maintain the heavy hitters for my (possibly unbounded) input streams throughout all the time my app runs. More precisely, I want to have a non-stop running task that runs the Space

Issue with Checkpoint restore( Beam pipeline)

2017-11-29 Thread Jins George
Hi, I am running a Beam Pipeline on Flink 1.2 and facing an issue in restoring a job from checkpoint. If I modify my beam pipeline to add a new operator and try to restore from the externalized checkpoint, I get the error /java.lang.IllegalStateException: Invalid Invalid number of operator

Re: getting started with link / scala

2017-11-29 Thread Georg Heiler
You would suggest: https://github.com/ottogroup/flink-spector for unit tests? Georg Heiler schrieb am Mi., 29. Nov. 2017 um 22:33 Uhr: > Thanks, this sounds like a good idea - can you recommend such a project? > > Jörn Franke schrieb am Mi., 29.

Re: getting started with link / scala

2017-11-29 Thread Georg Heiler
Thanks, this sounds like a good idea - can you recommend such a project? Jörn Franke schrieb am Mi., 29. Nov. 2017 um 22:30 Uhr: > If you want to really learn then I recommend you to start with a flink > project that contains unit tests and integration tests (maybe

Re: getting started with link / scala

2017-11-29 Thread Jörn Franke
If you want to really learn then I recommend you to start with a flink project that contains unit tests and integration tests (maybe augmented with https://wiki.apache.org/hadoop/HowToDevelopUnitTests to simulate a HDFS cluster during unit tests). It should also include coverage reporting.

getting started with link / scala

2017-11-29 Thread Georg Heiler
Getting started with Flink / scala, I wonder whether the scala base library should be excluded as a best practice: https://github.com/tillrohrmann/flink-project/blob/master/build.sbt#L32 // exclude Scala library from assembly assemblyOption in assembly := (assemblyOption in

Re: Dataset using several count operator in the same environment

2017-11-29 Thread Timo Walther
Hi Ebru, the count() operator is a very simple utility functions that calls execute() internally. If you want to have a more complex pipeline you can take a look at how our WordCount [0] example works. The general concept is to emit a 1 for every record and sum the ones in parallel. If you

Re: Classpath for execution of KafkaSerializer/Deserializer; java.lang.NoClassDefFoundError while class in job jar

2017-11-29 Thread Chesnay Schepler
This issues sounds strikingly similar to FLINK-6965. TL;DR: You must place classes loaded during serialization by the kafka connector under /lib. On 29.11.2017 16:15, Timo Walther wrote: Hi Bart, usually, this error means that your Maven project configuration is not correct. Is your custom

Re: Classpath for execution of KafkaSerializer/Deserializer; java.lang.NoClassDefFoundError while class in job jar

2017-11-29 Thread Timo Walther
Hi Bart, usually, this error means that your Maven project configuration is not correct. Is your custom class included in the jar file that you submit to the cluster? It might make sense to share your pom.xml with us. Regards, Timo Am 11/29/17 um 2:44 PM schrieb Bart Kastermans: I have

Re: Taskmanagers are quarantined

2017-11-29 Thread Stephan Ewen
We also saw issues in the failure detection/quarantining with some Hadoop versions because of a subtle runtime netty version conflict. Fink 1.4 shades Flink's / Akka's Netty, in Flink 1.3 you may need to exclude the Netty dependency pulled in through Hadoop explicitly. Also, Hadoop version

Dataset using several count operator in the same environment

2017-11-29 Thread ebru
Hi all, We are trying to use more than one count operator for dataset, but it executes first count and skips other operations. Also we call env.execute(). How can we solve this problem? -Ebru

Classpath for execution of KafkaSerializer/Deserializer; java.lang.NoClassDefFoundError while class in job jar

2017-11-29 Thread Bart Kastermans
I have a custom serializer for writing/reading from kafka. I am setting this up in main with code as follows: val kafkaConsumerProps = new Properties() kafkaConsumerProps.setProperty("bootstrap.servers", kafka_bootstrap)

Re: Taskmanagers are quarantined

2017-11-29 Thread Till Rohrmann
Hi, you could also try increasing the heartbeat timeout via `akka.watch.heartbeat.pause`. Maybe this helps to overcome the GC pauses. Cheers, Till On Wed, Nov 29, 2017 at 12:41 PM, T Obi wrote: > Warnings of Datanode appeared not in all cases of timeout. They seem > to be

FlinkKafkaProducerXX

2017-11-29 Thread Mikhail Pryakhin
Hi all, I've just come across a FlinkKafkaProducer misconfiguration issue especially when a FlinkKafkaProducer is created without specifying a kafka partitioner then a FlinkFixedPartitioner instance is used, and all messages end up in a single kafka partition (in case I have a single task

Re: Taskmanagers are quarantined

2017-11-29 Thread T Obi
Warnings of Datanode appeared not in all cases of timeout. They seem to be raised just by timeout while snapshotting. We output GC logs on taskmanagers and found that someone kicks System.gc() every an hour. So a full GC runs every an hour, and it takes about a minute or more in our cases... When

Re: Question about Timestamp in Flink SQL

2017-11-29 Thread Timo Walther
Hi Wangsan, I opened an issue to document the behavior properly in the future (https://issues.apache.org/jira/browse/FLINK-8169). Basically, both your event-time and processing-time timestamps should be GMT. We plan to support offsets for windows in the future

Re: Are there plans to support Hadoop 2.9.0 on near future?

2017-11-29 Thread Kostas Kloudas
Hi Oriol, This estimation is not accurate and the whole plan is a bit outdated. This was based on an outdated time-based release model that the community tried but without the expected results, so we changed it. You can follow the release voting for 1.4 in the dev mailing list. And the

Re: Are there plans to support Hadoop 2.9.0 on near future?

2017-11-29 Thread ORIOL LOPEZ SANCHEZ
Thanks, it helped a lot! But I've seen on https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+and+Feature+Plan#FlinkReleaseandFeaturePlan-Flink1.4 that they estimated releasing 1.4 at September. Do you know if it will be released this year or we may have to wait longer? Thanks a

Re: user driven stream processing

2017-11-29 Thread Fabian Hueske
Another example is King's RBEA platform [1] which was built on Flink. In a nutshell, RBEA runs a single large Flink job, to which users can add queries that should be computed. Of course, the query language is restricted because they queries must match on the structure of the running job. Hope

Re: How to perform efficient DataSet reuse between iterations

2017-11-29 Thread Fabian Hueske
The monitoring REST interface provides detailed stats about a job, its tasks, and processing verticies including their start and end time [1]. However, it is not trivial to make sense of the execution times because Flink uses pipelined shuffles by default. That means that the execution of

Re: Are there plans to support Hadoop 2.9.0 on near future?

2017-11-29 Thread Kostas Kloudas
Hi Oriol, As you may have seen form the mailing list we are currently in the process of releasing Flink 1.4. This is going to be a hadoop-free distribution which means that it should work with any hadoop version, including Hadoop 2.9.0. Given this, I would recommend to try out the release

Re: Question about Timestamp in Flink SQL

2017-11-29 Thread wangsan
Hi Timo, What I am doing is extracting a timestamp field (may be string format as “2017-11-28 11:00:00” or a long value base on my current timezone) as Event time attribute. So In timestampAndWatermarkAssigner , for string format I should parse the data time string using GMT, and for long

Re: Question about Timestamp in Flink SQL

2017-11-29 Thread Timo Walther
Hi Wangsan, currently the timestamps in Flink SQL do not depend on a timezone. All calculations happen on the UTC timestamp. This also guarantees that an input with Timestamp.valueOf("XXX") remains consistent when parsing and outputing it with toString(). Regards, Timo Am 11/29/17 um 3:43

Re: S3 Access in eu-central-1

2017-11-29 Thread Ufuk Celebi
Hey Dominik, yes, we should definitely add this to the docs. @Nico: You recently updated the Flink S3 setup docs. Would you mind adding these hints for eu-central-1 from Steve? I think that would be super helpful! Best, Ufuk On Tue, Nov 28, 2017 at 10:00 PM, Dominik Bruhn