Cannot load user classes

2017-12-12 Thread Soheil Pourbafrani
Hey, I wrote a code using Flink and creating fat jar using maven, I can errorlessly run it on a remote cluster. Trying to run it without creating a fat jar and directly from IDE I got the error Cannot load user class for not Flink core classes. For example `Cannot load user class: org.apache.flink.

Re: Calling an operator serially

2017-12-12 Thread Fabian Hueske
Hi, you are right. The purpose of a KeyedStream is to process all events/records with the same key by the same operator task (which runs in a single thread). The operator itself can have a greater parallelism, such that different keys are processed by different tasks. Best, Fabian 2017-12-13 1:0

Re: when does the timed window ends?

2017-12-12 Thread Jinhua Luo
Unless I generate event-time watermark continuously regardless of elements? Just like the doc does, it gives an example how to generate continuous watermark based on processing time (TimeLagWatermarkGenerator): https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_timestamps_waterm

Re: when does the timed window ends?

2017-12-12 Thread Jinhua Luo
If the window contains only one element, no more elements come in, then by default (with EventTimeTrigger), the window would be fired by next element if that element advances watermark which passes the end of the window, correct? That is, even if the window ends at 12:30, then if no more element co

Could not flush and close the file system output stream to s3a, is this fixed?

2017-12-12 Thread Hao Sun
https://issues.apache.org/jira/browse/FLINK-7590 I have a similar situation with Flink 1.3.2 on K8S = 2017-12-13 00:57:12,403 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermar

Re: [ANNOUNCE] Apache Flink 1.4.0 released

2017-12-12 Thread Vishal Santoshi
Awesome job folks, Congrats. A query as we shift to 1.4 are the flink_cep/flink-connector-kafka-0.11 for the 1.4 out on maven central ? Regards and congrats again. On Tue, Dec 12, 2017 at 9:44 AM, Hao Sun wrote: > Congratulations! Awesome work. > Two quick questions about the HDFS free featu

Re: hadoop error with flink mesos on startup

2017-12-12 Thread Eron Wright
Thanks for investigating this, Jared. I would summarize it as Flink-on-Mesos cannot be used in Hadoop-free mode in Flink 1.4.0. I filed an improvement bug to support this scenario: FLINK-8247 On Tue, Dec 12, 2017 at 11:46 AM, Jared Stehler < jared.steh...@intellifylearning.com> wrote: > I had

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Eron Wright
A couple of other details worth mentioning: 1. Your choice of connector matters a lot. Some connectors provide limited ordering guarantees, especially across keys, which leads to a highly disordered stream of events (with respect to event time) from the perspective of the watermark generator.

Re: Calling an operator serially

2017-12-12 Thread Philip Doctor
I guess my logs suggest this is simply what a KeyedStream does by default, I guess I was just trying to find a doc that said that rather than relying on my logs. From: Philip Doctor Date: Tuesday, December 12, 2017 at 5:50 PM To: "user@flink.apache.org" Subject: Calling an operator serially I

Calling an operator serially

2017-12-12 Thread Philip Doctor
I’ve got a KeyedStream, I only want max parallelism (1) per key in the keyed stream (i.e. if key is OrganizationFoo then only 1 input at a time from OrganizationFoo is processed by this operator). I feel like this is obvious somehow, but I’m struggling to find the docs for this. Can anyone poi

Re: hadoop error with flink mesos on startup

2017-12-12 Thread Jared Stehler
I had been excluding all transitive dependencies from the lib dir; it seems to be working when I added the following deps: commons-configuration commons-configuration 1.7 commons-lang commons-lang 2.6 -- Jared Stehler Chief Architect - In

Re: Flink-Kafka connector - partition offsets for a given timestamp?

2017-12-12 Thread Yang, Connie
Thanks, Gordan! Will keep an eye on that! Connie From: "Tzu-Li (Gordon) Tai" Date: Monday, December 11, 2017 at 5:29 PM To: Connie Yang Cc: "user@flink.apache.org" Subject: Re: Flink-Kafka connector - partition offsets for a given timestamp? Hi Connie, We do have a pull request for the feat

Re: hadoop error with flink mesos on startup

2017-12-12 Thread Jared Stehler
The class is there; this issue is a static initializer error, probably from other missing classes. I’ll try using the uber jar to see if that helps any, and will report back. I’ve included the shaded jar as a maven dependency: org.apache.flink flink-shaded-hadoop2 ${flink

Re: hadoop error with flink mesos on startup

2017-12-12 Thread Chesnay Schepler
Could you look into the flink-shaded-hadoop jar to check whether the missing class is actually contained? Where did the flink-shaded-hadoop jar come from? I'm asking because when building flink-dist from source the jar is called flink-shaded-hadoop2-uber-1.4.0.jar, which does indeed contain th

hadoop error with flink mesos on startup

2017-12-12 Thread Jared Stehler
After upgrading to flink 1.4.0 using the hadoop-free build option, I’m seeing the following error on startup in the app master: 2017-12-12 18:23:15.473 [main] ERROR o.a.f.m.r.clusterframework.MesosApplicationMasterRunner - Mesos JobManager initialization failed

Flink Kafka Producer Exception

2017-12-12 Thread Navneeth Krishnan
Hi, I have a kafka source and sink in my pipeline and when I start my job I get this error and the job goes to failed state. I checked the kafka node and everything looks good. Any suggestion on what is happening here? Thanks. java.lang.Exception: Failed to send data to Kafka: The server disconne

substantial realistic and idiomatic example applications

2017-12-12 Thread Derek VerLee
We are new to working with Flink and now that we have some basics down, we are looking for some codebases for Flink applications of real-world complexity and size, that could additionally be considered idiomatic and generally good code. Can anyone rec

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Vishal Santoshi
To add to Fabian's comment, What can be done ( and that may not be the norm ) is keep a 95-99% quantile ( using an Approximate Histogram such that the execution is not heavy ) of the diff between server ( or ingestion time ) and event time and use it as a max out of order ness.

Re: [ANNOUNCE] Apache Flink 1.4.0 released

2017-12-12 Thread Hao Sun
Congratulations! Awesome work. Two quick questions about the HDFS free feature. I am using S3 to store checkpoints, savepoints, and I know it is being done through hadoop-aws. - Do I have to include a hadoop-aws jar in my flatjar AND flink's lib directory to make it work for 1.4? Both or just the

Re: Off heap memory issue

2017-12-12 Thread Javier Lopez
Hi Piotr, We found out which one was the problem in the workers. After setting a value for XX:MaxMetaspaceSize we started to get OOM exceptions from the metaspace. We found out how Flink manages the User classes here https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_

Re: Kafka topic partition skewness causes watermark not being emitted

2017-12-12 Thread gerardg
I'm also affected by this behavior. There are no updates in FLINK-5479 but did you manage to find a way to workaround this? Thanks, Gerard -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Fabian Hueske
As I said before, you can solve that with a custom WatermarkAssigner. Collect a histogram, take the median out of X samples, ignore outliers, etc. 2017-12-12 13:37 GMT+01:00 Jinhua Luo : > Think about we have a normal ordered stream, if an abnormal event A > appears and thus advances the watermar

Re: [ANNOUNCE] Apache Flink 1.4.0 released

2017-12-12 Thread Flavio Pompermaier
Thanks Aljoscha! Just one question: is there any upgrade guideline? Or is the upgrade from 1.3.1 to 1.4 almost frictionless? On Tue, Dec 12, 2017 at 1:39 PM, Fabian Hueske wrote: > Thank you Aljoscha for managing the release! > > 2017-12-12 12:46 GMT+01:00 Aljoscha Krettek : > >> The Apache Flin

Re: [ANNOUNCE] Apache Flink 1.4.0 released

2017-12-12 Thread Fabian Hueske
Thank you Aljoscha for managing the release! 2017-12-12 12:46 GMT+01:00 Aljoscha Krettek : > The Apache Flink community is very happy to announce the release of Apache > Flink 1.4.0. > > Apache Flink® is an open-source stream processing framework for > distributed, high-performing, always-availab

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Jinhua Luo
Think about we have a normal ordered stream, if an abnormal event A appears and thus advances the watermark, making all subsequent normal events (earlier than A) late, I think it's a mistake. The ways you listed cannot help this mistake. The normal events cannot be dropped, and the lateness may be

[ANNOUNCE] Apache Flink 1.4.0 released

2017-12-12 Thread Aljoscha Krettek
The Apache Flink community is very happy to announce the release of Apache Flink 1.4.0. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for download at: https:

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Fabian Hueske
Early events are usually not an issue because the can be kept in state until they are ready to be processed. Also, depending on the watermark assigner often push the watermark ahead such that they are not early but all other events are late. Handling of late events depends on your use case and the

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Jinhua Luo
Yes, I know flink is flexible. But I am thinking when the event sequence is mess (e,g, branches of time-series events interleaved, but each branch has completely different time periods), then it's hard to apply them into streaming api, because no matter which way you generate watermark, the waterm

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Fabian Hueske
Hi, this depends on how you generate watermarks [1]. You could generate watermarks with a four hour delay and be fine (at the cost of a four hour latency) or have some checks that you don't increment a watermark by more than x minutes at a time. These considerations are quite use case specific, so

Re: Testing CoFlatMap correctness

2017-12-12 Thread Fabian Hueske
Hi Tovi, testing the behavior of a data flow with respect to the order of records from different sources is tricky. Source functions are working independently of each other and it is not easily possible to control the order in which records is shipped (and received) across source functions. You c

what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Jinhua Luo
Hi All, The watermark is monotonous incremental in a stream, correct? Given a stream out-of-order extremely, e.g. e4(12:04:33) --> e3 (15:00:22) --> e2(12:04:21) --> e1 (12:03:01) Here e1 appears first, so watermark start from 12:03:01, so e3 is an early event, it would be placed in another wind

Re: when does the timed window ends?

2017-12-12 Thread Fabian Hueske
No, that's exactly what is mean by "a window is created when the first element arrives". Otherwise, you'd have to fire empty windows for all possible keys (in case of a window operator on a keyed stream) which is obviously not possible. 2017-12-12 9:30 GMT+01:00 Jinhua Luo : > OK, I see. > > But

Re: when does the timed window ends?

2017-12-12 Thread Jinhua Luo
OK, I see. But what if a window contains no elements? Is it still get fired and invoke the window function? 2017-12-12 15:42 GMT+08:00 Fabian Hueske : > Hi, > > this depends on the window type. Tumbling and Sliding Windows are (by > default) aligned with the epoch time (1970-01-01 00:00:00). > Fo