Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Fabian Hueske
Hi Wouter, The DataStream API accumulators of the AggregateFunction [1] are stored in state and should be recovered in case of a failure as well. If this does not work, it would be a serious bug. What's the type of your accumulator? Can you maybe share the code? How to you apply the AggregateFunc

Re: Flink dashboard+ program arguments

2019-05-02 Thread Fabian Hueske
Hi, The SQL client can be started with > ./bin/sql-client.sh embedded Best, Fabian Am Di., 30. Apr. 2019 um 20:13 Uhr schrieb Rad Rad : > Thanks, Fabian. > > The problem was incorrect java path. Now, everything works fine. > > I would ask about the command for running sql-client.sh > > These

Re: Filter push-down not working for a custom BatchTableSource

2019-05-02 Thread Fabian Hueske
Hi Josh, Does your TableSource also implement ProjectableTableSource? If yes, you need to make sure that the filter information is also forwarded if ProjectableTableSource.projectFields() is called after FilterableTableSource.applyPredicate(). Also make sure to correctly implement FilterableTableS

Re: Timestamp and key preservation over operators

2019-05-02 Thread Fabian Hueske
Hi Averell, The watermark of a stream is always the low watermark of all its input streams. If one of the input streams does not have watermarks, Flink does not compute a watermark for the merged stream. If you do not need time-based operations on streams 3 and 4, setting the watermark to MAX_WATE

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Wouter Zorgdrager
Hi Fabian, Maybe I should clarify a bit, actually I'm using a (Long)Counter registered as Accumulator in the RuntimeContext [1]. So I'm using a KeyedProcessFunction, not an AggregateFunction. This works property, but is not retained after a job restart. I'm not entirely sure if I did this correct.

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Fabian Hueske
Hi Wouter, OK, that explains it :-) Overloaded terms... The Table API / SQL documentation refers to the accumulator of an AggregateFunction [1]. The accumulators that are accessible via the RuntimeContext are a rather old part of the API that is mainly intended for batch jobs. I would not use th

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Paul Lam
Hi Wouter, I've met the same issue and finally managed to use operator states to back the accumulators, so they can be restored after restarts. The downside is that we have to update the values in both accumulators and states to make them consistent. FYI. Best, Paul Lam Fabian Hueske 于2019年5月2日

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Wouter Zorgdrager
+1, especially if you don't want to rely on external metric reporter this is a nice feature. Op do 2 mei 2019 om 10:29 schreef Fabian Hueske : > Hi, > > Both of you seem to have the same requirement. > This is a good indication that "fault-tolerant metrics" are a missing > feature. > It might mak

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Fabian Hueske
Hi, Both of you seem to have the same requirement. This is a good indication that "fault-tolerant metrics" are a missing feature. It might make sense to think about a built-in mechanism to back metrics with state. Cheers, Fabian Am Do., 2. Mai 2019 um 10:25 Uhr schrieb Paul Lam : > Hi Wouter,

Re: HA lock nodes, Checkpoints, and JobGraphs after failure

2019-05-02 Thread Till Rohrmann
Thanks for the update Dyana. I'm also not an expert in running one's own ZooKeeper cluster. It might be related to setting the ZooKeeper cluster properly up. Maybe someone else from the community has experience with this. Therefore, I'm cross posting this thread to the user ML again to have a wider

Re: Ask about running Flink sql-client.sh

2019-05-02 Thread Radhya Sahal
Thanks a lot. On Thu, May 2, 2019, 9:59 AM David Anderson wrote: > There are some step-by-step instructions for setting up the sql client in > https://training.ververica.com/setup/sqlClient.html, plus some examples. >

Re: Ask about running Flink sql-client.sh

2019-05-02 Thread David Anderson
There are some step-by-step instructions for setting up the sql client in https://training.ververica.com/setup/sqlClient.html, plus some examples.

Re: Ask about Running Flink Jobs From Eclipse

2019-05-02 Thread David Anderson
When you run a Flink job from within an IDE, you end up running with a LocalStreamEnvironment (rather than a remote cluster) that by default does not provide the Web UI. If you want the Flink running in the IDE to have its own dashboard, you can do this by adding this to your application: Conf

ClassNotFoundException on remote cluster

2019-05-02 Thread Abhishek Jain
Hi, I'm running into ClassNotFoundException only when I run my application on a standalone cluster (using flink cli). If i directly run the main class in my IDE, it's working fine. I've copied the stacktrace from flink standalone session logs here

Re: configuration of standalone cluster

2019-05-02 Thread Chesnay Schepler
Which java version are you using? On 01/05/2019 21:31, Günter Hipler wrote: Hi, For the first time I'm trying to set up a standalone cluster. My current configuration 4 server (1 jobmanger and 3 taskmanager) a) starting the cluster swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-clu

Re: Type Erasure - Usage of class Tuple as a type is not allowed. Use a concrete subclass (e.g. Tuple1, Tuple2, etc.

2019-05-02 Thread Chesnay Schepler
I'm not sure what you're asking. If you have a Deserialization schema that convert the data into a Map you're done as I understand it, what do you believe to be missing? If, for a given job, the number/types of fields are fixed you could look into using Row. On 01/05/2019 22:40, Vijay Balak

Re: ClassNotFoundException on remote cluster

2019-05-02 Thread Chesnay Schepler
How are you packaging the jar that you submit? Specifically, are you ensuring that all your classes are actually contained within? On 02/05/2019 13:38, Abhishek Jain wrote: Hi, I'm running into ClassNotFoundException only when I run my application on a standalone cluster (using flink cli). If

Re: configuration of standalone cluster

2019-05-02 Thread Günter Hipler
swissbib@sb-ust1:~$ java -version openjdk version "11.0.2" 2019-01-15 OpenJDK Runtime Environment (build 11.0.2+9-Ubuntu-3ubuntu118.04.3) OpenJDK 64-Bit Server VM (build 11.0.2+9-Ubuntu-3ubuntu118.04.3, mixed mode, sharing) swissbib@sb-ust1:~$ Is version 8 more appropriate? Günter On 02.05.1

Re: configuration of standalone cluster

2019-05-02 Thread Abhishek Jain
Java version: "1.8.0_112" Java(TM) SE Runtime Environment (build 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode) On Thu, 2 May 2019 at 17:18, Chesnay Schepler wrote: > Which java version are you using? > > On 01/05/2019 21:31, Günter Hipler wrote: > > Hi, > > > >

Re: ClassNotFoundException on remote cluster

2019-05-02 Thread Abhishek Jain
This is a spring boot app that I've packaged using maven (Apache Maven 3.3.9). I've verified the class is present in the jar as well. On Thu, 2 May 2019 at 17:25, Chesnay Schepler wrote: > How are you packaging the jar that you submit? Specifically, are you > ensuring that all your classes are a

Re: configuration of standalone cluster

2019-05-02 Thread Chesnay Schepler
Flink still only works with Java 8 at the moment. It will be a while until we properly support Java 11. On 02/05/2019 13:58, Günter Hipler wrote: swissbib@sb-ust1:~$ java -version openjdk version "11.0.2" 2019-01-15 OpenJDK Runtime Environment (build 11.0.2+9-Ubuntu-3ubuntu118.04.3) OpenJDK 64-

Re: Can't build Flink for Scala 2.12

2019-05-02 Thread Chesnay Schepler
You can monitor https://issues.apache.org/jira/browse/FLINK-12392 for the compile issue. On 01/05/2019 22:05, Chesnay Schepler wrote: You are correct, that is a typo. Very well done for spotting it, will fix it right away. We can conclude that the current SNAPSHOT version does not build with

Re: ClassNotFoundException on remote cluster

2019-05-02 Thread Abhishek Jain
Also, the `execute()` call happens inside `MyWikiAnalysis` spring managed bean on PostConstruct but I don't think that should cause any issue. Any idea? Let me know if you need more info on my environment. On Thu, 2 May 2019 at 17:32, Abhishek Jain wrote: > This is a spring boot app that I've p

Re: Re: configuration of standalone cluster

2019-05-02 Thread guenterh.li...@bluewin.ch
Thanks a lot for the hint - this seems to solve the problem openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) 2019-05-02 15:17:44,109 INFO org.apache.flink.runtime.taskexecutor.TaskExe

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-05-02 Thread an0
This explanation is exactly what I'm looking for, thanks! Is such an important rule documented anywhere in the official document? On 2019/04/30 08:47:29, Fabian Hueske wrote: > An operator task broadcasts its current watermark to all downstream tasks > that might receive its records. > If you h

Re: Exceptions when launching counts on a Flink DataSet concurrently

2019-05-02 Thread Juan Rodríguez Hortalá
Thanks for your answer Fabian. In my opinion this is not just a possible new feature for an optimization, but a bigger problem because the client program crashes with an exception when concurrent counts or collects are triggered on the same data set, and this also happens non deterministically dep

Re: Filter push-down not working for a custom BatchTableSource

2019-05-02 Thread Josh Bradt
Hi Fabian, Thanks for your reply. My custom table source does not implement ProjectableTableSource. I believe that isFilterPushedDown is implemented correctly since it's nearly identical to what's written in the OrcTableSource. I pasted a slightly simplified version of the implementation below. If

Re: Write batch function for apache flink

2019-05-02 Thread anurag
Thanks Averell, let me try this out and get back to you. On Sat, Apr 27, 2019 at 11:44 PM Averell wrote: > Hi Anurag, > > Something like this one: > https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/batch/ > Is it what you are looking for? > > Regards, > Averell > > > > > -- > Sent

DateTimeBucketAssigner using Element Timestamp

2019-05-02 Thread Peter Groesbeck
Hi all, I have an application that reads from various Kafka topics and writes parquet files to corresponding buckets on S3 using StreamingFileSink with DateTimeBucketAssigner. The upstream application that writes to Kafka also writes records as gzipped json files to date bucketed locations on S3 a

Re: Timestamp and key preservation over operators

2019-05-02 Thread Averell
Thank you Fabian. I have one more question about timestamp: In the previous email, you asked how did I check the timestamp - I don't have an answer. Then I only checked the watermark, not the timestamp. I had the (wrong) assumption that watermarks advance along with timestamps. Today I played with