Programmatic creation of YARN sessions and deployment (running) Flink jobs on it.

2018-03-26 Thread kedar mhaswade
Typically, when one wants to run a Flink job on a Hadoop YARN installation, one creates a Yarn session (e.g. ./bin/yarn-session.sh -n 4 -qu test-yarn-queue) and runs intended Flink job(s) (e.g. ./bin/flink run -c MyFlinkApp -m job-manager-host:job-manager-port myapp.jar) on the Flink cluster whose

Re: Secure TLS/SSL ElasticSearch connector for current and future connector

2018-03-26 Thread Fritz Budiyanto
Hi Christophe, Thanks so much for the pointers. That helps. Looking at the latest update on https://issues.apache.org/jira/browse/FLINK-8101 , there was an issue related to HLR retry handling. If I read this correctly, there is a bug in ES/HLR

Re: Secure TLS/SSL ElasticSearch connector for current and future connector

2018-03-26 Thread Christophe Jolif
Hi Fritz, I think the High Level Rest Client implementation in this PR: https://github.com/apache/flink/pull/5374 should work. If you don't get the certificate properly available in your Java certs, you might want to redefine the createClient method to do something along those lines to get the con

Secure TLS/SSL ElasticSearch connector for current and future connector

2018-03-26 Thread Fritz Budiyanto
Hi All, Anyone know if Flink has TLS/SSL support for the current ES connector ? If yes, any sample configuration/code ? If not, would TLS/SSL be support in the upcoming ES connector using Java High Level client ? Thanks, Fritz

Table/SQL Kafka Sink Question

2018-03-26 Thread Pavel Ciorba
Hi everyone! Can I specify a *message key* using the Kafka sink in the Table/SQL API ? The goal is to sink each row as JSON along side with a message key into Kafka. I was achieving it using the Stream API by specifying a *KeyedSerializationSchema* using the *serializeKey() *method. Thanks in ad

Re: Query regarding to CountinousFileMonitoring operator

2018-03-26 Thread Puneet Kinra
Hi Kostas Thanks for the reply, Yep i am planning to implement the same. On Mon, Mar 26, 2018 at 7:53 PM, Kostas Kloudas wrote: > Hi Puneet, > > If you mean that after processing a file, you want to move it to another > directory outside the one containing > the data to be processed, then I

Keyby connect for a one to many relationship - DataStream API - Ride Enrichment (CoProcessFunction)

2018-03-26 Thread Dulce Morim
Hello, Following this exercise: http://training.data-artisans.com/exercises/rideEnrichment-processfunction.html I need to do something similar, but my data structure is something like: A Primary_key other fields B Primary_key Relation_Key other fields Where A and B relationship is one to more,

Re: Query regarding to CountinousFileMonitoring operator

2018-03-26 Thread Kostas Kloudas
Hi Puneet, If you mean that after processing a file, you want to move it to another directory outside the one containing the data to be processed, then I am afraid that this is currently not possible. This is because the whole logic of how to treat files is included in your FileInputFormat.

Re: "dynamic" bucketing sink

2018-03-26 Thread Christophe Jolif
Thanks Timo & Ashish for your input. I will definitely have a look at Kite SDK (was not aware of it). Otherwise I'll try to prototype something and share it with the community through a JIRA issue. -- Christophe On Mon, Mar 26, 2018 at 1:34 PM, ashish pok wrote: > Hi Christophe, > > Have you l

Re: How can I confirm a savepoint is used for a new job?

2018-03-26 Thread Hao Sun
Thanks Tim. On Mon, Mar 26, 2018, 03:04 Timo Walther wrote: > Hi Hao, > > I quickly checked that manually. There should be a message similar to > the one below in the JobManager log: > > INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - > Starting job from savepoint ... > > Rega

Re: Error running on Hadoop 2.7

2018-03-26 Thread ashish pok
Stephan, we are in 1.4.2. Thanks, -- Ashish On Mon, Mar 26, 2018 at 7:38 AM, Stephan Ewen wrote: If you are on Flink 1.4.0 or 1.4.1, please check if you accidentally have Hadoop in your application jar. That can mess up things with child-first classloading. 1.4.2 should handle Hadoop pro

Re: Query regarding to CountinousFileMonitoring operator

2018-03-26 Thread Puneet Kinra
Hi Timo FileInputFormat fileInputFormat = new TextInputFormat(new Path(fileSystem+this.path)); fileInputFormat.setNestedFileEnumeration(true); fileInputFormat.setFilesFilter(new UnicaFileFilter(".csv")); DataStreamvalue =this.execEnv.readFile(fileInputFormat, fileSystem+this.path, F

Re: "dynamic" bucketing sink

2018-03-26 Thread ashish pok
Hi Christophe, Have you looked at Kite SDK? We do something like this but using Gobblin and Kite SDK, which is a parallel pipeline to Flink. It feels like if you partition by something logical like topic name, you should be able to sink using Kite SDK. Kite allows you good ways to handle further

Re: Standalone cluster instability

2018-03-26 Thread Alexander Smirnov
Hi Piotr, I didn't find anything special in the logs before the failure. Here are the logs, please take a look: https://drive.google.com/drive/folders/1zlUDMpbO9xZjjJzf28lUX-bkn_x7QV59?usp=sharing The configuration is: 3 task managers: qafdsflinkw011.scl qafdsflinkw012.scl qafdsflinkw013.scl -

Re: Strange behavior on filter, group and reduce DataSets

2018-03-26 Thread simone
Hi Fabian, any update on this? Did you fix it? Best, Simone. On 22/03/2018 00:24, Fabian Hueske wrote: Hi, That was a bit too early. I found an issue with my approach. Will come back once I solved that. Best, Fabian 2018-03-21 23:45 GMT+01:00 Fabian Hueske >:

Re: Strange behavior on filter, group and reduce DataSets

2018-03-26 Thread Fabian Hueske
Hi, Yes, I've updated the PR. It needs a review and should be included in Flink 1.5. Cheers, Fabian simone schrieb am Mo., 26. März 2018, 12:01: > Hi Fabian, > > any update on this? Did you fix it? > > Best, Simone. > > On 22/03/2018 00:24, Fabian Hueske wrote: > > Hi, > > That was a bit too ea

Re: Error running on Hadoop 2.7

2018-03-26 Thread Stephan Ewen
If you are on Flink 1.4.0 or 1.4.1, please check if you accidentally have Hadoop in your application jar. That can mess up things with child-first classloading. 1.4.2 should handle Hadoop properly in any case. On Sun, Mar 25, 2018 at 3:26 PM, Ashish Pokharel wrote: > Hi Ken, > > Yes - we are on

Re: Out off memory when catching up

2018-03-26 Thread Timo Walther
Hi Lasse, in order to avoid OOM exception you should analyze your Flink job implementation. Are you creating a lot of objects within your Flink functions? Which state backend are you using? Maybe you can tell us a little bit more about your pipeline? Usually, there should be enough memory fo

Re: InterruptedException when async function is cancelled

2018-03-26 Thread Timo Walther
Hi Ken, as you can see here [1], Flink interrupts the timer service after a certain timeout. If you want to get rid of the exception, you should increase "task.cancellation.timers.timeout" in the configuration. Actually, the default is already set to 7 seconds. So your exception should not b

Re: How can I confirm a savepoint is used for a new job?

2018-03-26 Thread Timo Walther
Hi Hao, I quickly checked that manually. There should be a message similar to the one below in the JobManager log: INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Starting job from savepoint ... Regards, Timo Am 22.03.18 um 06:45 schrieb Hao Sun: Do we have any logs in J

Re: Query regarding to CountinousFileMonitoring operator

2018-03-26 Thread Timo Walther
Hi Puneet, can you share a little code example with us? I could not reproduce your problem. You have to keep in mind that a setParallelism() only affects the last operation. If you want to change the default parallelism of the entire pipeline, you have to change it in StreamExecutionEnvironm

Re: Issue in Flink/Zookeeper authentication via Kerberos

2018-03-26 Thread Timo Walther
Hi Sarthak, I'm not a Kerberos expert but maybe Eron or Shuyi are more familiar with the details? Would be great if somebody could help. Thanks, Timo Am 22.03.18 um 10:16 schrieb Sahu, Sarthak 1. (Nokia - IN/Bangalore): Hi Folks, *_Environment Setup:_* 1. I have configured KDC 5 server.

Re: "dynamic" bucketing sink

2018-03-26 Thread Timo Walther
Hi Christophe, I think this will require more effort. As far as I know there is no such "dynamic" feature. Have you looked in to the bucketing sink code? Maybe you can adapt it to your needs? Otherwise it might also make sense to open an issue for it to discuss a design for it. Maybe other c

Re: How to handle large lookup tables that update rarely in Apache Flink - Stack Overflow

2018-03-26 Thread Timo Walther
Hi Pete, you can find some basic examples about stream enrichment here [1]. I hope this helps a bit. Regards, Timo [1] http://training.data-artisans.com/exercises/rideEnrichment-flatmap.html [2] http://training.data-artisans.com/exercises/rideEnrichment-processfunction.html Am 25.03.18 um