Re: Using sensitive configuration/credentials

2018-08-08 Thread vino yang
Hi Matt, Flink is currently enhancing its security, such as the current data transmission can be configured with SSL mode[1]. However, some problems involving configuration and web ui display do exist, and they are still displayed in plain text. I think a temporary way to do this is to keep your

ProcessWindowFunction

2018-08-08 Thread 苗元君
I read the doc about ProcessWindowFunction But I the code on the flink demo is incorrect public class MyProcessWindowFunction extends ProcessWindowFunction, String, String, TimeWindow> { Tuple cannot have to parameter. I try to find a demo which ProcessWindowFunction used in window word count

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-08 Thread vino yang
Hi Juho, This problem does exist, I suggest you separate these two steps to temporarily deal with this problem: 1) Trigger Savepoint separately; 2) execute the cancel command; Hi Till, Chesnay: Our internal environment and multiple users on the mailing list have encountered similar problems.

Using sensitive configuration/credentials

2018-08-08 Thread Matt Moore
I'm wondering what the best practice is for using secrets in a Flink program, and I can't find any info in the docs or posted anywhere else. I need to store an access token to one of my APIs for flink to use to dump results into, and right now I'm passing it through as a configuration

Re: VerifyError when running Python streaming job

2018-08-08 Thread Joe Malt
Hi everyone, Thanks for your help. I discovered that the WordCount example runs when the lib directory is empty - something I had in there was causing it to break (perhaps a version conflict?). I haven't yet figured out what the culprit was, but I'll post an update if I do. Thanks again, Joe

Re: State in the Scala DataStream API

2018-08-08 Thread Fabian Hueske
Hi Juan, The state will be purged if you return None instead of a Some. However, this only happens when the function is called for a specific key, i.e., state won't be automatically removed after some time. If this is your use case, you have to implement a ProcessFunction and use timers to

State in the Scala DataStream API

2018-08-08 Thread Juan Gentile
Hello, I'm looking at the following page of the documentation https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html particularly at this piece of code: val stream: DataStream[(String, Int)] = ... val counts: DataStream[(String, Int)] = stream .keyBy(_._1)

UTF-16 support for TextInputFormat

2018-08-08 Thread David Dreyfus
Hello - It does not appear that Flink supports a charset encoding of "UTF-16". It particular, it doesn't appear that Flink consumes the Byte Order Mark (BOM) to establish whether a UTF-16 file is UTF-16LE or UTF-16BE. Are there any plans to enhance Flink to handle UTF-16 with BOM? Thank you,

Re: Listing in Powered By Flink directory

2018-08-08 Thread Fabian Hueske
Thanks Amit! I've added Limeroad to the list with your description. Best, Fabian 2018-08-08 14:12 GMT+02:00 amit.jain : > Hi Fabian, > > We at Limeroad, are using Flink for multiple use-cases ranging from ETL > jobs, ClickStream data processing, real-time dashboard to CEP. Could you > list us

Could not cancel job (with savepoint) "Ask timed out"

2018-08-08 Thread Juho Autio
I was trying to cancel a job with savepoint, but the CLI command failed with "akka.pattern.AskTimeoutException: Ask timed out". The stack trace reveals that ask timeout is 10 seconds: Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager_0#106635280]]

Re: Could not build the program from JAR file.

2018-08-08 Thread Gary Yao
Hi Florian, Thanks for following up. Does it consistently work for you if -ytm and -yjm are set to 2 GB? Can you enable DEBUG level logging, submit with 1GB of memory again, and send all TaskManager logs in addition? The output of yarn logs -applicationId should suffice. The Flink version that

Re: Listing in Powered By Flink directory

2018-08-08 Thread amit.jain
Hi Fabian, We at Limeroad, are using Flink for multiple use-cases ranging from ETL jobs, ClickStream data processing, real-time dashboard to CEP. Could you list us on given directory? Website: https://www.limeroad.com -- Thanks, Amit -- Sent from:

Re: JDBCInputFormat and SplitDataProperties

2018-08-08 Thread Alexis Sarda
Hi Fabian, Thanks for the clarification. I have a few remarks, but let me provide more concrete information. You can find the query I'm using, the JDBCInputFormat creation, and the execution plan in this github gist: https://gist.github.com/asardaes/8331a117210d4e08139c66c86e8c952d I cannot

Listing in Powered By Flink directory

2018-08-08 Thread Fabian Hueske
Hi everybody, The Flink community maintains a directory of organizations and projects that use Apache Flink [1]. Please reply to this thread if you'd like to add an entry to this list. Thanks, Fabian [1] https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink

Re: JDBCInputFormat and SplitDataProperties

2018-08-08 Thread Fabian Hueske
Hi Alexis, First of all, I think you leverage the partitioning and sorting properties of the data returned by the database using SplitDataProperties. However, please be aware that SplitDataProperties are a rather experimental feature. If used without query parameters, the JDBCInputFormat

Re: Need help regarding Flink Batch Application

2018-08-08 Thread Fabian Hueske
The code or the execution plan (ExecutionEnvironment.getExecutionPlan()) of the job would be interesting. 2018-08-08 10:26 GMT+02:00 Chesnay Schepler : > What have you tried so far to increase performance? (Did you try different > combinations of -yn and -ys?) > > Can you provide us with your

Re: VerifyError when running Python streaming job

2018-08-08 Thread Chesnay Schepler
I cannot reproduce the problem in 1.6-rc4 and 1.7-SNAPSHOT either :/ On 08.08.2018 10:33, Chesnay Schepler wrote: hmm, i was able to run it with 1.5.2 at least. Let's see what 1.6 says... On 08.08.2018 10:27, Chesnay Schepler wrote: I'll take a look, but it sounds like the source is the

Re: Filter-Join Ordering Issue

2018-08-08 Thread Fabian Hueske
I've created FLINK-10100 [1] to track the problem and suggest a solution and workaround. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-10100 2018-08-08 10:39 GMT+02:00 Fabian Hueske : > Hi Dylan, > > Yes, that's a bug. > As you can see from the plan, the partitioning step is

Re: Accessing source table data from hive/Presto

2018-08-08 Thread Fabian Hueske
Do you want to read the data once or monitor a directory and process new files as they appear? Reading from S3 with Flink's current MonitoringFileSource implementation is not working reliably due to S3's eventual consistent list operation (see FLINK-9940 [1]). Reading a directory also has some

Re: Filter-Join Ordering Issue

2018-08-08 Thread Fabian Hueske
Hi Dylan, Yes, that's a bug. As you can see from the plan, the partitioning step is pushed past the Filter. This is possible, because the optimizer knows that a Filter function cannot modify the data (it only removes records). A workaround should be to implement the filter as a FlatMapFunction.

Re: VerifyError when running Python streaming job

2018-08-08 Thread Chesnay Schepler
hmm, i was able to run it with 1.5.2 at least. Let's see what 1.6 says... On 08.08.2018 10:27, Chesnay Schepler wrote: I'll take a look, but it sounds like the source is the issue? On 08.08.2018 09:34, vino yang wrote: Hi Joe, Did you try the word_count example from the flink codebase?[1]

Re: VerifyError when running Python streaming job

2018-08-08 Thread Chesnay Schepler
I'll take a look, but it sounds like the source is the issue? On 08.08.2018 09:34, vino yang wrote: Hi Joe, Did you try the word_count example from the flink codebase?[1] Recently, I tried this example, it works fine to me. An example of an official document may not guarantee your success

Re: Need help regarding Flink Batch Application

2018-08-08 Thread Chesnay Schepler
What have you tried so far to increase performance? (Did you try different combinations of -yn and -ys?) Can you provide us with your application? What source/sink are you using? On 08.08.2018 07:59, Ravi Bhushan Ratnakar wrote: Hi Everybody, Currently I am working on a project where i need

Re: Filter-Join Ordering Issue

2018-08-08 Thread Chesnay Schepler
I agree, please open a JIRA. On 08.08.2018 05:11, vino yang wrote: Hi Dylan, I roughly looked at your job program and the DAG of the job. It seems that the optimizer chose the wrong optimization execution plan. cc Till. Thanks, vino. Dylan Adams mailto:dylan.ad...@gmail.com>>

Re: Small-files source - partitioning based on prefix of file

2018-08-08 Thread Fabian Hueske
Hi Averall, As Vino said, checkpoints store the state of all operators of an application. The state of a monitoring source function is the position in the currently read split and all splits that have been received and are currently pending. In case of a recovery, the splits are recovered and

Re: VerifyError when running Python streaming job

2018-08-08 Thread vino yang
Hi Joe, Did you try the word_count example from the flink codebase?[1] Recently, I tried this example, it works fine to me. An example of an official document may not guarantee your success due to maintenance issues. cc @Chesnay [1]:

Re: Small-files source - partitioning based on prefix of file

2018-08-08 Thread vino yang
Hi Averell, You need to understand that Flink reflects the recovery of the state, not the recovery of the record. Of course, sometimes your record is state, but sometimes the intermediate result of your record is the state. It depends on your business logic and your operators. Thanks, vino.

Re: unsubscribtion

2018-08-08 Thread Timo Walther
Hi, see https://flink.apache.org/community.html#mailing-lists for unsubscribing: Use: user-unsubscr...@flink.apache.org Regards, Timo Am 08.08.18 um 08:18 schrieb 네이버: On 7 Aug 2018, at 19:42, Yan Zhou [FDS Science] > wrote: Thank you Vino. It is very

unsubscribtion

2018-08-08 Thread 네이버
> On 7 Aug 2018, at 19:42, Yan Zhou [FDS Science] wrote: > > Thank you Vino. It is very helpful. > From: vino yang > Sent: Tuesday, August 7, 2018 7:22:50 PM > To: Yan Zhou [FDS Science] > Cc: user > Subject: Re: checkpoint recovery behavior when kafka source is set to start > from timestamp

Need help regarding Flink Batch Application

2018-08-08 Thread Ravi Bhushan Ratnakar
Hi Everybody, Currently I am working on a project where i need to write a Flink Batch Application which has to process hourly data around 400GB of compressed sequence file. After processing, it has write it as compressed parquet format in S3. I have managed to write the application in Flink and