StreamingFileSink in version 1.8

2019-06-11 Thread Yitzchak Lieberman
Hi. I'm a bit confused: When launching my flink streaming application on EMR release 5.24 (which have flink 1.8 version) that write Kafka messages to s3 parquet files i'm getting the exception below, but when i'm installing flink 1.8 on EMR custom wise it works. What could be the difference beh

How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-11 Thread Felipe Gutierrez
Hi all, I have implemented a Flink data stream application to compute distinct count of words. Flink does not have a built-in operator which does this computation. I used KeyedProcessFunction and I am saving the state on a ValueState descriptor. Could someone check if my implementation is the best

RE: No yarn option in self-built flink version

2019-06-11 Thread LINZ, Arnaud
Hello, Thanks a lot, it works. However, may I suggest that you update the documentation page : mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.6.0-cdh5.16.1 has absolutely no interest if you don’t include hadoop, that’s why I thought that -Pvendor-repos was including the -Pinc

Oracle data streaming using Flink

2019-06-11 Thread Kailash Kota
Hi Users, I am new to the flink world. Our requirement is to stream the data from Oracle DB to Oracle DB real time (more like data replication) and we wanted to understand if Flink is the choice to achieve this. If Flink is a choice : 1. Is there any documents I can follow to perform this a

[DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Stephan Ewen
Hi all! I would suggest to deprecating the existing python APIs for DataSet and DataStream API with the 1.9 release. Background is that there is a new Python API under development. The new Python API is initially against the Table API. Flink 1.9 will support Table API programs without UDFs, 1.10

Re: StreamingFileSink in version 1.8

2019-06-11 Thread Ken Krugler
The code in HadoopRecoverableWriter is: if (!"hdfs".equalsIgnoreCase(fs.getScheme()) || !HadoopUtils.isMinHadoopVersion(2, 7)) { throw new UnsupportedOperationException( "Recoverable writers on Hadoop are only suppor

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Jeff Zhang
+1 Stephan Ewen 于2019年6月11日周二 下午9:30写道: > Hi all! > > I would suggest to deprecating the existing python APIs for DataSet and > DataStream API with the 1.9 release. > > Background is that there is a new Python API under development. > The new Python API is initially against the Table API. Flink

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread zhijiang
It is reasonable as stephan explained. +1 from my side! -- From:Jeff Zhang Send Time:2019年6月11日(星期二) 22:11 To:Stephan Ewen Cc:user ; dev Subject:Re: [DISCUSS] Deprecate previous Python APIs +1 Stephan Ewen 于2019年6月11日周二 下午9:30写道

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Zili Chen
+1 Best, tison. zhijiang 于2019年6月11日周二 下午10:52写道: > It is reasonable as stephan explained. +1 from my side! > > -- > From:Jeff Zhang > Send Time:2019年6月11日(星期二) 22:11 > To:Stephan Ewen > Cc:user ; dev > Subject:Re: [DISCUSS] De

Local blobStore not freed

2019-06-11 Thread Dede
Hi Team, I'm struggling for a while with a strange issue: the local blob store files are not actually deleted from the job manager/task manager in versions 1.7.2 and 1.8.0 : lsof commands shows them like this: java6528 root 63r REG 202,16 162458786 1802248 /mnt/tmp1/blobStore-542f

Re: StreamingFileSink in version 1.8

2019-06-11 Thread Yitzchak Lieberman
Hi. I found that the problem is that i didn't have flink-s3-fs-hadoop-.jar in flink lib directory, with that i can use 's3a' protocol. On Tue, Jun 11, 2019 at 4:48 PM Ken Krugler wrote: > The code in HadoopRecoverableWriter is: > > if (!"hdfs".equalsIgnoreCase(fs.getScheme()) || > !HadoopUtils.

Re: count(DISTINCT) in flink SQL

2019-06-11 Thread Fabian Hueske
Hi Vinod, Sorry for the late reply. Your approach looks good to me. A few things to note: * It is not possible to set different idle state retention timers for different parts of a query. All operators that support idle state retention use the same configuration. * The inner query with the SESSIO

Re: Avro serde classes in Flink

2019-06-11 Thread Fabian Hueske
Hi Debasish, No, I don't think there's a particular reason. There a few Jira issues that propose adding an Avro Serialization Schema for Confluent Schema Registry [1] [2]. Please check them out and add a new one if they don't describe what you are looking for. Cheers, Fabian [1] https://issues.a

Re: About SerializationSchema/DeserializationSchema's concurrency safety

2019-06-11 Thread Fabian Hueske
Hi, Yes, multiple instances of the same De/SerializationSchema can be executed in the same JVM. Regarding 2. I'm not 100%, but would suspect that one De/SerializationSchema instance handles multiple partitions. Gordon (in CC) should know this for sure. Best, Fabian Am Mo., 10. Juni 2019 um 05:25

Re: Avro serde classes in Flink

2019-06-11 Thread Debasish Ghosh
Thanks Fabian .. I will take a look. On Tue, Jun 11, 2019 at 9:16 PM Fabian Hueske wrote: > Hi Debasish, > > No, I don't think there's a particular reason. > There a few Jira issues that propose adding an Avro Serialization Schema > for Confluent Schema Registry [1] [2]. > Please check them out

Apache Flink - Disabling system metrics and collecting only specific metrics

2019-06-11 Thread M Singh
Hi: I am working on an application and need to collect application metrics. I would like to use Flink's metrics framework for my application metrics.  From Flink's documentation  (https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#system-metrics), it looks like F

How to config user for passwordless ssh?

2019-06-11 Thread John Smith
Hi, is it possible to change the default user from root to something else? When we run ./start-cluster.sh it tries to ssh using root user. I see in the docs: env.ssh.opts But it doesn't say how to configure the options. If that's even the case?

Savepoint status check fails with error Operation not found under key

2019-06-11 Thread anaray
Hi, I am using flink 1.7.0 and checking the status of the savepoint fails with error { "errors": [ "Operation not found under key: org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@57b9711e" ] } I started a savepoint using /jobs/:jobid/savepoints REST api, wh

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread jincheng sun
big +1 for the proposal. We will soon complete all the Python API functional development of the 1.9 release, the development of UDFs will be carried out. After the support of UDFs is completed, it will be very natural to support Datastream API. If all of us agree with this proposal, I believe tha

RE: EXT :How to config user for passwordless ssh?

2019-06-11 Thread Martin, Nick
Env.ssh.opts is the literal argument string to ssh as you would enter it on the command line. Take a look at TMSlaves() in config.sh to see exactly how it’s being used. From: John Smith [mailto:java.dev@gmail.com] Sent: Tuesday, June 11, 2019 12:30 PM To: user Subject: EXT :How to config us

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Dian Fu
+1 for this proposal. Regards, Dian > 在 2019年6月12日,上午8:24,jincheng sun 写道: > > big +1 for the proposal. > > We will soon complete all the Python API functional development of the 1.9 > release, the development of UDFs will be carried out. After the support of > UDFs is completed, it will be

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Becket Qin
+1 on deprecating the old Python API in 1.9 release. Thanks, Jiangjie (Becket) Qin On Wed, Jun 12, 2019 at 9:07 AM Dian Fu wrote: > +1 for this proposal. > > Regards, > Dian > > 在 2019年6月12日,上午8:24,jincheng sun 写道: > > big +1 for the proposal. > > We will soon complete all the Python API func

Re: Local blobStore not freed

2019-06-11 Thread Zili Chen
Hi Dan, Said "The files are removed after a restart of the process", it sounds Flink cleaned up blobs properly. From your description I don't understand clearly in which case/situation you expected Flink deleted blobs but it doesn't. Could you describe the difference between 1.4.2 and 1.7.2/1.8.0

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Jark Wu
+1 and looking forward to the new Python API world. Best, Jark On Wed, 12 Jun 2019 at 09:22, Becket Qin wrote: > +1 on deprecating the old Python API in 1.9 release. > > Thanks, > > Jiangjie (Becket) Qin > > On Wed, Jun 12, 2019 at 9:07 AM Dian Fu wrote: > >> +1 for this proposal. >> >> Regard

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-11 Thread Rong Rong
Hi Felipe, there are multiple ways to do DISTINCT COUNT in Table/SQL API. In fact there's already a thread going on recently [1] Based on the description you provided, it seems like it might be a better API level to use. To answer your question, - You should be able to use other TimeCharacteristi

java.io.FileNotFoundException in implementing exactly once

2019-06-11 Thread syed
Hi; I am trying to run the standard WordCount application under exactly once and at-least once processing guarantees, respectively. I successfully run the at-least once guarantee, however which running the exactly once guarantee, I face the following exception *Root exception:* java.io.FileNotFound

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Hequn Cheng
+1 on the proposal! Maintaining only one Python API is helpful for users and contributors. Best, Hequn On Wed, Jun 12, 2019 at 9:41 AM Jark Wu wrote: > +1 and looking forward to the new Python API world. > > Best, > Jark > > On Wed, 12 Jun 2019 at 09:22, Becket Qin wrote: > >> +1 on deprecatin

Re: Apache Flink - Disabling system metrics and collecting only specific metrics

2019-06-11 Thread zhijiang
Hi Mans, AFAIK, we have no switch to disable general system metrics which are useful for monitoring status and performance tuning. Only some advanced system metrics could be confgiured to enable or not, and the default config is always disabled, so you do not need toconcern them. Maybe you cou

Re: java.io.FileNotFoundException in implementing exactly once

2019-06-11 Thread zhijiang
For exactly-once mode before flink-1.5, it needs the temp dir for spilling the blocking buffers during checkpoint. The temp dir is configured via `io.tmp.dirs` and the default value is `java.io.tmpdir`. In your case, your temp dir prefix with `/tmp/` has some problems (might be deleted), and yo

Re: Local blobStore not freed

2019-06-11 Thread Dede
Thanks Tison for looking into it - what I tried to say is that Flink keeps the files locked (hence, the space is still occupied) - this is visible during a lsof command >From my investigations, after the job finishes, the local (and HA) blob store is deleted - the operation succeed in both case,

Re: Controlling the amount of checkpoint files

2019-06-11 Thread Congxian Qiu
Hi Boris For the configure you gave, you can try to reduce the parallelism of the operator which contains states. Best, Congxian Boris Lublinsky 于2019年6月10日周一 下午9:43写道: > Here is code enabling checkpointing > > // Enable checkpointing > env.enableCheckpointing(6 ) // 1 min > env.getCheck