Split Stream on a Split Stream

2019-02-26 Thread Taher Koitawala
Hi All, We are currently working with Flink 1.7.2 version and we are get the below given exception when doing a split on a split. SplitStreamsplitStream=stream1.split(new SomeSplitLogic()); DataStream select1=splitStream.select("1"); DataStream select2=splitStream.select("2");

submit job failed on Yarn HA

2019-02-26 Thread 孙森
Hi all: I run flink (1.5.1 with hadoop 2.7) on yarn ,and submit job by “/usr/local/flink/bin/flink run -m jmhost:port my.jar”, but the submission is failed. The HA configuration is : high-availability: zookeeper high-availability.storageDir: hdfs:///flink/ha/

Re: flink list and flink run commands timeout

2019-02-26 Thread sen
Hi Aneesha: I am also facing the same problem.When I turn on the HA on yarn ,it will get the same exception. While I turn off the Ha configuration ,it works fine. I want to know that what did you do to deal with the problem? Thanks! Sen Sun -- Sent from:

Errors running user jars with flink in docker flink x

2019-02-26 Thread Paroma Sengupta
> Hi, > I am trying to run my flink application through docker containers. For > that I made use of the code present over here flink_docker > . > However when I try to run the docker image, it

Re: 试用BlinkSQL发现一些问题

2019-02-26 Thread bigdatayunzhongyan
Sorry! 1、SQL解析有问题,无法识别formatted,desc xxx可以 Flink SQL> desc formatted customer; [ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.api.TableException: Table 'formatted customer' was not found. Flink SQL> desc customer; root  |-- name: c_customer_sk  |-- type: IntType  |--

Errors running user jars with flink in docker flink x

2019-02-26 Thread Paroma Sengupta
Hi, I am trying to run my flink application through docker containers. For that I made use of the code present over here flink_docker . However when I try to run the docker image, it fails with

Re: left join failing with FlinkLogicalJoinConverter NPE

2019-02-26 Thread Xingcan Cui
Hi Karl, I think this is a bug and created FLINK-11769 to track it. Best, Xingcan > On Feb 26, 2019, at 2:02 PM, Karl Jin wrote: > > I removed the multiset> field and the join worked fine. > The field was created from a Kafka source

okio and okhttp not shaded in the Flink Uber Jar on EMR

2019-02-26 Thread Austin Cawley-Edwards
Hi, I recently experienced versioning clashes with the okio and okhttp when trying to deploy a Flink 1.6.0 app to AWS EMR on Hadoop 2.8.4. After investigating and talking to the okio team (see this issue) , I found that both okio and okhttp exist in the

Re: 试用BlinkSQL发现一些问题

2019-02-26 Thread Jark Wu
Hi, 邮件列表不支持发图片,可以发图片的链接。 Thanks, Jark On Wed, 27 Feb 2019 at 11:17, bigdatayunzhongyan wrote: > > 大家好! > 在试用BlinkSQL中发现一些问题,总结如下: > Hive版本2.3.4 > 1、SQL解析有问题 无法识别formatted > > 2、不支持hive外部表 > 3、Hive1.x版本兼容问题 > >

Re: ProgramInvocationException when I submit job by 'flink run' after running Flink stand-alone more than 1 month?

2019-02-26 Thread Zhenghua Gao
Seem like there is something wrong with RestServer and the RestClient didn't connect to it. U can check the standalonesession log for investigating causes. btw: The cause of "no cluster was found" is ur pid information was cleaned for some reason. The pid information is stored in ur TMP

试用BlinkSQL发现一些问题

2019-02-26 Thread bigdatayunzhongyan
大家好! 在试用BlinkSQL中发现一些问题,总结如下: Hive版本2.3.4 1、SQL解析有问题 无法识别formatted 2、不支持hive外部表 3、Hive1.x版本兼容问题

Re: ProgramInvocationException when I submit job by 'flink run' after running Flink stand-alone more than 1 month?

2019-02-26 Thread Benchao Li
Hi Son, According to your description, maybe it's caused by the '/tmp' file system retain strategy which removes tmp files regularly. Son Mai 于2019年2月27日周三 上午10:27写道: > Hi, > I'm having a question regarding Flink. > I'm running Flink in stand-alone mode on 1 host (JobManager, TaskManager > on

ProgramInvocationException when I submit job by 'flink run' after running Flink stand-alone more than 1 month?

2019-02-26 Thread Son Mai
Hi, I'm having a question regarding Flink. I'm running Flink in stand-alone mode on 1 host (JobManager, TaskManager on the same host). At first, I'm able to submit and cancel jobs normally, the jobs showed up in the web UI and ran. However, after ~1month, when I canceled the old job and submitting

One source is much slower than the other side when join history data

2019-02-26 Thread 刘建刚
When consuming history data in join operator with eventTime, reading data from one source is much slower than the other. As a result, the join operator will cache much data from the faster source in order to wait the slower source. The question is that how can I make the difference of

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-26 Thread vino yang
Great job. Stephan! Best, Vino Jamie Grier 于2019年2月27日周三 上午2:27写道: > This is awesome, Stephan! Thanks for doing this. > > -Jamie > > > On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen wrote: > >> Here is the pull request with a draft of the roadmap: >>

Re: Flink window triggering and timing on connected streams

2019-02-26 Thread Rong Rong
Hi Andrew, To add to the answer Till and Hequn already provide. WindowOperator are operating on a per-key-group based. so as long as you only have one open session per partition key group, you should be able to manage the windowing using the watermark strategy Hequn mentioned. As Till mentioned,

Re: long lived standalone job session cluster in kubernetes

2019-02-26 Thread Chunhui Shi
Hi Heath and Till, thanks for offering help on reviewing this feature. I just reassigned the JIRAs to myself after offline discussion with Jin. Let us work together to get kubernetes integrated natively with flink. Thanks. On Fri, Feb 15, 2019 at 12:19 AM Till Rohrmann wrote: > Alright, I'll

Re: StreamingFileSink on EMR

2019-02-26 Thread kb
Thanks! This fixed it. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: StreamingFileSink on EMR

2019-02-26 Thread Bruno Aranda
Hi, That Jar must exist for all the 1.7 versions, but I was replacing the libs for the Flink provided by the AWS EMR (1.7.0) by the more recent ones. But you could download the 1.7.0 distribution and copy the flink-s3-fs-hadoop-1.7.0.jar from there into the /usr/lib/flink/lib folder. But knowing

Re: StreamingFileSink on EMR

2019-02-26 Thread kb
Hi, So 1.7.2 jar has the fix? Thanks Kevin -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reducing the number of unique metrics

2019-02-26 Thread shkob1
Hey All, Just read the excellent monitoring blog post https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html I'm looking on reducing the number of unique metrics, especially on items i can compromise on consolidating such as using indices instead of ids. Specifically looking at

Re: left join failing with FlinkLogicalJoinConverter NPE

2019-02-26 Thread Karl Jin
I removed the multiset> field and the join worked fine. The field was created from a Kafka source through a query that looks like "select collect(data) as i_data from ... group by pk" Do you think this is a bug or is this something I can get around using some configuration? On Tue, Feb 26, 2019

Re: StreamingFileSink on EMR

2019-02-26 Thread Bruno Aranda
Hey, Got it working, basically you need to add the flink-s3-fs-hadoop-1.7.2.jar libraries from the /opt folder of the flink distribution into the /usr/lib/flink/lib. That has done the trick for me. Cheers, Bruno On Tue, 26 Feb 2019 at 16:28, kb wrote: > Hi Bruno, > > Thanks for verifying. We

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-26 Thread Jamie Grier
This is awesome, Stephan! Thanks for doing this. -Jamie On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen wrote: > Here is the pull request with a draft of the roadmap: > https://github.com/apache/flink-web/pull/178 > > Best, > Stephan > > On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng wrote: > >>

Re: Is there a Flink DataSet equivalent to Spark's RDD.persist?

2019-02-26 Thread Andrey Zagrebin
Hi Frank, This feature is currently under discussion. You can follow it in this issue: https://issues.apache.org/jira/browse/FLINK-11199 Best, Andrey On Thu, Feb 21, 2019 at 7:41 PM Frank Grimes wrote: > Hi, > > I'm trying to port an existing Spark job to Flink and have gotten stuck on > the

Re: Submitting job to Flink on yarn timesout on flip-6 1.5.x

2019-02-26 Thread Richard Deurwaarder
Hello Gary, Thank you for your response. I'd like to use the new mode but it does not work for me. It seems I am running into a firewall issue. Because the rest.port is random when running on yarn[1]. The machine I use to deploy the job can, in fact, start the Flink cluster, but it cannot

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-26 Thread Stephan Ewen
Here is the pull request with a draft of the roadmap: https://github.com/apache/flink-web/pull/178 Best, Stephan On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng wrote: > Hi Stephan, > > Thanks for summarizing the great roadmap! It is very helpful for users and > developers to track the direction

Re: StreamingFileSink on EMR

2019-02-26 Thread kb
Hi Bruno, Thanks for verifying. We are aiming for the same. Best, Kevin -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Sftp source in flink

2019-02-26 Thread Siew Wai Yow
Hi guys, Anyone can share experience on sftp source? Should i use hadoop sftpfilesystem or i can simply use any sftp java library in a user-defined source? Thanks. Regards, Yow

Re: StreamingFileSink on EMR

2019-02-26 Thread Bruno Aranda
Hi, I am having the same issue, but it is related to what Kostas is pointing out. I was trying to stream to the "s3" scheme and not "hdfs", and then getting that exception. I have realised that somehow I need to reach the S3RecoverableWriter, and found out it is in a difference library

Re: StreamingFileSink on EMR

2019-02-26 Thread Kostas Kloudas
Hi Kevin, I cannot find anything obviously wrong from what you describe. Just to eliminate the obvious, you are specifying "hdfs" as the scheme for your file path, right? Cheers, Kostas On Tue, Feb 26, 2019 at 3:35 PM Till Rohrmann wrote: > Hmm good question, I've pulled in Kostas who worked

Re: Collapsing watermarks after keyby

2019-02-26 Thread Padarn Wilson
Okay. I think I still must misunderstand something here. I will work on building a unit test around this, hopefully this clears up my confusion. Thank you, Padarn On Tue, Feb 26, 2019 at 10:28 PM Till Rohrmann wrote: > Operator's with multiple inputs emit the minimum of the input's watermarks

Re: StreamingFileSink on EMR

2019-02-26 Thread Till Rohrmann
Hmm good question, I've pulled in Kostas who worked on the StreamingFileSink. He might be able to tell you more in case that there is some special behaviour wrt the Hadoop file systems. Cheers, Till On Tue, Feb 26, 2019 at 3:29 PM kb wrote: > Hi Till, > > The only potential issue in the path I

Re: Share broadcast state between multiple operators

2019-02-26 Thread Till Rohrmann
On Tue, Feb 26, 2019 at 3:10 PM Richard Deurwaarder wrote: > Hello Till, > > So if I understand correctly, when messages get broadcast to multiple > operators, each operator will execute the processBroadcast() function and > store the state under a sort of operator scope? Even if they use the

Re: StreamingFileSink on EMR

2019-02-26 Thread kb
Hi Till, The only potential issue in the path I see is `/usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-2.29.0.jar`. I double checked my pom, the project is Hadoop-free. The JM log also shows `INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.8.5-amzn-1`.

Re: Collapsing watermarks after keyby

2019-02-26 Thread Till Rohrmann
Operator's with multiple inputs emit the minimum of the input's watermarks downstream. In case of a keyBy this means that the watermark is sent to all downstream consumers. Cheers, Till On Tue, Feb 26, 2019 at 1:47 PM Padarn Wilson wrote: > Just to add: by printing intermediate results I see

Re: Share broadcast state between multiple operators

2019-02-26 Thread Richard Deurwaarder
Hello Till, So if I understand correctly, when messages get broadcast to multiple operators, each operator will execute the processBroadcast() function and store the state under a sort of operator scope? Even if they use the same MapStateDescriptor? And if it replicates the state between

Re: Collapsing watermarks after keyby

2019-02-26 Thread Padarn Wilson
Just to add: by printing intermediate results I see that I definitely have more than five minutes of data, and by windowing without the session windows I see that event time watermarks do seem to be generated as expected. Thanks for your help and time. Padarn On Tue, 26 Feb 2019 at 8:43 PM,

Re: Share broadcast state between multiple operators

2019-02-26 Thread Till Rohrmann
Hi Richard, Flink does not support to share state between multiple operators. Technically also the broadcast state is not shared but replicated between subtasks belonging to the same operator. So what you can do is to send the broadcast input to different operators, but they will all keep their

Re: Flink 1.6.4 signing key file in docker-flink repo?

2019-02-26 Thread Till Rohrmann
Hi William, where do you get the /KEYS file from? Have you imported the latest KEYS from here [1]? [1] https://dist.apache.org/repos/dist/release/flink/KEYS Cheers, Till On Mon, Feb 25, 2019 at 5:16 PM William Saar wrote: > Trying to build a new Docker image by replacing 1.6.3 with 1.6.4 in

Re: [Blink]sql client kafka sink 失败

2019-02-26 Thread Zhenghua Gao
换个干净的环境(清除 standalone sql client 进程及日志, 然后reproduce你的问题), 然后把对应的 standalonesession, taskexecutor, 及 sql client日志传上来看看。 On Tue, Feb 26, 2019 at 10:43 AM 张洪涛 wrote: > > > 如果把kafka connector shade jar放在blink lib 下面 然后启动是没有问题的 但是放在sql client > --jar 参数就有问题 > > > 我又多测试了几遍 发现class not found的类 是随机的

Re: StreamingFileSink on EMR

2019-02-26 Thread Till Rohrmann
Hi Kevin, could you check what's on the class path of the Flink cluster? You should see this in the jobmanager.log at the top. It seems as if there is a Hadoop dependency with a lower version. Flink 1.7 is build against which Hadoop version? You should make sure that you either use the

Re: What are blobstore files and why do they keep filling up /tmp directory?

2019-02-26 Thread Till Rohrmann
Hi Harshith, the blob store files are necessary to distribute the Flink job in your cluster. After the job has been completed, they should be cleaned up. Only in the case of cluster crashes the clean up should not happen. Since Flink 1.4.2 is no longer actively supported, I would suggest to

Re: Collapsing watermarks after keyby

2019-02-26 Thread Till Rohrmann
Hi Padarn, Flink does not generate watermarks per keys. Atm watermarks are always global. Therefore, I would suspect that it is rather a problem with generating watermarks at all. Could it be that your input data does not span a period longer than 5 minutes and also does not terminate? Another

Re: Flink window triggering and timing on connected streams

2019-02-26 Thread Till Rohrmann
Hi Andrew, if using connected streams (e.g. CoFlatMapFunction or CoMapFunction), then the watermarks will be synchronized across both inputs. Concretely, you will always emit the minimum of the watermarks arriving on input channel 1 and 2. Take a look at AbstractStreamOperator.java:773-804.