Re: [DISCUSS] Create a Flink ecosystem website

2019-07-19 Thread Becket Qin
[Sorry for the incomplete message. Clicked send by mistake...] I agree with Marta that it might be good to have multi-language support as a mid-term goal. Jiangjie (Becket) Qin On Sat, Jul 20, 2019 at 11:22 AM Becket Qin wrote: > The website is awesome! I really like its conciseness and yet

Re: [DISCUSS] Create a Flink ecosystem website

2019-07-19 Thread Becket Qin
The website is awesome! I really like its conciseness and yet fairly useful information and functionalities. I cannot think of much to improve at the moment. Just one thought, do we need an "others" category, just in case a package does not fit into any of the current given categories? Thanks

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-19 Thread Dongwon Kim
Hi Rong, I have to dig deeper into the code to reproduce this error. This seems to > be a bug to me and will update once I find anything. Thanks a lot for spending your time on this. However from what you explained, if I understand correctly you can do all > of your processing within the

Re: Jython support for Flink

2019-07-19 Thread Zili Chen
Hi Dante, Both Jython and Jython support for Flink are out of development and maintain. As pointed out by Jeff, Flink 1.9 supports Python api via py4j[1] and the document page as posted. I guess your algorithms are written in CPython instead of Jython and want Jython only for interoperate, and

Re: Job submission timeout with no error info.

2019-07-19 Thread Fakrudeen Ali Ahmed
Hi Andrey, Flink version: 1.4.2 Please find the client log attached and job manager log is at: job manager log. Thanks, -Fakrudeen (define (sqrte n xn eph) (if (> eph (abs (- n (* xn xn xn (sqrte n (/ (+

From Kafka Stream to Flink

2019-07-19 Thread Maatary Okouya
Hi, I am a user of Kafka Stream so far. However, because i have been face with several limitation in particular in performing Join on KTable. I was wondering what is the appraoch in Flink to achieve (1) the concept of KTable, i.e. a Table that represent a changeLog, i.e. only the latest version

Re: Extending REST API with new endpoints

2019-07-19 Thread Oytun Tez
Yep, I scanned all of the issues in Jira and the codebase, I couldn't find a way to plug my new endpoint in. I am basically trying to open up an endpoint for queryable state client. I also read somewhere that this may cause some issues due to SSL communication within the cluster. Any pointers?

Extending REST API with new endpoints

2019-07-19 Thread Oytun Tez
Hi there, I am trying to add a new endpoint to the REST API, by extending AbstractRestHandler. But this new handler needs to be added in WebMonitorEndpoint, which has no interface for outside. Can I do this with 1.8? Any other way or plans to make this possible? --- Oytun Tez *M O T A W O R D*

Re: Apache Flink - Manipulating the state of an object in an evictor or trigger

2019-07-19 Thread M Singh
Biao: I am asking this question, just to understand the impact and best practices around it.  The state I am referring to it the objects that are passed to evictor. https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#evictors void evictBefore(Iterable>

Re: Apache Flink - Event time and process time timers with same timestamp

2019-07-19 Thread M Singh
Hi Bioa/Andrey: Just to clarify, can we register two timers (one for processing time and one for event time) with the same timestamp and if so, which one will fire. Also, is it ok to register an event time time and then deregister processing time time (or vice versa) ?  Here is the example I am

Re: Apache Flink - Side output time semantics for DataStream

2019-07-19 Thread M Singh
Hi Biao:  Thanks for your response. My question is that if i have streaming application and with timestamp assigned to the elements, and in one of the process functions I have side output, then there are two situations possible:1. The side output is the same type as main stream.2. The side

Re: Job submission timeout with no error info.

2019-07-19 Thread Andrey Zagrebin
Hi Fakrudeen, which Flink version do you use? could you share full client and job manager logs? Best, Andrey On Fri, Jul 19, 2019 at 7:00 PM Fakrudeen Ali Ahmed wrote: > Hi, > > > > We are submitting a Flink topology [YARN] and it fails during upload of > the jar with no error info. > > > >

Re: [DISCUSS] Create a Flink ecosystem website

2019-07-19 Thread Marta Paes Moreira
Hey, Robert. I will keep an eye on the overall progress and get started on the blog post to make the community announcement. Are there (mid-term) plans to translate/localize this website as well? It might be a point worth mentioning in the blogpost. Hats off to you and Daryl — this turned out

Re: Consuming data from dynamoDB streams to flink

2019-07-19 Thread Andrey Zagrebin
Hi Vinay, 1. I would assume it works similar to kinesis connector (correct me if wrong, people who actually developed it) 2. If you have activated just checkpointing, the checkpoints are gone if you externally kill the job. You might be interested in savepoints [1] 3. See paragraph in [2] about

Job submission timeout with no error info.

2019-07-19 Thread Fakrudeen Ali Ahmed
Hi, We are submitting a Flink topology [YARN] and it fails during upload of the jar with no error info. [main] INFO org.apache.flink.runtime.client.JobClient - Checking and uploading JAR files [main] ERROR org.apache.flink.client.CliFrontend - Error while running the command.

Re: Apache Flink - Event time and process time timers with same timestamp

2019-07-19 Thread Andrey Zagrebin
Hi, Event and processing time timers have independent state storage. You can use both independently, so I would expect two firings with different domains. `TimeCharacteristic` is for operations where you do not explicitly tell the time type, like windowing. Best, Andrey On Fri, Jul 19, 2019 at

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-19 Thread Rong Rong
Hi Dongwon, I have to dig deeper into the code to reproduce this error. This seems to be a bug to me and will update once I find anything. However from what you explained, if I understand correctly you can do all of your processing within the TableAPI scope without converting it back and forth

Re: Checkpoints timing out for no apparent reason

2019-07-19 Thread Andrey Zagrebin
Hi Sergei, If you want just to try increasing the timeouts, you could change the checkpoint timeout in env.getCheckpointConfig().setCheckpointTimeout(...) [1] or s3 client timeouts (see presto or hdfs for s3 configuration, there are some network timeouts) [2]. Otherwise it would be easier to

Re: Jython support for Flink

2019-07-19 Thread Jeff Zhang
Hi Dante, Flink 1.9 support python api, which may be what you want. See https://ci.apache.org/projects/flink/flink-docs-master/tutorials/python_table_api.html Dante Van den Broeke 于2019年7月19日周五 下午10:40写道: > Dear, > > > I'm a student currently working on a project involving apache kafka and >

Jython support for Flink

2019-07-19 Thread Dante Van den Broeke
Dear, I'm a student currently working on a project involving apache kafka and flink. The project itself is revolved around path prediction and machine learning for websites. To test a prove of concept I setup a kafka server locally (goal is to expend this to a google cloud server or similar

Re: apache flink: Why checkpoint coordinator takes long time to get completion

2019-07-19 Thread Xiangyu Su
btw. it seems like this issue has been fixed in 1.8.1 On Fri, 19 Jul 2019 at 12:21, Xiangyu Su wrote: > Ok, thanks. > > and this time-consuming until now always happens after 3rd checkpointing, > and this unexpected time-consuming was always consistent (~ 4 min by under > 4G/min incoming

Re: Consuming data from dynamoDB streams to flink

2019-07-19 Thread Vinay Patil
Hi, I am using this consumer for processing records from DynamoDb Streams , few questions on this : 1. How does checkpointing works with Dstreams, since this class is extending FlinkKinesisConsumer, I am assuming it will start from the last successful checkpoint in case of failure, right ? 2.

Re: apache flink: Why checkpoint coordinator takes long time to get completion

2019-07-19 Thread Xiangyu Su
Ok, thanks. and this time-consuming until now always happens after 3rd checkpointing, and this unexpected time-consuming was always consistent (~ 4 min by under 4G/min incoming traffic). On Fri, 19 Jul 2019 at 11:06, Biao Liu wrote: > Hi Xiangyu, > > Just took a glance at the relevant codes.

Re: left join failing with FlinkLogicalJoinConverter NPE

2019-07-19 Thread Tony Wei
Hi, I also found the similar issue here [1]. Best, Tony Wei [1] https://issues.apache.org/jira/browse/FLINK-11433 Tony Wei 於 2019年7月19日 週五 下午5:38寫道: > Hi, > > Is there any update for this issue? I have had the same problem just like > Karl's. > After I remove query like "select collect(data)

Re: left join failing with FlinkLogicalJoinConverter NPE

2019-07-19 Thread Tony Wei
Hi, Is there any update for this issue? I have had the same problem just like Karl's. After I remove query like "select collect(data) ..." from one of the joined tables, the sql can be executed correctly without throwing any NPE. Best regards, Tony Wei Xingcan Cui 於 2019年2月27日 週三 下午12:53寫道: >

Re: apache flink: Why checkpoint coordinator takes long time to get completion

2019-07-19 Thread Biao Liu
Hi Xiangyu, Just took a glance at the relevant codes. There is a gap between calculating the duration and logging it out. I guess the checkpoint 4 is finished in 1 minute, but there is an unexpected time-consuming operation during that time. But I can't tell which part it is. Xiangyu Su

Parallelism issue

2019-07-19 Thread Sung Gon Yi
Hello. I wrote below codes. It works extraordinarily. Processing performs after SourceFunction generates all data and quit. If SourceFunction works infinitely, processing is never performed. But, it works well when parallelismForTimestamp is given other value (eg. 3), I want to know the

Re:Re: Writing Flink logs into specific file

2019-07-19 Thread Haibo Sun
Hi, Soheil Placing the log configuration file in the resource directory of the job's jar will not be used by Flink, because the log configuration is explicitly specified by the script under the bin directory of Flink and the bootstrap code (for example the BootstrapTools class). If you want

Re: Re: Re: Flink 的 log 文件夹下产生了 44G 日志

2019-07-19 Thread Caizhi Weng
Hi Henry, LOG.error(e.getLocalizedMessage()); running = true; 这里写错了吧,应该是 running = false; Henry 于2019年7月19日周五 下午4:04写道: > > > 谢谢你的帮助哈! 我也是觉得 source 里的问题,但是呢,木有找到错误的地方。下面这个是我那个自定义的 source > 代码,但是里面没有写log里报的哪个错的提示。 > package com.JavaCustoms; > import

apache flink: Why checkpoint coordinator takes long time to get completion

2019-07-19 Thread Xiangyu Su
Dear flink community, We are POC flink(1.8) to process data in real time, and using global checkpointing(S3) and local checkpointing(EBS), deploy cluster on EKS. Our application is consuming data from Kinesis. For my test e.g I am using checkpointing interval 5min. and minimum pause 2min. The

Fwd: Issue running basic example locally

2019-07-19 Thread Biao Liu
Just forward it to user mailing list. it's not a development issue. -- Forwarded message - 发件人: Caizhi Weng Date: 2019年7月19日周五 上午8:56 Subject: Re: Issue running basic example locally To: Hi Andres, `provided` of flink-streaming-java seems suspicious, can you remove it and see

Re: Re: Flink 的 log 文件夹下产生了 44G 日志

2019-07-19 Thread Biao Liu
最根本的解法当然是去掉打日志的地方,这 source 不是 Flink 内置的,Flink 当然不能控制你们自定义 source 的行为。 你可以考虑自己改一下 log4j.properties,手动关掉这个 logger, Flink 内置的 log4j.properties 里有 example,参考着改一下 log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR, file 改成

Re: Re: Flink 的 log 文件夹下产生了 44G 日志

2019-07-19 Thread Caizhi Weng
Hi Henry 你的意思是不想让 Flink 写 log 吗?那只能通过 `log4j.rootLogger=OFF` (log4j) 或者 ` ` (logback) 把 log 关掉,或者把 log 等级设成更高的 FATAL... 但我感觉问题还是自定义的 source 里写 log 的时候死循环了... Henry 于2019年7月19日周五 下午2:20写道: > > > > 你好,谢谢!是的,这个Source是用JMS实现的自定义Source。目前还在查原因,但是怎么能够让Flink不这样爆炸写log日志呢?20分钟就能写满磁盘,写了40G多。 > > > > >

Re: Apache Flink - Side output time semantics for DataStream

2019-07-19 Thread Biao Liu
Hi, I'm not sure what you exactly mean. Could you describe more about your requirements? M Singh 于2019年7月14日周日 上午9:33写道: > Hi: > > I wanted to find out what is the timestamp associated with the elements of > a stream side output with different stream time characteristics. > > Thanks > > Man >

Re:Re: Flink 的 log 文件夹下产生了 44G 日志

2019-07-19 Thread Henry
你好,谢谢!是的,这个Source是用JMS实现的自定义Source。目前还在查原因,但是怎么能够让Flink不这样爆炸写log日志呢?20分钟就能写满磁盘,写了40G多。 在 2019-07-19 11:11:37,"Caizhi Weng" 写道: >Hi Henry, > >这个 source 看起来不像是 Flink 提供的 source,应该是 source 本身实现的问题。你可能需要修改 source >的源码让它出错后关闭或者进行其它处理... > >Henry 于2019年7月19日周五 上午9:31写道: > >>

Re: Apache Flink - Event time and process time timers with same timestamp

2019-07-19 Thread Biao Liu
Hi, Is it possible to support two different `TimeCharacteristic` in one job at the same time? I guess the answer is no. So I don't think there exists such a scenario. M Singh 于2019年7月19日周五 上午12:19写道: > Hey Folks - Just checking if you have any pointers for me. Thanks for > your advice. > >

Re: Apache Flink - Manipulating the state of an object in an evictor or trigger

2019-07-19 Thread Biao Liu
Hi, I don't find any official document about it. There are several state relevant methods in `TriggerContext`. I believe it's absolutely safe to use state in `Trigger` through `TriggerContext`. Regarding to `Evictor`, there is no such methods in `EvictorContext`. After taking a glance on

多滑动窗口问题

2019-07-19 Thread aegean0...@163.com
您好: 请问, 在进行流计算时, source相同, 处理逻辑相同, 但要计算不同的滑动时间窗口, 比如 每分钟统计最近 5m,15m,30m 以及 每15分钟计算, 1h, 3h , 12h的数据 除去每种窗口写一个程序外, 有其他更加便捷的解决方式吗 ? 谢谢