Flink如何实现Job间的协同联系?

2019-06-18 Thread 徐涛
大家好, 我这边想做一个实时数仓项目。但随着Flink间Job的数量越来越多,发现很多Job之间的代码存在大量的重复,且迫切需要一个类似于Batch的中间层解决方案来减少冗余,增加整体的清晰度和层次感。 我这边能想到的解决方案是:采用Kafka作为Job之间的联系纽带,例如有两个Job Job_1: 从原始kafka topic TOPIC_ORIG获取数据, 进行一定的业务逻辑处理后,写到另一个kafka topic, TOPIC_JOB_1_SINK 。注意 ① 需要实现一个retract kafka sink

Re: Role of Job Manager

2019-06-18 Thread Biao Liu
Hi Pankaj, That's really a good question. There was a refactor of architecture before[1]. So there might be some descriptions used the outdated concept. Before refactoring, Job Manager is a centralized role. It controls whole cluster and all jobs which is described in your interpretation 1.

unsubscribe

2019-06-18 Thread Sheel Pancholi

Re: [EXTERNAL] Re: How to restart/recover on reboot?

2019-06-18 Thread John Smith
Ah ok we need to pass --host. The command line help sais jobmanager.sh ?!?! If I recall. I have to go check tomorrow... On Tue., Jun. 18, 2019, 10:05 p.m. PoolakkalMukkath, Shakir, < shakir_poolakkalmukk...@comcast.com> wrote: > Hi Nick, > > > > It works that way by explicitly setting the –host.

Re: [EXTERNAL] Re: How to restart/recover on reboot?

2019-06-18 Thread PoolakkalMukkath, Shakir
Hi Nick, It works that way by explicitly setting the –host. I got mislead by the “only” word in doc and did not try. Thanks for the help Thanks, Shakir From: "Martin, Nick" Date: Tuesday, June 18, 2019 at 6:31 PM To: "PoolakkalMukkath, Shakir" , Till Rohrmann , John Smith Cc: user Subject:

Re: How to build dependencies and connections between stream jobs?

2019-06-18 Thread 徐涛
Hi Knauf, The solution that I can think of to coordinate between different stream jobs is : For example there are two streaming jobs, Job_1 and Job_2: Job_1: receive data from the original kafka topic, TOPIC_ORIG for example, sink the data to another kafka topic,

RE: [EXTERNAL] Re: How to restart/recover on reboot?

2019-06-18 Thread Martin, Nick
Jobmanager.sh takes an optional argument for the hostname to bind to, and start-cluster uses it. If you leave it blank it, the script will use whatever is in flink-conf.yaml (localhost is the default value that ships with flink). The dockerized version of flink runs pretty much the way you’re

答复: 关于在Flink-process中调用其他系统API的问题

2019-06-18 Thread 季 鸿飞
Hi:大家好! 关于在Flink-process中调用其他系统API的问题,已解决! 感谢大家! 发送自 Windows 10 版邮件应用 发件人: 季 鸿飞 发送时间: Monday, June 17, 2019 9:40:54 PM 收件人: user-zh@flink.apache.org 主题: 关于在Flink-process中调用其他系统API的问题 大家好:

Flink tps 速度问题

2019-06-18 Thread haibin
hello,各位大佬: 在做实时etl的时候,source(kafka)->map->filter->flatmap->map->sink(kafka)这样流程etl的时候,发现处理速度很慢,有什么好的方法提高处理速度。 有25个作业同时消费同一个topic(32个分区),会不会有性能问题? -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: [EXTERNAL] Re: How to restart/recover on reboot?

2019-06-18 Thread PoolakkalMukkath, Shakir
Hi Tim,John, I do agree with the issue John mentioned and have the same problem. We can only start a standalone HA cluster with ./start-cluster.sh script. And then when there are failures, we can restart those components individually by calling jobmanager.sh/ jobmanager.sh. This works great

Re: How to restart/recover on reboot?

2019-06-18 Thread Till Rohrmann
I guess it should work if you installed a systemd service which simply calls `jobmanager.sh start` or `taskmanager.sh start`. Cheers, Till On Tue, Jun 18, 2019 at 4:29 PM John Smith wrote: > Yes, that is understood. But I don't see why we cannot call jobmanager.sh > and taskmanager.sh to build

Re: Need for user class path accessibility on all nodes

2019-06-18 Thread Abdul Qadeer
Thanks Biao/Till, that answers my question. On Tue, 18 Jun 2019 at 01:41, Till Rohrmann wrote: > Hi Abdul, > > as Biao said the `--classpath` option should only be used if you want to > make dependencies available which are not included in the submitted user > code jar. E.g. if you have

Re: Role of Job Manager

2019-06-18 Thread Eduardo Winpenny Tejedor
Hi Pankaj, I have no experience with Hadoop but from the book I gathered there's one Job Manager per application i.e. per jar (as in the example in the first chapter). This is not to say there's one Job Manager per job. Actually I don't think the word Job is defined in the book, I've seen Task

Flink error handling

2019-06-18 Thread Steven Nelson
Hello! We are internally having a debate on how best to handle exceptions within our operators. Some advocate for wrapping maps/flatMaps inside a processfunction and sending the error to a side output. Other options are returning a custom Either that gets filtered and mapped into different

Side output in ProcessFunction.onTimer

2019-06-18 Thread Frank Wilson
Hi, Is there a way to make side outputs in an onTimer callback in ProcessFunction? I want to side output events that belong to a session that was below the minimum duration threshold. Currently these events are just discarded but I’d like more traceability. Thanks, Frank

Re: How to restart/recover on reboot?

2019-06-18 Thread John Smith
Yes, that is understood. But I don't see why we cannot call jobmanager.sh and taskmanager.sh to build the cluster and have them run as systemd units. I looked at start-cluster.sh and all it does is SSH and call jobmanager.sh which then cascades to taskmanager.sh I just have to pin point what's

Re: Calculate a 4 hour time window for sum,count with sum and count output every 5 mins

2019-06-18 Thread Felipe Gutierrez
Hi Vijay, I managed by using "ctx.timerService().registerProcessingTimeTimer(timeoutTime);" on the processElement method and clearing the state on the onTimer method. This is my program [1]. [1]

Re: Why the current time of my window never reaches the end time when I use KeyedProcessFunction?

2019-06-18 Thread Felipe Gutierrez
I achieved some enhancement based on [1]. My code is here [2]. Basically I am using "ctx.timerService().registerProcessingTimeTimer(timeoutTime);" inside the processElement method to trigger the onTimer method. And when the onTimer method is triggered I clean the state using

Re: Why the current time of my window never reaches the end time when I use KeyedProcessFunction?

2019-06-18 Thread Felipe Gutierrez
I am sorry, I wanted to point this reference https://stackoverflow.com/a/47071833/2096986 which implements a window on a ProcessFunction in Flink. *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com * On

Flink程序长期运行后报错退出 PartitionRequestQueue - Encountered error while consuming partitions

2019-06-18 Thread 罗学焕/予之
大家好: Flink应用,如以100笔/s的交易量向kafka写入数据(数据量不大),Flink程序接受并处理数据,涉及到 20个左右的 流表 Join 。和大量的异步操作读取hbase 维表。 运行1-2小时后,Flink应用停止运行并报错,(报错关键堆栈如下,省略部分为flink.shaded.netty部分的堆栈) 观察过内存未溢出,网络负载也不高。 不知道是啥原因,大家能帮忙看下吗? 主要报错: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue - Encountered error

Re: Need for user class path accessibility on all nodes

2019-06-18 Thread Till Rohrmann
Hi Abdul, as Biao said the `--classpath` option should only be used if you want to make dependencies available which are not included in the submitted user code jar. E.g. if you have installed a large library which is too costly to ship every time you submit a job. Usually, you would not need to

Re: How to restart/recover on reboot?

2019-06-18 Thread Till Rohrmann
When a single machine fails you should rather call `taskmanager.sh start`/`jobmanager.sh start` to start a single process. `start-cluster.sh` will start multiple processes on different machines. Cheers, Till On Mon, Jun 17, 2019 at 4:30 PM John Smith wrote: > Well some reasons, machine

Re: Checkpointing & File stream with

2019-06-18 Thread Sung Gon Yi
It works well now with following codes: —— TextInputFormat specFileFormat = new TextInputFormat(new Path(specFile)); specFileFormat.setFilesFilter(FilePathFilter.createDefaultFilter()); DataStream specificationFileStream = env .readFile(specFileFormat, specFile,

Role of Job Manager

2019-06-18 Thread Pankaj Chand
I am trying to understand the role of Job Manager in Flink, and have come across two possibly distinct interpretations. 1. The online documentation v1.8 signifies that there is at least one Job Manager in a cluster, and it is closely tied to the cluster of machines, by managing all jobs in that

关于在Flink-process中调用其他系统API的问题

2019-06-18 Thread 季 鸿飞
大家好: 我在Broadcast广播流,单机模式下(idea运行): connect().process(new function)方法中向其他业务系统请求数据,debug时发现,代码运行到请求数据那行的时候就不再往下运行了,就算参数错误也不会报错。 新的数据流进来也是阻塞在了那行代码那。 我在单独的Test.java 中是可行的,参数错误也会直接报错。 希望得到大家的帮助。 此致 敬礼 发送自 Windows 10 版邮件应用

Re: Has Flink a kafka processing location strategy?

2019-06-18 Thread Konstantin Knauf
Hi Theo, no, sorry, the Kafka partitions that each subtask is assigned to is only determined by the index of the subtask. Best, Konstantin On Mon, Jun 17, 2019 at 2:57 PM Theo Diefenthal < theo.diefent...@scoop-software.de> wrote: > Hi, > > > > We have a Hadoop/YARN Cluster with Kafka and

Re: Why the current time of my window never reaches the end time when I use KeyedProcessFunction?

2019-06-18 Thread Piotr Nowojski
Hi, Isn’t your problem that the source is constantly emitting the data and bumping your timers? Keep in mind that the code that you are basing on has the following characteristic: > In the following example a KeyedProcessFunction maintains counts per key, and > emits a key/count pair whenever

Re: Checkpointing & File stream with

2019-06-18 Thread Yun Tang
Hi Sung How about using FileProcessingMode.PROCESS_CONTINUOUSLY [1] as watch type when reading data from HDFS. FileProcessingMode.PROCESS_CONTINUOUSLY would periodically monitor the source while default FileProcessingMode.PROCESS_ONCE would only process once the data and exit. [1]

Checkpointing & File stream with

2019-06-18 Thread Sung Gon Yi
Hello, I work on joining two streams, one is from Kafka and another is from a file (small size). Stream processing works well, but checkpointing is failed with following message. The file only has less than 100 lines and the pipeline related file reading is finished with “FINISHED’ o as soon