求助flink Buffer pool is destroyed异常

2019-04-08 Thread 应聘程序员 北京邮电大学
hi, 大家好! 今天运行flink时抛出了Buffer pool is destroyed异常,数据源是kafka;消费前kafka队列中堆积了8G左右的数据。详细报错如下: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator at

Re: Flink Pipeline - CICD

2019-04-08 Thread Lifei Chen
There is a go cli for automating deploying and udpating flink jobs, you can integrate Jenkins pipeline with it, maybe it helps. https://github.com/ing-bank/flink-deployer Navneeth Krishnan 于2019年4月9日周二 上午10:34写道: > Hi All, > > We have some streaming jobs in production and today we manually

附件好像发不过去,补充部分日志//回复: 回复: blink提交yarn卡在一直重复分配container

2019-04-08 Thread 苏 欣
2019-04-09 09:58:03.012 [flink-akka.actor.default-dispatcher-3] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - SourceConversion(table:[builtin, default, sourceStreamTable, source: [KafkaJsonTableSource]], fields:(TI, EV, CS_HOST, DCS_ID)) -> Calc(select: (TI, EV, DCS_ID)) ->

回复: 答复: blink提交yarn卡在一直重复分配container

2019-04-08 Thread 苏 欣
sean...@live.com 发件人: 苏 欣 发送时间: 2019-04-09 10:30 收件人: user-zh@flink.apache.org 主题: 答复: blink提交yarn卡在一直重复分配container 不好意思,已补充yarn的日志文件。 出现问题的原因我已经找到了,在配置flink-conf.yaml中的下面三项后,会出现分配不了资源的问题

Flink Pipeline - CICD

2019-04-08 Thread Navneeth Krishnan
Hi All, We have some streaming jobs in production and today we manually deploy the flink jobs in each region/environment. Before we start automating it I just wanted to check if anyone has already created a CICD script for Jenkins or other CICD tools to deploy the latest JAR on to running flink

答复: blink提交yarn卡在一直重复分配container

2019-04-08 Thread 苏 欣
不好意思,已补充yarn的日志文件。 出现问题的原因我已经找到了,在配置flink-conf.yaml中的下面三项后,会出现分配不了资源的问题 security.kerberos.login.use-ticket-cache: false security.kerberos.login.keytab: /home/hive.keytab security.kerberos.login.principal: hive/cdh129135@MYCDH 如果在客户机使用kinit命令后再提交,yarn资源可以正常分配。 现在我有几个问题请教大佬们: 1、

Re: Schema Evolution on Dynamic Schema

2019-04-08 Thread Shahar Cizer Kobrinsky
That makes sense Fabian! So I want to make sure I fully understand how this should look. Would the same expression look like: custom_groupby(my_group_fields, map[ 'a', sum(a)...]) ? Will I be able to use the builtin aggregation function internally such as sum/avg etc? or would I need to

Re: Adding metadata to the jar

2019-04-08 Thread Timothy Victor
One approach I use is to write the git commit sha to the jars manifest while compiling it (I don't use semantic versioning but rather use commit sha). Then at runtime I read the implementationVersion (class.getPackage().getImplementationVersion()), and print that in the job name. Tim On Mon,

Re: Adding metadata to the jar

2019-04-08 Thread Bruno Aranda
Hi Avi, Don't know if there are better ways, but we store the version of the job running and other metadata as part of the "User configuration" of the job, so it shows in the UI when you go to the job Configuration tab inside the job. To do so, when we create the job: val buildInfo = new

Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Elias Levy
Hasn't this been always the end goal? It's certainly what we have been waiting on for job with very large TTLed state. Beyond timer storage, timer processing to simply expire stale data that may not be accessed otherwise is expensive. On Mon, Apr 8, 2019 at 7:11 AM Aljoscha Krettek wrote: > I

Cannot instantiate user function, unread block data

2019-04-08 Thread Metla,Sujitha
Hello, We are trying to run Beam application with Flink runner. We are using Beam API-2.10, Kafka-0.10.2.1 and Flink-1.7. Using direct-runner we made sure that the pipeline works fine end-to-end. But when we are trying to run the application using flink-runner we are facing this issue. I am

Re: Flink 1.7.2 UI : Jobs removed from Completed Jobs section

2019-04-08 Thread Timothy Victor
I face the same issue in Flink 1.7.1. Would be good to know a solution. Tim On Mon, Apr 8, 2019, 12:45 PM Jins George wrote: > Hi, > > > > I am facing a weird problem in which jobs from ‘Completed Jobs’ section in > Flink 1.7.2 UI disappear. Looking at the job manager logs, I see the job >

Flink 1.7.2 UI : Jobs removed from Completed Jobs section

2019-04-08 Thread Jins George
Hi, I am facing a weird problem in which jobs from ‘Completed Jobs’ section in Flink 1.7.2 UI disappear. Looking at the job manager logs, I see the job was failed and restarted ‘restart-strategy.fixed-delay.attempts’ times and the JobMaster was stopped. I was able to see the job in Completed

Adding metadata to the jar

2019-04-08 Thread Avi Levi
Is there a way to add some metadata to the jar and see it on dashboard ? I couldn't find a way to do so but I think it very useful. Consider that you want to know which version is actually running in the job manager (not just which jar is uploaded which is not necessary being running at the moment

Re: Partitioning key range

2019-04-08 Thread Davood Rafiei
Hi all, Thanks a lot for the replies! On Mon, Apr 8, 2019 at 5:15 PM Ken Krugler wrote: > Hi Davood, > > We have done some explicit partitioning in the past, but it’s pretty > fragile. > > See FlinkUtils#makeKeyForOperatorIndex >

Re: Partitioning key range

2019-04-08 Thread Ken Krugler
Hi Davood, We have done some explicit partitioning in the past, but it’s pretty fragile. See FlinkUtils#makeKeyForOperatorIndex Though I haven’t tried this

Re: print() method does not always print on the taskmanager.out file

2019-04-08 Thread Felipe Gutierrez
Ok, I figured out what was happening. I was not passing the IP of the virtual machine which generates the source using MQTT protocol. So, I was seeing results only if the operator was placed on the machine that was generating the data (the virtual machine). If the operator was placed on the other

RE: InvalidProgramException when trying to sort a group within a dataset

2019-04-08 Thread Papadopoulos, Konstantinos
Thanks, Fabian. Problem solved after implementing the Serializable interface from all the services of the stack or making transient the ones not needed. Best, Konstantinos From: Fabian Hueske Sent: Δευτέρα, 8 Απριλίου 2019 11:37 πμ To: Papadopoulos, Konstantinos Cc: Chesnay Schepler ; user

Re: Flink 1.7.2 extremely unstable and losing jobs in prod

2019-04-08 Thread Till Rohrmann
Hi Bruno, first of all good to hear that you could resolve some of the problems. Slots get removed if a TaskManager gets unregistered from the SlotPool. This usually happens if a TaskManager closes its connection or its heartbeat with the ResourceManager times out. So you could look for messages

Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Aljoscha Krettek
I had a discussion with Andrey and now think that also the case event-time-timestamp/watermark-cleanup is a valid case. If you don’t need this for regulatory compliance but just for cleaning up old state, in case where you have re-processing of old data. I think the discussion about whether to

Re: HA and zookeeper

2019-04-08 Thread Konstantin Knauf
Hi Boris, I am not aware of documentation describing this in detail. There is an open JIRA for a High Availability Service based on etcd [1]. Cheers, Konstantin [1] https://issues.apache.org/jira/browse/FLINK-11105 On Mon, Apr 8, 2019 at 3:20 PM Boris Lublinsky <

Re: HA and zookeeper

2019-04-08 Thread Boris Lublinsky
Thanks. Is there: 1. Documentation, describing this? 2. Any proposals/work trying to store it elsewhere? The reason for this question is kubernetes deployment, where the use of zookeeper seems an overkill, but it will not work without zookeeper, see

Default Kafka producers pool size for FlinkKafkaProducer.Semantic.EXACTLY_ONCE

2019-04-08 Thread min.tan
Hi, I keep getting exceptions "org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Too many ongoing snapshots. Increase kafka producers pool size or decrease number of concurrent checkpoints." I understand that DEFAULT_KAFKA_PRODUCERS_POOL_SIZE is 5 and need to increase this

Re: Flink SQL hangs in StreamTableEnvironment.sqlUpdate, keeps executing and seems never stop, finally lead to java.lang.OutOfMemoryError: GC overhead limit exceeded

2019-04-08 Thread 徐涛
Hi Fabian, No, I did not add any optimization rules. I have created two JIRAs about this issue, because when I modify the SQL a little, the error turns to StackOverflowError then: https://issues.apache.org/jira/browse/FLINK-12097

Re: End-to-end exactly-once semantics in simple streaming app

2019-04-08 Thread Piotr Nowojski
Hi, Regarding the JDBC and Two-Phase commit (2PC) protocol. As Fabian mentioned it is not supported by the JDBC standard out of the box. With some workarounds I guess you could make it work by for example following one of the ideas: 1. Write records using JDBC with at-least-once semantics, by

FlinkException: The assigned slot was removed

2019-04-08 Thread Papadopoulos, Konstantinos
Hi all, When I execute my Flink job using IntelliJ IDEA stand-alone mode, the job is executed successfully, but when I try to attach it to a stand-alone Flink cluster, my job fails with a Flink exception that "the assigned slot was removed". Does anyone have any idea why I am facing this

Re: Flink 1.7.2 extremely unstable and losing jobs in prod

2019-04-08 Thread Bruno Aranda
Hi Till, Many thanks for your reply and don't worry. We understand this is tricky and you are busy. We have been experiencing some issues, and a couple of them have been addressed, so the logs probably were not relevant anymore. About losing jobs on restart -> it seems that YARN was killing the

Re: HA and zookeeper

2019-04-08 Thread Fabian Hueske
Hi Boris, ZooKeeper is also used by the JobManager to store metadata about the running job. The JM writes information like the JobGraph, JAR file, checkpoint metadata to a persistent storage (like HDFS, S3, ...) and a pointer to this information to ZooKeeper. In case of a recovery, the new JM

Re: Flink SQL hangs in StreamTableEnvironment.sqlUpdate, keeps executing and seems never stop, finally lead to java.lang.OutOfMemoryError: GC overhead limit exceeded

2019-04-08 Thread Fabian Hueske
Hi Henry, It seem that the optimizer is not handling this case well. The search space might be too large (or rather the optimizer explores too much of the search space). Can you share the query? Did you add any optimization rules? Best, Fabian Am Mi., 3. Apr. 2019 um 12:36 Uhr schrieb 徐涛 : >

Re: checkpointing when yarn session crashed

2019-04-08 Thread Guowei Ma
Could you give more details? Such as which flink version do you use? which Statebackend do you use? Does there has any successful checkpoint? and so on.. I can't reproduce your problem. (I used BucketingSinkTestProgram(enable external checkpoint) + Flink 1.7.2 and default StateBackend ) Best,

Re: Enabling JMX Reporter on a Local Mini Cluster

2019-04-08 Thread Frank Wilson
Originally I had: *public static void *main(String[] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment. *getExecutionEnvironment*(); … This was nice because this would work just as well in my kubernetes deployment as it did when I just ran the main directly. However

Re: End-to-end exactly-once semantics in simple streaming app

2019-04-08 Thread Fabian Hueske
Hi Patrick, In general, you could also implement the 2PC logic in a regular operator. It does not have to be a sink. You would need to add the logic of TwoPhaseCommitSinkFunction to your operator. However, the TwoPhaseCommitSinkFunction does not work well with JDBC. The problem is that you would

Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Kostas Kloudas
Hi all, For GDPR: I am not sure about the regulatory requirements of GDPR but I would assume that the time for deletion starts counting from the time an organisation received the data (i.e. the wall-clock ingestion time of the data), and not the "event time" of the data. In other case, an

Re: Schema Evolution on Dynamic Schema

2019-04-08 Thread Fabian Hueske
Hi Shahar, Sorry for the late response. The problem is not with the type of the retract stream, but with the GROUP BY aggregation operator. The optimizer is converting the plan into an aggregation operator that computes all aggregates followed by a projection that inserts the aggregation results

Re: Flink 1.7.2 extremely unstable and losing jobs in prod

2019-04-08 Thread Till Rohrmann
Hi Bruno, sorry for getting back to you so late. I just tried to access your logs to investigate the problem but transfer.sh tells me that they are no longer there. Could you maybe re-upload them or directly send them to my mail address. Sorry for not taking faster a look at your problem and the

Re: InvalidProgramException when trying to sort a group within a dataset

2019-04-08 Thread Fabian Hueske
Hi, If you have a closer look at the excecption, you'll see that the issue is cause by com.iri.aa.etl.rgm.service.ThresholdAcvCalcServiceImpl not being serializable. It seems that you have a reference to this class somewhere. Flink requires that all function classes (like KeySelector) are

Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Aljoscha Krettek
Oh boy, this is an interesting pickle. For *last-access-timestamp*, I think only *event-time-of-current-record* makes sense. I’m looking at this from a GDPR/regulatory compliance perspective. If you update a state, by say storing the event you just received in state, you want to use the exact

Re: Lantency caised Flink Checkpoint on EXACTLY_ONCE mode

2019-04-08 Thread Fabian Hueske
Hi Min, Guowei is right, the comment in the documentation about exactly-once in embarrassingly parallel data flows refers to exactly-once *state consistency*, not *end-to-end* exactly-once. However, in strictly forwarding pipelines, enabling exactly-once checkpoints should not have drawbacks

Re: Partitioning key range

2019-04-08 Thread Fabian Hueske
Hi Davood, Flink uses hash partitioning to assign keys to key groups. Each key group is then assigned to a task for processing (a task might process multiple key groups). There is no way to directly assign a key to a particular key group or task. All you can do is to experiment with different

Re: flink on yarn 启动任务失败

2019-04-08 Thread Biao Liu
Hi, “Queue's AM resource limit exceeded” -> 这个应该是 YARN 对 AM 的使用资源进行了限制吧,上限是 4096M 内存?你启动的应该是 job mode 吧,每个 job 都会启动单独的 AM,每个 AM 占用 2048M 内存?如果按这样算的话确实只够启动两个 1900 <575209...@qq.com> 于2019年4月4日周四 下午4:54写道: > 目前整体采用flink on yarn ha 部署,flink版本为社区版1.7.2,hadoop版本为社区版2.8.5 > > >

Re: 写HBase慢造成消息堆积,有没有异步IO可以用于sink或者outputFormat的方法?

2019-04-08 Thread Yang Peng
有没有参考flink官方源码示例中的这个例子: org.apache.flink.addons.hbase.example.HBaseWriteExample 这个类写的就是flink插入HBase 效率很高 我们实际生产也用到了插入HBase但是效率很高,你可以看一下这个源码; 张作峰 于2019年4月6日周六 下午4:38写道: > 业务场景中,需要将处理后的消息写入到HBase中,由于写入HBase慢,引起消息堆积。 > 通过Stream API 有没有方法可以异步批量发送? > 谢谢!

Re: flink on yarn 模式 日志问题

2019-04-08 Thread Yang Peng
flink的historyserver 貌似只能查看completed jobs 不能查看日志,这个跟spark的historyserver有差别吧 Biao Liu 于2019年4月8日周一 下午3:43写道: > 1. 这个日志确实会存在,如果你觉得5秒打印两行不能接受的话,我能想到的几种解决方法 > 1.1. 加大 checkpoint 间隔 > 1.2. 单独指定该 logger 的 level,修改 > >

Re: 求助,blink资源配置的问题,为什么资源还不足啊。。。

2019-04-08 Thread Biao Liu
Hi,可以提供更详细的信息吗?例如 1. 版本号 2. 完整的日志 3. 完整的集群配置文件 4. 集群是 on YARN 还是 standalone? 启动集群命令? 5. 完整的 job 信息?启动 job 的命令? 邓成刚【qq】 于2019年4月4日周四 下午6:13写道: > 求助,blink资源配置的问题,为什么资源还不足啊。。。 > 盼回复,谢谢! > > 为什么 > > 2019-04-04 17:49:32,495 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool -

Re: 请教关于Keyed() 方法的问题。

2019-04-08 Thread Biao Liu
Hi, 尝试理解fli一下你的疑问 “其中,每个具体mapFunc处理的数据,应该是相同的key数据。不知理解是否正确” -> keyby 只会保证相同 key 的数据会被分在相同 mapFunc 中,每个 mapFunc 可能会处理不同 key 的数据,详见官网文档: https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/ Yaoting Gong 于2019年4月4日周四 下午2:00写道: > 大家好, > >

Re: flink on yarn 模式 日志问题

2019-04-08 Thread Biao Liu
1. 这个日志确实会存在,如果你觉得5秒打印两行不能接受的话,我能想到的几种解决方法 1.1. 加大 checkpoint 间隔 1.2. 单独指定该 logger 的 level,修改 log4j.properties,增加一行:log4j.logger.org.apache.flink.runtime.checkpoint.CheckpointCoordinator=WARN 1.3. 修改源代码重新编译 2. 确实在 YARN 模式下,日志的位置不固定,和你的需求不匹配,standalone 模式可能更友好些。硬核一点的方法,可以扩展 log4j

RE: InvalidProgramException when trying to sort a group within a dataset

2019-04-08 Thread Papadopoulos, Konstantinos
Hi Fabian, Thanks for your support. I updated my POJO to implement the Serializable interface with no success. I got the same NotSerializableException. Best, Konstantinos From: Fabian Hueske Sent: Σάββατο, 6 Απριλίου 2019 2:26 πμ To: Papadopoulos, Konstantinos Cc: Chesnay Schepler ; user