回复:Flink 1.6 Job fails with IllegalStateException: Buffer pool is destroyed.

2018-09-07 Thread Zhijiang(wangzhijiang999)
, Zhijiang -- 发件人:杨力 发送时间:2018年9月7日(星期五) 13:09 收件人:user 主 题:Flink 1.6 Job fails with IllegalStateException: Buffer pool is destroyed. Hi all, I am encountering a weird problem when running flink 1.6 in yarn per-job clusters. The job

回复:Backpressure? for Batches

2018-08-29 Thread Zhijiang(wangzhijiang999)
You can check the log to show the related stack in OOM, maybe we can confirm some reasons. Or you can dump the heap to analyze the memory usages after OOM. Best, Zhijiang -- 发件人:Darshan Singh 发送时间:2018年8月29日(星期三) 19:22 收件人

回复:Backpressure? for Batches

2018-08-29 Thread Zhijiang(wangzhijiang999)
I remember, that means the downstream will be scheduled after upstream finishes, so the slower downstream will not block upstream running, then the backpressure may not exist in this case. Best, Zhijiang -- 发件人:Darshan Singh 发送时间

回复:Kryo Serialization Issue

2018-08-28 Thread Zhijiang(wangzhijiang999)
buffers in record serializers. If the record size is large and the downstream parallelism is large, it may cause OOM issue in serialization. Could you show the stack of OOM part? If it is this case, the following [1] can solve it and it is working in progress. Zhijiang [1] https

回复:Network PartitionNotFoundException when run on multi nodes

2018-07-23 Thread Zhijiang(wangzhijiang999)
askManager received the task deployment delayed from JobManager, or some operations in upstream task initialization unexpectly cost more time before registering result partition. Best, Zhijiang -- 发件人:Steffen Wohlers 发送时间:2018年7月22日(星期日)

回复:Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-17 Thread Zhijiang(wangzhijiang999)
for lock which is also occupied by task output process. As you mentioned, it makes sense to check the data structure of the output record and reduces the size or make it lightweight to handle. Best, Zhijiang -- 发件人:Gerard Garcia

回复:Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-13 Thread Zhijiang(wangzhijiang999)
framework. Also you can monitor the gc status to check the full gc delay. Best, Zhijiang -- 发件人:Gerard Garcia 发送时间:2018年7月13日(星期五) 16:22 收件人:wangzhijiang999 抄 送:user 主 题:Re: Flink job hangs/deadlocks (possibly related to out of m

回复:Limiting in flight data

2018-07-08 Thread Zhijiang(wangzhijiang999)
trics for some helps. -- 发件人:Vishal Santoshi 发送时间:2018年7月6日(星期五) 22:05 收件人:Zhijiang(wangzhijiang999) 抄 送:user 主 题:Re: Limiting in flight data Further if there is are metrics that allows us to chart delays per pipe on n/w buffers, that would be immensely help

回复:Handling back pressure in Flink.

2018-07-05 Thread Zhijiang(wangzhijiang999)
(will not cause OOM). I think you should not worry about that. Normally it is better to consider TPS of both sides and set the proper paralellism to avoid back pressure to some extent. Zhijiang -- 发件人:Mich Talebzadeh 发送时间:2018年7月4日(星期三

回复:Limiting in flight data

2018-07-05 Thread Zhijiang(wangzhijiang999)
and taskmanager.network.memory.floating-buffers-per-gate. If you have other questions about them, let me know then i can explain for you. Zhijiang -- 发件人:Vishal Santoshi 发送时间:2018年7月5日(星期四) 22:28 收件人:user 主 题:Limiting in flight data "Yes,

回复:Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-02 Thread Zhijiang(wangzhijiang999)
to trigger restarting the job. Zhijiang -- 发件人:Gerard Garcia 发送时间:2018年7月2日(星期一) 18:29 收件人:wangzhijiang999 抄 送:user 主 题:Re: Flink job hangs/deadlocks (possibly related to out of memory) Thanks Zhijiang, We haven't found any other

回复:Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-02 Thread Zhijiang(wangzhijiang999)
whether and where caused the OOM. Maybe check the task failure logs. Zhijiang -- 发件人:gerardg 发送时间:2018年6月30日(星期六) 00:12 收件人:user 主 题:Re: Flink job hangs/deadlocks (possibly related to out of memory) (fixed formatting) Hello

回复:DataSet with Multiple reduce Actions

2018-06-27 Thread Zhijiang(wangzhijiang999)
Hi Osh, As I know, currently one dataset source can not be consumed by several different vertexs and from the API you can not construct the topology for your request. I think your way to merge different reduce functions into one UDF is feasible. Maybe someone has better solution. :) zhijiang

回复:Checkpoints very slow with high backpressure

2018-04-07 Thread Zhijiang(wangzhijiang999)
to improve barrier alignment, that has already been verified to decrease the alignment time greatly for backpressure scenarios. zhijiang --发件人:Piotr Nowojski <pi...@data-artisans.com>发送时间:2018年4月6日(星期五) 00:06收件人:Edward

回复:An addition to Netty's memory footprint

2017-06-30 Thread Zhijiang(wangzhijiang999)
memory usage by netty PooledByteBuffer can be largely reduced and under controlled easily. cheers,zhijiang --发件人:Kurt Young <k...@apache.org>发送时间:2017年6月30日(星期五) 15:51收件人:dev <d...@flink.apache.org>; user <user@flink

回复:Question regarding configuring number of network buffers

2017-06-07 Thread Zhijiang(wangzhijiang999)
it can help you. Cheers,Zhijiang  --发件人:Ray Ruvinskiy <ray.ruvins...@arcticwolf.com>发送时间:2017年6月7日(星期三) 23:59收件人:user@flink.apache.org <user@flink.apache.org>主 题:Question regarding configuring number of net

回复:Multiple consumers on a subpartition

2017-04-25 Thread Zhijiang(wangzhijiang999)
cheers,zhijiang--发件人:albertjonathan <alb...@cs.umn.edu>发送时间:2017年4月26日(星期三) 02:37收件人:user <user@flink.apache.org>主  题:Multiple consumers on a subpartition Hello, Is there a way Flink allow a (pipelined) subpartit

回复:Yarn terminating TM for pmem limit cascades causing all jobs to fail

2017-04-19 Thread Zhijiang(wangzhijiang999)
 native memory, so you can try to upgrade the version as Stephan's suggestions.  Good luck! Cheers,zhijiang--发件人:Stephan Ewen <se...@apache.org>发送时间:2017年4月19日(星期三) 21:25收件人:Shannon Carey <sca...@expedia.com>抄 送:user@flin

回复:回复:Changing timeout for cancel command

2017-04-14 Thread Zhijiang(wangzhijiang999)
for ack in hdfs.  cheers,zhijiang--发件人:Jürgen Thomann <juergen.thom...@innogames.com>发送时间:2017年4月13日(星期四) 15:32收件人:user <user@flink.apache.org>主 题:Re: 回复:Changing timeout for cancel command Hi zhijiang,

回复:Changing timeout for cancel command

2017-04-12 Thread Zhijiang(wangzhijiang999)
Hi Jürgen,      You can set the timeout in the configuration by this key "akka.ask.timeout", and the current default value is 10 s. Hope it can help you. cheers,zhijiang --发件人:Jürgen Thomann <juergen.thom...@innog

回复:PartitionNotFoundException on deploying streaming job

2017-04-04 Thread Zhijiang(wangzhijiang999)
when response the PartitionNotFound to track the reason. Wish your further findings! Cheers,Zhijiang --发件人:Kamil Dziublinski <kamil.dziublin...@gmail.com>发送时间:2017年4月4日(星期二) 17:20收件人:user <user@flink.apach

回复:question about record

2017-03-27 Thread Zhijiang(wangzhijiang999)
buffers. Cheers,Zhijiang--发件人:lining jing <jinglini...@gmail.com>发送时间:2017年3月27日(星期一) 15:46收件人:user <user@flink.apache.org>主 题:question about record Hi All , data transmission is achieved through the buffer. If recor

回复:multiple consumer of intermediate data set

2017-03-14 Thread Zhijiang(wangzhijiang999)
-B1,A1-IntermediateResultPartition-B2,A2-IntermediateResultPartition-B1,  A2-IntermediateResultPartition-B2 in the right graph. Cheers, Zhijiang-发件人:lining jing <jinglini...@gmail.com>发送时间:2017年3月15日(星期三) 10:54收件人:user

回复:multiple consumer of intermediate data set

2017-03-14 Thread Zhijiang(wangzhijiang999)
case with JobVertex(A). Cheers, Zhijiang--发件人:윤형덕 <ynoo...@naver.com>发送时间:2017年3月13日(星期一) 12:43收件人:user <user@flink.apache.org>主  题:multiple consumer of intermediate data set Hi All, figure1 https://ci.apache.org/projects

回复:TaskManager failure detection

2017-02-22 Thread Zhijiang(wangzhijiang999)
onent in JobManager is in charge of recovering state from complete checkpoint, and the state would be set onto Execution in ExecutionGraph. Best, Zhijiang For yarn cluster mode,  --发件人:Dominik Safaric <dominiksafa...@gmail.com>发送时间:201

<    1   2