how can i get this heap size??
------------------ ???????? ------------------ ??????: "Alberto Ram??n";<a.ramonporto...@gmail.com>; ????????: 2017??2??14??(??????) ????0:17 ??????: "user"<user@kylin.apache.org>; ????: Re: kylin job stop accidentally and can resume success! Sounds like a problem of Resource Manager (RM) of YARN, check the Heap size for RM Kylin loose connectivity whit RM 2017-02-13 17:00 GMT+01:00 ???? <452652...@qq.com>: hello,kylin community! sometimes my jobs stop accidenttly.It is can stop by any step. kylin log is like : 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8] hbase.HBaseResourceStore:262 : Update row /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02 from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true 2017-02-13 23:27:13,384 INFO [pool-8-thread-8] ipc.Client:842 : Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2017-02-13 23:27:14,387 INFO [pool-8-thread-8] ipc.Client:842 : Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2017-02-13 23:27:15,388 INFO [pool-8-thread-8] ipc.Client:842 : Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2017-02-13 23:27:15,495 INFO [pool-8-thread-8] mapred.ClientServiceDelegate:273 : Application state is completed. FinalApplicationStatus=KILLED. Redirecting to job history server 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 : updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02 CM log is like: Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn User Name: tmn Queue: root.tmn State: KILLED Uberized: false Submitted: Sun Feb 12 19:19:24 CST 2017 Started: Sun Feb 12 19:19:38 CST 2017 Finished: Sun Feb 12 20:30:13 CST 2017 Elapsed: 1hrs, 10mins, 35sec Diagnostics: Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at 10.180.212.38 Job received Kill while in RUNNING state. Average Map Time 24mins, 48sec mapreduce job log Task KILL is received. Killing attempt! and when this happened ,by resume job,the job can resume success! I mean it is not stop by error! what's the problem? My hadoop cluster is very busy,this situation happens very often. can I set retry time and retry Interval?