On 1/27/15, 9:24 PM, [email protected] wrote:
I test again, I found if I set mapreduce.map.cpu.vcores >1 ,the job will hang . 
Very similar
to https://issues.apache.org/jira/browse/TEZ-704

I suspect this might be a YARN scheduler bug.

Are you using the FairScheduler or the CapacityScheduler?

I cannot reproduce this issue using my YARN-CS cluster, but I suspect capacity scheduler is automatically set up to do the dominant resource scheduling.

Are you on FS + Fair instead of FS + DRF or CS?

Cheers,
Gopal

From: [email protected]
Date: 2015-01-28 10:29
To: user
Subject: Re: Re: tez map task and reduce task stay pending forerver
o yeah . I fix the problem.
I add the config to my hive-site.xml
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
</property>
And config my tez-site.xml just
  <property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/apps/tez-0.5.3/tez-0.5.3-minimal.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>

Every thing is ok.
I think some config in my cluster is too larger .



[email protected]

From: [email protected]
Date: 2015-01-28 10:24
To: user
Subject: Re: Re: tez map task and reduce task stay pending forerver
No .   set hive.execution.engine=mr , still hang...



[email protected]

发件人: Jianfeng (Jeff) Zhang
发送时间: 2015-01-28 10:11
收件人: user
主题: Re: 回复: tez map task and reduce task stay pending forerver
Can you run this query successfully using hive on mr ?



Best Regards,
Jeff Zhang


On Wed, Jan 28, 2015 at 10:01 AM, [email protected] <[email protected]> wrote:

I check the tez document from HDP page 
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html.

tez.am.resource.memory.mb default value is 1536
My hadoop yarn.app.mapreduce.am.resource.mb value is 5734 MiB

The configuration mismatch will cause the problem ?


[email protected]

发件人: [email protected]
发送时间: 2015-01-27 17:59
收件人: user
主题: 回复: 回复: tez map task and reduce task stay pending forerver
Sorry Gopal V, I made a mistake, My config mapreduce.map.memory.mb is 2867 .



[email protected]

发件人: [email protected]
发送时间: 2015-01-27 17:58
收件人: user
主题: 回复: 回复: tez map task and reduce task stay pending forerver
Hello Gopal V,
         I check my cdh config ,I found mapreduce.map.memory.mb is 2876.
[email protected]

发件人: [email protected]
发送时间: 2015-01-27 17:31
收件人: user
主题: 回复: Re: tez map task and reduce task stay pending forerver

I check the hivetez.log . No kill  request  trigger by hive.


[email protected]

发件人: Gopal V
发送时间: 2015-01-27 17:17
收件人: user
抄送: [email protected]
主题: Re: tez map task and reduce task stay pending forerver
On 1/27/15, 12:50 AM, [email protected] wrote:
hive 0.14.0  tez 0.53  hadoop 2.3.0-cdh 5.0.2
hive> select * from p_city order by id;
Query ID = zhoushugang_20150127163434_da70d957-6ac4-4b8b-a484-42b593838076
...
--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------

Map 1 INITED 1 0 0 1 0 0
Reducer 2 INITED 1 0 0 1 0 0

Looks all container requests are pending/unresponsive.

I see a container request in the log with

2015-01-27 15:43:15,434 INFO [TaskSchedulerEventHandlerThread]
rm.YarnTaskSchedulerService: Allocation request for task:
attempt_1419300485749_371785_1_00_000000_0 with request:
Capability[<memory:2867, vCores:3>]Priority[2] host:
yhd-jqhadoop11.int.yihaodian.com rack: null
...
2015-01-27 15:43:17,635 INFO [DelayedContainerManager]
rm.YarnTaskSchedulerService: Releasing held container as either there
are pending but  unmatched requests or this is not a session,
containerId=container_1419300485749_371785_01_000002, pendingTasks=1,
isSession=true. isNew=true

That seems to indicate that a container allocation request was made, but
YARN resource manager never responded with a container (or gave the
wrong container?).

Does the container size 2867 suggest any idea on what that might be?

Cheers,
Gopal



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed
and may contain information that is confidential, privileged and exempt from 
disclosure under
applicable law. If the reader of this message is not the intended recipient, 
you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or 
forwarding
of this communication is strictly prohibited. If you have received this 
communication in error,
please contact the sender immediately and delete it from your system. Thank You.


Reply via email to