Hello
My cluster mapreduce.map.cpu.vcores setting is 3.
[email protected]
From: Hitesh Shah
Date: 2015-01-29 05:36
To: user
Subject: Re: tez map task and reduce task stay pending forerver
Hello
Thanks for tracking down the issue to the vcores setting. Let me dig into that.
Some initial questions:
- do you know if YARN has been configured to schedule on both memory and
vcores i.e using the DominantResourceScheduler?
- I am assuming that the max vcores per container is 1 but the job is
configured wrongly hence the error. The most likely fix for this is to probably
check the max resource settings allowed by YARN before allowing a Job/DAG to be
submitted. Do you see the problem show up if YARN is configured to allow
containers with 2 vcores i.e. changing the max allocation setting for vcores in
the RM?
thanks
— Hitesh
On Jan 27, 2015, at 9:24 PM, [email protected] wrote:
> I test again, I found if I set mapreduce.map.cpu.vcores >1 ,the job will hang
> . Very similar to https://issues.apache.org/jira/browse/TEZ-704
>
> [email protected]
>
> From: [email protected]
> Date: 2015-01-28 10:29
> To: user
> Subject: Re: Re: tez map task and reduce task stay pending forerver
> o yeah . I fix the problem.
> I add the config to my hive-site.xml
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> </property>
> <property>
> <name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
> <value>1</value>
> </property>
> <property>
> <name>yarn.app.mapreduce.am.command-opts</name>
> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
> </property>
> <property>
> <name>mapreduce.map.java.opts</name>
> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
> </property>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
> </property>
> <property>
> <name>mapreduce.map.memory.mb</name>
> <value>1024</value>
> </property>
> <property>
> <name>mapreduce.map.cpu.vcores</name>
> <value>1</value>
> </property>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>1024</value>
> </property>
> <property>
> <name>mapreduce.reduce.cpu.vcores</name>
> <value>1</value>
> </property>
> And config my tez-site.xml just
> <property>
> <name>tez.lib.uris</name>
> <value>${fs.defaultFS}/apps/tez-0.5.3/tez-0.5.3-minimal.tar.gz</value>
> </property>
> <property>
> <name>tez.use.cluster.hadoop-libs</name>
> <value>true</value>
> </property>
>
> Every thing is ok.
> I think some config in my cluster is too larger .
>
> [email protected]
>
> From: [email protected]
> Date: 2015-01-28 10:24
> To: user
> Subject: Re: Re: tez map task and reduce task stay pending forerver
> No . set hive.execution.engine=mr , still hang...
>
> [email protected]
>
> 发件人: Jianfeng (Jeff) Zhang
> 发送时间: 2015-01-28 10:11
> 收件人: user
> 主题: Re: 回复: tez map task and reduce task stay pending forerver
> Can you run this query successfully using hive on mr ?
>
>
>
> Best Regards,
> Jeff Zhang
>
>
> On Wed, Jan 28, 2015 at 10:01 AM, [email protected] <[email protected]>
> wrote:
>
> I check the tez document from HDP page
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html.
>
> tez.am.resource.memory.mb default value is 1536
> My hadoop yarn.app.mapreduce.am.resource.mb value is 5734 MiB
>
> The configuration mismatch will cause the problem ?
> [email protected]
>
> 发件人: [email protected]
> 发送时间: 2015-01-27 17:59
> 收件人: user
> 主题: 回复: 回复: tez map task and reduce task stay pending forerver
> Sorry Gopal V, I made a mistake, My config mapreduce.map.memory.mb is 2867 .
>
> [email protected]
>
> 发件人: [email protected]
> 发送时间: 2015-01-27 17:58
> 收件人: user
> 主题: 回复: 回复: tez map task and reduce task stay pending forerver
> Hello Gopal V,
> I check my cdh config ,I found mapreduce.map.memory.mb is 2876.
> [email protected]
>
> 发件人: [email protected]
> 发送时间: 2015-01-27 17:31
> 收件人: user
> 主题: 回复: Re: tez map task and reduce task stay pending forerver
>
> I check the hivetez.log . No kill request trigger by hive.
> [email protected]
>
> 发件人: Gopal V
> 发送时间: 2015-01-27 17:17
> 收件人: user
> 抄送: [email protected]
> 主题: Re: tez map task and reduce task stay pending forerver
> On 1/27/15, 12:50 AM, [email protected] wrote:
> > hive 0.14.0 tez 0.53 hadoop 2.3.0-cdh 5.0.2
> > hive> select * from p_city order by id;
> > Query ID = zhoushugang_20150127163434_da70d957-6ac4-4b8b-a484-42b593838076
> ...
> > --------------------------------------------------------------------------------
> >
> > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
> > --------------------------------------------------------------------------------
> >
> > Map 1 INITED 1 0 0 1 0 0
> > Reducer 2 INITED 1 0 0 1 0 0
>
> Looks all container requests are pending/unresponsive.
>
> I see a container request in the log with
>
> 2015-01-27 15:43:15,434 INFO [TaskSchedulerEventHandlerThread]
> rm.YarnTaskSchedulerService: Allocation request for task:
> attempt_1419300485749_371785_1_00_000000_0 with request:
> Capability[<memory:2867, vCores:3>]Priority[2] host:
> yhd-jqhadoop11.int.yihaodian.com rack: null
> ...
> 2015-01-27 15:43:17,635 INFO [DelayedContainerManager]
> rm.YarnTaskSchedulerService: Releasing held container as either there
> are pending but unmatched requests or this is not a session,
> containerId=container_1419300485749_371785_01_000002, pendingTasks=1,
> isSession=true. isNew=true
>
> That seems to indicate that a container allocation request was made, but
> YARN resource manager never responded with a container (or gave the
> wrong container?).
>
> Does the container size 2867 suggest any idea on what that might be?
>
> Cheers,
> Gopal
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.