Re: new to hadoop, jobs never leaving accepted

Sunil Govindan Wed, 10 Jul 2019 10:48:53 -0700

Hi Jason
Default value of
yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent is 0.1
(which is 10% of resource of the queue).
If you want to run more apps in that queue, you need to increase this limit
to run Application Master's for the apps.
Please make sure this is not configured to a higher value as you may end up
in having too many master's and less workers.


Thanks
Sunil





On Wed, Jul 10, 2019 at 11:11 PM Jason Laughman <
ja...@bernetechconsulting.com> wrote:

> I added a couple of bigger servers and now I see multiple containers
> running, but I still can’t get a job to run.  The job details now say:
>
> Diagnostics:    [Wed Jul 10 17:27:59 +0000 2019] Application is added to
> the scheduler and is not yet activated. Queue's AM resource limit exceeded.
> Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request =
> <memory:2048, vCores:1>; Queue Resource Limit for AM = <memory:3072,
> vCores:1>; User AM Resource Limit of the queue = <memory:3072, vCores:1>;
> Queue AM Resource Usage = <memory:2048, vCores:2>;
>
> I understand WHAT that’s saying, but I don’t understand WHY.  Here’s what
> my scheduler details look like, I don’t see why it’s complaining about the
> AM, unless something’s not talking to something else right:
>
> Queue State:    RUNNING
> Used Capacity:  12.5%
> Configured Capacity:    100.0%
> Configured Max Capacity:        100.0%
> Absolute Used Capacity: 12.5%
> Absolute Configured Capacity:   100.0%
> Absolute Configured Max Capacity:       100.0%
> Used Resources: <memory:3072, vCores:3>
> Configured Max Application Master Limit:        10.0
> Max Application Master Resources:       <memory:3072, vCores:1>
> Used Application Master Resources:      <memory:3072, vCores:3>
> Max Application Master Resources Per User:      <memory:3072, vCores:1>
> Num Schedulable Applications:   3
> Num Non-Schedulable Applications:       38
> Num Containers: 3
> Max Applications:       10000
> Max Applications Per User:      10000
> Configured Minimum User Limit Percent:  100%
> Configured User Limit Factor:   1.0
> Accessible Node Labels: *
> Ordering Policy:        FifoOrderingPolicy
> Preemption:     disabled
> Intra-queue Preemption: disabled
> Default Node Label Expression:  <DEFAULT_PARTITION>
> Default Application Priority:   0
>
> User Name       Max Resource    Weight  Used Resource   Max AM Resource
> Used AM Resource        Schedulable Apps        Non-Schedulable Apps
> hdfs    <memory:0, vCores:0>    1.0     <memory:0, vCores:0>
> <memory:3072, vCores:1> <memory:0, vCores:0>    0       1
> dr.who  <memory:24576, vCores:1>        1.0     <memory:3072, vCores:3>
> <memory:3072, vCores:1> <memory:3072, vCores:3> 3       37
>
> > On Jul 10, 2019, at 3:37 AM, yangtao.yt <yangtao...@alibaba-inc.com>
> wrote:
> >
> > Hi, Jason.
> >
> > According to the information you provided, your cluster has two nodes
> with the same resource <memory:1732, vCores:2>, the single running
> container is AM container which already takes over <memory:1024, vCores: 1>.
> > I think a possible cause is that available resource of your cluster was
> insufficient for requesting new containers, please refer to the application
> attempt UI 
> (http://<RM-HOST>:<RM-HTTP-PORT>/cluster/appattempt/<APP-ATTEMPT-ID>),
> you can find outstanding requests with required resources over there.
> Another possible cause is the queue/user limit, you can refer to scheduler
> UI (http://<RM-HOST>:<RM-HTTP-PORT>/cluster/scheduler) to check resource
> quotas and usage of the queue.
> > Hope it helps.
> >
> > Best,
> > Tao Yang
> >
> >> 在 2019年7月10日，上午8:23，Jason Laughman <ja...@bernetechconsulting.com> 写道：
> >>
> >> I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating
> through HDFS, but when I try to run a job via Hive (I see that it’s
> deprecated, but it’s what I’m working with for now) it never gets out of
> accepted state in the web tool.  I’ve done some Googling and the general
> consensus is that it’s resource constraints, so can someone tell me if I’ve
> got enough horsepower here?
> >>
> >> I’ve got one small name server, three small data servers, and two
> larger data servers.  I figured out the the small data servers were too
> small because even if I tried to tweak YARN parameters for RAM and CPU the
> resource managers would immediately shutdown.  I added the two larger data
> servers, and now I see two active nodes but only with a total of one
> container:
> >>
> >> $ yarn node -list
> >> 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at
> <resource_manager>:8032
> >> Total Nodes:2
> >>        Node-Id            Node-State Node-Http-Address
>  Number-of-Running-Containers
> >> node1:40079          RUNNING node1:8042
>  1
> >> node2:36311          RUNNING node2:8042
>  0
> >>
> >> There are a ton of some sort of automated jobs backed up on there, and
> when I try to run anything through Hive it just sits there and eventually
> times out (I do see it get accepted).  My larger nodes are 4 GB RAM and 2
> vcores and I set YARN to do automated resource allocation with
> yarn.nodemanager.resource.detect-hardware-capabilities.  Is that enough to
> even get a POC lab working?  I don’t care about having the three smaller
> servers running as resource nodes, but I’d like to have a better
> understanding of what’s going on with the larger servers, because it seems
> like they’re close to working.
> >>
> >> Here’s the metrics data from the website, hopefully somebody can parse
> it.
> >> Cluster Metrics
> >> Apps Submitted       Apps Pending    Apps Running    Apps Completed
> Containers Running      Memory Used     Memory Total    Memory Reserved
> VCores Used     VCores Total    VCores Reserved
> >> 292  284     1       7       1       1 GB    3.38 GB 0 B     1       4
>      0
> >> Cluster Nodes Metrics
> >> Active Nodes Decommissioning Nodes   Decommissioned Nodes    Lost
> Nodes      Unhealthy Nodes Rebooted Nodes  Shutdown Nodes
> >> 2    0       0       0       0       0       4
> >> Scheduler Metrics
> >> Scheduler Type       Scheduling Resource Type        Minimum
> Allocation      Maximum Allocation      Maximum Cluster Application Priority
> >> Capacity Scheduler   [MEMORY]        <memory:1024, vCores:1>
> <memory:1732, vCores:2> 0
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: user-h...@hadoop.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>
>

Re: new to hadoop, jobs never leaving accepted

Reply via email to