Hi
We are trying to improve our LLAP performance on our cluster but we've
noticed that event though LLAP daemon containers get configured memory,
they get only 1 vcore per container.
We are running 10 LLAP deamons using Slider. There are no other containers
running on the nodes that run LLAP daemons and there are 0 memory available
but 43 vcores running idle.
I can see the following lines on Slider logs so I suspect SliderAppMaster
doesn't request vcores from Yarn:
2018-10-16 18:38:42,503 [AmExecutor-006] INFO appmaster.SliderAppMaster -
Registered service under /users/hive/services/org-apache-slider/llap0;
absolute path /registry/users/hive/services/org-apache-slider/llap0
2018-10-16 18:38:42,510 [AmExecutor-006] INFO state.AppState - Reviewing
RoleStatus{name='LLAP', group=LLAP, key=1, desired=10, actual=0,
requested=0, releasing=0, failed=0, startFailed=0, started=0, completed=0,
totalRequested=0, preempted=0, nodeFailed=0, failedRecently=0,
limitsExceeded=0, resourceRequirements=<memory:445440, vCores:1>,
isAntiAffinePlacement=false, failureMessage='',
providerRole=ProviderRole{name='LLAP', group=LLAP, id=1, placementPolicy=0,
nodeFailureThreshold=3, placementTimeoutSeconds=30,
labelExpression='null'}, failedContainers=[],
healthThresholdMonitorEnabled=true} :
2018-10-16 18:38:42,510 [AmExecutor-006] INFO state.AppState - LLAP:
Asking for 10 more nodes(s) for a total of 10
2018-10-16 18:38:42,512 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,514 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,514 [AmExecutor-006] INFO state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
And here is the configuration output from same log file that might be
relevant:
"credentials" : { },
"components" : {
"LLAP" : {
"yarn.container.health.threshold.init.delay.secs" : "400",
"yarn.role.priority" : "1",
"yarn.component.instances" : "10",
"yarn.memory" : "445440",
"yarn.resource.normalization.enabled" : "false",
"yarn.container.health.threshold.window.secs" : "300",
"yarn.component.placement.policy" : "0",
"yarn.container.health.threshold.percent" : "80"
},
"slider-appmaster" : {
"yarn.vcores" : "1",
"yarn.component.instances" : "1",
"yarn.memory" : "1024"
}
}
},
yarn.nodemanager.resource.cpu-vcores,
yarn.scheduler.maximum-allocation-vcores,
hive.llap.daemon.vcpus.per.instance,
hive.llap.daemon.num.executors are all set to 44.
We can confirm 44 executors running per instance on LLAP Daemon web UI.
We are using HDP 2.7.3.2.6.4.0-91 with YARN 2.7.3, Hive 1.2.1000, Slider
0.92.0.
Any ideas how to utilize more CPU with LLAP daemons?
Thanks.