We have seen 2 cases mentioned below, where, it would have been nice if
Apex allowed us to exclude a node from the cluster for an application.
1. A node in the cluster had gone bad (was randomly rebooting) and so an
Apex app should not use it - other apps can use it as they were batch jobs.
2. A n
We do have this feature in Yarn, but that applies to all applications. I am
not sure if Yarn has anti-affinity. This feature may be used, but in
general there is danger is an application taking over resource allocation.
Another quirk is that big data apps should ideally be node-neutral. This is
a g
I think “exclude nodes” and such is really the job of the resource manager i.e.
Yarn. So I am not sure taking over some of these tasks in Apex would be very
useful.
I agree with Amol that apps should be node neutral. Resource management in Yarn
together with fault tolerance in Apex should minim
But then, what's the solution to the 2 problem scenarios that Milind
describes ?
Ram
On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare
wrote:
> I think “exclude nodes” and such is really the job of the resource manager
> i.e. Yarn. So I am not sure taking over some of these tasks in Apex would
>
To me both use cases appear to be generic resource management use cases. For
example, a randomly rebooting node is not good for any purpose esp. long
running apps so it is a bit of a stretch to imagine that these nodes will be
acceptable for some batch jobs in Yarn. So such a node should be mark
I agree, Randomly rebooting node is Yarn issue. Even anti-affinity between
apps should be Yarn in long run. We could contribute the above jira.
Thks
Amol
On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare
wrote:
> To me both use cases appear to be generic resource management use cases.
> For exam
Not sure if this is what Milind had in mind but we often run into
situations where the dev group
working with Apex has no control over cluster configuration -- to make any
changes to the cluster they need to
go through an elaborate process that can take many days.
Meanwhile, if they notice that a
Apex has automatic blacklisting of the troublesome nodes, please take a
look at the following attributes,
MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
https://www.datatorrent.com/docs/apidocs/com/datatorrent/api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
BLACKLIS
This is a practical scenario where developers would be required to exclude
certain nodes as they might be required for some mission critical
applications. It would be good to have this feature.
I understand that Stram should not get into resourcing and still rely on
Yarn, however, as the App Maste
Yes, Ram explained to me that in practice this would be a useful feature for
Apex devops who typically have no control over Hadoop/Yarn cluster.
On 11/30/16, 9:22 PM, "Mohit Jotwani" wrote:
This is a practical scenario where developers would be required to exclude
certain nodes as they
I have created a jira, for adding the list of blacklisted nodes,
https://issues.apache.org/jira/browse/APEXCORE-584
On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare
wrote:
> Yes, Ram explained to me that in practice this would be a useful feature
> for Apex devops who typically have no control ove
Shouldn't this be already covered by anti-affinity. Today users can specify
multiple affinity rules, for each rule they can specify positive or
negative affinity, locality and operator selection. If an affinity rule
specifying negative affinity, node locality and all operators, does not
work then l
Pramod,
How to specify, "don't deploy any operators on Node20" using anti-affinity?
I don't see any examples here,
http://apex.apache.org/docs/apex/application_development/#affinity-rules
On Thu, Dec 1, 2016 at 11:31 AM Pramod Immaneni
wrote:
> Shouldn't this be already covered by anti-affin
I see a host locality available as an attribute in DAG for individual
operators. If affinity doesn't support this today, we could probably add
it. You could also make setting a blacklist directly a convenience function
on top of affinity.
On Thu, Dec 1, 2016 at 11:58 AM, Sandesh Hegde
wrote:
> P
I agree, this should be on top of affinity work
Thks
Amol
On Thu, Dec 1, 2016 at 1:01 PM, Pramod Immaneni
wrote:
> I see a host locality available as an attribute in DAG for individual
> operators. If affinity doesn't support this today, we could probably add
> it. You could also make setting a
Hi,
Can't we make use of existing Node Label + queue feature in Yarn to achieve
this. Though we will have to redeploy cluster, its still possible to
exclude nodes.
https://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
Thanks,
Ajay
On Fri, Dec 2, 2016 at 5:57 AM, Amo
As suggested by Sandesh, the parameter
MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do exactly what
is needed.
Why would this not work?
~ Bhupesh
It only takes effect after failures -- no way to exclude from the get-go.
Ram
On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" wrote:
> As suggested by Sandesh, the parameter
> MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do exactly what
> is needed.
> Why would this not work?
>
> ~ Bhupes
Okay, I think that serves an alternate purpose of detecting any newly gone
bad node and excluding it.
+1 for covering the original scenario under anti-affinity.
~ Bhupesh
On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath
wrote:
> It only takes effect after failures -- no way to exclude from th
While it is possible to extend anti-affinity to take care of this, I feel
it will cause confusion from a user perspective. As a user, when I think
about anti-affinity, what comes to mind right away is a relative relation
between operators.
On the other hand, the current ask is not that, but a rela
My previous mail explains it, but just forgot to add : -1 to cover this
under anti affinity.
On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve wrote:
> While it is possible to extend anti-affinity to take care of this, I feel
> it will cause confusion from a user perspective. As a user, when I think
Additionally, this would apply to Stram as well i.e. the master should also
not be deployed on these nodes. Not sure if anti-affinity goes beyond
operators.
On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve wrote:
> My previous mail explains it, but just forgot to add : -1 to cover this
> under anti
I would agree with Milind.
Regards,
Mohit
On Fri, Dec 2, 2016 at 12:49 PM, Milind Barve wrote:
> Additionally, this would apply to Stram as well i.e. the master should also
> not be deployed on these nodes. Not sure if anti-affinity goes beyond
> operators.
>
> On Fri, Dec 2, 2016 at 12:47 PM,
Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
attribute within the app un-enforceable in terms of not deploying master on
a node.
Thks
Amol
On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve wrote:
> Additionally, this would apply to Stram as well i.e. the master should
Yarn allows the AppMaster to run on the selected node, Apex shouldn't
select the blacklisted nodes, so it is possible to achieve not running the
Apex containers on certain nodes.
http://stackoverflow.com/questions/29302659/run-my-own-application-master-on-a-specific-node-in-a-yarn-cluster
On Thu
So all Apex will need to do is - to make sure as a part of the initial
configuration validations that the node selected to run the master is not a
part of the "excludeNode" list.
On Fri, Dec 2, 2016 at 1:47 PM, Sandesh Hegde
wrote:
> Yarn allows the AppMaster to run on the selected node, Apex sh
Could STRAM include a poison pill where it simply exits with diagnostic if
its host name is blacklisted ?
Ram
On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre wrote:
> Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> attribute within the app un-enforceable in terms of not
Stram exclude node should be via Yarn, poison pill is not a good way as it
induces a terminate for wrong reasons.
Thks
Amol
On Fri, Dec 2, 2016 at 7:13 AM, Munagala Ramanath
wrote:
> Could STRAM include a poison pill where it simply exits with diagnostic if
> its host name is blacklisted ?
>
>
The OP is claiming (in the comment to the first response) that he actually
tried the
proposed solution and it did not work for him and shows the RM code fragment
that is clobbering his preference.
Ram
On Fri, Dec 2, 2016 at 12:17 AM, Sandesh Hegde
wrote:
> Yarn allows the AppMaster to run on th
Agree it should be via YARN; the poison pill would be the final barrier in
the event
all other mechanisms have failed -- sort of like an API call which
documents that a parameter
should be non-null but nevertheless checks it internally and throws an
exception if it finds null.
Additionally, it als
30 matches
Mail list logo