Hello Chris,

There is no such metric as "node is healthy" now, but each node provides a
lot of low-level metrics such as CPU usage, memory usage, jobs
execution/waiting time etc, which you can combine and define your own
criteria of "healthy node". These metrics available cluster-wide and
contains information for each node, see ClusterGroup#metrics(),
ClusterNode#metrics() methods.


ср, 5 сент. 2018 г. в 0:39, Chris Berry <chriswbe...@gmail.com>:

> Hi,
>
> We are using an Ignite ComputeGrid, and it is mostly working nicely.
>
> Recently we had a Node with "Noisy Neighbors" in AWS that wrecked havoc in
> our ComputeGrid.
> Even though that Node was quite slow, it was never removed from the
> map/reduce – slowing down all computes.
>
> We have already built a system that allows us to add/subtract Nodes to the
> ComputeGrid based on when they are actually “ready to compute”,
> Because our Nodes take considerable time to be truly ready for computation
> (i.e. quite a bit of prepreparation is required).
> So, to accomplish this, we use a dynamic Ignite ClusterGroup when we create
> the compute.
>
> ```
> ClusterGroup readyNodes =
> readyForComputeMonitor.getNodesReadyForCompute(ignite.cluster());
> log.debug(dumpClusterGroup(readyNodes));
> return ignite.compute(readyNodes);
> ```
>
> So. My question.
> Does Ignite keep any information that we can use to determine if a Node is
> healthy?
> I.e. some way that we can locate any outliers in the ComputeGrid?
>
> For example, the Node in our recent incident was at 100% CPU and was much,
> much slower in the reduce phase.
>
> Any help/advise would be much appreciated.
>
> Thanks,
> -- Chris
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Reply via email to