Hello Chris, There is no such metric as "node is healthy" now, but each node provides a lot of low-level metrics such as CPU usage, memory usage, jobs execution/waiting time etc, which you can combine and define your own criteria of "healthy node". These metrics available cluster-wide and contains information for each node, see ClusterGroup#metrics(), ClusterNode#metrics() methods.
ср, 5 сент. 2018 г. в 0:39, Chris Berry <chriswbe...@gmail.com>: > Hi, > > We are using an Ignite ComputeGrid, and it is mostly working nicely. > > Recently we had a Node with "Noisy Neighbors" in AWS that wrecked havoc in > our ComputeGrid. > Even though that Node was quite slow, it was never removed from the > map/reduce – slowing down all computes. > > We have already built a system that allows us to add/subtract Nodes to the > ComputeGrid based on when they are actually “ready to compute”, > Because our Nodes take considerable time to be truly ready for computation > (i.e. quite a bit of prepreparation is required). > So, to accomplish this, we use a dynamic Ignite ClusterGroup when we create > the compute. > > ``` > ClusterGroup readyNodes = > readyForComputeMonitor.getNodesReadyForCompute(ignite.cluster()); > log.debug(dumpClusterGroup(readyNodes)); > return ignite.compute(readyNodes); > ``` > > So. My question. > Does Ignite keep any information that we can use to determine if a Node is > healthy? > I.e. some way that we can locate any outliers in the ComputeGrid? > > For example, the Node in our recent incident was at 100% CPU and was much, > much slower in the reduce phase. > > Any help/advise would be much appreciated. > > Thanks, > -- Chris > > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >