Hi, We are using an Ignite ComputeGrid, and it is mostly working nicely.
Recently we had a Node with "Noisy Neighbors" in AWS that wrecked havoc in our ComputeGrid. Even though that Node was quite slow, it was never removed from the map/reduce – slowing down all computes. We have already built a system that allows us to add/subtract Nodes to the ComputeGrid based on when they are actually “ready to compute”, Because our Nodes take considerable time to be truly ready for computation (i.e. quite a bit of prepreparation is required). So, to accomplish this, we use a dynamic Ignite ClusterGroup when we create the compute. ``` ClusterGroup readyNodes = readyForComputeMonitor.getNodesReadyForCompute(ignite.cluster()); log.debug(dumpClusterGroup(readyNodes)); return ignite.compute(readyNodes); ``` So. My question. Does Ignite keep any information that we can use to determine if a Node is healthy? I.e. some way that we can locate any outliers in the ComputeGrid? For example, the Node in our recent incident was at 100% CPU and was much, much slower in the reduce phase. Any help/advise would be much appreciated. Thanks, -- Chris -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/