Arun Suresh created YARN-8849: --------------------------------- Summary: DynoYARN: A simulation and testing infrastructure for YARN clusters Key: YARN-8849 URL: https://issues.apache.org/jira/browse/YARN-8849 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun Suresh
Traditionally, YARN workload simulation is performed using SLS (Scheduler Load Simulator) which is packaged with YARN. It Essentially, starts a full fledged *ResourceManager*, but runs simulators for the *NodeManager* and the *ApplicationMaster* Containers. These simulators are lightweight and run in a threadpool. The NM simulators do not open any external ports and send (in-process) heartbeats to the ResourceManager. There are a couple of drawbacks with using the SLS: * It might be difficult to simulate really large clusters without having access to a very beefy box - since the NMs are launched as tasks in a threadpool, and each NM has to send periodic heartbeats to the RM. * Certain features (like YARN-1011) requires changes to the NodeManager - aspects such as queuing and selectively killing containers have to be incorporate into the existing NM Simulator which might make the simulator a bit heavy weight - there is a need for locking and synchronization. * Since the NM and AM are simulations, only the Scheduler is faithfully tested - it does not really perform an end-2-end test of a cluster. Therefore, drawing inspiration from [Dynamometer|https://github.com/linkedin/dynamometer], we propose a framework for YARN deployable YARN cluster - *DynoYARN* - for testing, with the following features: * The NM already has hooks to plug-in custom *ContainerExecutor* and *NodeResourceMonitor*. If we can plug-in a custom *ContainersMonitorImpl*'s Monitoring thread (and other modules like the LocalizationService), We can probably inject an Executor that does not actually launch containers and a Node and Container resource monitor that reports synthetic pre-specified Utilization metrics back to the RM. * Since we are launching fake containers, we cannot run normal AM containers. We can therefore, use *Unmanaged AM*'s to launch synthetic jobs. Essentially, a test workflow would look like this: * Launch a DynoYARN cluster. * Use the Unmanaged AM feature to directly negotiate with the DynaYARN Resource Manager for container tokens. * Use the container tokens from the RM to directly ask the DynoYARN Node Managers to start fake containers. * The DynoYARN NodeManagers will start the fake containers and report to the DynoYARN Resource Manager synthetically generated resource utilization for the containers (which will be injected via the *ContainerLaunchContext* and parsed by the plugged-in Container Executor). * The Scheduler will use the utilization report to schedule containers - we will be able to test allocation of {{Opportunistic}} containers based on resource utilization. * Since the DynoYARN Node Managers run the actual code paths, all preemption and queuing logic will be faithfully executed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org