Dear all,
Currently, I am running spark standalone cluster with ~100 nodes. Multiple users can connect to the cluster by Spark-shell or PyShell. However, I can't find an efficient way to control the resources among multiple users. I can set "spark.deploy.defaultCores" in the server side to limit the cpus for each Application. But, I cannot limit the memory usage by ease Application. User can set "spark.executor.memory 10g spark.python.worker.memory 10g" in ./conf/spark-defaults.conf in client side. Which means they can control how many resources they have. How can I control resources in the server side? Meanwhile, the Fair-shceduler requires the user to the pool explicitly. What if the user don't set it? Can I maintain the user-to-pool mapping in the server side? Thanks!