We have our quantitative team using Spark as part of their daily work. One of the more common problems we run into is that people unintentionally leave their shells open throughout the day. This eats up memory in the cluster and causes others to have limited resources to run their jobs.
With something like Hive or many client applications for SQL databases, this is not really an issue but with Spark it's a significant inconvenience to non-technical users. Someone ends up having to post throughout the day in chats to ensure people are using their shells or to 'get off the cluster'. Just wondering if anyone else has experienced this type of issue and how they are managing it. One idea we've had is to implement an 'idle timeout' monitor for the shell, though on the surface this appears quite challenging.