Hello, I would like to expose Apache Spark to untrusted users (through Livy, and with a direct JDBC connection).
However, there appear to be a variety of avenues wherein one of these untrusted users can execute arbitrary code (by design): PySpark, SparkR, Jar uploads, various UDFs, etc. I would like to prevent my untrusted users from executing arbitrary remote code. I have found small bits of information relating to this[0][1], but nothing comprehensive or prescriptive. I understand that this is not exactly Spark’s use case, but any thoughts or opinions with regards to this would be appreciated, and especially if there is an established process for tending to this scenario. Thanks. Jack 0: https://stackoverflow.com/questions/38333873/securely-running-a-spark-application-inside-a-sandbox 1: https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.builtin.udf.whitelist