Yanbo Liang created SPARK-17428: ----------------------------------- Summary: SparkR executors/workers support virtualenv Key: SPARK-17428 URL: https://issues.apache.org/jira/browse/SPARK-17428 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yanbo Liang
Many users have requirements to use third party R packages in executors/workers, but SparkR can not satisfy this requirements elegantly. For example, you should to mess with the IT/administrators of the cluster to deploy these R packages on each executors/workers node which is very inflexible. I think we should support third party R packages for SparkR users as what we do for jar packages in the following two scenarios: 1, Users can install R packages from CRAN or custom CRAN-like repository for each executors. 2, Users can load their local R packages and install them on each executors. To achieve this goal, the first thing is to make SparkR executors support virtualenv like Python conda. I have investigated and found packrat is one of the candidates to support virtualenv for R. Packrat is a dependency management system for R and can isolate the dependent R packages in its own private package space. Then SparkR users can install third party packages in the application scope(destroy after the application exit) and don’t need to bother IT/administrators to install these packages manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org