Hi Clifford, To use a remote Spark cluster, use passthrough command line arguments on the CLI, e.g.
pio train -- --master spark://your_master_url Anything after a lone -- will be passed to spark-submit verbatim. For more information try "pio help". To use a remote Elasticsearch cluster, please refer to examples in "conf/pio-env.sh" where you could find a variable to set the remote host name or IP of your ES cluster. Regards, Donald On Tue, Feb 28, 2017 at 12:57 PM Miller, Clifford < [email protected]> wrote: > I currently have Cloudera cluster (Hadoop, Spark, Hbase...) setup on AWS. > I have PredictionIO installed on a different EC2 instance. I've been able > to successfully configure it to use HDFS for model storage and to store > events in Hbase from the cluster. Spark and Elasticsearch are installed > locally on the PredictionIO EC2 instance. I have the following questions: > > How can I configure PredictionIO to utilize the Spark on the Cloudera > cluster? > How can I configure PredictionIO to utilize a remote Elasticsearch > domain? I'd like to use the AWS Elasticsearch service if possible. > > Thanks > > > -- > Clifford Miller > Mobile | 321.431.9089 >
