Re: PredictionIO with remote Spark and Elasticsearch

Pat Ferrel Thu, 02 Mar 2017 14:44:10 -0800

I think it will be released with the upcoming release. We are still deciding 
how or if we modify the sbt build so I’d wait, if you can. It’s in feature/es5 
but the config is also still in flux a bit.



On Mar 2, 2017, at 2:15 PM, Miller, Clifford 
<[email protected]> wrote:

I probably should have asked if the elasticsearch 5.x compatible branch was in 
a state that I could clone and build it.  If it is, where can I find it?

On Thu, Mar 2, 2017 at 5:06 PM, Miller, Clifford 
<[email protected] 
<mailto:[email protected]>> wrote:
Actually, AWS has 3 current options.  1.5, 2.3, and 5.1.  So a 5.x compatible 
version should work.  When will this 5.x compatible version be available?

On Thu, Mar 2, 2017 at 5:02 PM, Pat Ferrel <[email protected] 
<mailto:[email protected]>> wrote:
Yes, PIO uses the TransportClient and this is being deprecated by ES. PIO has a 
feature branch that adds support for ES5 using only the REST client. Not sure 
this will help though since I suspect AWS is not on ES5 yet.


On Mar 2, 2017, at 1:10 PM, Miller, Clifford 
<[email protected] 
<mailto:[email protected]>> wrote:

I found some old references of folks having the same issue as me.  They 
indicated that the AWS Elasticsearch Service only supports HTTP and not TCP.  
If this is true then it means that AWS Elasticsearch has very limited 
usefulness.  Has anyone else ran into this?


On Thu, Mar 2, 2017 at 1:26 PM, Miller, Clifford 
<[email protected] 
<mailto:[email protected]>> wrote:
I'm able run pio train although the pio train -- --master 
spark://your_master_url <> did not work.  I'm using Spark on Yarn so I was able 
to get pio train -- --master yarn://URL <> to work after I copied the elastic 
search configuration from my CDH cluster.

I'm still struggling with integrating this with AWS elasticsearch.  Does anyone 
have an example of how this should be configured.  

FYI, the EC2 instance that I'm running PredictionIO on can access it from the 
command line: "curl -X GET <AWS Elasticsearch endpoint URL>". 
 

On Wed, Mar 1, 2017 at 11:44 AM, Donald Szeto <[email protected] 
<mailto:[email protected]>> wrote:
Hi Clifford,

To use a remote Spark cluster, use passthrough command line arguments on the 
CLI, e.g.

pio train -- --master spark://your_master_url <>

Anything after a lone -- will be passed to spark-submit verbatim. For more 
information try "pio help".

To use a remote Elasticsearch cluster, please refer to examples in 
"conf/pio-env.sh" where you could find a variable to set the remote host name 
or IP of your ES cluster.

Regards,
Donald

On Tue, Feb 28, 2017 at 12:57 PM Miller, Clifford 
<[email protected] 
<mailto:[email protected]>> wrote:
I currently have Cloudera cluster (Hadoop, Spark, Hbase...) setup on AWS.  I 
have PredictionIO installed on a different EC2 instance.  I've been able to 
successfully configure it to use HDFS for model storage and to store events in 
Hbase from the cluster.  Spark and Elasticsearch are installed locally on the 
PredictionIO EC2 instance.  I have the following questions:

How can I configure PredictionIO to utilize the Spark on the Cloudera cluster?  
How can I configure PredictionIO to utilize a remote Elasticsearch domain?  I'd 
like to use the AWS Elasticsearch service if possible.

Thanks


-- 
Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>



-- 
Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>



-- 
Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>




-- 
Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>



-- 
Clifford Miller
Mobile | 321.431.9089 <tel:321.431.9089>

Re: PredictionIO with remote Spark and Elasticsearch

Reply via email to