I was able to get spark and mahout installed on EMR cluster as bootstrap 
actions and was able to run spark-itemsimilarity job via an EMR step with some 
modifications to mahout script (defining SPARK_HOME and making sure CLASSPATH 
is not picked up from the invoking script  which is amazon's script-runner).

I was only able to run this job using yarn-client (yarn-master is not able to 
submit to resource manager).  

In yarn-client mode the driver program runs in the client process and submits 
jobs to executors via yarn manager, so my question is how much memory does this 
driver need?
Will the memory requirement vary based on the size of the input to 
spark-itemsimilarity?

Thanks. 
  

-----Original Message-----
From: Pasmanik, Paul [mailto:paul.pasma...@danteinc.com] 
Sent: Thursday, January 15, 2015 12:46 PM
To: user@mahout.apache.org
Subject: mahout 1.0 on EMR with spark

Has anyone tried running mahout 1.0 on EMR with Spark?
I've used instructions at  
https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark to get EMR 
cluster running spark.   I am now able to deploy EMR cluster with Spark using 
AWS JAVA APIs.
EMR allows running a custom script as bootstrap action which I can use to 
install mahout.
What I am trying to figure out is whether I would need to build mahout every 
time I start EMR cluster or have pre-built artifacts and develop a script 
similar to what awslab is using to install spark?

Thanks.



________________________________
The information contained in this electronic transmission is intended only for 
the use of the recipient and may be confidential and privileged. Unauthorized 
use, disclosure, or reproduction is strictly prohibited and may be unlawful. If 
you have received this electronic transmission in error, please notify the 
sender immediately.

Reply via email to