RE: running spark-itemsimilarity against HDP sandbox with Spark

Pasmanik, Paul Tue, 06 Jan 2015 11:49:51 -0800

Thanks, Pat.
I am using HDP with spark 1.1.0: 
http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/


Spark examples run without issues.  For mahout I had to create a couple of env 
vars: (HADOOP_HOME, SPARK_HOME, MAHOUT_HOME).  Also, to run using yarn cluster 
with HDP   -ma yarn-cluster needs to be passed in.   
Also, default memory allocated to yarn was not enough out of the box ( 2g), 
increased to 3g, now restarting and trying again.

-----Original Message-----
From: Pat Ferrel [mailto:p...@occamsmachete.com] 
Sent: Tuesday, January 06, 2015 12:58 PM
To: user@mahout.apache.org
Subject: Re: running spark-itemsimilarity against HDP sandbox with Spark

There are some issues with using Mahout on Windows so you’ll have to run on a 
‘nix machine or VM. There shouldn’t be any problem with using VMs as long as 
your Spark install is setup correctly.

Currently you have to build Spark first and then Mahout from source. Mahout 
uses Spark 1.1. You’ll need to build Spark from source using “mvn install” 
rather than their recommended “mvn package” There were some problems in the 
Spark artifacts when running from the binary release. Check Mahout’s Spark FAQ 
for some pointers http://mahout.apache.org/users/sparkbindings/faq.html

Verify Spark is running correctly by trying their sample SparkPi job. 
http://spark.apache.org/docs/1.1.1/submitting-applications.html

Spark in general and spark-itemsimilarity especially like lots of memory so you 
may have to play with the -sem option to spark-itemsimilarity.

On Jan 6, 2015, at 8:07 AM, Pasmanik, Paul <paul.pasma...@danteinc.com> wrote:

Hi, I've been trying to run spark-itemsimilarity against Hortonworks Sandbox 
with Spark running in a VM, but have not succeeded yet.

Do I need to install mahout and run within a VM or is there a way to run 
remotely against a VM where spark and hadoop are running?

I tried running a scala ItemSimilaritySuite test with some modifications 
pointing hdfs and spark to sandbox but getting various errors the latest one 
with ShuffleMapTask getting hdfs block missing exception trying to read an 
input file that I uploaded to the hdfs cluster.


________________________________
The information contained in this electronic transmission is intended only for 
the use of the recipient and may be confidential and privileged. Unauthorized 
use, disclosure, or reproduction is strictly prohibited and may be unlawful. If 
you have received this electronic transmission in error, please notify the 
sender immediately.

RE: running spark-itemsimilarity against HDP sandbox with Spark

Reply via email to