Re: running spark-itemsimilarity against HDP sandbox with Spark

2015-01-06 Thread Pat Ferrel
Did you build Spark from source and deploy it to the cluster? When you build Mahout it’s running its tests against the artifacts it gets from maven repos. When you run mahout on a cluster it is running from the artifacts on the cluster. These may not be the same and there have been problems that

RE: running spark-itemsimilarity against HDP sandbox with Spark

2015-01-06 Thread Pasmanik, Paul
So, when I follow examples from hortonworks and run spark Pi example using spark-submit - everything works. I can run mahout spark-itemsimilarity without specifying master parameter which means it is running in the local mode (right?) and it works. But if I try to run mahout using -ma (master

RE: running spark-itemsimilarity against HDP sandbox with Spark

2015-01-06 Thread Pasmanik, Paul
Thanks, Pat. I am using HDP with spark 1.1.0: http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/ Spark examples run without issues. For mahout I had to create a couple of env vars: (HADOOP_HOME, SPARK_HOME, MAHOUT_HOME). Also, to run using yarn cluster with HDP -ma yarn-cluster

Re: example of hashing vectorizer for text data using mapreduce code

2015-01-06 Thread chirag lakhani
I believe I may have found a solution to this problem which I will try to eventually put on github but now I am not sure how to run this on the cluster. I have created the code on my eclipse IDE as a maven project and then copied the jar file to the Hadoop cluster (vectorCode-1.0.jar) I know try

Re: java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 1 in Mahout Itemsimilarity

2015-01-06 Thread Pat Ferrel
Just a guess itemsimilarity takes a csv of The IDs must be non-negative row and column numbers. The hadoop version of this job expects you to translate your IDs into row and column numbers in the overall matrix it will create from the individual lines of the csv. On Jan 6, 2015, at 1:36 AM,

Re: running spark-itemsimilarity against HDP sandbox with Spark

2015-01-06 Thread Pat Ferrel
There are some issues with using Mahout on Windows so you’ll have to run on a ‘nix machine or VM. There shouldn’t be any problem with using VMs as long as your Spark install is setup correctly. Currently you have to build Spark first and then Mahout from source. Mahout uses Spark 1.1. You’ll ne

running spark-itemsimilarity against HDP sandbox with Spark

2015-01-06 Thread Pasmanik, Paul
Hi, I've been trying to run spark-itemsimilarity against Hortonworks Sandbox with Spark running in a VM, but have not succeeded yet. Do I need to install mahout and run within a VM or is there a way to run remotely against a VM where spark and hadoop are running? I tried running a scala ItemSim

Re: java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 1 in Mahout Itemsimilarity

2015-01-06 Thread unmesha sreeveni
Any one know what could be the reason? On Tue, Jan 6, 2015 at 2:08 PM, unmesha sreeveni wrote: > I am trying to run Itemsimilarity in mahout instead of ftp growth. > > But once I run I am getting > java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.mapred.L

java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 1 in Mahout Itemsimilarity

2015-01-06 Thread unmesha sreeveni
I am trying to run Itemsimilarity in mahout instead of ftp growth. But once I run I am getting java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJ