PS spark shell with all proper imports are also supported natively in Mahout (mahout spark-shell command). See M-1489 for specifics. There's also a tutorial somewhere but i suspect it has not been yet finished/publised via public link yet. Again, you need trunk to use spark shell there.
On Wed, May 14, 2014 at 12:43 AM, Stuti Awasthi <stutiawas...@hcl.com>wrote: > Hi Xiangrui, > Thanks for the response .. I tried few ways to include mahout-math jar > while launching Spark shell.. but no success.. Can you please point what I > am doing wrong > > 1. mahout-math.jar exported in CLASSPATH, and PATH > 2. Tried Launching Spark Shell by : MASTER=spark://<HOSTNAME>:<PORT> > ADD_JARS=~/installations/work-space/mahout-math-0.7.jar > park-0.9.0/bin/spark-shell > > After launching, I checked the environment details on WebUi: It looks > like mahout-math jar is included. > spark.jars /home/hduser/installations/work-space/mahout-math-0.7.jar > > Then I try : > scala> import org.apache.mahout.math.VectorWritable > <console>:10: error: object mahout is not a member of package org.apache > import org.apache.mahout.math.VectorWritable > > scala> val raw = sc.sequenceFile(path, classOf[Text], > classOf[VectorWritable]) > <console>:12: error: not found: type Text > val data = > sc.sequenceFile("/stuti/ML/Clustering/KMeans/HAR/KMeans_dataset_seq/part-r-00000", > classOf[Text], classOf[VectorWritable]) > > ^ > Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 > > Thanks > Stuti > > > > -----Original Message----- > From: Xiangrui Meng [mailto:men...@gmail.com] > Sent: Wednesday, May 14, 2014 11:56 AM > To: user@spark.apache.org > Subject: Re: How to use Mahout VectorWritable in Spark. > > You need > > > val raw = sc.sequenceFile(path, classOf[Text], > > classOf[VectorWriteable]) > > to load the data. After that, you can do > > > val data = raw.values.map(_.get) > > To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` when > you launch spark-shell to include mahout-math. > > Best, > Xiangrui > > On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawas...@hcl.com> > wrote: > > Hi All, > > > > I am very new to Spark and trying to play around with Mllib hence > > apologies for the basic question. > > > > > > > > I am trying to run KMeans algorithm using Mahout and Spark MLlib to > > see the performance. Now initial datasize was 10 GB. Mahout converts > > the data in Sequence File <Text,VectorWritable> which is used for KMeans > Clustering. > > The Sequence File crated was ~ 6GB in size. > > > > > > > > Now I wanted if I can use the Mahout Sequence file to be executed in > > Spark MLlib for KMeans . I have read that SparkContext.sequenceFile > > may be used here. Hence I tried to read my sequencefile as below but > getting the error : > > > > > > > > Command on Spark Shell : > > > > scala> val data = sc.sequenceFile[String,VectorWritable]("/ > > KMeans_dataset_seq/part-r-00000",String,VectorWritable) > > > > <console>:12: error: not found: type VectorWritable > > > > val data = sc.sequenceFile[String,VectorWritable](" > > /KMeans_dataset_seq/part-r-00000",String,VectorWritable) > > > > > > > > Here I have 2 ques: > > > > 1. Mahout has “Text” as Key but Spark is printing “not found: type:Text” > > hence I changed it to String.. Is this correct ??? > > > > 2. How will VectorWritable be found in Spark. Do I need to include > > Mahout jar in Classpath or any other option ?? > > > > > > > > Please Suggest > > > > > > > > Regards > > > > Stuti Awasthi > > > > > > > > ::DISCLAIMER:: > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > -------- > > > > The contents of this e-mail and any attachment(s) are confidential and > > intended for the named recipient(s) only. > > E-mail transmission is not guaranteed to be secure or error-free as > > information could be intercepted, corrupted, lost, destroyed, arrive > > late or incomplete, or may contain viruses in transmission. The e mail > > and its contents (with or without referred errors) shall therefore not > > attach any liability on the originator or HCL or its affiliates. > > Views or opinions, if any, presented in this email are solely those of > > the author and may not necessarily reflect the views or opinions of > > HCL or its affiliates. Any form of reproduction, dissemination, > > copying, disclosure, modification, distribution and / or publication > > of this message without the prior written consent of authorized > > representative of HCL is strictly prohibited. If you have received > > this email in error please delete it and notify the sender > > immediately. > > Before opening any email and/or attachments, please check them for > > viruses and other defects. > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > -------- >