RE: How to use Mahout VectorWritable in Spark.

Stuti Awasthi Wed, 14 May 2014 02:25:32 -0700

The issue of "<console>:12: error: not found: type Text" is resolved by import 
statement.. But still facing issue with imports of VectorWritable.
Mahout math jar is added to classpath as I can check on WebUI as well on shell


scala> System.getenv
res1: java.util.Map[String,String] = {TERM=xterm, 
JAVA_HOME=/usr/lib/jvm/java-6-openjdk, SHLVL=2, 
SHELL_JARS=/home/hduser/installations/work-space/mahout-math-0.7.jar, 
SPARK_MASTER_WEBUI_PORT=5050, LESSCLOSE=/usr/bin/lesspipe %s %s, 
SSH_CLIENT=10.112.67.149 55123 22, 
SPARK_HOME=/home/hduser/installations/spark-0.9.0, MAIL=/var/mail/hduser, 
SPARK_WORKER_DIR=/tmp/spark-hduser-worklogs/work, 
XDG_SESSION_COOKIE=fbd2e4304c8c75dd606c361000000186-1400039480.256868-916349946,
 https_proxy=https://DS-1078D2486320:3128/, NICKNAME=vm01, JAVA_OPTS=  
-Djava.library.path= -Xms512m -Xmx512m, 
PWD=/home/hduser/installations/work-space/KMeansClustering_1, 
SSH_TTY=/dev/pts/0, SPARK_MASTER_PORT=7077, LOGNAME=hduser, 
MASTER=spark://VM-52540048731A:7077, SPARK_WORKER_MEMORY=2g, 
HADOOP_HOME=/usr/lib/hadoop, SS...

Still not able to import  Mahout Classes.. Any ideas ??

Thanks
Stuti Awasthi

-----Original Message-----
From: Stuti Awasthi 
Sent: Wednesday, May 14, 2014 1:13 PM
To: user@spark.apache.org
Subject: RE: How to use Mahout VectorWritable in Spark.

Hi Xiangrui,
Thanks for the response .. I tried few ways to include mahout-math jar while 
launching Spark shell.. but no success.. Can you please point what I am doing 
wrong

1. mahout-math.jar exported in CLASSPATH, and PATH 2. Tried Launching Spark 
Shell by :  MASTER=spark://<HOSTNAME>:<PORT> 
ADD_JARS=~/installations/work-space/mahout-math-0.7.jar 
park-0.9.0/bin/spark-shell

 After launching, I checked the environment details on WebUi: It looks like 
mahout-math jar is included.
spark.jars      /home/hduser/installations/work-space/mahout-math-0.7.jar

Then I try :
scala> import org.apache.mahout.math.VectorWritable
<console>:10: error: object mahout is not a member of package org.apache
       import org.apache.mahout.math.VectorWritable

scala> val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWritable])  
<console>:12: error: not found: type Text
       val data = 
sc.sequenceFile("/stuti/ML/Clustering/KMeans/HAR/KMeans_dataset_seq/part-r-00000",
 classOf[Text], classOf[VectorWritable])
                                                                                
                             ^ Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 
0.7

Thanks
Stuti 



-----Original Message-----
From: Xiangrui Meng [mailto:men...@gmail.com]
Sent: Wednesday, May 14, 2014 11:56 AM
To: user@spark.apache.org
Subject: Re: How to use Mahout VectorWritable in Spark.

You need

> val raw = sc.sequenceFile(path, classOf[Text],
> classOf[VectorWriteable])

to load the data. After that, you can do

> val data = raw.values.map(_.get)

To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` when you 
launch spark-shell to include mahout-math.

Best,
Xiangrui

On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote:
> Hi All,
>
> I am very new to Spark and trying to play around with Mllib hence 
> apologies for the basic question.
>
>
>
> I am trying to run KMeans algorithm using Mahout and Spark MLlib to 
> see the performance. Now initial datasize was 10 GB. Mahout converts 
> the data in Sequence File <Text,VectorWritable> which is used for KMeans 
> Clustering.
> The Sequence File crated was ~ 6GB in size.
>
>
>
> Now I wanted if I can use the Mahout Sequence file to be executed in 
> Spark MLlib for KMeans . I have read that SparkContext.sequenceFile 
> may be used here. Hence I tried to read my sequencefile as below but getting 
> the error :
>
>
>
> Command on Spark Shell :
>
> scala> val data = sc.sequenceFile[String,VectorWritable]("/
> KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>
> <console>:12: error: not found: type VectorWritable
>
>        val data = sc.sequenceFile[String,VectorWritable]("
> /KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>
>
>
> Here I have 2 ques:
>
> 1.  Mahout has “Text” as Key but Spark is printing “not found: type:Text”
> hence I changed it to String.. Is this correct ???
>
> 2. How will VectorWritable be found in Spark. Do I need to include 
> Mahout jar in Classpath or any other option ??
>
>
>
> Please Suggest
>
>
>
> Regards
>
> Stuti Awasthi
>
>
>
> ::DISCLAIMER::
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------
>
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as 
> information could be intercepted, corrupted, lost, destroyed, arrive 
> late or incomplete, or may contain viruses in transmission. The e mail 
> and its contents (with or without referred errors) shall therefore not 
> attach any liability on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of 
> the author and may not necessarily reflect the views or opinions of 
> HCL or its affiliates. Any form of reproduction, dissemination, 
> copying, disclosure, modification, distribution and / or publication 
> of this message without the prior written consent of authorized 
> representative of HCL is strictly prohibited. If you have received 
> this email in error please delete it and notify the sender 
> immediately.
> Before opening any email and/or attachments, please check them for 
> viruses and other defects.
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------

RE: How to use Mahout VectorWritable in Spark.

Reply via email to