The issue of "<console>:12: error: not found: type Text" is resolved by import statement.. But still facing issue with imports of VectorWritable. Mahout math jar is added to classpath as I can check on WebUI as well on shell
scala> System.getenv res1: java.util.Map[String,String] = {TERM=xterm, JAVA_HOME=/usr/lib/jvm/java-6-openjdk, SHLVL=2, SHELL_JARS=/home/hduser/installations/work-space/mahout-math-0.7.jar, SPARK_MASTER_WEBUI_PORT=5050, LESSCLOSE=/usr/bin/lesspipe %s %s, SSH_CLIENT=10.112.67.149 55123 22, SPARK_HOME=/home/hduser/installations/spark-0.9.0, MAIL=/var/mail/hduser, SPARK_WORKER_DIR=/tmp/spark-hduser-worklogs/work, XDG_SESSION_COOKIE=fbd2e4304c8c75dd606c361000000186-1400039480.256868-916349946, https_proxy=https://DS-1078D2486320:3128/, NICKNAME=vm01, JAVA_OPTS= -Djava.library.path= -Xms512m -Xmx512m, PWD=/home/hduser/installations/work-space/KMeansClustering_1, SSH_TTY=/dev/pts/0, SPARK_MASTER_PORT=7077, LOGNAME=hduser, MASTER=spark://VM-52540048731A:7077, SPARK_WORKER_MEMORY=2g, HADOOP_HOME=/usr/lib/hadoop, SS... Still not able to import Mahout Classes.. Any ideas ?? Thanks Stuti Awasthi -----Original Message----- From: Stuti Awasthi Sent: Wednesday, May 14, 2014 1:13 PM To: user@spark.apache.org Subject: RE: How to use Mahout VectorWritable in Spark. Hi Xiangrui, Thanks for the response .. I tried few ways to include mahout-math jar while launching Spark shell.. but no success.. Can you please point what I am doing wrong 1. mahout-math.jar exported in CLASSPATH, and PATH 2. Tried Launching Spark Shell by : MASTER=spark://<HOSTNAME>:<PORT> ADD_JARS=~/installations/work-space/mahout-math-0.7.jar park-0.9.0/bin/spark-shell After launching, I checked the environment details on WebUi: It looks like mahout-math jar is included. spark.jars /home/hduser/installations/work-space/mahout-math-0.7.jar Then I try : scala> import org.apache.mahout.math.VectorWritable <console>:10: error: object mahout is not a member of package org.apache import org.apache.mahout.math.VectorWritable scala> val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWritable]) <console>:12: error: not found: type Text val data = sc.sequenceFile("/stuti/ML/Clustering/KMeans/HAR/KMeans_dataset_seq/part-r-00000", classOf[Text], classOf[VectorWritable]) ^ Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 Thanks Stuti -----Original Message----- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Wednesday, May 14, 2014 11:56 AM To: user@spark.apache.org Subject: Re: How to use Mahout VectorWritable in Spark. You need > val raw = sc.sequenceFile(path, classOf[Text], > classOf[VectorWriteable]) to load the data. After that, you can do > val data = raw.values.map(_.get) To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` when you launch spark-shell to include mahout-math. Best, Xiangrui On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote: > Hi All, > > I am very new to Spark and trying to play around with Mllib hence > apologies for the basic question. > > > > I am trying to run KMeans algorithm using Mahout and Spark MLlib to > see the performance. Now initial datasize was 10 GB. Mahout converts > the data in Sequence File <Text,VectorWritable> which is used for KMeans > Clustering. > The Sequence File crated was ~ 6GB in size. > > > > Now I wanted if I can use the Mahout Sequence file to be executed in > Spark MLlib for KMeans . I have read that SparkContext.sequenceFile > may be used here. Hence I tried to read my sequencefile as below but getting > the error : > > > > Command on Spark Shell : > > scala> val data = sc.sequenceFile[String,VectorWritable]("/ > KMeans_dataset_seq/part-r-00000",String,VectorWritable) > > <console>:12: error: not found: type VectorWritable > > val data = sc.sequenceFile[String,VectorWritable](" > /KMeans_dataset_seq/part-r-00000",String,VectorWritable) > > > > Here I have 2 ques: > > 1. Mahout has “Text” as Key but Spark is printing “not found: type:Text” > hence I changed it to String.. Is this correct ??? > > 2. How will VectorWritable be found in Spark. Do I need to include > Mahout jar in Classpath or any other option ?? > > > > Please Suggest > > > > Regards > > Stuti Awasthi > > > > ::DISCLAIMER:: > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > -------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, lost, destroyed, arrive > late or incomplete, or may contain viruses in transmission. The e mail > and its contents (with or without referred errors) shall therefore not > attach any liability on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of > the author and may not necessarily reflect the views or opinions of > HCL or its affiliates. Any form of reproduction, dissemination, > copying, disclosure, modification, distribution and / or publication > of this message without the prior written consent of authorized > representative of HCL is strictly prohibited. If you have received > this email in error please delete it and notify the sender > immediately. > Before opening any email and/or attachments, please check them for > viruses and other defects. > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > --------