RE: Mahout on Spark: random forest

2014-08-11 Thread Sameer Tilak
To: user@mahout.apache.org there is no Random Forest impl on Spark in Mahout yet. Ml-lib has a Random Forests impl why can't u use that instead. On Tue, Aug 12, 2014 at 2:19 AM, Sameer Tilak ssti...@live.com wrote: Hi All, We are currently using Weka. I looked

RE: Mahout on Spark: random forest

2014-08-11 Thread Sameer Tilak
, Sameer Tilak ssti...@live.com wrote: Hi All, We are currently using Weka. I looked the the site and read briefly about experimental nature of Mahout on Spark. I was wondering how mu of Mahout's functionality is available currently? For example, I am

Pig local mode issue

2014-01-22 Thread Sameer Tilak
Hi All,My script runs find in map reduce mode, but I get the following error when I run it in the local mode. I have made sure that the i/p file exists. I am not sure why map reduce is coming into picture when it is local mode. pig -x local myscript.pig 2014-01-22 16:14:02,771 [main] INFO

mahout itemsimilarity problem

2013-12-30 Thread Sameer Tilak
Hi All,I am getting the following error while executing this job: -bash-4.1$ ./mahout itemsimilarity -i /scratch/SimilartyInput -o /scratch/SimilartyOutput -s SIMILARITY_COOCCURRENCE --maxSimilaritiesPerItem 10 13/12/30 10:30:29 INFO common.AbstractJob: Command line arguments:

item similarity result interpretation

2013-12-27 Thread Sameer Tilak
Hi All, I was able to successfully run item similarity algorithm on my dataset.My input data had the following format: userid itemid…… I used the following command: ./mahout itemsimilarity -i /scratch/SimilartyInput -o /scratch/SimilartyOutput -s SIMILARITY_COOCCURRENCE --maxSimilaritiesPerItem

Mahout itemsimilarity error

2013-12-27 Thread Sameer Tilak
Hi All,I am running the following command to process a quite large dataset. I want to mention upfront that my input file does contain few blank lines. Any thought on why this might be happening? ./mahout itemsimilarity -i /scratch/SimilartyInput -o /scratch/SimilartyOutput -s

itemsimilarity Exception in thread: java.io.FileNotFoundException: File does not exist: numUsers.bin

2013-12-27 Thread Sameer Tilak
Hi All,I am having another issue with item similarity. For some reason numUsers.bin file does not get generated. I am copying the command here: ./mahout itemsimilarity -i /scratch/SimilartyInput -o /scratch/SimilartyOutput --tempDir /scratch/Similartytemp -s SIMILARITY_COOCCURRENCE

K-means: No input clusters found

2013-12-24 Thread Sameer Tilak
Hi all, I get the following problem whehn I run k-mens clustering on my real data. Any ehlp with this would be great! Here is data that I read out of the Sequencefile: 022960 value:

Vectorizing data in mapreduce mode

2013-12-23 Thread Sameer Tilak
Hi everyone, My Pig script generates the following -- results are stored in part-m-0 to part-m-4 files. -bash-4.1$ hadoop dfs -ls /scratch/ItemIds Found 7 items -rw-r--r-- 1 userid supergroup 0 2013-12-23 11:13 /scratch/ItemIds/_SUCCESS drwxr-xr-x - userid supergroup

clusterdump

2013-12-20 Thread Sameer Tilak
Hi All, I was able to do the clustering and need some help with viewing the result. I get the following problem. ./mahout clusterdump -i /scratch/dummyvectoroutput/clusters-*-final -d /scratch/dummyvectorfinalclusters MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning:

RE: KMeansDriver and distributed cache

2013-12-20 Thread Sameer Tilak
Hi All, I was able to resolve this issue by adding the following to my code: DistributedCache.addFileToClassPath(new Path(/scratch/mahout-math-0.9-\ SNAPSHOT.jar), conf,fs); DistributedCache.addFileToClassPath(new Path(/scratch/mahout-core-0.9-\ SNAPSHOT.jar), conf,fs);

RE: clusterdump

2013-12-20 Thread Sameer Tilak
, December 20, 2013 5:33 PM, Sameer Tilak ssti...@live.com wrote: Hi All, I was able to do the clustering and need some help with viewing the result. I get the following problem. ./mahout clusterdump -i /scratch/dummyvectoroutput/clusters-*-final -d /scratch/dummyvectorfinalclusters

RE: KMeansDriver and distributed cache

2013-12-20 Thread Sameer Tilak
. However, using DistributedCache.addFileToClassPath I was able to have them seen in worker nodes. From: kkrugler_li...@transpac.com Subject: Re: KMeansDriver and distributed cache Date: Fri, 20 Dec 2013 14:47:13 -0800 To: user@mahout.apache.org On Dec 20, 2013, at 2:35pm, Sameer Tilak

RE: clusterdump

2013-12-20 Thread Sameer Tilak
@mahout.apache.org I would investigate all of those 'Unable to add .' messages first. Checkout the latest code and run a clean build. On Friday, December 20, 2013 5:58 PM, Sameer Tilak ssti...@live.com wrote: Suneel: Yes, I am working off of trunk. I saw that example. In my case the data

RE: Exception in thread main java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector

2013-12-19 Thread Sameer Tilak
, 2013 1:04 PM, Sameer Tilak ssti...@live.com wrote: Hi everyone, I used the following commands to generate the jar file: javac -d /apps/analytics/myanalytics -classpath .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps

More on Exception in thread main java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector

2013-12-19 Thread Sameer Tilak
Hi Everyone, I have added math jar file to my javac command. As per the documentation, Include the JAR in the “-libjars” command line option of the `hadoop jar …` command. The jar will be placed in distributed cache and will be made available to all of the job’s task attempts. Ideally, this

RE: More on Exception in thread main java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector

2013-12-19 Thread Sameer Tilak
Hi,I was able to resolve this problem by adding the jar files to HADOOP_CLASSPATH. Here is a command sequence: export

KMeansDriver and distributed cache

2013-12-19 Thread Sameer Tilak
Hi All, I am trying to execute the following command: hadoop jar /apps/analytics/myanalytics.jar myanalytics.SimpleKMeansClustering -libjars /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar

Data Vectorization

2013-12-16 Thread Sameer Tilak
Hi All, I have some questions regarding vectorization. Here is my Pig script snippet. AU = FOREACH A GENERATE myparser.myUDF(param1, param2); STORE AU into '/scratch/AU'; AU has the following format: (userid, (item_view_history)) (27,(0,1,1,0,0))(28,(0,0,1,0,0))(29,(0,0,1,0,1))(30,(1,0,1,0,1))

RE: Data Vectorization

2013-12-16 Thread Sameer Tilak
Trying to figure that out now. Will keep you posted. Date: Mon, 16 Dec 2013 12:13:52 -0800 Subject: Re: Data Vectorization From: andrew.mussel...@gmail.com To: user@mahout.apache.org Looks reasonable. Does it work? On Mon, Dec 16, 2013 at 12:09 PM, Sameer Tilak ssti...@live.com wrote

RE: Data Vectorization

2013-12-16 Thread Sameer Tilak
Dec 2013 12:13:52 -0800 Subject: Re: Data Vectorization From: andrew.mussel...@gmail.com To: user@mahout.apache.org Looks reasonable. Does it work? On Mon, Dec 16, 2013 at 12:09 PM, Sameer Tilak ssti...@live.com wrote: Hi All, I have some questions regarding vectorization. Here

RE: Data Vectorization

2013-12-16 Thread Sameer Tilak
. On Monday, December 16, 2013 3:58 PM, Sameer Tilak ssti...@live.com wrote: It does not see to work :(. Here is who I use the generated sequence (described in my last email) file for clustering. ./mahout seqdirectory -i /scratch/VectorizedInput -o /scratch/VectorizedOutputSeqdir -c UTF-8

RE: Data Vectorization

2013-12-16 Thread Sameer Tilak
9.288379669189453 Date: Mon, 16 Dec 2013 12:13:52 -0800 Subject: Re: Data Vectorization From: andrew.mussel...@gmail.com To: user@mahout.apache.org Looks reasonable.á Does it work? On Mon, Dec 16, 2013 at 12:09 PM, Sameer Tilak ssti...@live.com wrote: Hi All, I have

K-means clustering: clusterdump

2013-12-12 Thread Sameer Tilak
Hi, I am running K-means clustering following the script on Wiki: https://cwiki.apache.org/confluence/download/attachments/75159/quickstart-kmeans.sh?version=2modificationDate=1286718326000 Looks like with the newer version of Mahout the commandline options have changed. For example I get the

Elephant-Bird, Pig, and Mahout

2013-12-05 Thread Sameer Tilak
Hi All, I have some question about using EB's VectorWritableConverter in my Pig script for data vectorization. I am generating the tuples using a UDF, however for simplicity I am loading the data from a file in the following code. My UDF returns tuples of the form (1,0,1,1...) etc. My map.dat

Pig vector project

2013-12-02 Thread Sameer Tilak
Hi All,We are using Pig top build our data pipeline. I came across the following:https://github.com/tdunning/pig-vector The last commit was 2 yrs ago. Any information on will there be any further work on this project?

Mahout for clustering

2013-12-02 Thread Sameer Tilak
Hi All,We are using Apache Pig for building our data pipeline. We have data in the following fashion: userid, age, items {code 1, code 2, ….}, few other features... Each item has a unique alphanumeric code. I would like to use mahout for clustering it. Based on my current reading I see

RE: Pig vector project

2013-12-02 Thread Sameer Tilak
. https://github.com/kevinweil/elephant-bird On Mon, Dec 2, 2013 at 4:10 PM, Sameer Tilak ssti...@live.com wrote: Hi All,We are using Pig top build our data pipeline. I came across the following:https://github.com/tdunning/pig-vector The last commit was 2 yrs ago. Any information

RE: Mahout for clustering

2013-12-02 Thread Sameer Tilak
I am looking for some input on how to vectorize my data. From: ssti...@live.com To: user@mahout.apache.org Subject: Mahout for clustering Date: Mon, 2 Dec 2013 16:22:03 -0800 Hi All,We are using Apache Pig for building our data pipeline. We have data in the following fashion:

RE: Mahout fpg

2013-11-21 Thread Sameer Tilak
. --sebastian On 21.11.2013 00:28, Sameer Tilak wrote: Yes, changing A1234567 to 1234567 resolves that issue trivially. However, (input: userid, itemcode) itemcode is alphanumeric and not just numeric. I am sure ItemSimilarityJob will be able to handle that case, however I need to know to supply

RE: Mahout fpg

2013-11-20 Thread Sameer Tilak
To: user@mahout.apache.org Subject: Re: Mahout fpg You can use ItemSimilarityJob to find sets of items that cooccur together in your users interactions. --sebastian On 20.11.2013 08:11, Sameer Tilak wrote: Hi Sunil, Thanks for your reply. We can benefit a lot from

RE: Mahout fpg

2013-11-20 Thread Sameer Tilak
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) Obviously, the input's incorrect. On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak ssti...@live.com wrote: Dear Sebastian,I tried using ItemSimilarityJob.My data has the following format Each line

Parallel Frequent Pattern Mining input format

2013-11-19 Thread Sameer Tilak
Hi everyone, I am interested in using Mahout for analyzing data -- in particular frequent pattern mining using Mahout's FPG algorithm. My data can be expressed as a MXN matrix. Each row represents a given user where as columns represent the items (1 if a given user has viewed a particular item

Mahout fpg

2013-11-19 Thread Sameer Tilak
Hi everyone,I downloaded the latest version of Mahout and did mvn install. When I try to run fog, I get the following errors. Do I need to download and compile FPG separately? Looks like somehow it has not been included in the list of valid programs. 13/11/19 17:49:19 WARN driver.MahoutDriver:

RE: Mahout fpg

2013-11-19 Thread Sameer Tilak
...@yahoo.com Subject: Re: Mahout fpg To: user@mahout.apache.org Fpg has been removed from the codebase as it will not be supported. On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak ssti...@live.com wrote: Hi everyone,I downloaded the latest version of Mahout and did mvn install. When