Re: mahout svd OOM error

2013-12-20 Thread Suneel Marthi
DistributedLanczosSolver has been deprecated (and the blog post u mention is old). Use Stochastic SVD (SSVD) instead. On Friday, December 20, 2013 12:41 AM, Partha Pratim Talukdar partha.taluk...@cs.cmu.edu wrote: Hello, I am running mahout (v0.8) svd over a sparse matrix of size

Re: [ANNOUNCE] Machine Learning using Apache Mahout training course

2013-12-20 Thread David G
Really interesting, I would like to have that in Paris :) On 20 December 2013 07:47, Michael Wechner michael.wech...@wyona.comwrote: Hi Are you also considering to tell this (or maybe a shorter version) at ApacheCon? Thanks Michael Am 20.12.13 03:50, schrieb Koji Sekiguchi: I'm

Re: [ANNOUNCE] Machine Learning using Apache Mahout training course

2013-12-20 Thread Koji Sekiguchi
Uh, interesting idea that I've never thought. Sorry but I don't have a plan to go to ApacheCon. koji (13/12/20 15:47), Michael Wechner wrote: Hi Are you also considering to tell this (or maybe a shorter version) at ApacheCon? Thanks Michael Am 20.12.13 03:50, schrieb Koji

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Scott C. Cote
Suneel and others, I am still getting the strange results when I do the tour. Suneel: I manually wiped out the temp folder and also deleted the reuters-XXX folders. Also, per your advice I added the -ow option to all of the commands. NOTE: The step to create a matrix would NOT take a -ow option

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Suneel Marthi
Sorry Scott I should have looked at this more closely. I apologize. 1. You are doing a seqdumper of the matrix (which is generated from the rowid job and is not the output of the rowsimilarity job). Rowid Job generates a MxN matrix where M - no. of documents and N - terms associated with

clusterdump

2013-12-20 Thread Sameer Tilak
Hi All, I was able to do the clustering and need some help with viewing the result. I get the following problem. ./mahout clusterdump -i /scratch/dummyvectoroutput/clusters-*-final -d /scratch/dummyvectorfinalclusters MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning:

RE: KMeansDriver and distributed cache

2013-12-20 Thread Sameer Tilak
Hi All, I was able to resolve this issue by adding the following to my code: DistributedCache.addFileToClassPath(new Path(/scratch/mahout-math-0.9-\ SNAPSHOT.jar), conf,fs); DistributedCache.addFileToClassPath(new Path(/scratch/mahout-core-0.9-\ SNAPSHOT.jar), conf,fs);

Re: clusterdump

2013-12-20 Thread Suneel Marthi
Are you working off of trunk? 'clusterdump' is being used in examples/bin/cluster-reuters.sh. On Friday, December 20, 2013 5:33 PM, Sameer Tilak ssti...@live.com wrote: Hi All, I was able to do the clustering and need some help with viewing the result. I get the following problem.

Re: KMeansDriver and distributed cache

2013-12-20 Thread Ken Krugler
On Dec 20, 2013, at 2:35pm, Sameer Tilak ssti...@live.com wrote: Hi All, I was able to resolve this issue by adding the following to my code: DistributedCache.addFileToClassPath(new Path(/scratch/mahout-math-0.9-\ SNAPSHOT.jar), conf,fs);

RE: clusterdump

2013-12-20 Thread Sameer Tilak
Suneel: Yes, I am working off of trunk. I saw that example. In my case the data is numeric -- I assume that means no need for dictionary etc . I am not sure what is going on, but I still get the following errors: ./mahout clusterdump -i /scratch/dummyvectoroutput/clusters-*-final -o

RE: KMeansDriver and distributed cache

2013-12-20 Thread Sameer Tilak
Hi Ken, Thanks. I was going through that route. I was wondering if there is any advantage approach that uses Tool and call ToolRunner.run() over the one that uses DistributedCache.addFileToClassPath. May be the former one is more generic and can help you with things other than adding jar files.

Re: clusterdump

2013-12-20 Thread Suneel Marthi
I would investigate all of those 'Unable to add .' messages first. Checkout the latest code and run a clean build. On Friday, December 20, 2013 5:58 PM, Sameer Tilak ssti...@live.com wrote: Suneel: Yes, I am working off of trunk. I saw that example. In my case the data is numeric -- I

RE: clusterdump

2013-12-20 Thread Sameer Tilak
Hi All, My HADOOP_CLASSPATH was interfering somehow. Things seem to work fine now. -bash-4.1$ export HADOOP_CLASSPATH= ./mahout clusterdump -i /scratch/dummyvectoroutput/clusters-*-final --pointsDir /scratch/clusterdump MAHOUT-JOB:

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Scott C. Cote
Suneel, Thank you for your help. :) Thought I was completely in the ditch. If you are interested: inline with you comments are demonstrations that I finally have it (and the commands that I used)…. YAQ (Yet another question): How do I see with the dumper the documents that belong in a given

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Scott C. Cote
What does the data in cdump.txt represent? Can you point me in the right direction? SCott On 12/20/13 4:30 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Sorry Scott I should have looked at this more closely. I apologize. 1. You are doing a seqdumper of the matrix (which is generated from

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Suneel Marthi
You could use clusterdump to see the output of your clusters. Eg:   $MAHOUT clusterdump \     -i ${WORK_DIR}/reuters-kmeans/clusters-*-final \     -o ${WORK_DIR}/reuters-kmeans/clusterdump \     -d ${WORK_DIR}/reuters-out-seqdir-sparse-kmeans/dictionary.file-0 \     -dt sequencefile -b 100 -n

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Suneel Marthi
Which cdump.txt ? On Friday, December 20, 2013 7:29 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: You could use clusterdump to see the output of your clusters. Eg:   $MAHOUT clusterdump \     -i ${WORK_DIR}/reuters-kmeans/clusters-*-final \     -o

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Scott C. Cote
Suneel, I think I have it :) Pls confirm this understanding: I'm looking at the cdump.out that comes from clusterdump. It has the 20 clusters, each of the top words in the cluster, and each of the vectors that are members of the cluster. Do I have it? Am I getting this? Thanks, SCott

Re: unexpected results in seqdump of reuters-matrix in quick tour of text analysis

2013-12-20 Thread Suneel Marthi
You got it. On Friday, December 20, 2013 7:36 PM, Scott C. Cote scottcc...@gmail.com wrote: Suneel, I think I have it :) Pls confirm this understanding: I'm looking at the cdump.out that comes from clusterdump.  It has the 20 clusters, each of the top words in the cluster, and each of