Heap space

2014-03-09 Thread Mahmood Naderan
Hello, I ran this command     ./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 but got this error Exception in thread main java.lang.OutOfMemoryError: Java heap space There are many web pages regarding this and the solution is

ClusteringUtils for Kmeans output

2014-03-09 Thread Bikash Gupta
Hi, I want to use ClusteringUtils on Kmeans clusteredPoints to get summarizeClusterDistances , daviesBouldinIndex dunnIndex Is there any sample or example how to use these features? -- Thanks Regards Bikash Kumar Gupta

Re: ClusteringUtils for Kmeans output

2014-03-09 Thread Suneel Marthi
U could call ClusterQualitySummarizer which then calls ClusteringUtils to spew out the different metrics u had specified. For an example, see the Streaming Kmeans section in examples/bin/cluster-reuters.sh.  It calls 'qualcluster' with options -i tf-idf vectors generated from seq2sparse -c

Re: Heap space

2014-03-09 Thread Mahmood Naderan
OK  I found that I have to add this property to mapred-site.xml property namemapred.child.java.opts/name value-Xmx2048m/value /property   Regards, Mahmood On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hello, I ran this command     ./bin/mahout

Re: ClusteringUtils for Kmeans output

2014-03-09 Thread Bikash Gupta
I am successfully able to run ClusteringUtils on Kmeans(needs to check the scenario which you have mentionbed). However I am getting error from TDigest class Exception in thread main java.lang.NoSuchMethodError: com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque; at

Re: ClusteringUtils for Kmeans output

2014-03-09 Thread Suneel Marthi
Darn. U r the second guy to report that this week. Change that line to what ted suggested. The issue is with guava incompatibility with Hadoop's antiquated guava version. Sent from my iPhone On Mar 9, 2014, at 6:10 AM, Bikash Gupta bikash.gupt...@gmail.com wrote: I am successfully able

Re: Heap space

2014-03-09 Thread Mahmood Naderan
Excuse me, I added the -Xmx option and restarted the hadoop services using sbin/stop-all.sh sbin/start-all.sh however still I get heap size error. How can I find the correct and needed heap size?   Regards, Mahmood On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan nt_mahm...@yahoo.com

Re: ClusteringUtils for Kmeans output

2014-03-09 Thread Bikash Gupta
Just FYI... downgrading guava to 11.0.2 has fixed the build error in mahout-math as suggested by Ted however it is causing some other build error in mahout-core [INFO] - [ERROR]

Re: ClusteringUtils for Kmeans output

2014-03-09 Thread Bikash Gupta
Info for everyone I have successfully forced Mahout to build with Guava 11.0.2. Error and fixes as mentioned below 1. Class: org.apache.mahout.math.stats.GroupTree - Change Line No 171 to - stack = new ArrayDequeGroupTree(); - Import package java.util.ArrayDeque; 2. Class:

Re: ClusteringUtils for Kmeans output

2014-03-09 Thread Ted Dunning
Can you file a JIRA and attach your patch? On Sun, Mar 9, 2014 at 8:03 AM, Bikash Gupta bikash.gupt...@gmail.comwrote: Info for everyone I have successfully forced Mahout to build with Guava 11.0.2. Error and fixes as mentioned below 1. Class: org.apache.mahout.math.stats.GroupTree -

Re: ClusteringUtils for Kmeans output

2014-03-09 Thread Bikash Gupta
MAHOUT-1442 has been created. Will submit the patch too. On Sun, Mar 9, 2014 at 9:03 PM, Ted Dunning ted.dunn...@gmail.com wrote: Can you file a JIRA and attach your patch? On Sun, Mar 9, 2014 at 8:03 AM, Bikash Gupta bikash.gupt...@gmail.com wrote: Info for everyone I have

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-09 Thread Sebastian Schelter
Hi Koji, I've added a link to your article to our website: https://mahout.apache.org/general/books-tutorials-and-talks.html On 03/07/2014 03:29 AM, Koji Sekiguchi wrote: Hello, I just posted an article on Comparing Document Classification Functions of Lucene and Mahout.

Re: Heap space

2014-03-09 Thread Sebastian Schelter
I usually do try and error. Start with some very large value and do a binary search :) --sebastian On 03/09/2014 01:30 PM, Mahmood Naderan wrote: Excuse me, I added the -Xmx option and restarted the hadoop services using sbin/stop-all.sh sbin/start-all.sh however still I get heap size

RE: Heap space

2014-03-09 Thread Jason Xin
Hello, Sebastian, Can you help me remove my email account from this list? I tried several time with unsubscribe but to no vail. Thanks. The email is jason@sas.com Best Regards Jason Xin -Original Message- From: Sebastian Schelter [mailto:s...@apache.org] Sent: Sunday, March 09,

Re: Heap space

2014-03-09 Thread Suneel Marthi
Mahmood, Firstly thanks for starting this email thread and for highlighting the issues with wikipedia example. Since you raised this issue, I updated the new wikipedia examples page at http://mahout.apache.org/users/classification/wikipedia-bayes-example.html and also responded to a similar

Re: ClusteringUtils for Kmeans output

2014-03-09 Thread Suneel Marthi
Thinking loud here. If this is indeed a build error that u r seeing, a better fix would be to exclude hadoop's guava 11 transitive dependency in the pom as opposed to having downgrade Mahout code to be guava 11 compatible. We might have missed excluding Hadoop's Guava 11 jar during the recent

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-09 Thread Koji Sekiguchi
Cool, thanks Sebastian! koji (14/03/10 5:21), Sebastian Schelter wrote: Hi Koji, I've added a link to your article to our website: https://mahout.apache.org/general/books-tutorials-and-talks.html On 03/07/2014 03:29 AM, Koji Sekiguchi wrote: Hello, I just posted an article on