Re: OutOfMemoryError with BreimanExample on 4GB of data?

2012-12-21 Thread Adam Baron
ead limit > > exceeded > > > > Is due to to much time CPU time spent in GC, as opposed to not enough > heap > > allocation. Decreasing your heap allocation may in fact help as GC is > more > > efficient on a smaller heap. You may have to consider GC tuning. > >

Parallel MapReduce Classification Examples?

2012-12-28 Thread Adam Baron
I'm trying to get familiar with the the parallel MapReduce Classification algorithms offered in Mahout . So far I've run through examples for Naïve Bayes (classify-20newsgroups.sh) and Random Forest ( https://cwiki.apache.org/MAHOUT/partial-implementation.html). Looking at the algorithm list (htt

Re: Parallel MapReduce Classification Examples?

2012-12-31 Thread Adam Baron
g on every possible label value. Thanks, Adam On Fri, Dec 28, 2012 at 6:54 PM, Ted Dunning wrote: > On Fri, Dec 28, 2012 at 5:30 PM, Adam Baron > wrote: > > > I'm trying to get familiar with the the parallel MapReduce Classification > > algorithms offered in Mah

Re: How to segment seq2sparse output into predefined training set and test set?

2013-02-06 Thread Adam Baron
; > That class is responsible for the partitioning and you can probably > > just copy that class and replace the map() so that you look at the > > year from the text somehow. > > > > So, while it's not exactly code-free, it's better than writing a new > >

Re: How to classifyan individual file after training

2013-03-14 Thread Adam Baron
Frederic, Adding the functionality to classify new text on a go-forward basis against an existing Naïve Bayes model would be very helpful functionality to add to Mahout. I found your blog post informative and I'm sure many other classification users of Mahout have faced similar challenges to what

Set Random Forest Max Depth?

2013-05-17 Thread Adam Baron
Is there a way to set the maximum depth of a Random Forest? Browsing the code, it looks like the answer is no since the maxDepth() function is merely an informative property of the Node class (and derived classes). Just wanted to confirm with more experienced foresters before giving up on trying

ForestVisualizer OutOfMemoryError?

2013-06-24 Thread Adam Baron
I'm trying to visualize a Random Forest using ForestVisualizer (with the output redirected to a file) and am getting an OutOfMemoryError: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringB

Run more than one mapper for TestForest?

2013-07-05 Thread Adam Baron
I'm attempting to run org.apache.mahout.classifier.df.mapreduce.TestForest on a CSV with 200,000 rows that have 500,000 features per row. However, TestForest is running extremely slow, likely because only 1 mapper was assigned to the job. This seems strange because the org.apache.mahout.classifi

Re: Run more than one mapper for TestForest?

2013-07-29 Thread Adam Baron
log.info(String.format("min: %,d block: %,d max: %,d split: %,d", > minSize, blockSize, maxSize, splitSize)); > return splitSize; > } > > It seems like there should be a more straightforward way to do this, but > it works for me and I've used it on a l

Re: How to get human-readable output for large clustering?

2013-08-05 Thread Adam Baron
Florian, Any luck finding an answer over the past 5 months? I'm also dealing with similar out of memory errors when I run clusterdump. I'm using 50,000 features and tried k=500. The kmeans command ran fine, but then I got the dreaded OutOfMemory error on with the clusterdump command: 2013-08-0

Re: How to get human-readable output for large clustering?

2013-08-06 Thread Adam Baron
erdump by limiting the number of terms from > clusterdump, by specifying -n 20 (outputs the 20 top terms)? > > > > > ________ > From: Adam Baron > To: user@mahout.apache.org > Sent: Monday, August 5, 2013 8:03 PM > Subject: Re: How to get hum

Re: How to get human-readable output for large clustering?

2013-08-07 Thread Adam Baron
. Has the ClusterDumper code changed in 0.8? Regards, Adam On Tue, Aug 6, 2013 at 9:00 PM, Suneel Marthi wrote: > Adam, > > Pardon my asking again if this has already been answered - Are you running > against Mahout 0.8? > > > > > -----

Re: How to get human-readable output for large clustering?

2013-10-07 Thread Adam Baron
> Mahout is a library. You can link against any version you like and still > have a perfectly valid Hadoop program. > > > > > On Wed, Aug 7, 2013 at 11:51 AM, Adam Baron > wrote: > > > Suneel, > > > > Unfortunately no, we're still on Mahout 0.7.

Re: Examine Individual Trees in Random Forest - Mahout 0.8

2013-11-05 Thread Adam Baron
Tim, Try using org.apache.mahout.classifier.df.tools.ForestVisualizer: http://shawnwan.wordpress.com/2012/06/01/mahout-0-7-random-forest-examples/ Regards, Adam On Fri, Nov 1, 2013 at 3:33 AM, Tim Peut wrote: > Hi all, > > I'm building a random forest in Mahout 0.8 with > org.apach